##plugins.themes.academic_pro.article.main##

Abstract

Feature selection is the process of identifying a subset of the most useful features that produces compatible results as the original entire set of features. Features provide the information about the data set. In High-dimensional data representation each sample is described by many features. The data sets are typically not task-speciï¬c, many features are irrelevant or redundant and should be pruned out or ï¬ltered for the purpose of classifying target objects. Given a set of features the feature selection problem is to find a subset of features that “maximizes the learner’s ability to classify patternsâ€. A graph theoretic clustering algorithm based on boruvka’s algorithm is implemented and experimentally evaluated in this paper. The proposed algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. All the representative features from different clusters form the final feature subset. After finding feature subset accuracy of a classifier, time required for classification and proportion of features selected can be calculated.

Keywords: Boruvka’s Algorithm, Graph theoretic clustering, Filter Method, Wrapper Method, Embedded Approach

##plugins.themes.academic_pro.article.details##

Author Biographies

Yaswanth Kumar Alapati, R.V.R. & J.C. College of Engineering, Guntur, A.P

Assistant Professor, Dept. of Information Technology

K. Sindhu, R.V.R. & J.C. College of Engineering, Guntur, A.P

Assistant Professor, Department of CSE

S. Suneel, PRRM College of Engineering, Shahabad, Telangana

Assistant Professor, Department of CSE
How to Cite
Alapati, Y. K., Sindhu, K., & Suneel, S. (2015). Relevant Feature Selection from High-Dimensional Data Using MST Based Clustering. International Journal of Emerging Trends in Science and Technology, 2(03). Retrieved from http://igmpublication.org/ijetst.in/index.php/ijetst/article/view/544

References

1. H. Liu and R. Setiono, “A Probabilistic Approach to Feature Selection: A Filter Solution,” Proc. 13th Int’l Conf. Machine Learning, pp. 319-327, 1996
2. K. Kira and L.A. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” Proc. 10th Nat’l Conf. Artificial Intelligence, pp. 129-134, 1992
3. M.A. Hall, “Correlation-Based Feature Subset Selection for Machine Learning,” PhD dissertation, Univ. of Waikato, 1999.
4. M.F. Usama and B. Keki, “Irani: Multi-Interval Discretization of Continuousvalued Attributes for Classification Learning,” Proc. 13th Int’l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993
5. Jinna Lei, Three minimum spanning tree algorithms, University of California, Berkeley, May 2010
6. I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol 3, pp. 1157-1182, 2003.
7. M. Dash and H. Liu, “Feature Selection for Classification,”Intelligent Data Analysis, vol. 1, no. 3, pp. 131-156, 1997.
The data sets can be downloaded at: http://archive.ics.uci.edu/ml/,http://tunedit.org/repo/Data/Text-wc,http://featureselection.asu.edu/ datasets.php,http://www.lsi.us.es/aguilar/datasets