Relevant Feature Selection from High-Dimensional Data Using MST Based Clustering

Google Scholar

Published

Mar 11, 2015

Download

PDF

Statistic

Read Counter : 23 Download : 34

Downloads

Download data is not yet available.

Abstract

Feature selection is the process of identifying a subset of the most useful features that produces compatible results as the original entire set of features. Features provide the information about the data set. In High-dimensional data representation each sample is described by many features. The data sets are typically not task-speciï¬c, many features are irrelevant or redundant and should be pruned out or ï¬ltered for the purpose of classifying target objects. Given a set of features the feature selection problem is to find a subset of features that â€œmaximizes the learnerâ€™s ability to classify patternsâ€. A graph theoretic clustering algorithm based on boruvkaâ€™s algorithm is implemented and experimentally evaluated in this paper. The proposed algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. All the representative features from different clusters form the final feature subset. After finding feature subset accuracy of a classifier, time required for classification and proportion of features selected can be calculated.

Keywords: Boruvkaâ€™s Algorithm, Graph theoretic clustering, Filter Method, Wrapper Method, Embedded Approach

Author Biographies

Yaswanth Kumar Alapati, R.V.R. & J.C. College of Engineering, Guntur, A.P

Assistant Professor, Dept. of Information Technology

K. Sindhu, R.V.R. & J.C. College of Engineering, Guntur, A.P

Assistant Professor, Department of CSE

S. Suneel, PRRM College of Engineering, Shahabad, Telangana

Assistant Professor, Department of CSE

How to Cite

Alapati, Y. K., Sindhu, K., & Suneel, S. (2015). Relevant Feature Selection from High-Dimensional Data Using MST Based Clustering. International Journal of Emerging Trends in Science and Technology, 2(03). Retrieved from https://igmpublication.org/ijetst.in/index.php/ijetst/article/view/544

References

1. H. Liu and R. Setiono, â€œA Probabilistic Approach to Feature Selection: A Filter Solution,â€ Proc. 13th Intâ€™l Conf. Machine Learning, pp. 319-327, 1996
2. K. Kira and L.A. Rendell, â€œThe Feature Selection Problem: Traditional Methods and a New Algorithm,â€ Proc. 10th Natâ€™l Conf. Artificial Intelligence, pp. 129-134, 1992
3. M.A. Hall, â€œCorrelation-Based Feature Subset Selection for Machine Learning,â€ PhD dissertation, Univ. of Waikato, 1999.
4. M.F. Usama and B. Keki, â€œIrani: Multi-Interval Discretization of Continuousvalued Attributes for Classification Learning,â€ Proc. 13th Intâ€™l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993
5. Jinna Lei, Three minimum spanning tree algorithms, University of California, Berkeley, May 2010
6. I. Guyon and A. Elisseeff, â€œAn Introduction to Variable and Feature Selection,â€ J. Machine Learning Research, vol 3, pp. 1157-1182, 2003.
7. M. Dash and H. Liu, â€œFeature Selection for Classification,â€Intelligent Data Analysis, vol. 1, no. 3, pp. 131-156, 1997.
The data sets can be downloaded at: http://archive.ics.uci.edu/ml/,http://tunedit.org/repo/Data/Text-wc,http://featureselection.asu.edu/ datasets.php,http://www.lsi.us.es/aguilar/datasets

##plugins.themes.academic_pro.article.sidebar##