Feature Selection using Clustering Algorithms: FAST and LUFS

Google Scholar

Published

Jul 8, 2015

Download

PDF

Statistic

Read Counter : 54 Download : 42

Downloads

Download data is not yet available.

Abstract

Feature selection is used to reduce the number of features in many applications where hundreds or thousands of features are present in data.Â Many feature selection methods are proposed which mainly focus on ï¬nding relevant features. High dimensional data becomes very common with the emerging growth ofÂ Â applications. Thus, there is a need of mining High dimensional data very effectively and efficiently. Clustering is widely used data mining model that partitions data into a set of groups, each of which is called a cluster. To reduce the dimensionality of the data andÂ to select a subset of useful features from this clusters is the main goal of feature subset selection.Â In dealing with high-dimensional data for efficient data mining, feature selection has been shown very effective. Popular social media data nowadays increasingly presents new challenges to feature selection. Social media data consists of data such as posts, comments, images, tweets, and linked data which describes the relationships between users of social media andÂ the users who post the posts. The nature of social media increases the already challenging problem of feature selection because the social media data is massive, noisy, and incomplete. There are several algorithms applied to find the efficiency and effectiveness of the features. Here we are using the combination of FAST and Linked Unsupervised feature selection algorithm for the linked high dimensional data.

How to Cite

Dharmale, N. V., & Shelke, S. N. (2015). Feature Selection using Clustering Algorithms: FAST and LUFS. International Journal of Emerging Trends in Science and Technology, 2(07). Retrieved from https://igmpublication.org/ijetst.in/index.php/ijetst/article/view/778

References

1. AQinbao Song, Jingjie Ni and Guangtao Wang , â€œA fast clustering- based feature subset selection algorithm for high-dimensional dataâ€, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, 2013.
2. J. Tang and H. Liu, â€œFeature selection with linked data in social mediaâ€, In SIAM International Conference on Data Mining, 2012.
3. L. Yu and H. Liu, â€œEfficiently Handling Feature Redundancy in High-Dimensional Data,â€ Proc. Ninth ACM SIGKDD Intâ€™l Conf. Knowledge Discovery and Data Mining (KDD â€™03), pp. 685-690, 2003.
4. R. Butterworth, G. Piatetsky -Shapiro, and D.A. Simovici, â€œOn Feature Selection through Clustering,â€ Proc. IEEE Fifth Intâ€™l Conf. Data Mining, pp. 581-584, 2005.
5. R. Kohavi and G.H. John, â€œWrappers for Feature Subset Selection,â€ Artificial Intellig-ence, vol. 97, nos. 1/2, pp. 273-324, 1997.
6. J. Souza, â€œFeature Selection with a General Hybrid Algorithm,â€ PhD dissertation, Univ. of Ottawa, 2004.
7. F. Fleuret, â€œFast Binary Feature Selection with Conditional Mutual Information,â€ J. Machine Learning Research, v. 5, pp. 1531- 1555, 2004.
8. A. Arauzo-Azofra, J.M. Benitez, and J.L. Castro, â€œA Feature Set Measure Based on Relief,â€ Proc. Fifth Intâ€™l Conf. Recent Advances in Soft Computing, pp. 104-109, 2004.
9. I. Kononenko, â€œEstimating Attributes: Analysis and Extensions of RELIEF,â€ Proc. European Conf. Machine Learning, pp. 171-182, 1994.
10. L.C. Molina, L. Belanche, and A. Nebot, â€œFeature Selection Algorithms: A Survey and Experimental Evaluation,â€ Proc. IEEE Intâ€™l Conf. Data Mining, pp. 306-313, 2002.
11. M. Dash, H. Liu, and H. Motoda, â€œConsistency Based Feature Selection,â€ Proc. Fourth Pacific Asia Conf. Knowledge Discovery and Data Mining, pp. 98-109, 2000.
12. K. Kira and L.A. Rendell, â€œThe Feature Selection Problem: Traditional Methods and a New Algorithm,â€ Proc. 10th Natâ€™l Conf. Artificial Intelligence, pp. 129-134, 1992.
13. M.A. Hall, â€œCorrelation-Based Feature Subset Selection for Machine Learning,â€ PhD dissertation, Univ. of Waikato, 1999.
14. L. Yu and H. Liu, â€œFeature Selection for High-Dimensional Data: A Fast Correlation- Based Filter Solution,â€Proc. 20th Intâ€™l Conf. Machine Leaning, vol. 20, no. 2, pp. 856-863, 2003.
15. G.H. John, R. Kohavi, and K. Pfleger, â€œIrrelevant Features and the Subset Selection Problem,â€ Proc. 11th Intâ€™l Conf. Machine Learning, pp. 121-129, 1994.

##plugins.themes.academic_pro.article.sidebar##

Downloads

##plugins.themes.academic_pro.article.main##

Abstract

##plugins.themes.academic_pro.article.details##

References