##plugins.themes.academic_pro.article.main##

Abstract

Feature selection is used to reduce the number of features in many applications where hundreds or thousands of features are present in data.  Many feature selection methods are proposed which mainly focus on ï¬nding relevant features. High dimensional data becomes very common with the emerging growth of   applications. Thus, there is a need of mining High dimensional data very effectively and efficiently. Clustering is widely used data mining model that partitions data into a set of groups, each of which is called a cluster. To reduce the dimensionality of the data and  to select a subset of useful features from this clusters is the main goal of feature subset selection.  In dealing with high-dimensional data for efficient data mining, feature selection has been shown very effective. Popular social media data nowadays increasingly presents new challenges to feature selection. Social media data consists of data such as posts, comments, images, tweets, and linked data which describes the relationships between users of social media and  the users who post the posts. The nature of social media increases the already challenging problem of feature selection because the social media data is massive, noisy, and incomplete. There are several algorithms applied to find the efficiency and effectiveness of the features. Here we are using the combination of FAST and Linked Unsupervised feature selection algorithm for the linked high dimensional data.

##plugins.themes.academic_pro.article.details##

How to Cite
Dharmale, N. V., & Shelke, S. N. (2015). Feature Selection using Clustering Algorithms: FAST and LUFS. International Journal of Emerging Trends in Science and Technology, 2(07). Retrieved from https://igmpublication.org/ijetst.in/index.php/ijetst/article/view/778

References

1. AQinbao Song, Jingjie Ni and Guangtao Wang , “A fast clustering- based feature subset selection algorithm for high-dimensional data”, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, 2013.
2. J. Tang and H. Liu, “Feature selection with linked data in social media”, In SIAM International Conference on Data Mining, 2012.
3. L. Yu and H. Liu, “Efficiently Handling Feature Redundancy in High-Dimensional Data,” Proc. Ninth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD ’03), pp. 685-690, 2003.
4. R. Butterworth, G. Piatetsky -Shapiro, and D.A. Simovici, “On Feature Selection through Clustering,” Proc. IEEE Fifth Int’l Conf. Data Mining, pp. 581-584, 2005.
5. R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intellig-ence, vol. 97, nos. 1/2, pp. 273-324, 1997.
6. J. Souza, “Feature Selection with a General Hybrid Algorithm,” PhD dissertation, Univ. of Ottawa, 2004.
7. F. Fleuret, “Fast Binary Feature Selection with Conditional Mutual Information,” J. Machine Learning Research, v. 5, pp. 1531- 1555, 2004.
8. A. Arauzo-Azofra, J.M. Benitez, and J.L. Castro, “A Feature Set Measure Based on Relief,” Proc. Fifth Int’l Conf. Recent Advances in Soft Computing, pp. 104-109, 2004.
9. I. Kononenko, “Estimating Attributes: Analysis and Extensions of RELIEF,” Proc. European Conf. Machine Learning, pp. 171-182, 1994.
10. L.C. Molina, L. Belanche, and A. Nebot, “Feature Selection Algorithms: A Survey and Experimental Evaluation,” Proc. IEEE Int’l Conf. Data Mining, pp. 306-313, 2002.
11. M. Dash, H. Liu, and H. Motoda, “Consistency Based Feature Selection,” Proc. Fourth Pacific Asia Conf. Knowledge Discovery and Data Mining, pp. 98-109, 2000.
12. K. Kira and L.A. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” Proc. 10th Nat’l Conf. Artificial Intelligence, pp. 129-134, 1992.
13. M.A. Hall, “Correlation-Based Feature Subset Selection for Machine Learning,” PhD dissertation, Univ. of Waikato, 1999.
14. L. Yu and H. Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation- Based Filter Solution,”Proc. 20th Int’l Conf. Machine Leaning, vol. 20, no. 2, pp. 856-863, 2003.
15. G.H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” Proc. 11th Int’l Conf. Machine Learning, pp. 121-129, 1994.