|
|
Linked Social Media Data Based Semi-Supervised Feature Selection Method |
WANG Yi-Bing, PAN Zhi-Song, WU Jun-Qing, JIA Bo, HU Gu-Yu |
College of Command Information System, PLA University of Science and Technology, Nanjing 210007 |
|
|
Abstract Mountains of high-dimensional, unlabeled data are produced by the social media network, which brings tremendous challenges to the data processing. Meanwhile, the linked graph information between data samples can not be effectively used in the existing pattern recognition algorithms. A semi-supervised feature selection method (SSLFS) based on linked relations is proposed combined with a little supervised information after mining the linked graph of social media network. Through spectral analysis and sparsity constraint, SSLFS selects feature subsets which maintain the characteristics of local manifold and sparsity. The experimental results on the Flickr dataset show that the subset obtained by SSLFS is more effective when applied to classification compared with those by other methods.
|
Received: 13 May 2013
|
|
|
|
|
[1] Jain A K, Duin R P W, Mao J C. Statistical Pattern Recognition: A Review. IEEE Trans on Pattern Analysis and Machine Inte-lligence, 2000, 22(1): 4-37 [2] Liu H, Motoda H. Computational Methods of Feature Selection. London, UK: Chapman and Hall, 2007 [3] Liu H, Yu L. Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Trans on Knowledge and Data Engineering, 2005, 17(4): 491-502 [4] Guyon I, Elisseeff A. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 2003, 3: 1157-1182 [5] Cover T M, Thomas J A. Elements of Information Theory. 2nd Edition. New York, USA: John Wiley and Sons, 2006 [6] He X F, Cai D, Niyogi P. Laplacian Score for Feature Selection // Weiss Y, Sch lkopf B, Platt J, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2005, 17: 507-514 [7] Zhao Z, Liu H. Spectral Feature Selection for Supervised and Unsupervised Learning // Proc of the 24th International Conference on Machine Learning. Oregon, USA, 2007: 1151-1157 [8] Duda R O, Hart P E. Stork D G. Pattern Classification. 2nd Edition. New York, USA: John Wiley and Sons, 2001 [9] Sikonja M R, Kononenko I. Theoretical and Empirical Analysis of Relief and RReliefF. Machine Learning, 2003, 53(1/2): 23-69 [10] Howe D, Costanzo M, Fey P, et al. Big Data: The Future of Biocuration. Nature, 2008, 455(7209): 47-50 [11] Chapelle O, Sch lkopf B, Zien A. Semi-Supervised Learning. Cambridge, USA: MIT Press, 2006 [12] Zhao Z, Liu H. Semi-Supervised Feature Selection via Spectral Analysis // Proc of the SIAM International Conference on Data Mining. Minnesota, USA, 2007: 641-646 [13] Handl J, Knowles J. Semi-Supervised Feature Selection via Multiobjective Optimization // Proc of the International Joint Confe-rence on Neural Networks. Vancouver, Canada, 2006: 3319-3326 [14] Tang J L, Liu H. Unsupervised Feature Selection for Linked Social Media Data // Proc of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, China, 2012: 904-912 [15] Newman M E J, Girvan M. Finding and Evaluating Community Structure in Networks. Physical Review E, 2004. DOI:10.1103/physreve.69.026113 [16] Newman M E J. Modularity and Community Structure in Networks. Proc of the National Academy of Sciences of the United States of America, 2006, 103(23): 8577-8582 [17] Luxburg U. A Tutorial on Spectral Clustering. Statistics and Computing, 2007, 17(4): 395-416 [18] Belkin M, Niyogi P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering // Dietterich T G, Becker S, Ghah-ramani Z, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2001, 13: 585-591 [19] Yang Y, Shen H T, Ma Z G, et al. L21-Norm Regularized Discriminative Feature Selection for Unsupervised Learning // Proc of the 22nd International Joint Conference on Artificial Intelligence. Barcelona, Spain, 2011, II: 1589-1594 [20] Tang J L, Liu H. Feature Selection with Linked Data in Social Media // Proc of the SIAM International Conference on Data Mining. Anaheim, USA, 2012: 118-128 [21] Huang G B, Zhu Q Y, Siew C K. Extreme Learning Machine: Theory and Applications. Neurocomputing, 2006, 70(1/2/3): 489-501 [22] Wang X F, Tang L, Gao H J, et al. Discovering Overlapping Groups in Social Media // Proc of the IEEE International Conference on Data Mining. Sydney, Australia, 2010: 569-578 [23] Cai D, He X F, Han J W. Semi-Supervised Discriminant Analysis // Proc of the International Conference on Computer Vision. Rio de Janeiro, Brazil, 2007: 1-7 |
|
|
|