Classification Approach by Mining Betweenness Information beyond Data Points Themselves
GU Suhang1,2,3, WANG Shitong1,2
1.Institute of Digital Media, Jiangnan University, Wuxi 214122 2.Jiangsu Key Laboratory of Media Design and Software Techno-logy, Jiangnan University, Wuxi 214122 3.Institute of Information Engineering and Technology, Changzhou Vocational Institute of Light Industry, Changzhou 213164
Abstract:Mining useful information beyond data points themselves to guide and improve the accuracy of data classification is a subject worthy of study. In this paper, a network is firstly built to characterize the whole dataset, and the side information about the betweenness information between every pair of data points is mined in the network, and then the efficiency of subnetworks and the influence value of each network node are computed in an iterative way with the concept of density. With the mined inside information, a classification approach by mining betweenness information beyond data points themselves(CA-MBI) is developed. CA-MBI exhibits low time complexity with a high accuracy of data classification.The experimental results on synthetic and real-world datasets demonstrate the superior performance of CA-MBI compared with several benchmarking classification algorithms.
[1] SILVA T C, ZHAO L. Network-Based High Level Data Classification. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(6): 954-970. [2] CARNEIRO M G, ZHAO L. Organizational Data Classification Ba-sed on the Importance Concept of Complex Networks. IEEE Transactions on Neural Networks and Learning Systems, 2017. DOI: 10.1109/TNNLS.2017.2726082. [3] LI Y L, WANG S J, TIAN Q, et al. A Boosting Approach to Exploit Instance Correlations for Multi-instance Classification. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(12): 2740-2747. [4] SHANG F H, LIU Y Y, CHENG J, et al. Fuzzy Double Trace Norm Minimization for Recommendation Systems. IEEE Transactions on Fuzzy Systems, 2017. DOI: 10.1109/ITFUZZ.2017.276087. [5] WU Q Y, YE Y M, ZHANG H J, et al. ML-Tree: A Tree-Structure-Based Approach to Multilabel Learning. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(3): 430-443. [6] BARABSI A L, ALBERT R. Emergence of Scaling in Random Networks. Science, 1999, 286(5439): 509-512. [7] WATTS D J, STROGATZ S H. Collective Dynamics of ′Small-World′ Networks. Nature, 1998, 393(6684): 440-442. [8] NEWMAN M E J. The Structure and Function of Complex Networks. SIAM Review, 2003, 45(2): 167-256. [9] GONG C, LIU T L, TAO D C, et al. Deformed Graph Laplacian for Semi-supervised Learning. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2261-2274. [10] CULP M, MICHAILIDIS G. Graph-Based Semisupervised Lear-ning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 6(1): 17-26. [11] BIRX D L, PIPENBERG S J. A Complex Mapping Network for Phase Sensitive Classification. IEEE Transactions on Neural Networks, 1993, 4(1): 127-135. [12] WANG M, FU W J, HAO S J, et al. Learning on Big Graph: Label Inference and Regularization with Anchor Hierarchy. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(5):1101-1114. [13] HU L, CHAN K C C. Fuzzy Clustering in a Complex Network Based on Content Relevance and Link Structures. IEEE Transactions on Fuzzy Systems, 2016, 24(2): 456-470. [14] LANCICHINETTI A, FORTUNATO S. Consensus Clustering in Complex Networks. Scientific Reports, 2012, 2(13): 336-342. [15] JR BERTINI J R, ZHAO L, MOTTA R, et al. A Nonparametric Classification Method Based on K-associated Graphs. Information Sciences, 2011, 181(24): 5434-5456. [16] SILVA T C, ZHAO L, et al. High-Level Pattern-Based Classification via Tourist Walks in Networks. Information Sciences, 2015, 294: 109-126. [17] ABEYWICKRAMA T, CHEEMA M A. Efficient Landmark-Based Candidate Generation for kNN Queries on Road Networks // Proc of the 22th International Conference on Database Systems for Advanced Applications. Berlin, Germany: Springer, 2017, II: 425-440. [18] LIU Q, XIANG B, JUAN N J, et al. An Influence Propagation View of PageRank. ACM Transactions on Knowledge Discovery from Data, 2017, 11(3). DOI: 10.1145/3046941. [19] BOLDI P, SANTINI M, VIGNA S. PageRank as a Function of the Damping Factor // Proc of the 14th International Conference on World Wide Web. New York, USA: ACM, 2005: 557-566. [20] RODRIGUEZ A, LAIO A. Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496. [21] TSANG I W H, KWOK J T Y, ZURADA J M. Generalized Core Vector Machines. IEEE Transactions on Neural Networks, 2006, 17(5): 1126-1140. [22] ZHOU T, CHUNG F L, WANG S T. Deep TSK Fuzzy Classifier with Stacked Generalization and Triplely Concise Interpretability Guarantee for Large Data. IEEE Transactions on Fuzzy Systems, 2017, 25(5): 1207-1221. [23] GU X Q, CHUNG F L, ISHIBUCHI H, et al. Imbalanced TSK Fuzzy Classifier by Cross-Class Bayesian Fuzzy Clustering and Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics(Systems), 2017, 47(8): 2005-2020. [24] CHANG C C, LIN C J. LIBSVM: A Library for Support Vector Machines[J/OL]. [2017-10-21].https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf. [25] 王士同,钟富礼.最小学习机.江南大学学报(自然科学版), 2010, 9(5): 505-510. (WANG S T, ZHONG F L. On Least Learning Machine. Journal of Jiangnan University(Natural Science Edition), 2010, 9(5): 505-510.) [26] CHEN J, FANG H R, SAAD Y. Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection. Journal of Machine Learning Research, 2009, 10: 1989-2012. [27] LATORA V, MARCHIORI M. Efficient Behavior of Small-World Networks. Physical Review Letters, 2001. DOI: 10.1103/PhysRevLett.87.198701.