1.Shanghai Key Laboratory of Intelligent Information Processing,School of Computer Science,Fudan University ,Shanghai 200433 2.Department of Computer Science Technology,Tongji University,Shanghai 201804
Abstract:The rapid increase of network applications generates a lot of networked data. Classification in networked data is recently an important research issue of data mining field. The state of the art techniques of classification in networked data is surveyed. Firstly, the basic concepts of classification in networked data are introduced. Then, major classification algorithms of networked data are reviewed in detail. Next, one challenging issue, classification in sparsely labeled networks, is reviewed extensively, and various solutions to this issue are discussed. Finally, the future development and expectation of networked data classification techniques are summarized.
[1] Newman M E J.Scientific Collaboration Networks I: Network Construction and Fundamental Results [EB/OL].[2010-06-16].http://pre.aps.org/abstract/PRE/v64/il/eo16131 [2] Cortes C,Pregibon D,Volinsky T C.Communities of Interest.Intelligent Data Analysis,2002,6(3): 211-219 [3] Sen P,Namata G M,Bilgic M,et al.Collective Classification in Network Data.AI Magazine,2008,29(3): 93-106 [4] Zhang N L,Poole D.A Simple Approach to Bayesian Network Computations // Proc of the 10th Canadian Conference on Artificial Intelligence.Vancouver,Canada,1994: 171-178 [5] Huang C.Inference in Belief Networks: A Procedural Guide.International Journal of Approximate Reasoning,1996,15(3): 225-263 [6] Dechter R.Bucket Elimination: A Unifying Framework for Probabilistic Inference // Proc of the 12th Annual Conference on Uncertainty in Artificial Intelligence.Portland,USA,1996: 211-219 [7] Brush S G.History of the Lenz-Ising Model.Reviews of Modern Physics,1967,39(4): 883-893 [8] Zucker S W,Hummel R A,Rosenfeld A.An Application of Relaxation Labeling to Line and Curve Enhancement.IEEE Trans on Computers,1997,26(4): 394-403 [9] Li Z S,Han Wang,Petrou M.Relaxation Labeling of Markov Random Fields // Proc of the 12th International Conference on Pattern Recognition.Jerusalem,Israel,1994,I: 488-492 [10] Chakrabarti S,Dom B,Indyk P.Enhanced Hypertext Categorization Using Hyperlinks // Proc of the ACM SIGMOD International Conference on Management of Data.Seattle,USA,1998: 307-318 [11] Carvalho R V,Cohen W W.On the Collective Classification of Email Speech Acts // Proc of the 28th ACM SIGIR Annual International Conference on Research and Development in Information Retrieval.Salvador,Brazil,2005: 345-352 [12] Taskar B,Pieter A,Koller D.Discriminative Probabilistic Models for Relational Data // Proc of the 18th Annual Conference on Uncertainty in Artificial Intelligence.Edmonton,Canada,2002: 485-492 [13] Hou Cuiqin,Jiao Licheng.Graph Based Co-Training Algorithm for Web Page Classification.Chinese Journal of Electronics,2009,37(10): 2173-2180 (in Chinese) (侯翠琴,焦李成.基于图的Co-training网页分类.电子学报,2009,37(10): 2173-2180) [14] Zhou Zhihua,Zhang Minling.Multi-Instance Multi-Label Learning with Application to Scene Classification // Proc of the 20th Annual Conference on Neural Information Processing Systems.Vancouver,Canada,2006: 1609-1616 [15] Zhang Minling,Zhou Zhihua.M3MIML: A Maximum Margin Method for Multi-Instance Multi-Label Learning // Proc of the 8th IEEE International Conference on Data Mining.Pisa,Italy,2008: 688-697 [16] Zhou Zhihua,Jiang Kai,Li Ming.Multi-Instance Learning Based Web Mining.Applied Intelligence,2005,22(2): 135-147 [17] Jin Rong,Wang Shijun,Zhou Zhihua.Learning a Distance Metric from Multi-Instance Multi-Label Data // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Miami,USA,2009: 896-902 [18] Li Yingxin,Ji Shuiwang,Kumar S,et al.Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning // Proc of the 21st International Joint Conference on Artificial Intelligence.Pasadena,USA,2009: 1445-1450 [19] Taskar B,Wong M F,Abbeel P,et al.Link Prediction in Relational Data // Proc of the 17th Annual Conference on Neural Information Processing Systems.Vancouver,Canada,2003: 659-666 [20] Tumulty K.Bushs Secret Spy Net Report [EB/OL].[2006-5-25].http://www.time.com/time/archive/preview /010987119402100.html [21] Macskassy S A,Provost F.Suspicion Scoring Based on Guilt-by-Aassociation,Collective Inference and Focused Data Access // Proc of the 1st International Conference on Intelligence Analysis.McLean,USA,2005:352-361 [22] Neville J,Simsek O,Jensen D,et al.Using Relational Knowledge Discovery to Prevent Securities Fraud // Proc of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Chicago,USA,2005: 449- 458 [23] Hill S,Provost F,Volinsky C.Network-Based Marketing: Identifying Likely Adopters via Consumer Networks.Statistical Science,2006,22(2): 256-276 [24] Segal E,Wang H,Koller D.Discovering Molecular Pathways from Protein Interaction and Gene Expression Data.Bioinformatics,2003,19(Z1): 264-272 [25] Lafferty J D,McCallum A,Pereira F C N.Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data // Proc of the 18th International Conference on Machine Learning.Williamstown,USA,2001: 282-289 [26] Chen Lei,Wainwright M,Cetin M,et al.Multitarget-Multisensor Data Association Using the Tree-Reweighted Max-Product Algorithm // Proc of the SPIE,2003,5096: 127-138 [27] Lu Qing,Getoor L.Link-Based Classification // Proc of the 20th International Conference on Machine Learning.Washington,USA,2003: 496-503 [28] Jensen D,Neville J,Gallagher B.Why Collective Inference Improves Relational Classification // Proc of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Seattle,USA,2004: 593- 598 [29] Neville J,Jensen D.Iterative Classification in Relational Data // Proc of the 15th AAAI Workshop on Statistical Relational Learning.Austin,USA,2000: 42-49 [30] McDowell L K,Gupta K M,Aha D W.Cautious Inference in Collective Classification // Proc of the 22nd AAAI Conference on Artificial Intelligence.Vancouver,Canada,2007: 596-601 [31] Friedman N,Getoor L,Koller D,et al.Learning Probabilistic Relational Models // Proc of the 16th International Joint Conference on Artificial Intelligence.Stockholm,Sweden,1999,Ⅱ: 1300-1309 [32] Getoor L.Advanced Methods for Knowledge Discovery from Complex Data.London,UK: Springer-Verlag,2005 [33] Macskassy S,Provost F.Classification in Networked Data: A Toolkit and a Univariate Case Study.Journal of Machine Learning Research,2007,8(1):935-983 [34] Geman S,Geman D.Stochastic Relaxation,Gibbs Distributions and the Bayesian Restoration of Images.IEEE Trans on Pattern Analysis and Machine Intelligence,1984,6(6): 721-741 [35] Macskassy S A,Provost F J.A Simple Relational Classifier // Proc of the 9th ACM SIGKDD Multi- Relational Data Mining Workshop.Washington,USA,2003: 64-76 [36] Liu Dayou,Yu Peng,Gao Ying,et al.Research Progress in Statistical Relational Learning.Journal of Computer Research and Development,2008,45(12): 2110-2119 (in Chinese) (刘大有,于 鹏,高 滢,等.统计关系学习研究进展.计算机研究与发展,2008,45(12): 2110 -2119 ) [37] Koller D,Pfeffer A.Probabilistic Frame-Based Systems // Proc of the 15th National Conference on Artificial Intelligence.Madison,USA,1998: 580-587 [38] Neville J,Jensen D.Relational Dependency Networks.Journal of Machine Learning Research,2007,8(3): 653- 692 [39] Neville J,Jensen D,Friedland L,et al.Learning Relational Probability Trees // Proc of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington,USA,2003: 625-630 [40] Stoer M,Wagner F.A Simple Min-Cut Algorithm.Journal of the ACM,1997,44(4): 585-591 [41] Blau M P.Inequality and Heterogeneity: A Primitive Theory of Social Structure.New York,USA: Free Press,1977 [42] Ahuja R,Orlin J,Magnanti T.Network Flows: Theory,Algorithms,and Applications.Upper Saddle River,USA: Prentice Hall,1992 [43] Kernighan W B,Lin S.An Efficient Heuristic Procedure for Partitioning Graphs.The Bell System Technical Journal,1970,49(1): 291-307 [44] Shi Jianbo,Malik J.Normalized Cuts and Image Segmentation.IEEE Trans on Pattern Analysis and Machine Intelligence,2000,22(8): 888-905 [45] Pearl J.Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.Orlando,USA: Morgan Kaufmann,1988 [46] Macskassy S A.Improving Learning in Networked Data by Combining Explicit and Mined Links // Proc of the 22nd AAAI Conference on Artificial Intelligence.Vancouver,Canada,2007: 590-595 [47] Gallagher B,Tong Hanghang,Eliassi-Rad T,et al.Using Ghost Edges for Classification in Sparsely Labeled Networks // Proc of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Las Vegas,USA,2008: 256-264 [48] Chapelle O,Scholkopf B,Zien A.Semi-Supervised Learning.Cambridge,USA: MIT Press,2006 [49] Zhu Xiaojin,Ghahramani Z,Lafferty J.Semi- Supervised Learning using Gaussian Fields and Harmonic Functions // Proc of the 20th International Conference on Machine Learning.Washington,USA,2003: 912-919 [50] Zhou Dengyong,Bousquet O,Lal T N,et al.Learning with Local and Global Consistency // Proc of the 17th Annual Conference on Neural Information Processing Systems.Vancouver,Canada,2004: 321-328 [51] Long Jun,Yin Jianping,Zhu En,et al.A Survey of Active Learning.Journal of Computer Research and Development,2008,45(Z1): 300-304 (in Chinese) (龙 军,殷建平,祝 恩,等.主动学习研究综述.计算机研究与发展,2008,45(Z1): 300 -304) [52] Cohn D,Atlas L,Ladner R.Improving Generalization with Active Learning.Machine Learning,1994,15(2): 201-221 [53] Lewis D D,Gale W A.A Sequential Algorithm for Training Text Classifiers // Proc of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval.Dublin,Ireland,1994: 3-12 [54] Lewis D,Catlett J.Heterogeneous Uncertainty Sampling for Supervised Learning // Proc of the 11th International Conference on Machine Learning.New Brunswick,USA,1994: 148-156 [55] Seung S H,Opper M,Sompolinsky H.Query by Committee // Proc of the 5th Annual Workshop on Computational Learning Theory.Pittsburgh,USA,1992: 287-294 [56]Roy N,McCallum A.Toward Optimal Active Learning through Sampling Estimation of Error Reduction // Proc of the 18th International Conference on Machine Learning.San Francisco,USA,2001: 441-448 [57] Bilgic M,Getoor L.Link-Based Active Learning // Proc of the 23rd NIPS Workshop on Analyzing Networks and Learning with Graphs.Vancouver,Canada,2009: 136-142 [58] Macskassy S A.Using Graph-Based Metrics with Empirical Risk Minimization to Speed up Active Learning on Networked Data // Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Paris,France,2009: 597-606 [59] McCallum A K,Nigam K,Rennie J,et al.Automating the Construction of Internet Portals with Machine Learning.Information Retrieval,2000,3(2): 127-163 [60] Craven M,Dipasquo D,Freitag D,et al.Learning to Extract Symbolic Knowledge from the World Wide Web // Proc of the 15th National Conference on Artificial Intelligence.Madison,USA,1998: 509-516 [61] Bilgic M,Namata G M,Getoor L.Combining Collective Classification and Link Prediction // Proc of the 7th IEEE International Conference on Data Mining.Omaha,USA,2007: 381-386 [62] Zhang Qianming,Shang Mingsheng,Lu Linyuan.Similarity-Based Classification in Partially Labeled Networks.International Journal of Modern Physics C,2010,21(6): 813-824 [63] Bilgic M,Getoor L.Reflect and Correct: A Misclassification Prediction Approach to Active Inference.ACM Trans on Knowledge Discovery from Data.2009,3(4):1-32