Abstract:Tree Augmented Naive Bayesian Classifier (TAN) often outperforms Naive Bayesian, yet at the same time maintains the computational simplicity and robustness that characterize Naive Bayesian. But TAN often requires a prior discretization of continuous variables. It is important to investigate mixedmode data, in order to represent data distributions well and avoid the problem of information loss. In this paper, the maximum likelihood function of hybrid data is deduced, and a new classifier called Extended Tree Augmented Naive Bayesian Classifier (ETAN) is put forward. The proposed classifier breaks through the restriction that continuous variables must be discretized, and it can deal with hybrid variables in the framework of TAN. Experiments show that this classifier has a good accuracy of classification.
[1] Langley P, Iba W, Thompson K. An Analysis of Bayesian Classifiers. In: Proc of the 10th National Conference on Artificial Intelligence. San Jose, USA: AAAI Press, 1992, 223-228 [2] Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers. Machine Learning, 1997, 29(2-3): 131-163 [3] Neapolitan R E. Learning Bayesian Networks. Upper Saddle River, USA: Prentice Hall, 2003 [4] Chickering D M, Geiger D, Heckerman D. Learning Bayesian Networks is NP-Complete. In: Fisher D H, Lenz H J, eds. Learning from Data: Artificial Intelligence and Statistics. New York, USA: Springer-Verlag, 1996, 121-130 [5] Geiger D. An Entropy-Based Learning Algorithm of Bayesian Conditional Trees. In: Proc of the 8th Annual Conference on Uncertainty in Artificial Intelligence. San Mateo, USA: Morgan Kaufmann, 1992, 92-97 [6] Chow C K, Liu C N. Approximating Discrete Probability Distributions with Dependence Trees. IEEE Trans on Information Theory, 1968, 14(3): 462-467 [7] Tarjan R E. Finding Optimal Branchings. Networks, 1977, 7: 25-35 [8] Johnson R A, Wichern D W. Applied Multivariate Statistical Analysis. 4th Edition. Upper Saddle River, USA: Prentice Hall, 1998 (Johnson R A, Wichern D W,著;陆璇,译. 实用多元统计分析. 北京: 清华大学出版社, 2001) [9] Murphy P M, Aha D W. UCI Repository of Machine Learning Database. http://www.ics.uci.edu/~mlearn/MLRepository.html [10] Fayyad U M, Irani K B. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proc of the 13th International Joint Conference on Artificial Intelligence. San Mateo, USA: Morgan Kaufmann, 1993, 1022-1027 [11] Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 2nd Edition. Upper Saddle River, USA: Prentice Hall, 1998 (Russell S, Norvig P,著;姜 哲,等,译.人工智能——一种现代方法.第2版.北京:人民邮电出版社, 2004) [12] Monti S, Cooper G F. Learning Hybrid Bayesian Networks from Data. In: Jordan M I, ed. Learning in Graphical Models. Dordrecht, Netherlands: Kluwer Academic Publishers, 1998, 521-540 [13] Richard O D, Peter E H, David G S. Pattern Classification. 2nd Edition. New York, USA: John Wiley & Sons, 2001 (Richard O D, Peter E H, David G S,著; 李宏东,等,译.模式分类.第2版.北京:机械工业出版社, 2003)