Learning Bayesian Networks Structure with Hidden Variables
WANG ShuangCheng1,2, LIU XiHua3, TANG HaiYan2
1.Department of Information Science, Shanghai Lixin University of Commerce, Shanghai 201620 2.Risk Management Research Institute, Shanghai Lixin University of Commerce, Shanghai 201620 3.Economic Institute, Qingdao University, Qingdao 266071
Abstract:At present the method of learning Bayesian network structure with hidden variables is mainly based on the scoringsearch method combined with EM algorithm. But it is inefficient and unreliable. A new method of learning Bayesian network structure with hidden variables is presented. In this method, the Bayesian network structure without hidden variables is set up based on basic dependency relationship between variables and basic structures between nodes and dependency analysis idea. Hidden variables are found in terms of the dimension of cliques in the moral graph of Bayesian network. The value, the dimension and the local structure of hidden variables are made based on dependency strcture between variables, Gibbs sampling and MDL criterion. The method can avoide the exponential complexity of standard Gibbs sampling and the main problems of the existing algorithm of learning Bayesian network structure with hidden variables. Experimental results show that this algorithm can effectively learn Bayesian network strcture with hidden variables.
[1] Heckerman D. Bayesian Networks for Data Mining. Data Mining and Knowledge Discovery, 1997, 1(1): 79-119 [2] Friedman N. Learning Belief Networks in the Presence of Missing Values and Hidden Variables // Proc of the 14th International Conference on Machine Learning. Porto, Portugal, 1997: 125-133 [3] Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, USA: Morgan Kaufmann, 1988: 383-408 [4] Chickering D M, Geiger D, Heckerman D E. Learning Bayesian Networks is NP-Hard. Technical Report, MSR-TR-94-17. Redmond, USA: Microsoft Research, 1994 [5] Binder J, Koller D, Russell S, et al. Adaptive Probabilistic Networks with Hidden Variables. Machine Learning, 1997, 29(2/3): 213-244 [6] Geiger D, Heckerman D, Meek C. Asymptotic Model Selection for Directed Networks with Hidden Variables // Proc of the 12th Conference on Uncertainty in Artificial Intelligence. Portland, USA, 1996: 283-290 [7] Elidan G, Lotner N, Friedman N, et al. Discovering Hidden Variables: A Structure-Based Approach // Leen T K, Dictterich T G, Tresp V, eds. Advances in Neural Information Processing Systems 13. Cambridge, USA: MIT Press, 2000: 479-485 [8] Zhang N L. Hierarchical Latent Class Models for Cluster Analysis. Journal of Machine Learning Research, 2004, 5(6): 697-723 [9] Elidan G, Friedman N. Learning Hidden Variable Networks: The Information Bottleneck Approach. Journal of Machine Learning Research, 2005, 6(1): 81-127 [10] Mao Shisong, Wang Jinglong, Pu Xiaolong. Advanced Mathematical Statistics. Beijing, China: China Higher Education Press, 1998: 401-459 (in Chinese) (茆诗松, 王静龙, 濮晓龙. 高等数理统计. 北京: 高等教育出版社, 1998: 401-459) [11] Geman S, Geman D. Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images. IEEE Trans on Pattern Analysis and Machine Intelligence, 1984, 6(6): 721-741 [12] Wang Shuangcheng, Yuan Senmiao. Learning Decomposable Markov Network Structure with Missing Data. Chinese Journal of Computers, 2004, 27(9): 1221-1228 (in Chinese) (王双成, 苑森淼. 具有丢失数据的可分解马尔科夫网络结构学习. 计算机学报, 2004, 27(9): 1221-1228) [13] Wang Shuangcheng, Yuan Senmiao. Research on Learning Bayesian Networks Structure with Missing Data. Journal of Software, 2004, 15(7): 1042-1048 (in Chinese) (王双成, 苑森淼. 具有丢失数据的贝叶斯网络结构学习研究. 软件学报, 2004, 15(7): 1042-1048) [14] Cheng J, Bell D, Liu W R. Learning Bayesian Networks from Data: An Efficient Approach Based on Information Theory. Artificial Intelligence, 2002, 137 (1/2): 43-90 [15] Chickering D M. Learning Equivalence Classes of Bayesian Network Structures. Journal of Machine Learning Research, 2002, 2(3): 445-498 [16] Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers. Machine Learning, 1997, 29(2/3): 131-163