Abstract:Conventional samplebased mutual information estimation methods can’t handle the mixed features directly that include both numeric attributes and nominal attributes. A Parzen window based general mutual information calculation method, PG method, is proposed in this paper, which could deal with the mixed attributes directly. A criterion named hybrid mutual information (HMI) is presented. Based on PG mutual information estimation method and HMI feature selection criterion, a feature selection algorithm (PGHMI) is proposed. Experimental results show the correctness of PG and the effectiveness of PGHMI.
[1] Piramuthu S. Evaluating Feature Selection Methods for Learning in Data Mining Applications // Proc of the 31st Hawaii International Conference on System Sciences. Kohala Coast, USA, 1998, Ⅴ: 294301 [2] Han J, Kamber M. Data Mining: Concepts and Techniques. New York, USA: Morgan Kaufman, 2000 (Han J, Kamber M. 数据挖掘:概念与技术.范 明,孟小峰,等,译.北京:机械工业出版社, 2001) [3] Battiti R. Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Trans on Neural Networks, 1994, 5(4): 537550 [4] Kwak N, Choi C H. Input Feature Selection for Classification Problems. IEEE Trans on Neural Networks, 2002, 13(1): 143159 [5] Kwak N, Choi C H. Input Feature Selection by Mutual Information Based on Parzen Window. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002, 24(12): 16671671 [6] Peng Hanchuan, Long Fuhui, Ding C. Feature Selection Based on Mutual Information Criteria of MaxDependency, MaxRelevance and MinRedundancy. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(8): 12261238 [7] Chow T W S, Huang D. Estimating Optimal Feature Subsets Using Efficient Estimation of HighDimensional Mutual Information. IEEE Trans on Neural Networks, 2005, 16(1): 213224 [8] Quinlan J R. C4.5: Programs for Machine Learning. San Mateo, USA: Morgan Kaufmann, 1993 [9] Shannon C E, Weaver W. The Mathematical Theory of Communication. Urbana, USA: University of Illinois Press, 1949 [10] Zhu Xuelong. Fundamentals of Applied Information Theory. Beijing, China: Tsinghua University Press, 2000 (in Chinese) (朱雪龙.应用信息论基础.北京:清华大学出版社, 2000) [11] Parzen E. On Estimation of a Probability Density Function and Mode. Annals of Mathematical Statistics, 1962, 33(3): 10651076 [12] Bian Zhaoqi, Zhang Xuegong. Pattern Recognition. Beijing, China: Tsinghua University Press, 2000 (in Chinese) (边肇祺,张学工.模式识别.北京:清华大学出版社, 2000) [13] Silverman B W. Density Estimation for Statistics and Data Analysis. London, UK: Chapman & Hall, 1986 [14] Hettich S, Bay S D. The UCI KDD Archive[DB/OL]. [19990909]. http://kdd.ics.uci.edu [15] The University of Wakato. WEKA Software [CP/OL]. [20051020]. http://www.cs.waikato.ac.nz/~ml/weka