Interestingness Rule Mining Algorithm Based on Information Entropy
JIN Zhou1,2, WANG Ru-Jing1
1Bionic Computing and Intelligent Decision Laboratory, Institute of Intelligent Machines,Chinese Academy of Sciences, Hefei 230031 2Department of Automation, University of Science and Technology of China, Hefei 230026
Abstract:With the development of data collection and storage techniques, excessive and unorderly rules are generated by traditional association rule mining, which can not meet interest of users. To solve this problem, an interestingness measure of association rules based on information entropy is proposed to mine interestingness association rules. Correlation analysis for categorical variables is adopted to eliminate false and erroneous rules from the primitive set, and a framework for evaluating the interestingness degree of rules based on information entropy is proposed. Since the method does not depend on the prior knowledge of users, it can represent the information hidden in the data accurately. Simulation results on both real and synthetic datasets show that the proposed algorithm performs better than the traditional algorithms, and it discovers interestingness rules from large database efficiently.
金洲,王儒敬. 基于信息熵的兴趣度规则挖掘算法*[J]. 模式识别与人工智能, 2014, 27(6): 524-532.
JIN Zhou, WANG Ru-Jing. Interestingness Rule Mining Algorithm Based on Information Entropy. , 2014, 27(6): 524-532.
[1] Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules in Large Databases // Proc of the International Conference on Very Large Data Bases. San Francisco, USA, 1994: 487-499 [2] Savasers A, Omiecinski E, Navathe S B. An Efficient Algorithm for Mining Association Rules in Large Databases // Proc of the International Conference on Very Large Data Bases. Zurich, Switzerland, 1995: 432-443 [3] Nagpal S. Improved Apriori Algorithm Using Logarithmic Decoding and Pruning. International Journal of Engineering Research and Application, 2012, 2(3): 2569-2572 [4] Han J W, Pei J, Yin Y W, et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery, 2000, 8(1): 53-87 [5] Fan M, Li C. Mining Frequent Patterns in an FP-tree without Conditional FP-tree Generation. Journal of Computer Research and Development, 2003, 40(8): 1216-1222 (in Chinese) (范 明,李 川.在FP-树中挖掘频繁模式而不生成条件FP-树.计算机研究与发展, 2003, 40(8): 1216-1222) [6] Grahne G, Zhu J. Fast Algorithms for Frequent Itemset Mining Using FP-Trees. IEEE Trans on Knowledge and Data Engineering, 2005, 17(10): 1347-1362 [7] Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques. 3rd Edition. San Francisco, USA: Morgan Kaufmann, 2011 [8] Mcgarry K. A Survey of Interestingness Measures for Knowledge Discovery. The Knowledge Engineering Review, 2005, 20(1): 39-61 [9] Geng L Q, Hamilton H J. Interestingness Measure for Data Mining: A Survey. ACM Computing Surveys, 2006, 38(3): 1-32 [10] Tan P N, Kumar V. Interestingness Measures for Association Patterns: A Perspective. Technical Report, TR00-36. Minneapolis, USA: University of Minnesota, 2000 [11] McNicholas P D, Murphyb T B, O'Reganc M. Standardising the Lift of an Association Rule. Computational Statistics and Data Analysis, 2008, 52(10): 4712-4721 [12] Hébert C, Crémilleux B. A Unified View of Objective Interestingness Measures // Proc of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition. Leipzig, Germany, 2007: 533-547 [13] Jiménez A, Berzal F, Cubero J C. Interestingness Measures for Association Rules within Groups. Intelligent Data Analysis, 2013, 17(2): 195-215 [14] Tan P N, Kumar V, Srivastava J. Selecting the Right Objective Measure for Association Analysis. Information Systems, 2004, 29(4): 293-313 [15] Silberschatz A, Tuzhilin A. On Subjective Measures of Interestingness in Knowledge Discovery // Proc of the 1st International Conference on Knowledge Discovery and Data Mining. Montreal, Canada, 1995: 275-281 [16] Livingston G, Vozalis E. What's New? Using Prior Models as a Measure of Novelty in Knowledge Discovery // Proc of the 12th IEEE International Conference on Tools with Artificial Intelligence. Vancouver, Canada, 2000: 86-89 [17] Kontonasios K N, Spyropoulou E, Bie T D. Knowledge Discovery Interestingness Measures Based on Unexpectedness. Wiley Interdisciplinary Review: Data Mining and Knowledge Discovery, 2012, 2(5): 386-399 [18] Yao Y Y, Chen Y H, Yang X D. A Measurement-Theoretic Foundation of Rule Interestingness Evaluation. Foundations and Novel Approaches in Data Mining, 2006, 9: 41-59 [19] Erwin A, Gopalan R P, Achuthan N R. Efficient Mining of High-Utility Itemsets from Large Datasets // Proc of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, 2008: 554-561 [20] Lenca P, Meyer P, Vailant B, et al. On Selecting Interestingness Measures for Association Rules: User Oriented Description and Multiple Criteria Decision Aid. European Journal of Operational Research, 2008, 184(2): 610-626 [21] Zimmermann A, Raedt L D. CorClass: Correlated Association Rule Mining for Classification // Proc of the 7th International Conference on Discovery Science. Padova, Italy, 2004: 60-72 [22] Liu G M, Lu H J, Yu J X. AFOPT: An Efficient Implementation of Pattern Growth Approach // Proc of the 3rd ICDM Workshop on Frequent Itemset Mining Implementations. Melbourne, USA, 2003: 113-122 [23] Borgelt C. Efficient Implementations of Apriori and Eclat // Proc of the 3rd ICDM Workshop on Frequent Itemset Mining Implementations. Melbourne, USA, 2003: 24-32