Abstract:Multi-valued and multi-labeled classifier (MMC) and multi-valued and multi-labeled decision tree (MMDT) are two existing decision tree algorithms for dealing with multi-valued and multi-labeled data . Based on the two algorithms, formula sim3 is put forward to calculate the similarity between two label sets. By amending the measuring formula of samebased similarity of labelsets in MMDT, a new decision tree algorithm, similarity of same and consistent in constructing same in predicting (SCC_SP) is proposed with comprehensive consideration of both similarity and appropriateness of the label set. Results of contrast experiments with the same prediction mechanism show that SCC_SP has higher accuracy rate than MMDT.
[1] Han Juo, Kamber M. Data Mining Concept and Techniques. Los Altos, USA: Morgan Kaufmann Publishers, 2001 [2] Shafer J C, Agrawal R, Mehta M. SPRINT: A Scalable Parallel Classifier for Data Mining // Proc of the 22nd International Conference on Very Large Databases. Mumbai, India, 1996: 544-555 [3] Chen Y L, Hsu C L, Chou S C. Constructing a Multi-Valued and Multi-Labeled Decision Tree. Expert Systems with Applications, 2003, 25(2): 199-209 [4] Chou S C, Hsu C L. MMDT: A Multi-Valued and Multi-Labeled Decision Tree Classifier for Data Mining. Expert Systems with Applications, 2005, 28(2): 799-812 [5] de Mantaras R L. A Distance-Based Attribute Selection Measure for Decision Tree Induction. Machine Learning, 1991, 5(6): 81-92 [6] Agrawal R, Ghosh S, Imielinski T, et al. An Interval Classifier for Database Mining Applications // Proc of the 18th International Conference on Very Large Databases. Vancouver, USA, 1992: 560-573 [7] Ruggieri S. Efficient C4.5. IEEE Trans on Knowledge and Data Engineering, 2002, 14(2): 438-444 [8] Wang H X, Zaniolo C. CMP: A Fast Decision Tree Classifier Using Multivariate Predictions // Proc of the 16th International Conference on Data Engineering. San Diego, USA, 2000: 449-460