Abstract Traditional incremental methods mainly focus on the attribute reduction from the perspective of updating approximation. However, while processing large-scale data sets, the methods need to evaluate all attributes and calculate importance repeatedly. Thus, time complexity is increased and efficiency is decreased. To solve the problems, an incremental acceleration strategy for parallelization based on attribute tree is proposed. The key step is to cluster all attributes into multiple attribute trees for parallel dynamic attribute evaluation. Firstly, an appropriate attribute tree is selected for attribute evaluation according to the attribute tree correlation measure to reduce the time complexity. Then, the branch coefficient is added to the stop criterion, and the dynamic increase is conducted with the increase of the branch depth. Consequently, the algorithm can jump out of the cycle automatically after reaching the maximum threshold to avoid the original redundant calculation and improve the efficiency effectively. Based on the above improvements, an incremental dynamic attribute reduction algorithm based on attribute tree is proposed, and a parallel incremental dynamic attribute reduction algorithm based on attribute tree is designed by being combined with Spark parallel mechanism. Finally, experiments on multiple datasets show that the proposed algorithm improves the search efficiency of dynamically variational dataset reduction significantly while maintaining the classification performance, holding a better performance advantage.
DING Weiping, Ph.D., professor. His research interests include data mining, machine learning, granular computing, evolutionary computing and big data analytics.
About author:: QIN Tingzhen, master student. His research interests include data mining, granular computing and rough sets.JU Hengrong, Ph.D., associate professor. His research interests include granular computing, rough sets, machine learning and know-ledge discovery.LI Ming, master. His research interests include data mining, granular computing and big data analytics.HUANG Jiashuang, Ph.D., associate pro-fessor. His research interests include brain network analysis and deep learning.CHEN Yuepeng, Ph.D. candidate. His research interests include granular computing, rough sets and machine learning.WANG Haipeng, master student. His research interests include fuzzy theory, granular computing and deep learning.
