Abstract:To enhance the efficiency of lag correlation sequences mining for big data stream, a lag correlation mining method based on Boolean reduction and layered series is proposed in this paper. Firstly, by two sequence averages of the original data stream, the big data stream sequence is transformed by the improved Boolean to effectively decrease the computational cost of Boolean reduction. Secondly, through conversion and reduction of sequence elements, the number of the sequence element is reduced. And the proposed method overcomes the drawback of the traditional algorithm in computing lag correlations of all sequence elements. The experiments show the effective reduction in computational time and obvious improvement in computational accuracy of the proposed method.
任永功,钱海振,郎泓钰. 基于改进布尔约减级数分层的大数据流滞后相关性挖掘方法*[J]. 模式识别与人工智能, 2016, 29(5): 455-463.
REN Yonggong, QIAN Haizhen, LANG Hongyu. Lag Correlation Mining Method Based on Improved Boolean Reduction and Layered Series for Big Data Stream. , 2016, 29(5): 455-463.
[1] MUTHUKRISHNAN S. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 2005, 1(2): 117-236. [2] MOUSTAFA A, ABUELNASR B, ABOUGABAL M S. Efficient Mining Fuzzy Association Rules from Ubiquitous Data Streams. Alexandria Engineering Journal, 2015, 54(2): 163-174. [3] ZHU Y Y, SHASHA D. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time // Proc of the 28th International Conference on Very Large Data Bases. Hong Kong, China, 2002: 358-369. [4]SAKURAI Y, PAPADIMITRIOU S, FALOUTSOS C. BRAID: Stream Mining through Group Lag Correlations[EB/OL]. [2014-11-10]. http://www.cs.cmu.edu/~christos/PUBLICATIONS/sigmod05-braid.pdf. [5] SAKURAI Y, FALOUTSOS C, PAPADIMITRIOU S. Fast Discovery of Group Lag Correlations in Streams. ACM Trans on Knowledge Discovery from Data, 2010, 5(1). DOI: 10.1145/1870096.1870101. [6] COLE R, SHASHA D, ZHAO X J. Fast Window Correlations over Uncooperative Time Series // Proc of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Chicago, USA, 2005: 743-749. [7] 张天成.实时数据流相关性分析与挖掘技术研究.博士学位论文.沈阳:东北大学, 2008. (ZHANG T C. Study on Correlation Analysis and Mining in Real-Time Data Streams. Ph.D Dissertation. Shenyang, China: Northeastern University, 2008.) [8] BABCOCK B, BABU S, DATAR M, et al. Models and Issues in Data Stream Systems // Proc of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Madison, USA, 2012: 1-16. [9] HO S S. A Martingale Framework for Concept Change Detection in Time-Varying Data Streams // Proc of the 22nd International Confe-rence on Machine Learning. Bonn, Germany, 2005: 321-327. [10] DVALOS A, SHA N J. Estimating Intraclass Correlation Coefficient and Identifying Influential Observations under One-Way Random Effects Model. Communications in Statistics-Simulation and Computation, 2014, 43(10): 2374-2389. [11] GIANNELLA C, HAN J W, ROBERTSON E, et al. Mining Frequent Itemsets over Arbitrary Time Intervals in Data Streams. Technical Report, TR587. Bloomington, USA: Indiana University, 2003. [12] LI H F, HO C C, SHAN M K, et al. Efficient Maintenance and Mining of Frequent Itemsets over Online Data Streams with a Sli-ding Window // Proc of the IEEE International Conference on System, Man and Cybernetics. Taibei, China, 2006, Ⅲ: 2672-2677. [13] Ghosh B, SEN P. Handbook of Sequential Analysis. New York,USA: Marcel Dekker, 1991. [14] COPPERSMITH D, KUMAR R. An Improved Data Stream Algorithm for Frequency Moments // Proc of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans, USA, 2004: 151-156. [15] GUHA S, MISHRA N, MOTWANI R, et al. Clustering Data Streams // Proc of the 41st Annual Symposium on Foundations of Computer Science. Redondo Beach, USA, 2000: 359-366. [16] DOMINGOS P, HULTEN G. Mining High-Speed Data Stream // Proc of the 6th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining. Ottawa, USA, 2000: 71-80.