|
|
Lag Correlation Mining Method Based on Improved Boolean Reduction and Layered Series for Big Data Stream |
REN Yonggong, QIAN Haizhen, LANG Hongyu |
School of Computer and Information Technology, Liaoning Normal University, Dalian 116081 |
|
|
Abstract To enhance the efficiency of lag correlation sequences mining for big data stream, a lag correlation mining method based on Boolean reduction and layered series is proposed in this paper. Firstly, by two sequence averages of the original data stream, the big data stream sequence is transformed by the improved Boolean to effectively decrease the computational cost of Boolean reduction. Secondly, through conversion and reduction of sequence elements, the number of the sequence element is reduced. And the proposed method overcomes the drawback of the traditional algorithm in computing lag correlations of all sequence elements. The experiments show the effective reduction in computational time and obvious improvement in computational accuracy of the proposed method.
|
Received: 29 December 2014
|
About author:: 任永功,男,1972年生,博士,教授,主要研究方向为数据库技术、数据挖掘、智能信息计算等.E-mail: ryg@lnnu.edu.cn. (REN Yonggong, born in 1972, Ph.D., professor. His research interests include database technology, data mining and intelligent information calculation.) 钱海振,男,1986年生,硕士研究生,主要研究方向为数据挖掘.E-mail:243127387@qq.com. (QIAN Haizhen, born in 1986, master student. His research interests include data mining.) 郎泓钰,男,1989年生,硕士研究生,主要研究方向为数据挖掘.E-mail:15998416927@163.com. (LANG Hongyu, born in 1989, master student. His research interests include data mining.) |
|
|
|
[1] MUTHUKRISHNAN S. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 2005, 1(2): 117-236. [2] MOUSTAFA A, ABUELNASR B, ABOUGABAL M S. Efficient Mining Fuzzy Association Rules from Ubiquitous Data Streams. Alexandria Engineering Journal, 2015, 54(2): 163-174. [3] ZHU Y Y, SHASHA D. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time // Proc of the 28th International Conference on Very Large Data Bases. Hong Kong, China, 2002: 358-369. [4]SAKURAI Y, PAPADIMITRIOU S, FALOUTSOS C. BRAID: Stream Mining through Group Lag Correlations[EB/OL]. [2014-11-10]. http://www.cs.cmu.edu/~christos/PUBLICATIONS/sigmod05-braid.pdf. [5] SAKURAI Y, FALOUTSOS C, PAPADIMITRIOU S. Fast Discovery of Group Lag Correlations in Streams. ACM Trans on Knowledge Discovery from Data, 2010, 5(1). DOI: 10.1145/1870096.1870101. [6] COLE R, SHASHA D, ZHAO X J. Fast Window Correlations over Uncooperative Time Series // Proc of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Chicago, USA, 2005: 743-749. [7] 张天成.实时数据流相关性分析与挖掘技术研究.博士学位论文.沈阳:东北大学, 2008. (ZHANG T C. Study on Correlation Analysis and Mining in Real-Time Data Streams. Ph.D Dissertation. Shenyang, China: Northeastern University, 2008.) [8] BABCOCK B, BABU S, DATAR M, et al. Models and Issues in Data Stream Systems // Proc of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Madison, USA, 2012: 1-16. [9] HO S S. A Martingale Framework for Concept Change Detection in Time-Varying Data Streams // Proc of the 22nd International Confe-rence on Machine Learning. Bonn, Germany, 2005: 321-327. [10] DVALOS A, SHA N J. Estimating Intraclass Correlation Coefficient and Identifying Influential Observations under One-Way Random Effects Model. Communications in Statistics-Simulation and Computation, 2014, 43(10): 2374-2389. [11] GIANNELLA C, HAN J W, ROBERTSON E, et al. Mining Frequent Itemsets over Arbitrary Time Intervals in Data Streams. Technical Report, TR587. Bloomington, USA: Indiana University, 2003. [12] LI H F, HO C C, SHAN M K, et al. Efficient Maintenance and Mining of Frequent Itemsets over Online Data Streams with a Sli-ding Window // Proc of the IEEE International Conference on System, Man and Cybernetics. Taibei, China, 2006, Ⅲ: 2672-2677. [13] Ghosh B, SEN P. Handbook of Sequential Analysis. New York,USA: Marcel Dekker, 1991. [14] COPPERSMITH D, KUMAR R. An Improved Data Stream Algorithm for Frequency Moments // Proc of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans, USA, 2004: 151-156. [15] GUHA S, MISHRA N, MOTWANI R, et al. Clustering Data Streams // Proc of the 41st Annual Symposium on Foundations of Computer Science. Redondo Beach, USA, 2000: 359-366. [16] DOMINGOS P, HULTEN G. Mining High-Speed Data Stream // Proc of the 6th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining. Ottawa, USA, 2000: 71-80. |
|
|
|