Correlation Analysis on Multidimensional Data Streams Based on Base-Windows
QIAN Jiang-Bo1, WANG Zhi-Jie1, CHEN Hua-Hui1, DONG Yi-Hong1, XIE Zhi-Jun1, WANG Yong-Li2
1.College of Information Science and Engineering,Ningbo University,Ningbo 315211 2.School of Computer Science and Technology,Nanjing University of Science and Technology,Nanjing 210094
Abstract:Multidimensional data stream analysis is seldom studied, even the minor contribution is mainly from the analytical works on a single sliding window model. An on-line correlation analysis algorithm called Base_win_CCA algorithm is presented, which significantly reduces space and time complexity by performing simultaneous correlation analysis on multidimensional data streams. Technically, the algorithm achieves the correlation of multiple windows in a flexible and accurate way by dynamically maintaining statistics data. Theoretical analysis and experimental results indicate that the proposed algorithm is remarkable in performance when the window is larger with sufficient data streams and users.
[1] Zhu Yunyue,Shasha D.StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time // Proc of the 28th International Conference on Very Large Data Bases.Hong Kong,China,2002: 358-369 [2] Vlachos M,Suleyman S,Philip S.Optimal Distance Bounds on Time-Series Data // Proc of the SIAM International Conference on Data Mining.Sparks,USA,2009: 109-120 [3] Kanagal B,Deshpande A.Lineage Processing over Correlated Probabilistic Databases // Proc of the ACM SIGMOD International Conference on Management of Data.Indianapolis,USA,2010: 675-686 [4] Galen R,Liu J,Nath S,et al.Managing Massive Time Series Streams with Multiscale Compressed Trickles // Proc of the 35th International Conference on Very Large Data Bases.Lyon,France,2009: 97-108 [5] Bulut A,Singh A.SWAT: Hierarchical Stream Summarization in Large Networks // Proc of the 19th International Conference on Data Engineering.Bangalore,India,2003: 303-314 [6] Bulut A,Ambuj K,Singh A.A Unified Framework for Monitoring Data Streams in Real Time // Proc of the 21st International Conference on Data Engineering.Tokyo,Japan,2005: 44-55 [7] Sakurai Y.BRAID: Stream Mining through Group Lag Correlations // Proc of the ACM SIGMOD International Conference on Management of Data.Baltimore,USA,2005: 14-16 [8] Mueen A,Nath S,Liu J.Fast Approximate Correlation for Massive Time-Series Data // Proc of the ACM SIGMOD International Conference on Management of Data.Indianapolis,USA,2010: 171-182 [9] Sudipto G,Dimitrios G,Nick K.Correlating Synchronous and Asynchronous Data Streams // Proc of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington,USA,2003: 529-534 [10] Wang Yongli,Xu Hongbing,Dong Yisheng,et al.A Correlation Analysis Algorithm Based on Low-Rank Approximation for Multiple Dimension Data Streams.Chinese Journal of Electronics,2006,35(2): 293-300 (in Chinese) (王永利,徐宏炳,董逸生,等.基于低阶近似的多维数据流相关性分析.电子学报,2006,35(2): 293-300) [11] Richard A J,Dean W W.Applied Multivariate Statistical Analysis.6th Edition.New York,USA: Prentice Hall,2007