基于改进布尔约减级数分层的大数据流滞后相关性挖掘方法<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201605009

摘要
图/表
参考文献
相关文章 (0)

全文: PDF (717 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要为了提高大数据流滞后相关性序列挖掘效率，提出基于改进布尔约减级数分层的大数据流滞后相关性挖掘方法。该方法根据原数据流两段序列的序列均值对大数据流序列进行布尔变换，有效降低布尔约减计算开销。通过序列元素转换及还原，缩减序列元素的数目，克服传统算法在滞后相关性计算时需要计算所有数据流序列元素之间滞后相关性的弊端。实验表明，文中方法可有效减少运算时间，在保证精度的同时提高运算效率。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	任永功
	钱海振
	郎泓钰

关键词 ：改进布尔约减, 大数据流, 滑动窗口, 滞后相关性, 级数分层

Abstract：To enhance the efficiency of lag correlation sequences mining for big data stream, a lag correlation mining method based on Boolean reduction and layered series is proposed in this paper. Firstly, by two sequence averages of the original data stream, the big data stream sequence is transformed by the improved Boolean to effectively decrease the computational cost of Boolean reduction. Secondly, through conversion and reduction of sequence elements, the number of the sequence element is reduced. And the proposed method overcomes the drawback of the traditional algorithm in computing lag correlations of all sequence elements. The experiments show the effective reduction in computational time and obvious improvement in computational accuracy of the proposed method.

Key words： Improved Boolean Reduction Big Data Stream Sliding Window Lag Correlation Layered Series

收稿日期: 2014-12-29

基金资助:国家自然科学基金项目(No.F020806)、辽宁省自然科学基金项目(No.201202119)、辽宁省科学计划项目(No.2013405003)、大连市科技计划项目(No.2013A16GX116)资助

作者简介: 任永功，男，1972年生，博士，教授，主要研究方向为数据库技术、数据挖掘、智能信息计算等.E-mail:ryg@lnnu.edu.cn.
钱海振，男，1986年生，硕士研究生，主要研究方向为数据挖掘.E-mail:243127387@qq.com.
郎泓钰，男，1989年生，硕士研究生，主要研究方向为数据挖掘.E-mail:15998416927@163.com.

引用本文:

任永功，钱海振，郎泓钰. 基于改进布尔约减级数分层的大数据流滞后相关性挖掘方法^*[J]. 模式识别与人工智能, 2016, 29(5): 455-463. REN Yonggong, QIAN Haizhen, LANG Hongyu. Lag Correlation Mining Method Based on Improved Boolean Reduction and Layered Series for Big Data Stream. , 2016, 29(5): 455-463.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201605009 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2016/V29/I5/455

[1] MUTHUKRISHNAN S. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 2005, 1(2): 117-236.
[2] MOUSTAFA A, ABUELNASR B, ABOUGABAL M S. Efficient Mining Fuzzy Association Rules from Ubiquitous Data Streams. Alexandria Engineering Journal, 2015, 54(2): 163-174.
[3] ZHU Y Y, SHASHA D. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time // Proc of the 28th International Conference on Very Large Data Bases. Hong Kong, China, 2002: 358-369.
[4]SAKURAI Y, PAPADIMITRIOU S, FALOUTSOS C. BRAID: Stream Mining through Group Lag Correlations[EB/OL]. [2014-11-10]. http://www.cs.cmu.edu/~christos/PUBLICATIONS/sigmod05-braid.pdf.
[5] SAKURAI Y, FALOUTSOS C, PAPADIMITRIOU S. Fast Discovery
of Group Lag Correlations in Streams. ACM Trans on Knowledge Discovery from Data, 2010, 5(1). DOI: 10.1145/1870096.1870101.
[6] COLE R, SHASHA D, ZHAO X J. Fast Window Correlations over Uncooperative Time Series // Proc of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Chicago, USA, 2005: 743-749.
[7] 张天成.实时数据流相关性分析与挖掘技术研究.博士学位论文.沈阳:东北大学, 2008.
(ZHANG T C. Study on Correlation Analysis and Mining in Real-Time Data Streams. Ph.D Dissertation. Shenyang, China: Northeastern University, 2008.)
[8] BABCOCK B, BABU S, DATAR M, et al. Models and Issues in Data Stream Systems // Proc of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Madison, USA, 2012: 1-16.
[9] HO S S. A Martingale Framework for Concept Change Detection in Time-Varying Data Streams // Proc of the 22nd International Confe-rence on Machine Learning. Bonn, Germany, 2005: 321-327.
[10] DVALOS A, SHA N J. Estimating Intraclass Correlation Coefficient and Identifying Influential Observations under One-Way Random Effects Model. Communications in Statistics-Simulation and Computation, 2014, 43(10): 2374-2389.
[11] GIANNELLA C, HAN J W, ROBERTSON E, et al. Mining Frequent Itemsets over Arbitrary Time Intervals in Data Streams. Technical Report, TR587. Bloomington, USA: Indiana University, 2003.
[12] LI H F, HO C C, SHAN M K, et al. Efficient Maintenance and Mining of Frequent Itemsets over Online Data Streams with a Sli-ding Window // Proc of the IEEE International Conference on System, Man and Cybernetics. Taibei, China, 2006, Ⅲ: 2672-2677.
[13] Ghosh B, SEN P. Handbook of Sequential Analysis. New York,USA: Marcel Dekker, 1991.
[14] COPPERSMITH D, KUMAR R. An Improved Data Stream Algorithm for Frequency Moments // Proc of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans, USA, 2004: 151-156.
[15] GUHA S, MISHRA N, MOTWANI R, et al. Clustering Data Streams // Proc of the 41st Annual Symposium on Foundations of Computer Science. Redondo Beach, USA, 2000: 359-366.
[16] DOMINGOS P, HULTEN G. Mining High-Speed Data Stream // Proc of the 6th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining. Ottawa, USA, 2000: 71-80.