Abstract:The existing algorithms can not solve the clustering problems for uncertain data stream from the perspective of temporal evolution. An evolutionary clustering algorithm based on affinity propagation for uncertain data stream (EAP-UStream) is presented. A concept of change rate of uncertain micro-cluster is put forward with the consideration of the influence of the varying factors caused by the procedure of online uncertain data stream forming the micro-clusters on offline clustering. The degree of similarity between the micro-clusters is measured in terms of uncertain data stream evolution. A concept of coupling degree of uncertain micro-clusters is proposed. Thus, the uncertain similarity matrix is constructed, and evolutionary clustering for uncertain data stream is realized with the idea of affinity propagation. The experimental results show the effectiveness of EAP-UStream.
[1] AGGARWAL C C, YU P S. A Framework for Clustering Uncertain Data Streams // Proc of the 24th IEEE International Conference on Data Engineering. Cancun, Mexico, 2008: 150-159. [2] AGGARWAL C C. On High Dimensional Projected Clustering of Uncertain Data Streams // Proc of the 25th IEEE International Conference on Data Engineering. Shanghai, China, 2009: 1152-1154. [3] ZHANG C, GAO M, ZHOU A Y. Tracking High Quality Clusters over Uncertain Data Streams // Proc of the 25th IEEE International Conference on Data Engineering. Shanghai, China, 2009: 1641-1648. [4] 戴东波,赵 杠,孙圣力.基于概率数据流的有效聚类算法.软件学报, 2009, 20(5): 1313-1328. (DAI D B, ZHAO G, SUN S L. Effective Clustering Algorithm for Probabilistic Data Stream. Journal of Software, 2009, 20(5): 1313-1328.) [5] 张 晨,金澈清,周傲英.一种不确定数据流聚类算法.软件学报, 2010, 21(9): 2173-2182. (ZHANG C, JIN C Q, ZHOU A Y. Clustering Algorithm over Uncertain Data Stream. Journal of Software, 2010, 21(9): 2173-2182.) [6] 罗清华,彭 宇,彭喜元.一种多维不确定性数据流聚类算法.仪器仪表学报, 2013, 34(6): 1330-1338. (LUO Q H, PENG Y, PENG X Y. Multi-dimensional Uncertain Data Stream Clustering Algorithm. Chinese Journal of Scientific Instrument, 2013, 34(6): 1330-1338.) [7] JIN C Q, YU J X, ZHOU A Y, et al. Efficient Clustering of Uncertain Data Streams. Knowledge and Information Systems, 2014, 40(3): 509-539. [8] LAMMERSEN C, SCHMIDT M, SOHLER C. Probabilistic K-Median Clustering in Data Streams. Theory of Computing Systems, 2015, 56(1): 251-290. [9] ZHANG X L, FURTLEHNER C, GERMAIN-RENAUD C, et al. Data Stream Clustering with Affinity Propagation. IEEE Trans on Knowledge and Data Engineering, 2013, 26(7): 1644-1656. [10] LUO Q H, YAN X Z, LI J B, et al. DDEUDSC: A Dynamic Distance Estimation Using Uncertain Data Stream Clustering in Mobile Wireless Sensor Networks. Measurement, 2014, 55: 423-433. [11] 闫雷鸣,孙志挥,吴英杰,等.联合聚类非线性相关的时序基因表达数据.计算机研究与发展, 2008, 45(11): 1865-1873. (YAN L M, SUN Z H, WU Y J, et al. Biclustering Nonlinearly Correlated Time Series Gene Expression Data. Journal of Computer Research and Development, 2008, 45(11): 1865-1873.) [12] MUEEN A, KEOGH E. Online Discovery and Maintenance of Time Series Motifs // Proc of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, USA, 2010: 1089-1098. [13] 杨 宁,唐常杰,王 悦,等.基于谱聚类的多数据流演化事件挖掘.软件学报, 2010, 21(10): 2395-2409. (YANG N, TANG C J, WANG Y, et al. Mining Evolutionary Events from Multi-streams Based on Spectral Clustering. Journal of Software, 2010, 21(10): 2395-2409.) [14] 姜高霞,王文剑.时序数据曲线排齐的相关性分析方法.软件学报, 2014, 25(9): 2002-2017. (JIANG G X, WANG W J. Correlation Analysis in Curve Registration of Time Series. Journal of Software, 2014, 25(9): 2002-2017.) [15] 章登义,欧阳黜霏,吴文李.针对时间序列多步预测的聚类隐马尔科夫模型.电子学报, 2014, 42(12): 2359-2364. (ZHANG D Y, OUYANG C F, WU W L. Cluster-Based Hidden Markov Model in Time Series Multi-step Prediction. Acta Electronica Sinica, 2014, 42(12): 2359-2364.) [16] 张亚昕.不确定数据聚类算法研究.计算技术与自动化, 2013, 32(2): 60-63. (ZHANG Y X. Uncertain Data Clustering Algorithm. Computing Technology and Automation, 2013, 32(2): 60-63.) [17] FREY B J, DUECK D. Clustering by Passing Messages between Data Points. Science, 2007, 315(5814): 972-976. [18] AGGARWAL C C, HAN J W, WANG J Y, et al. A Framework for Clustering Evolving Data Streams // Proc of the 29th International Conference on Very Large Data Bases. Berlin, Germany, 2003: 81-92. [19] ANDERSON D, KELLER J M, SKUBIC M, et al. Recognizing Falls from Silhouettes // Proc of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. New York, USA, 2006, I: 6388-6391. [20] RODRIGUEZ A, LAIO A. Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496. [21] 韩翠华,郝志新,郑景云.1951-2010年中国气温变化分区及其区域特征.地理科学进展, 2013, 32(6): 887-896. (HAN C H, HAO Z X, ZHENG J Y. Regionalization of Temperature Changes in China and Characteristics of Temperature in Diffe-rent Regions during 1951-2010. Progress in Geography, 2013, 32(6): 887-896.)