Clustering Method of Time Series Based on EMD and K-means Algorithm
LIU Hui-Ting1, NI Zhi-Wei2
1.School of Computer Science and Technology, Anhui University, Hefei 230039 2.Institute of Computer Network System, Hefei University of Technology, Hefei 230009
Abstract Dimension reduction of time series and noise in sequences filtering are important prerequisites for effective realization of time series clustering. A method is proposed to preprocess time series effectively. Firstly, the trend of a time sequence is got by using empirical mode decomposition method. Then, the trend series are divided into several segments by bottom-up algorithm. Finally, the piecewise series are translated into uniform sequences, and each of them is composed of -1, 0 and 1. To prove that the proposed method can achieve dimensionality reduction and filter out the noise from the data sequence, K-means algorithm is utilized to finish clustering of pretreated time series. Experimental results show clustering of pretreated data sequences is better than that of the original series.
[1] Last M, Klein Y, Kandel A. Knowledge Discovery in Time Series Databases. IEEE Trans on Systems, Man and Cybernetics , 2001, 31(1): 160-169 [2] Moon Y S, Kim J. Efficient Moving Average Transform-Based Subsequence Matching Algorithms in Time-Series Databases. Information Sciences: An International Journal, 2007, 177(23): 5415-5431 [3] Kim S W, Park D H, Lee H G. Efficient Processing of Subsequence Matching with the Euclidean Metric in Time-Series Databases. Information Processing Letters, 2004, 90(5): 253-260 [4] Kontaki M, Papadopoulos A N, Manolopoulos Y. Adaptive Similarity Search in Streaming Time Series with Sliding Windows. Data & Knowledge Engineering, 2007, 63(2): 478-502 [5] Zhang Haiqin, Cai Qingsheng. Time Series Similar Pattern Matching Based on Wavelet Transform. Chinese Journal of Computers, 2003, 26(3): 1-5 (in Chinese) (张海勤,蔡庆生.基于小波变换的时间序列相似模式匹配.计算机学报, 2003, 26(3): 1-5) [6] Zhang Min, Zhang Yanping, Cheng Jiaxing. Hierarchical Algorithm to Match Similar Time Series Pattern. Journal of Computer-Aided Design & Computer Graphics, 2005, 17(7): 1480-1485 (in Chinese) (张 旻,张燕平,程家兴.时间序列相似模式的分层匹配.计算机辅助设计与图形学学报, 2005, 17(7): 1480-1485) [7] Peng Z K, Tse P W, Chu F L. An Improved Hilbert-Huang Transform and Its Application in Vibration Signal Analysis. Journal of Sound and Vibration, 2005, 286(1/2): 187-205 [8] Keogh E, Chu S, Hart D, et al. An On-line Algorithm for Segmenting Time Series // Proc of the IEEE International Conference on Data Mining. San Jose, USA, 2001: 289-296 [9] Keogh E, Kasetty S. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration // Proc of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, 2002: 102-111 [10] Ordonez C, Omiecinski E. Efficient Disk-Based K-means Clustering for Relational Databases. IEEE Trans on Knowledge and Data Engineering, 2004, 16(8): 909-921 [11] Zhang Jianpei, Yang Yue, Yang Jing, et al. Algorithm for Initialization of K-means Clustering Center Based on Optimized-Division. Journal of System Simulation, 2009, 21(9): 2586-2590 (in Chinese) (张健沛,杨 悦,杨 静,等.基于最优划分的K-means初始聚类中心选取算法.系统仿真学报, 2009, 21(9): 2586-2590) [12] Zhang Yan, Sun Zhengxing, Li Wenhui. Texture Synthesis Based on Direction Empirical Mode Decomposition. Computers & Graphics, 2008, 32(2): 175-186