Online Segmentation Algorithm for Time Series Based on Hierarchical Clustering
DU Yi, LU DeTang , LI DaoLun, ZHA WenShu
Institute of Engineering and Science Software, University of Science and Technology of China, Hefei 230027 Key Laboratory of Software in Computing and Communication of Anhui Province, Hefei 230027
Abstract:How to segment sequential data in realtime is becoming one of the most important tasks in the time series mining domain. A new online segmentation algorithm called online segmentation algorithm for time series based on hierarchical clustering (OSHC) is presented. According to the order characteristics of sequence data, a novel Segment Feature List (SFList) is developed to save segmentation information. In the algorithm, time series are segmented effectively with one scan of the database and the time complexity is O(n). Historical information can also be inquired quickly by using the SFList. Experimental results show that the algorithm is efficient.
杜奕,卢德唐,李道伦,查文舒. 基于层次聚类的时间序列在线划分算法*[J]. 模式识别与人工智能, 2007, 20(3): 415-420.
DU Yi, LU DeTang , LI DaoLun, ZHA WenShu. Online Segmentation Algorithm for Time Series Based on Hierarchical Clustering. , 2007, 20(3): 415-420.
[1] Guralnik V, Srivastava J. Event Detection from Time Series Data // Proc of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, USA, 1999: 3342 [2] LI Aiguo, Qin Zheng. OnLine Segmentation of TimeSeries Data. Journal of Software, 2004, 15(11): 16711679 (in Chinese) (李爱国,覃 征.在线分割时间序列数据.软件学报, 2004,15(11): 16711679) [3] Huang Chao, Zhu Yangyong. An OnLine Algorithm for Segmenting Time Series Based on ARMA Model. Pattern Recognition and Artificial Intelligence, 2005, 18(2): 129134 (in Chinese) (黄 超,朱扬勇.基于ARMA模型的联机时间序列数据分割算法.模式识别与人工智能, 2005, 18(2): 129134) [4] Firoiu L. Segmenting Time Series with a Hybrid Neural NetworksHidden Markov Model // Proc of the 18th National Conference on Artificial Intelligence. Edmonton, Canada, 2002: 247252 [5] Chung F L, Fu T C, Ng V, et al. An Evolutionary Approach to PatternBased Time Series Segmentation. IEEE Trans on Evolutionary Computation, 2004, 8(5): 471489 [6] Haiminen N, Gionis A. Unimodal Segmentation of Sequences // Proc of the 4th IEEE International Conference on Data Mining. Brighton, UK, 2004: 106113 [7] Perng C S, Wang H X, Zhang S R, et al. Landmarks: A New Model for SimilarityBased Pattern Querying in Time Series Databases // Proc of 16th International Conference on Data Engineering. San Diego, USA, 2000: 3342 [8] Terzi E, Tsaparas P. Efficient Algorithms for Sequence Segmentation // Proc of the 6th SIAM International Conference on Data Mining. Bethesda, USA, 2006: 314325 [9] Keogh E, Kasetty S. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration // Proc of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, 2002: 102111 [10] Keogh E J, Chakrabarti K, Pazzani M J, et al. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and Information Systems, 2001, 3(3): 263286 [11] Yi B K, Faloutsos C. Fast Time Sequence Indexing for Arbitrary Lp Norms // Proc of the 26th International Conference on Very Large Data Bases. Cairo, Egypt, 2000: 385394 [12] Keogh E, Chu S, Hart D, et al. An Online Algorithm for Segmenting Time Series // Proc of the IEEE International Conference on Data Mining. San Jose, USA, 2001: 289296 [13] Wu Jiangqin, Gao Wen. Time Series Clustering Algorithm and Its Application in Gesture Recognition. Pattern Recognition and Artificial Intelligence, 2005,18(1):15 (in Chinese) (吴江琴,高 文.时间序列聚类算法及其在手势识别中的应用.模式识别与人工智能, 2005, 18(1): 15) [14] Han Jiawei, Kamber M. Data Mining: Concepts and Techniques. San Francisco, USA: Morgan Kaufmann Publishers, 2001 [15] MacQueen J B. Some Methods for Classification and Analysis of Multivariate Observations // Proc of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967, Ⅰ: 281297 [16] Kim J S, Kim M H. On Effective Data Clustering in Bitemporal Databases // Proc of the 4th International Workshop on Temporal Representation and Reasoning. Daytona Beach, USA, 1997: 5461