Abstract:Changeable trends of time series can be reflected by shape features which retain sufficient data information during the dimensionality reduction. It is good to improve the efficiency of time series data mining in the later stage. A symbolic aggregate approximation based on shape features is proposed. It regards the mean and the shape feature of a sequence as two important characteristics, and changes their domains of discourse to transform them into strings. Compared with the traditional methods, the proposed method improves the efficiency of time series data mining in the setting of equal compress rate because of the sufficient information which is retained by the previous stage.
[1] Chan K, Fu A W. Efficient Time Series Matching by Wavelets // Proc of the 15th IEEE International Conference on Data Engineering. Sydney, Australia, 1999: 117-126 [2] Agrawal R, Faloutsos C, Swami A. Efficient Similarity Search in Sequence Databases // Proc of the 4th International Conferences on Foundations of Data Organization and Algorithms. Chicago, USA, 1993: 69-84 [3] Keogh E, Chu S, Hart D, et al. An Online Algorithm for Segmenting Time Series // Proc of the IEEE International Conference on Data Mining. San Jose, USA, 2001: 289-296 [4] Keogh E, Chakrabarti K, Pazzani M, et al. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Journal of Knowledge and Information Systems, 2001, 3(3): 263-286 [5] Lin J, Keogh E, Lonardi S, et al. A Symbolic Representation of Time Series with Implications for Streaming Algorithms // Proc of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, USA, 2003: 2-11 [6] Battuguldur L, Yu S, Kyoji K. Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation [EB/OL]. [2010-10-18]. http://citeseerx.ist.psu.edu/viewdoc/download?doi-10.1.1.149.9325rep-repltype=pdf [7] Zhong Qingliu, Cai Zixing. The Symbolic Algorithm for Time Series Data Based on Statistic Feature. Chinese Journal of Computers, 2008, 31(10): 1857-1864 (in Chinese) (钟清流, 蔡自兴.基于统计特征的时序数据符号化算法.计算机学报, 2008, 31(10): 1857-1864) [8] Xiao Hui, Hu Yuanfa. Data Mining Based on Segmented Time Warping Distance in Time Series Database. Journal of Computer Research and Development, 2005, 42(1): 72-78 (in Chinese) (肖 辉,胡运发.基于分段时间弯曲距离的时间序列挖掘.计算机研究与发展, 2005, 42(1): 72-78) [9] Hung N, Anh D. An Improvement of PAA for Dimensionality Reduction in Large Time Series Databases // Proc of the 10th Pacific Rim International Conference on Artificial Intelligence. Hanoi, Vietnam, 2008: 698-707 [10] Zhang Jianye, Pan Quan, Zhang Pen, et al. Similarity Measuring Method in Time Series Based on Slope. Pattern Recognition and Artificial Intelligence, 2007, 20(2): 271-274 (in Chinese) (张建业,潘 泉,张 鹏,等. 基于斜率表示的时间序列相似性度量方法.模式识别与人工智能, 2007, 20(2): 271-274) [11] Arkin E M, Chew L P, Huttenlocher D P, et al. An Efficiently Computable Metric for Comparing Polygonal Shapes. IEEE Trans on Pattern Analysis and Machine Intelligence, 1999, 13(3): 209-215 [12] Zhang Peng, Li Xueren, Zhang Jianye, et al. Included Angle Distance of Time Series and Similarity Search. Pattern Recognition and Artificial Intelligence, 2008, 21(6): 763-767 (in Chinese) (张 鹏,李学仁,张建业,等.时间序列的夹角距离及相似性搜索.模式识别与人工智能, 2008, 21(6): 763-767) [13] Faloutsos C, Ranganathan M, Manolopoulos Y. Fast Subsequence Matching in Time-Series Database // Proc of the ACM SIGMOD International Conference on Management of Data. Minneapolis, USA, 1994, 23(2): 419-429 [14] Keogh E, Ratanamahatana C A. Exact Indexing of Dynamic Time Warping. Knowledge and Information Systems, 2005, 7(3): 358-386 [15] Guo Chonghui, Li Hailin, Pan Donghua. An Improved Piecewise Aggregate Approximation Based on Statistical Features for Time Series Mining // Proc of the 4th International Conference on Knowledge Science, Engineering and Management. Belfast, UK, 2010: 234-244 [16] Keogh E, Xi X, Li W, et al. UCR Time Series Classification/Clustering Page[EB/OL]. [2003-01-06]. http://www.cs.ucr.edu/~eamonn/time_series_data/ [17] Keogh E, Lin J, Fu A. HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence // Proc of the 5th IEEE International Conference on Data Mining. Houston, USA, 2005: 226-233