Dimension Reduction and Similarity Search for Time Series Based on Regression Coefficient
HUANG Chao1,2, ZHU YangYong1
1.Department of Computing and Information Technology, Fudan University, Shanghai 200433 2.School of Economics and Management, Southeast University, Nanjing 210096
Abstract:Dimension reduction is always necessary when similarity search is conducted in time series. The privious methods are of high time complexity and unintuitive (such as DFT and DWT), or can’t be used for accurate similarity search (such as PAA). This paper brings forward a new dimension reduction method which is called Piecewise Regression Approximation (PRA) for time series based on regression coefficient. The PRA method is of linear time complexity and not sensitive to independent noises. It is proved the similarity search based on PRA method satisfies the lowerbounding lemma, so it is practical and effective. The experiments conducted on reallife datasets validate our conclusions.
黄超,朱扬勇. 基于回归系数的时间序列维约简与相似性查找*[J]. 模式识别与人工智能, 2006, 19(1): 52-57.
HUANG Chao, ZHU YangYong. Dimension Reduction and Similarity Search for Time Series Based on Regression Coefficient. , 2006, 19(1): 52-57.
[1] Keogh E, Kasetty S. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In: Proc of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, 2002, 102-111 [2] Keogh E J, Chu S, Hart D, Pazzani M J. An Online Algorithm for Segmenting Time Series. In: Proc of the IEEE International Conference on Data Mining. Washington, USA: IEEE Computer Society, 2001, 289-296 [3] Faloutsos C, Ranganathan M, Manolopoulos Y. Fast Subsequence Matching in Time-Series Databases. In: Proc of the ACM SIGMOD International Conference on Management of Data. Minneapolis, USA, 1994, 419-429 [4] Chan K P, Fu A W. Efficient Time Series Matching by Wavelets. In: Proc of the 15th International Conference on Data Engineering. Sydney, Australia, 1999, 126-133 [5] Zhao H, Hou J R, Shi B L. Research on Similarity of Stochastic Non-Stationary Time Series Based on Wavelet-Fractal. Journal of Software, 2004, 15(5): 633-640 (in Chinese) (赵 慧, 侯建荣, 施伯乐. 随机非平稳时间序列数据的相似性研究. 软件学报, 2004, 15(5): 633-640) [6] Keogh E J, Pazzani M J. Relevance Feedback Retrieval of Time Series Data. In: Proc of the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Berkeley, USA, 1999, 183-190 [7] Keogh E, Chakrabarti K, Mehrotra S, Pazzani M. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Trans on Database Systems, 2002, 27(2): 188-228 [8] Lin J, Keogh E, Lonardi S, Chiu B. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In: Proc of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, USA, 2003, 2-11 [9] Mills T C. The Econometric Modelling of Financial Time Series. 2nd Edition. Cambridge, UK: Cambridge University Press, 1999 [10] Webb A R. Statistical Pattern Recognition. 2nd Edition. London, UK: John Wiley Press, 2002