Identification of Outlier Patterns in Multivariate Time Series
WENG XiaoQing1,2, SHEN JunYi1
1.Institute of Computer Software, Xi’an Jiaotong University, Xi’an 710049 2.Computer Center, Hebei University of Economics and Trade, Shijiazhuang 050061
Abstract:Multivariate time series (MTS) is widely available in many fields including finance, medicine, science and engineering. An approach for identifying outlier patterns in MTS is proposed. By using bottomup segmentation algorithm, MTS is divided into nonoverlapping subsequences. An extended Frobenius norm is used to compare the similarity between two MTS subsequences. Kmeans algorithm is employed to cluster MTS subsequences into some classes. According to the definitions of outlier patterns, the outlier patterns in MTS can be identified from the classes. Experiments are performed on two realworld datasets: stock market dataset and brain computer interface dataset. The experimental results show the effectiveness of the algorithm.
[1] Yang K, Shahabi C. A PCABased Similarity Measure for Multivariate Time Series // Proc of the 2nd ACM International Workshop on Multimedia Databases. Washington, USA, 2004: 6574 [2] Li Aiguo. Segmenting and Mining Temporal Patterns in Time Series Data. Ph.D Dissertation. Xi’an, China: Xi’an Jiaotong University. Computer Science Department, 2003 (in Chinese) (李爱国.时间序列数据分割与时态模式挖掘研究.博士学位论文.西安:西安交通大学.计算机科学系, 2003) [3] Abonyi J, Feil B, Nemeth S, et al. Modified GathGeva Clustering for Fuzzy Segmentation of Multivariate TimeSeries. Fuzzy Sets and Systems, 2005, 149(1): 3956 [4] Vlachos M, Yu P S, Castelli V. On Periodicity Detection and Structural Periodic Similarity [EB/OL]. [20051211]. http://www.cs.ucr.edu/~mvlachos/pubs/sdmo5.pdf [5] Keogh E, Chu S, Hart D, et al. An Online Algorithm for Segmenting Time Series // Proc of the IEEE International Conference on Data Mining. San Jose, USA, 2001: 289296 [6] Vasko K T, Toivonen H T T. Estimating the Number of Segments in Time Series Data Using Permutation Tests // Proc of the IEEE International Conference on Data Mining. Maebashi City, Japan, 2002: 466473 [7] Singhalt A, Seborg D E. Clustering of Multivariate TimeSeries Data // Proc of the American Control Conference. Anchorage, USA, 2002: 39313936 [8] Shahabi C, Yan D H. RealTime Pattern Isolation and Recognition over Immersive Sensor Data Streams // Proc of the 9th International Conference on MultiMedia Modeling. Taipei, China, 2003: 93113 [9] Han Jiawei, Kamber M. Data Mining: Concepts and Techniques. Orlando, USA: Morgan Kaufmann Publishers, 2001 [10] http://finance.sina.com.cn [11] Blankertz B, Curio G, Müller K R. Classifying Single Trial EEG: Towards Brain Computer Interfacing // Diettrich T G, Becker S, Ghahramani Z, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2002, 14: 157164