Detection of Outlier Samples in Multivariate Time Series
WENG XiaoQing1, 2, SHEN JunYi1
1.Institute of Computer Software, Xi’an Jiaotong University, Xi’an 710049 2.Computer Center, Hebei University of Economics and Trade, Shijiazhuang 050061
Abstract:Multivariate time series (MTS) datasets are commonly used in the fields of finance, multimedia and medicine. MTS samples, namely outlier samples, are significantly different from the other MTS samples. In this paper, a method for detecting outlier samples in the MTS dataset based on local sparsity coefficient is proposed. An extended Frobenius norm is used to compare the similarity between two MTS samples, and knearest neighbor (kNN) searches are performed by using twophase sequential scan. MTS samples that are not possible outlier candidates are pruned, which reduces the number of computations and comparisons. Experiments are carried out on two realworld datasets, stock market dataset and BCI (Brain Computer Interface) dataset. The experimental results show the effectiveness of the proposed method.
[1] Yang K, Shahabi C. A PCABased Similarity Measure for Multivariate Time Series // Proc of the 2nd ACM International Workshop on Multimedia Databases. Washington, USA, 2004: 6574 [2] Yoon H, Yang K, Shahabi C. Feature Subset Selection and Feature Ranking for Multivariate Time Series. IEEE Trans on Knowledge and Data Engineering, 2005, 17(9): 11861198 [3] Yang K, Shahabi C. A Multilevel DistanceBased Index Structure for Multivariate Time Series // Proc of the 12th International Symposium on Temporal Representation and Reasoning. Burlington, USA, 2005: 6573 [4] Singhalt A, Seborg D E. Clustering of Multivariate TimeSeries Data // Proc of the American Control Conference. Anchorage, USA, 2002: 39313936 [5] Hawkins D. Identification of Outliers. London, UK: Chapman and Hall, 1980 [6] Agyemang M, Ezeife C I. LSCMine: Algorithm for Mining Local Outliers // Proc of the 15th International Conference on Information Resources Management Association. New Orleans, USA, 2004: 58 [7] Zheng Binxiang, Xi Yugeng, Du Xiuhua. Outlier Mining for Time Series Data Based on Outlier Index. Acta Automatica Sinica, 2004, 30 (1): 7077 (in Chinese) (郑斌祥,席裕庚,杜秀华.基于离群指数的时序数据离群挖掘.自动化学报, 2004, 30 (1): 7077) [8] Angiulli F, Pizzuti C. Outlier Mining in Large HighDimensional Data Sets. IEEE Trans on Knowledge and Data Engineering, 2005, 17(2): 203215 [9] Karioti V, Caroni C. Detecting Outlying Series in Sets of Short Time Series. Computational Statistics & Data Analysis, 2002, 39(3): 351364 [10] Vlachos M, Hadjieleftheriou M, Gunopulos D, et al. Indexing MultiDimensional TimeSeries with Support for Multiple Distance Measures // Proc of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, USA, 2003: 216225 [11] Li Chuanjun, Pradhan G, Zheng S Q, et al. Indexing of Variable Length MultiAttribute Motion Data // Proc of the 2nd ACM International Workshop on Multimedia Databases. Arlington, USA, 2004: 7584 [12] Agrawal R, Faloutsos C, Swami A. Efficient Similarity Search in Sequence Databases // Proc of the 4th International Conference on Foundations of Data Organization Algorithms. Chicago, USA, 1993: 6984 [13] Chan K P, Fu W A. Efficient Time Series Matching by Wavelets // Proc of the 15th IEEE International Conference on Data Engineering. Sydney, Australia, 1999: 126133 [14] Keogh E J, Chakrabarti K, Pazzani M J, et al. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and Information Systems, 2000, 3(3): 263286 [15] Li Aiguo. Segmenting and Mining Temporal Patterns in Time Series Data. Ph. D Dissertation. Xi’an, China: Xi’an Jiaotong University. Department of Computer Science, 2003 (in Chinese) (李爱国.时间序列数据分割与时态模式挖掘研究. 博士学位论文. 西安:西安交通大学.计算机科学系,2003) [16] http://finance.sina.com.cn [17] Schlgl A, Neuper C, Pfurtscheller G. Estimating the Mutual Information of an EEGBased BrainComputerInterface. Biomedizinische Technik, 2002, 47(1/2): 38 [18] BCI Competition 2003 [DB/OL]. [2005620]. http://ida.first.fhg.de/projects/bci/competition_ii [19] Shahabi C, Yan Donghui. RealTime Pattern Isolation and Recognition over Immersive Sensor Data Streams // Proc of the 9th International Conference on MultiMedia Modeling. Taiwan, China, 2003: 93113