|
|
A DensityNeighborsBased Incremental Outlier Detection Algorithm |
CAO Hui, SI Gang-Quan, ZHANG Yan-Bin, JIA Li-Xin |
School of Electrical Engineering, Xi'an Jiaotong University, Xi'an 710049 |
|
|
Abstract Aiming at the problem of incremental outlier detection with the dataset being updated, a density-neighbors-based incremental outlier detection algorithm is proposed. When the dataset is updated, the proposed algorithm identifies the affected objects and establishes the density neighbor sequences of the objects based on the change of the k-density of the object and those of its neighbors. According to the density neighbor sequence cost (DNSC) of the object and the average of the DNSC of k-distance neighbors of the object, the proposed algorithm calculates the incremental outlier factor(IOF) of each affected objects and the IOF value indicates the degree of the object as an outlier. Therefore, the proposed algorithm improves the effectiveness of incremental outlier detection. Moreover, it speeds up the outlier detection since the proposed algorithm recalculates the IOF values of these affected objects. The experimental results show that the proposed algorithm has a higher quality in outlier detection than the former incremental algorithms with the decrease of the running time.
|
Received: 24 November 2008
|
|
|
|
|
[1] Tan Pangning, Steinbach M, Kumar V. Introduction to Data Min-ing. Milano, Italy: Addison Wesley Higher Education, 2006: 491-509 [2] Domingos P, Hulten G. A General Framework for Mining Massive Data Streams. Journal of Computational and Graphical Statistics, 2003, 12 (4): 945-949 [3] Takeuchi J, Yamanishi K. A Unifying Framework for Detecting Outliers and Change Points from Time Series. IEEE Trans on Knowledge and Data Engineering, 2006, 18(4): 482-492 [4] Shan Shimin, Deng Guishi, He Yinghao. Online Detection Method towards Outlier in Data Stream. Computer Engineering, 2007, 33(15): 172-174 (in Chinese) (单世民,邓贵仕,何英昊.数据流中孤立点识别方法.计算机工程, 2007, 33(15): 172-174) [5] Dong Yihong, Tai Xiaoying, Zhao Jieyu. A Novel Fuzzy-Connectedness-Based Incremental Clustering Algorithm for Large Databases// Proc of the 2nd International Conference on Fuzzy Systems and Knowledge Discovery. Changsha, China, 2005: 470-474 [6] Kong Qinglu, Zhu Qiuming. Incremental Procedures for Partitioning Highly Intermixed Multi-Class Datasets into Hyper-Spherical and Hyper-Ellipsoidal Clusters. Data & Knowledge Engineering, 2007, 63(2): 457-477 [7] Breunig M M, Kriegel H P, Ng R T, et al. LOF: Identifying Density-Based Local Outliers// Proc of the ACM SIGMOD International Conference on Management of Data. Dallas, USA, 2000: 93-104 [8] Yang Fengzhao, Zhu Yangyong, Shi Baile. IncLOF: An Incremental Algorithm for Mining Local Outliers in Dynamic Environment. Journal of Computer Research and Development, 2004, 41(3): 477-484 (in Chinese) (杨风召,朱扬勇,施伯乐.IncLOF:动态环境下局部异常的增量挖掘算法.计算机研究与发展, 2004, 41(3): 477-484) [9] Pokrajac D, Lazarevic A, Latecki L J. Incremental Local Outlier Detection for Data Streams // Proc of the IEEE Symposium on Computational Intelligence and Data Mining. Honolulu, USA, 2007: 504-515 [10] Tang Jian, Chen Zhixiang, Fu A W, et al. Capabilities of Outlier Detection Schemes in Large Datasets, Framework and Methodologies. Knowledge and Information Systems, 2006, 11(1): 45-84 [11] Daszykowski M, Kaczmarek K, vander Heyden Y, et al. Robust Statistics in Data Analysis-A Review: Basic Concepts. Chemometrics and Intelligent Laboratory Systems, 2007, 85(2): 203-219 [12] Jin Wen, Tung A K H, Han Jiawei,et al. Ranking Outliers Using Symmetric Neighborhood Relationship // Proc of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore, Singapore, 2006: 577-593 [13] He Zengyou, Xu Xiaofei, Deng Shengchun. Discovering Cluster Based Local Outliers. Pattern Recognition Letters, 2003, 24(9): 1641-1650 [14] Hu Tianming, Sung S Y. Detecting Pattern-Based Outliers. Pattern Recognition Letters, 2003, 24(16): 3059-3068 |
|
|
|