|
|
Robustness Analysis of Local Learning Algorithm Based on Nearest Neighbor |
BI Hua, WANG Jue |
Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, Beijing 100190 |
|
|
Abstract Robustness in statistical inference means that the departure of real data from an assumed sample distribution has little influence on the results of the remarkable prediction performance of the algorithm. The research methods of statistical robustness are introduced into machine learning in this paper. The nearest neighbor estimation algorithm, a kind of local learning, can converge to Bayes optimal estimation in the case of large number of samples, and meanwhile the nearest neighbor estimation algorithm is a kind of robust algorithm under the convergent condition. Finally, experimental results on synthetic and real datasets demonstrate that the generalization performance of the nearest neighbor estimation algorithm can be guaranteed when the model is affected by some outliers.
|
Received: 06 March 2007
|
|
|
|
|
[1] Henderson R. Note on Graduation by Adjusted Average. Transaction of the Actuarial Society of America, 1916, 17(1): 43-48 [2] Schoenberg I J. Contribution to the Problem of Approximation of Equidistant Data by Analytic Function. Quarterly of Applied Mathematics, 1946, 4(1): 45-99,112-141 [3] Breiman L. Statistical Modeling: The Two Cultures. Statistical Science, 2001, 16(3): 199-231 [4] Hoaglin D C, Mosteller F, Tukey J W. Understanding Robust and Exploratory Data Analysis. New York, USA: John Wiley, 1983 [5] Le Q, Bengio S. Noise Robust Discriminative Models [EB/OL]. [2003-09-01]. www.idiap.ch/fip/reports/2003/rr03-40.ps.gz [6] Zhu Xingquan, Wu Xindong. Class Noise vs. Attribute Noise: A Quantitative Study of Their Impacts. Artificial Intelligence Review, 2004, 22(3): 177-210 [7] Jin Wen, Tung A K H, Han Jiawei. Minding Top-n Local Outliers in Large Database // Proc of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2001: 293-298 [8] Hautamaki V, Karkkainen I, Franti P. Outlier Detection Using k-Nearest Neighbor Graph // Proc of the 17th International Conference on Pattern Recognition. Cambridge, UK, 2004: 430-433 [9] Vapnik V N. The Nature of Statistical Learning Theory. New York, USA: Springer, 1995 [10] Hastie T, Tibshirani R, Friedman J H F. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, USA: Springer, 2001 [11] Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd Edition. New York, USA: John Wiley, 2000 [12] Stone C J. Consistent Nonparametric Regression. The Annals of Statistics, 1977, 5(4): 595-620 [13] Devroye L, Gyorfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. New York, USA: Springer, 1996 |
|
|
|