基于非平稳割点的大数据分类样例选择<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201609002

Abstract
Figure/Table
References
Related Citation (8)

Download: PDF (475 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract When the traditional sample selection methods are used to compress the large data, the computational complexity and large time consumption are high. Aiming at this problem, a sample selection method based on unstable cuts for the compression of large data sets is proposed in this paper. The extreme value is obtained at the interval endpoint for convex function, and therefore the endpoint degree of a sample is measured by making the unstable cuts of all attributes according to the basic property. The samples with higher endpoint degree are selected,and the calculation of the distance between the samples is avoided. The efficiency of the computation is improved without affecting the classification accuracy. The experimental results show a significant effect of the proposed algorithm on the compression for the large data set with high imbalance ratio and strong ability of anti-noise.

Key words： Large Data Classification Sample Selection Unstable cut-points Decision Tree

Received: 03 May 2016

ZTFLH:

TP 181

About author:: WANG Xizhao, born in 1963, Ph.D., professor. His research interests include machine learning and pattern recognition.XING Sheng(Corresponding author), born in 1982, Ph.D. candidate, lecturer.His research interests include machine learning.ZHAO Shixin, born in 1978, Ph.D. candidate, lecturer. Her research interests include machine learning.)

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	WANG Xizhao
	XING Sheng
	ZHAO Shixin

Cite this article:

WANG Xizhao,XING Sheng,ZHAO Shixin. Unstable Cut-Points Based Sample Selection for Large Data Classification[J]. , 2016, 29(9): 780-789.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201609002 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2016/V29/I9/780

[1] BRYANT R E, KATE R H, LAZOWSKA E D. Big-Data Computing: Creating Revolutionary Breakthroughs in Commerce, Science, and Society[EB/OL].[2012-10-02]. http://videolectures.net/eswc2012_grobelnik_big_data.
[2] WILSON D R, MARTINEZ T R. Reduction Techniques for Instance-Based Learning Algorithms. Machine Learning, 2000, 38(3): 257-286.
[3] BRIGHTON H, MELLISH C. Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery, 2002, 6(2): 153-172.
[4] HART P E. The Condensed Nearest Neighbor Rule. IEEE Trans on Information Theory, 1968, 14(3): 515-516.
[5] GATES G W. The Reduced Nearest Neighbor Rule. IEEE Trans on Information Theory, 1972, 18(3): 431-433.
[6] RITTER G, WOODRUFF H, LOWRY S, et al. An Algorithm for the Selective Nearest Neighbour Decision Rule. IEEE Trans on Information Theory, 1975, 21(6): 665-669.
[7] NIKOLAIDIS K, GOULERMAS J Y, WU Q H. A Class Boundary Preserving Algorithm for Data Condensation. Pattern Recognition, 2011, 44(3): 704-715.
[8] GARCA S, DERRAC J, CANO J R, et al. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Trans on Pattern Analysis and Machine Intelligence, 2012, 34(3): 417-435.
[9] ZHAI J H, LI T, WANG X Z. A Cross-Selection Instance Algorithm. Journal of Intelligent and Fuzzy Systems, 2016, 30(2): 717-728.
[10] CHEN J N, ZHANG C M, XUE X P, et al. Fast Instance Selection for Speeding up Support Vector Machines. Knowledge-Based Systems, 2013, 45: 1-7.
[11] WILSON D L. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Trans on Systems, Man and Cybernetics, 1972, SMC-2(3): 408-421.
[12] TOMEK I. An Experiment with the Edited Nearest-Neighbor Rule. IEEE Trans on Systems, Man, and Cybernetics, 1976, SMC-6(6): 448-452.
[13] AHA D W, KIBLER D, ALBERT M K. Instance-Based Learning Algorithms. Machine Learning, 1991, 6(1): 37-66.
[14] TSAI C F, CHEN Z Y. Towards High Dimensional Instance Selection: An Evolutionary Approach. Decision Support Systems, 2014, 61: 79-92.
[15] FU Y F, ZHU X Q, ELMAGARMID A K. Active Learning with Optimal Instance Subset Selection. IEEE Trans on Cybernetics, 2013, 43(2): 464-475.
[16] ZHAI T T, HE Z F. Instance Selection for Time Series Classification Based on Immune Binary Particle Swarm Optimization. Know-ledge-Based Systems, 2013, 49: 106-115.
[17] TSAI C F, CHANG C W. SVOIS: Support Vector Oriented Instance Selection for Text Classification. Information Systems, 2013, 38(8): 1070-1083.
[18] WANG X Z, DONG L C, YAN J H. Maximum Ambiguity-Based Sample Selection in Fuzzy Decision Tree Induction. IEEE Trans on Knowledge and Data Engineering, 2012, 24(8): 1491-1505.
[19] FAYYAD U M, IRANI K B. On the Handling of Continuous-Va-lued Attributes in Decision Tree Generation. Machine Learning, 1992, 8(1): 87-102.
[20] QUINLAN J R. Induction of Decision Trees. Machine Learning, 1986, 1(1): 81-106.
[21] QUINLAN J R. Improved Use of Continuous Attributes in C4.5. Journal of Artificial Intelligence Research, 1996, 4(1): 77-90.
[22] BREIMAN L. Technical Note: Some Properties of Splitting Criteria. Machine Learning, 1996, 24(1): 41-47.
[23] ROKACH L, MAIMON O. Top-Down Induction of Decision Trees Classifiers-A Survey. IEEE Trans on Systems, Man and Cyberne-tics (Applications and Reviews), 2005, 35(4): 476-487.
[24] L J C, YI Z. An Improved Backpropagation Algorithm Using Absolute Error Function // Proc of the 2nd International Symposium on Neural Networks. Berlin, Germany: Springer, 2005, I: 585-590.
[25] ZONG W W, HUANG G B, CHEN Y Q. Weighted Extreme Lear-ning Machine for Imbalance Learning. Neurocomputing, 2013, 101: 229-242.