Abstract:As a significant clue for video indexing and retrieval, audio detection and classification has attracted much attention and become a hot topic. On the basis of the prior model of news video structure, a selective ensemble support vector machines (SENSVM) is proposed to detect and classify the news audio into 4 types: silence, music, speech, and speech with music background. Experiments on real news audio clips of 8514s in total length illustrate that the average accuracy rate of the proposed audio classification method reaches 98.2%, which is much better than that of the available SVMbased method or the traditional thresholdbased method.
[1] Zhang T, Kuo C C J. Audio Content Analysis for Online Audiovisual Data Segmentation and Classification. IEEE Trans on Speech and Audio Processing, 2001, 9(4): 441-457 [2] Feiten B, Frank R, Ungvary T. Organization of Sounds with Neural Nets // Proc of the International Computer Music Conference. San Francisco, USA, 1991: 441-444 [3] Feiten B, Günzel S. Automatic Indexing of a Sound Database Using Self-Organizing Neural Nets. Computer Music Journal, 1994, 18(3): 53-65 [4] Kimber D, Wilcox L. Acoustic Segmentation for Audio Browsers // Proc of Interface Conference. Sydney, Australia, 1996: 384-392 [5] Scheirer E, Slaney M. Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Munich, Germany, 1997, Ⅱ: 1331-1334 [6] Li Y, Dorai C. SVM-Based Audio Classification for Instructional Video Analysis // Proc of the IEEE International Conference on Acoustics Speech and Signal Processing. Montreal, Canada, 2004, Ⅴ: 897-900 [7] Vapnik V N. The Nature of Statistical Learning Theory. 2nd Edition. Berlin, Germany: Springer-Verlag, 1995 [8] Hansen L K, Salamon P. Neural Network Ensembles. IEEE Trans on Pattern Analysis and Machine Intelligence, 1990, 12(10): 993-1001 [9] Hansen L K, Liisberg L, Salamon P. Ensemble Methods for Handwritten Digit Recognition // Proc of the IEEE Workshop on Neural Networks for Signal Processing. Copenhagen, Denmark, 1992: 333-342 [10] Zhou Z H, Jiang Y, Yang Y B, et al. Lung Cancer Cell Identification Based on Artificial Neural Network Ensembles. Artificial Intelligence in Medicine, 2002, 24(1): 25-36 [11] Zhou Z H, Wu J X, Jiang Y, et al. Genetic Algorithm Based Selective Neural Network Ensemble // Proc of the 17th International Joint Conference on Artificial Intelligence. Seattle, USA, 2001, Ⅱ: 797-802 [12] Maclin R, Shavlik J W. Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks // Proc of the 14th International Joint Conference on Artificial Intelligence. Montreal, Canada, 1995: 524-530 [13] Schapire R E. The Strength of Weak Learnability. Machine Learning, 1990, 5(2): 197-227 [14] Wang Y, Liu Z, Huang J C. Multimedia Content Analysis Using Both Audio and Visual Cues. IEEE Signal Processing Magazine, 2000, 17(6): 12-36 [15] Zhang T, Kuo C C J. Audio Content Analysis for Online Audiovisual Data Segmentation. IEEE Trans on Speech and Audio Processing, 2001, 9(4): 441-457