Cross-Media Retrieval Method Based on Feature Subspace Learning
ZHANG Hong1,2, WU Fei1, ZHUANG Yue-Ting1
1.College of Computer Science and Technology, Zhejiang University, Hangzhou 3100272. College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065
Abstract:The latent correlation between low-level features of different modalities is studied, and an optimizing algorithm is proposed to improve cluster quality of both image and audio datasets in the feature subspace. To speed up the convergence of query process, three active learning strategies in relevance feedback are incorporated. Thus, the condition probability of unlabeled samples around labeled examples is calculated. Experimental results show that overall cross-media retrieval results are encouraging, and the mutual retrieval between image and audio data can be effectively realized by the proposed algorithm.
[1] McGurk H, MacDonald J. Hearing Lips and Seeing Voices. Nature, 1976, 264(5588): 746-748 [2] Calvert A. Cross-Modal Processing in the Human Brain: Insights from Functional Neuron Imaging Studies. Cerebral Cortex, 2001, 11(12): 1120-1123 [3] Zhang Hong, Weng Jianguang. Measuring Multi-Modality Similarities via Subspace Learning for Cross-Media Retrieval // Proc of the 7th Pacific-Rim Conference on Multimedia. Hangzhou, China, 2006: 979-988 [4] Wu Fei, Zhang Hong, Zhuang Yueting. Learning Semantic Correlations for Cross-Media Retrieval // Proc of the 13th IEEE International Conference on Image Processing. Atlanta, USA, 2006: 1465-1468 [5] Li Mingkun, Li Dongge, Dimitrova N, et al. Audio-Visual Talking Face Detection // Proc of the International Conference on Multimedia and Expo. Baltimore, USA, 2003: 473-476 [6] Zhuang Yueting, Yang Yi, Wu Fei. Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval. IEEE Trans on Multimedia, 2008, 10(2): 221-229 [7] Yang Yi, Zhuang Yueting, Wu Fei, et al. Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval. IEEE Trans on Multimedia, 2008, 10(3): 437-446 [8] Naphade M R, Chang E, Smith J R. ICME 2003 Tutorial: Multimedia Semantics and Machine Learning [EB/OL]. [2003-07-06]. https: //www.securecms.com/icme2003/tutorialInfo.asp?Tutorial ID=1 [9] Guo Guodong, Li S Z. Content-Based Audio Classification and Retrieval by Support Vector Machines. IEEE Trans on Neural Networks, 2003, 14(1): 209-215 [10] Zhao Rong, Grosky W I. Negotiating the Semantic Gap: From Feature Maps to Semantic Landscapes. Pattern Recognition, 2002, 35(3): 593-600 [11] Adams W H, Iyengar G, Lin C Y, et al. Semantic Indexing of Multimedia Content Using Visual, Audio and Text Cues. EURASIP Journal on Applied Signal Processing, 2003, 2: 170-185 [12] Ma Qiang, Nadamoto A, Tanaka K. Complementary Information Retrieval for Cross-Media News Content. Information Systems, 2006, 31(7): 659-678 [13] Craig D H, Sandor S, Szedmak S, et al. A Correlation Approach for Automatic Image Annotation // Proc of the 2nd International Conference on Advanced Data Mining and Applications. Xi'an, China, 2006: 681-692 [14] Hardoon D R, Shawe-Taylor J. KCCA for Different Level Precision in Content-Based Image Retrieval // Proc of the 3rd International Workshop on Content-Based Multimedia Indexing. Rennes, France, 2003: 22-24 [15] Wang Xinjing, Ma Weiying, Xue Guirong, et al. Multi-Model Similarity Propagation and Its Application for Web Image Retrieval // Proc of the 12th Annual ACM International Conference on Multimedia. New York, USA, 2004: 944-951 [16] Zhao Rong, Grosky W I. Narrowing the Semantic Gap: Improved Text-Based Web Document Retrieval Using Visual Features. IEEE Trans on Multimedia, 2002, 4(2):189-200