Abstract:The effectiveness of the image representation based on bag-of-visual words(BoW) model is majorly limited by the quantization error. To address this issue, an improved image representation based on multiple visual codebooks is proposed in this paper, which considers both visual codebook construction and feature coding. The proposed method specifically consists of 1) multiple visual codebooks construction, in which the compact and complementary visual codebooks are iteratively generated; 2) image representation, in which the visual words are firstly selected from each individual visual codebook, then the coding coefficients are determined by using the regularized linear regression method, and finally the image is represented by combining the spatial pyramid structure. The experimental results on several benchmark image classification datasets demonstrate the consistent and significant improvement of the proposed method.
[1] Lazebnik S, Schmid C, Ponce J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories / / Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA, 2006, II: 2169-2178 [2] Boureau Y, Bach F, LeCun Y, et al. Learning MidLevel Features for Recognition / / Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA,2010: 2559-2566 [3] Lowe D. Distinctive Image Features from ScaleInvariant Keypoints.International Journal of Computer Vision, 2004, 60(2): 91-110 [4] Sivic J, Zisserman A. Video Google: A Text Retrieval Approach to Object Matching in Videos / / Proc of the 9th IEEE International Conference on Computer Vision. Nice, France, 2003,域: 1470-1477 [5] Aharon M, Elad M, Bruckstein A. KSVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation.IEEE Trans on Signal Processing, 2006, 54(11): 4311-4322 [6] Jiang Yuguang, Ngo C W. Visual Word Proximity and Linguistics for Semantic Video Indexing and NearDuplicate Retrieval. Compu ter Vision and Image Understanding, 2009, 113(3): 405-414 [7] Jurie F, Triggs B. Creating Efficient Codebooks for Visual Recogni tion / / Proc of the 10th International Conference on ComputerVision. Beijing, China, 2005, I: 604-610 [8] Boiman O, Shechtman E, Irani M. In Defense of NearestNeighbor Based Image Classification / / Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Ancho rage, USA, 2008: 1-8 [9] Gemert J, Geusebroek J, Veenman C, et al. Kernel Codebooks for Scene Categorization / / Proc of the 10th European Conference on Computer Vision. Marseille, France, 2008: 696-709 [10] Coates A, Ng A Y. The Importance of Encoding versus Training with Sparse Coding and Vector Quantization / / Proc of the 28th International Conference on Machine Learning. Bellevue, USA, 2011: 921-928 [11] Yang Jianchao, Yu Kai, Gong Yihong, et al. Linear Spatial Pyra mid Matching Using Sparse Coding for Image Classification / / Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 1794-1801 [12] Wang Jinjun, Yang Jianchao, Yu Kai, et al. LocalityConstrained Linear Coding for Image Classification / / Proc of the IEEE Com puter Society Conference on Computer Vision and Pattern Recogni tion. San Francisco, USA, 2010: 3360-3367 [13] Jegou H, Douze M, Schmid C. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search / / Proc of the 10th European Conference on Computer Vision. Marseille,France, 2008: 304-317 [14] Zhou Xi, Yu Kai, Zhang Tong, et al. Image Classification Using SuperVector Coding of Local Image Descriptors / / Proc of the 11thEuropean Conference on Computer Vision. Heraklion, Greece,2010: 141-154 [15] Yu Kai, Zhang Tong, Gong Yihong. Nonlinear Learning Using Local Coordinate Coding / / Proc of the Annual Conference on Neural Information Systems. Vancouver, Canada, 2009: 2223-2231