Abstract:The existing image hierarchical representation methods are strict in feed-forward style, and therefore it is not able to solve problems like local ambiguities well. In this paper, a probabilistic model is proposed to learn and deduce all layers of the hierarchy together. Specifically, a recursive probabilistic decomposition process is taken into account, and a generative model based on latent Dirichlet allocation with pyramidal multilayer structure is derived. Two important properties of the proposed probabilistic model are demonstrated: adding an additional representation layer to improve the performance of the flat model and adopting a full Bayesian approach which is better than a feed-forward implementation of the model. Experimental results on a standard recognition dataset show that the proposed method outperforms the existing hierarchical approaches, and it improves the classification and the learning accuracy with better performance.
[1] Lowe D G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110 [2] Ahmed A, Yu Kai, Xu Wei, et al. Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks // Proc of the 10th European Conference on Computer Vision. Marseille, France, 2008: 69-82 [3] Lazebnik S, Schmid C, Ponce J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. New York, USA, 2006: 2169-2178 [4] Yang Jianchao, Yu Kai, Gong Yihong, et al. Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 1794-1801 [5] Olshausen B A, Field D J. Sparse Coding with an Over-Complete Basis Set: A Strategy Employed by V1? Vision Research, 1997, 37(23): 3311-3325 [6] Boureau Y L, Bach F, LeCun Y, et al. Learning Mid-Level Features for Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA, 2010: 2559-2566 [7] Fritz M, Black M J, Bradski G R, et al. An Additive Latent Feature Model for Transparent Object Recognition // Proc of the 23rd An-nual Conference on Neural Information Processing Systems. Vancouver, Canada, 2009: 558-566 [8] Mutch J, Lowe D G. Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields. International Journal of Computer Vision, 2008, 80(1): 45-47 [9] Rolls E, Deco G. Computational Neuroscience of Vision. Oxford, UK: Oxford University Press, 2002 [10] Lee H, Grosse R, Ranganath R, et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Repre- sentations // Proc of the 26th Annual International Conference on Machine Learning. Montreal, Canada, 2009: 609-616 [11] Ranzato M A, Huang Fujie, Boureau Y L, et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition[EB/OL].[2012-05-30].http://www.cs.nyu.edu/~ylan/files/publi/ranzato-cvpr-07.pdf [12] Serre T, Wolf L, Bileschi S, et al. Robust Object Recognition with Cortex-Like Mechanisms. IEEE Trans on Pattern Analysis and Machine Intelligence, 2007, 29(3): 411-426 [13] Sivic J, Russell B C, Efros A A, et al. Discovering Objects and Their Locations in Images // Proc of the 10th IEEE International Conference on Computer Vision. Beijing, China, 2005, I: 370-377 [14] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003, 3(1): 993-1022 [15] Blei D M, Griffiths T L, Jordan M I, et al. Hierarchical Topic Models and the Nested Chinese Restaurant Process[EB/OL]. [2012-07-25]. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2003_AA03.pdf [16] Ferguson T S. A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1973, 1(3): 209-230 [17] Heinrich G. Parameter Estimation for Text Analysis[EB/OL]. [2012-07-25]. http://www.arbylon.net/publications/text-est.pdf [18] Li Feifei, Fergus R, Perona P. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories. Computer Vision and Image Understanding, 2004, 106(1): 59-70 [19] Kavukcuoglu K, Sermanet P, Boureau Y L, et al. Learning Convolutional Feature Hierarchies for Visual Recognition[EB/OL]. [2012-07-25]. http://yann.lecun.com/exdb/publis/pdf/koray-nips-10.pdf [20] Fidler S, Boben M, Leonardis A. Similarity-Based Cross-Layered Hierarchical Representation for Object Categorization[EB/OL].[2012-06-10].http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4587409&tag=1