Abstract:Semantic analysis is the importance and difficulty of high-level interpretation in image understanding, in which there are two key issues of text-image semantic gap and text description polysemy. Concentrating on semantization of images ontology, three sophisticated methodologies are roundly reviewed as generative, discriminative and descriptive grammar on the basis of concluding images semantic features and context expression. The objective benchmark and evaluation for semantic vocabulary are induced as well. Finally, the summarized directions for further researches on semantics in image understanding are discussed intensively.
[1] Gao Jun, Xie Zhao. Image Understanding Theory and Approach. Beijing, China: Science Press, 2009 (in Chinese) (高 隽,谢 昭.图像理解理论与方法.北京:科学出版社, 2009) [2] Xie Zhao, Gao Jun. A Novel Method for Scene Categorization with Constraint Mechanism Based on Gaussian Statistical Model. Acta Electronica Sinica, 2009, 37(4): 733-738 (in Chinese) (谢 昭,高隽.基于高斯统计模型的场景分类及约束机制新方法.电子学报, 2009, 37(4): 733-738) [3] Zhang Yujin. Contented - Based Visual Information Retrieval. Beijing, China: Science Press, 2003 (in Chinese) (章毓晋.基于内容的视觉信息检索.北京:科学出版社, 2003) [4] Moravec H P. Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover. Technical Report, CMU-RI-TR-80-03, Pittsburgh, USA: Carnegie Mellon University. Robotics Institute, 1980 [5] Mikolajczyk K, Schmid C. Scale Affine Invariant Interest Point Detectors. International Journal of Computer Vision, 2004, 60(1): 63-86 [6] Rothwell C A, Zisserman A, Forsythe D A, et al. Planar Object Recognition Using Projective Shape Representation. International Journal of Computer Vision, 1995, 16(1): 57- 99 [7] Nelson R C, Selinger A. A Cubist Approach to Object Recognition // Proc of the IEEE International Conference on Computer Vision. Bombay, India, 1998: 614-621 [8] Jurie F, Schmid C. Scale-Invariant Shape Features for Recognition of Object Categories // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA, 2004, Ⅱ: 90-96 [9] Mikolajczyk K, Schmid C. Indexing Based on Scale Invariant Interest Points // Proc of the 8th International Conference on Computer Vision. Vancouver, Canada, 2001: 525- 531 [10] Xie Zhao. Researches for Key Issues and Methods in Image Understanding. Ph.D Dissertation. Hefei, China: Hefei University of Technology. School of Computer and Information, 2007 (in Chinese) (谢 昭.图像理解的关键问题和方法研究.博士学位论文.合肥:合肥工业大学.计算机与信息学院, 2007) [11] Xie Zhao, Gao Jun. Object Localization Based on Visual Statistical Probabilistic Models. Journal of Image and Graphics, 2007, 12(7): 1234-1242 (in Chinese) (谢 昭,高 隽.基于视觉统计概率模型的目标定位.中国图象图形学报, 2007, 12(7): 1234-1242) [12] Lowe D G. Object Recognition from Local Scale Invariant Features // Proc of the IEEE International Conference on Computer Vision. Kerkyra, Greece, 1999, Ⅱ: 1150-1157 [13] Lowe D G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110 [14] Li Feifei, Perona P. A Bayesian Hierarchical Model for Learning Natural Scene Categories // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2001, Ⅱ: 524-531 [15] Biederman I. On the Semantics of a Glance at a Scene // Kubovy M, Pomerantz J R, eds. Perceptual Organization. Hillsdale, USA: Lawrence Erlbaum, 1981: 234-242 [16] Biederman I, Mezzanotte R J, Rabinowitz J C. Scene Perception: Detecting and Judging Objects Undergoing Relational Violations. Cognitive Psychology, 1982, 14(2): 143-177 [17] Oliva A, Torralba A. Modeling the Shape of the Scene a Holistic Representation of the Spatial Envelope. International Journal in Computer Vision, 2001, 42(3): 145-175 [18] Oliva A, Torralba A. Building the Gist of a Scene: The Role of Global Image Features in Recognition. Progress in Brain Research: Visual Perception, 2006, 155: 23-36 [19] Galleguillos C, Belongie S. Context Based Object Categorization: A Critical Survey. Technical Report, UCSD CS2008-0928, San Diego, USA: University of California. Department of Computer Science and Engineering, 2008 [20] Wolf L, Bileschi S. A Critical View of Context. International Journal of Computer Vision, 2006, 69(2): 251-261 [21] Dietterich T G, Bakiri G. Solving Multiclass Learning Problems via Error-Correcting Output Codes. Journal of Artificial Intelligence Research, 1995, 2(1): 263-286 [22] Rabinovich A, Vedaldi A, Galleguillos C, et al. Objects in Context // Proc of the IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil, 2007: 1-8 [23] Torralba A. Contextual Priming for Object Detection. International Journal of Computer Vision, 2003, 53(2): 169-191 [24] Bar M, Ullman S. Spatial Context in Recognition. Technical Report, CS93-22, Rehovot, Israel: Weizmann Institute of Science. Department of Applied Mathematics Computer Science, 1993 [25] Singhal A, Luo Jiebo, Zhu Weiyu. Probabilistic Spatial Context Models for Scene Content Understanding // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison, USA, 2003, Ⅰ: 235-241 [26] Strat T M, Fischler M A. Context-Based Vision: Recognizing Objects Using Information from Both 2D and 3D Imagery. IEEE Trans on Pattern Analysis and Machine Intelligence, 1991, 13(10): 1050-1065 [27] Torralba A, Murphy K, Freeman W T. Contextual Models for Object Detection Using Boosted Random Fields // Saul L K, Weiss Y, Bottou L, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2004, XIII: 1401-1408 [28] Galleguillos C, Rabinovich A, Belongie S. Object Categorization Using Co-Occurrence, Location and Appearance // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA, 2008: 1-8 [29] Hofmann T. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 2001, 42(1/2): 177-196 [30] Hofmann T. Probabilistic Latent Semantic Indexing // Proc of the 15th Conference on Uncertainty in Artificial Intelligence. Stockholm, Netherlands, 1999: 35-44 [31] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022 [32] Shi Jing, Hu Ming, Shi Xin, et al. Text Segmentation Based on Model LDA. Chinese Journal of Computers, 2008, 31(10): 1865-1873 (in Chinese) (石 晶,胡 明,石 鑫,等.基于LDA模型的文本分割.计算机学报, 2008, 31(10): 1865-1873) [33] Li Feifei, Forgus R, Perona P. A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories // Proc of the 9th International Conference on Computer Vision. Nice, France, 2003, 1134-1141 [34] Fergus R. Visual Object Category Recognition. Ph.D Dissertation. Oxford, UK: Oxford University. Department of Engineering Science, 2005 [35] Li Feifei, Fergus R, Perona P. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories // Proc of the Workshop on Generative-Model Based Vision in Computer Vision and Pattern Recognition. Washington, USA, 2004: 59-70 [36] Sudderth E B, Torralba A, Freeman W T, et al. Describing Visual Scenes Using Transformed Objects and Parts. International Journal of Computer Vision, 2007, 77(1/2/3): 291-330 [37] Sudderth E B. Graphical Models for Visual Object Recognition and Tracking. Ph.D Dissertation. Cambridge, USA: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, 2006 [38] Verbeek J, Triggs B. Region Classification with Markov Field Aspect Models // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA, 2007: 1367-1373 [39] Li Lijia, Socher R, Li Feifei. Towards Total Scene Understanding Classification, Annotation and Segmentation in an Automatic Framework // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 2036-2043 [40] Xie Zhao, Gao Jun, Wu Xindong. Regional Category Parsing in Undirected Graphical Models. Pattern Recognition Letters, 2009, 30(14): 1264-1272 [41] Torralba A, Fergus R, Freeman W T. 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition. IEEE Trans on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1958-1970 [42] Russell B C, Torralba A, Liu Ce, et al. Object Recognition by Scene Alignment // Proc of the Conference on Neural Information Processing Systems. Vancouver, Canada, 2007: 1-8 [43] Shi Zhiping, Hu Hong, Li Qingyong, et al. Cluster-Based Index Method for Video Database. Chinese Journal of Computers, 2007, 30(3): 397-404 (in Chinese) (施智平,胡 宏,李清勇,等.视频数据库的聚类索引方法.计算机学报, 2007, 30(3): 397-404) [44] Wang Jue. Machine Learning and Application. Beijing, China: Tsinghua University Press, 2006 (in Chinese) (王 珏.机器学习及其应用.北京:清华大学出版社,2006) [45] Dietterich T G. Ensemble Learning // Arbib M A, ed. Handbook of Brain Theory and Neural Networks. 2nd Edition. Cambridge, USA: MIT Press, 2002 [46] Freund Y, Schapire R. A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences, 1997, 55(1): 119-139 [47] Friedman J, Hastie T, Trbshirani R. Additive Logistic Regression: A Statistical View of Boosting. Annals of Statistics, 2000, 28(2): 337-374 [48] Torralba A, Murphy K, Freeman R. Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA, 2004, Ⅱ: 762-769 [49] Gao Jun, Xie Zhao, Wu Xindong. Generic Object Recognition with Regional Statistical Models and Layer Joint Boosting. Pattern Recognition Letters, 2007, 28(16): 2227-2237 [50] Shotton J, Winn J, Rother C, et al. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout and Context. International Journal of Computer Vision, 2009, 81(1): 2-23 [51] Shotton J, Winn J, Rother C, et al. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation // Proc of the 9th Europeon Conference on Computer Vision. Graz, Austria, 2006: 1-15 [52] Zhou Zhihua, Zhang Minling. Multi-Instance Multi-Label Learning with Application to Scene Classification // Schlkopf B, Platt J, Hofmann J, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2007, XIX: 1609-1616 [53] Li Yufeng, Guo Tianyon, Zhou Zhihua. Combo-Dimensional Kernels for Graph Classification. Chinese Journal of Computers, 2009, 32(5): 946-952 (in Chinese) (李宇峰,郭天佑,周志华.用于图分类的组合维核方法.计算机学报, 2009, 32(5): 946-952) [54] Teytaud O, Sarrut D. Kernel Based Image Classification // Proc of the International Conference on Artificial Neural Networks. Vienna, Austria, 2001: 369-375 [55] Sahbi H, Geman D. A Hierarchy of Support Vector Machines for Pattern Detection. The Journal of Machine Learning Research, 2006, 7: 2087-2123 [56] Fleuret F, Geman D. Fast Face Detection with Precise Pose Estimation // Proc of the 16th International Conference on Pattern Recognition. Québec, Canada, 2002, Ⅰ: 235-238 [57] Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005, Ⅱ: 886-893 [58] Ying Zilun, Tang Jinghai, Li Jinwen, et al. Support Vector Discriminant Analysis and Its Application to Facial Expression Recognition. Acta Electronica Sinica, 2008, 36(4): 725-730 (in Chinese) (应自炉,唐京海,李景文,等.支持向量鉴别分析及在人脸表情识别中的应用.电子学报, 2008, 36(4): 725-730) [59] Farhadi A, Endres I, Hoiem D, et al. Describing Objects by Their Attributes // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 1511-1519 [60] Zhu Songchun, Zhang Rong, Tu Zhuowen. Integrating Top-Down/Bottom-Up for Object Recognition by Data-Driven Markov Chain Monte Carlo // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Hilton Head, USA, 2000, Ⅰ: 738-745 [61] Zhu Songchun, Mumford D. A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2006, 2(4): 259-362 [62] Cruse D. Lexical Semantics. Cambridge, UK: Cambridge University Press, 1986 [63] Hoogs A, Collins R. Object Boundary Detection in Images Using Semantic Ontology // Proc of the 21st International Conference on Artificial intelligence. Boston, USA, 2006: 956-963 [64] Lu Hanqing, Liu Jing. Image Annotation Based on Graph Learning. Chinese Journal of Computers, 2008, 31(9): 1629-1639 (in Chinese) (卢汉清,刘 静.基于图学习的自动图像标注.计算机学报, 2008, 31(9): 1629-1639) [65] Hoogs A, Rittscher J, Stein G, et al. Video Content Annotation Using Visual Analysis and a Large Semantic Knowledgebase // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison, USA, 2003, Ⅱ: 327-334 [66] Li Feifei, Fergus R, Perona P. One-Shot Learning of Object Categories. IEEE Trans on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594-611 [67] Winn J, Criminisi A, Minka T. Object Categorization by Learned Universal Visual Dictionary // Proc of the 10th IEEE International Conference on Computer Vision. Beijing, China, 2005, Ⅱ: 1800-1807 [68] Ponce J, Berg T L, Everingham M, et al. Dataset Issues in Object Recognition // Tean P, Hebert M, Schmid C, eds. Toward Category-Level Object Recognition. New York, USA: Springer, 2006: 29-48 [69] von Ahn L, Dabbish L. Labeling Images with a Computer Game // Proc of the SIGCHI Cconference on Human Factors in Computing Systems. Vienna, Austria, 2004: 319-326 [70] Russell B C, Torralba A, Mruphy K P, et al. Labelme: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision, 2008, 77(1/2/3): 157-173 [71] Yao B, Yang Xiong, Zhu Songchun, et al. Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks // Proc of the International Symposium on Energy Optimization Algorithm in Computer Vision and Pattern Recognition. Ezhou, China, 2007: 169-183 [72] Deng Jie, Dong Wei, Socher R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 710-719 [73] Vogel J, Schiele B. Semantic Modeling of Natural Scenes for Content-Based Image Retrieval. International Journal of Computer Vision, 2007, 72(2): 133-157 [74] Barnard K, Fan Quanfu, Swaminathan R, et al. Evaluation of Localized Semantics: Data, Methodology, and Experiments. International Journal of Computer Vision, 2008, 77(1/2/3): 199-217 [75] Wang Yi, Zhou Lizhu, Xing Chunxiao. Video Semantic Models and Their Evaluation Criteria. Chinese Journal of Computers, 2007, 30(3): 337-351 (in Chinese)