Abstract:Visual analysis includes moving object detection, moving object classification, human tracking and activity recognition and description. It has broad application prospects in many fields, such as smart vision surveillance, visual reality, intelligent human-computer interface, video compression and computer-aided clinical diagnosis. A comprehensive survey on vision-based human motion analysis is presented from the above four aspects, and the challenges and future directions are discussed.
[1] Wang Liang, Hu Weiming, Tan Tieniu. Recent Developments in Human Motion Analysis. Pattern Recognition, 2003, 36(3): 585-601 [2] Johansson G. Visual Motion Perception. Scientific American, 1975, 232(6): 76-88 [3] Du Youtian, Chen Feng, Xu Wenli, et al. A Survey on the Vision-Based Human Motion Recognition. Acta Electronica Sinica, 2007, 35(1), 84-90 (in Chinese) (杜友田,陈 峰,徐文立,等.基于视觉的人的运动识别综述.电子学报, 2007, 35(1): 84-90) [4] Yilmaz A, Javed O, Shah M. Object Tracking: A Survey. ACM Computing Surveys, 2006, 38(4): 13-57 [5] Stauffer C, Grimson W E L. Learning Patterns of Activity Using Real-Time Tracking. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22(8): 747-757 [6] Pavlidis T, Morellas V, Tsiam P, et al. Urban Surveillance Systems: From the Laboratory to the Commercial World. Proc of the IEEE, 2001, 89(10): 1478-1497 [7] Elgammal A M, Harwood D, Davis L. Non-Parametric Model for Background Subtraction // Proc of the 6th European Conference on Computer Vision. Dublin, Germany, 2000: 751-767 [8] Mittal A, Paragios N. Motion-Based Background Subtraction Using Adaptive Kernel Density Estimation // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA, 2004: 302-309 [9] Tuzel O, Porikli F, Meer P. A Bayesian Approach to Background Modeling // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005: 58-65 [10] Sun Jian, Zhang Weiwei, Tang Xiaoou, et al. Background Cut // Proc of the European Conference on Computer Vision. Graz, Austria, 2006: 628-641 [11] Lin H H, Liu T L, Chuang J H. A Probabilistic SVM Approach for Background Scene Initialization // Proc of the IEEE International Conference on Image Processing. New York, USA, 2002, Ⅲ: 893-896 [12] Lee D S. Effective Gaussian Mixture Learning for Video Background Subtraction. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(5): 827-832 [13] Yu Ting, Zhang Cha, Cohen M, et al. Monocular Video Foreground/Background Segmentation by Tracking Spatial-Color Gaussian Mixture Models // Proc of the IEEE Workshop on Motion and Vision Computing. Austin, USA, 2007: 5-12 [14] Cheung C S, Kamath C. Robust Techniques for Background Subtraction in Urban Traffic Video. Proc of the SPIE, 2004, 5308: 881-892 [15] Kass M, Witkin A, Terzopoulos D. Snakes: Active Contour Models. International Journal on Computer Vision, 1988, 1(4): 321-331 [16] Xu Chenyang, Prince J L. Snakes, Shapes and Gradient Vector Flow. IEEE Trans on Image Processing, 1998, 7(3): 359-369 [17] Paragios N, Deriche R. Geodesic Active Contours and Level Sets for the Detection and Tracking of Moving Objects. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22(3): 266-280 [18] Mansouri A R, Konrad J. Multiple Motion Segmentation with Level Sets. IEEE Trans on Image Processing, 2003, 12(2): 201-220 [19] Appleton B, Talbot H. Globally Minimal Surfaces by Continuous Maximal Flows. IEEE Trans on Pattern Analysis and Machine Intelligence, 2006, 28(1): 106-118 [20] Boykov Y, Jolly M P. Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in n-d Images // Proc of the 8th International Conference on Computer Vision. Vancouver, Canada, 2001, Ⅰ: 105-112 [21] Criminisi A, Cross G, Blake A, et al. Bilayer Segmentation of Live Video // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA, 2006: 53-60 [22] Kim M, Choi J G, Kim D. A VOP Generation Tool: Automatic Segmentation of Moving Objects in Image Sequences Based on Spatio-Temporal Information. IEEE Trans on Circuits and Systems for Video Technology, 1998, 9(8): 1216-1226 [23] Collins R T, Lipton A J, Kanade T, et al. A System for Video Surveillance and Monitoring: VSAM Report. Technical Report, CMU-RI-TR-00-12, Pittsburg, USA: Carnegie Mellon University. Robotics Institute, 2000 [24] Migliore D A, Matteucci M, Naccari M. A Revaluation of Frame Difference in Fast and Robust Motion Detection // Proc of the 4th ACM International Workshop on Video Surveillance and Sensor Networks. Santa Barbara, USA, 2006: 215-218 [25] Barron J L, Fleet D J, Beauchemin S S, et al. Performance of Optical Flow Techniques. International Journal of Computer Vision, 1994, 12(1): 42-77 [26] Adiv G. Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects. IEEE Trans on Pattern Analysis and Machine Intelligence, 1985, 7(4): 384-401 [27] Huang Shike, Tao Lin, Zhang Tianxu. An Improved Algorithm of Moving Object Detection Based on Optical Flow. Journal of Huazhong University of Science and Technology: Nature Science, 2005, 33(5): 39-41 (in Chinese) (黄士科,陶 琳,张天序.一种改进的基于光流的运动目标检测方法.华中科技大学学报:自然科学版, 2005, 33(5): 39-41) [28] Daniel G P, Chuang G, Sun M T. Semantic Video Object Extraction Using Four-Band Water Shed and Partition Lattice Operators. IEEE Trans on Circuits and Systems for Video Technology, 2001, 11(5): 603-618 [29] Shi Li, Zhang Zhaoyang. Extraction of Video Object Plane Using Modified Hausdorff Object Tracker. Journal of Image and Graphics, 2001, 6(7): 805-810 (in Chinese) (史 力,张兆扬.使用修改的豪氏道夫距离自动提取运动对象.中国图象图形学报, 2001, 6(7): 805-810) [30] Kim J S, Chen T. Multiple Feature Clustering for Image Sequence Segmentation. Pattern Recognition Letters, 2001, 22(11): 1207-1217 [31] Collins R T. Mean Shift Blob Tracking through Scale Space // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison, USA, 2003, Ⅱ: 234-240 [32] Kuno Y, Watanabe T, Shimosakoda Y, et al. Automated Detection of Human for Visual Surveillance System // Proc of the IEEE International Conference on Pattern Recognition. Vienna, Austria, 1996: 865-869 [33] Zhou Jianpeng, Hoang J. Real Time Robust Human Detection and Tracking System // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005: 149-156 [34] Ishii Y, Hongo H, Yamamoto K, et al. Face and Head Detection for a Real-Time Surveillance System // Proc of the 17th International Conference on Pattern Recognition. Cambridge, UK, 2004, Ⅲ: 298-301 [35] Dalal N, Triggs B. Histogram of Oriented Gradients for Human Detection // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005, Ⅰ: 886-893 [36] Cutler R, Davis L. Robust Real-Time Periodic Motion Detection, Analysis, and Application. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22(8): 781-796 [37] Ran Yang, Weiss I, Zheng Qinfen, et al. An Efficient and Robust Human Classification Algorithm Using Finite Frequencies Probing // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA, 2004, Ⅷ: 132-137 [38] Wren C R, Azarbayejani A, Darrell T, et al. Pfinder: Real-Time Tracking of Human Body. IEEE Trans on Pattern Analysis and Machine Intelligence, 1997, 19(7): 780-785 [39] Li Xin, Shahon M, Yilmaz A. Contour-Based Object Tracking with Occlusion Handling in Video Acquired Using Mobile Cameras. IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(11): 1531-1536 [40] Mittal A, Davis L S. M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene. International Journal of Computer Vision, 2003, 51(3): 189-203 [41] Zhao Tao, Nevaita R. 3D Tracking Human Locomotion: A Tracking as Recognition Approach // Proc of the 16th International Conference on Pattern Recognition. Québec, Canada, 2002, Ⅰ: 546-551 [42] Karaulova I, Hall P, Marshall A. A Hierarchical Model of Dynamics for Tracking People with a Single Video Camera // Proc of the 11th British Machine Vision Conference. Bristol, UK, 2000: 352-361 [43] Rehg J, Morris D D, Kanade T. Ambiguities in Visual Tracking of Articulated Objects Using Two and Three Dimension Models. International Journal of Robotics Research, 2003, 22(6): 393-418 [44] Wu Ying, Hua Gang, Yu Ting. Tracking Articulated Body by Dynamic Markov Network // Proc of the International Conference on Computer Vision. Nice, France, 2003, Ⅱ: 1094-1101 [45] Remondino F, Roditakis A. 3D Reconstruction of Human Skeleton from Single Images or Monocular Video Sequences // Proc of the 25th DAGM Symposium on Pattern Recognition. Magdeburg, Germany, 2003: 100-107 [46] Remondino F, Roditakis A. Human Figures Reconstruction and Modeling from Single Images or Monocular Video Sequences // Proc of the 4th IEEE International Conference on 3D Digital Imaging and Modeling. Banff, USA, 2003: 116-123 [47] Deutscher J, Blake A, Reid L. Articulated Body Motion Capture by Annealed Particle Filtering // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Hilton Head, USA, 2000, Ⅱ: 126-133 [48] Gavrila D M, Davis L S. 3D Model-Based Tracking of Human in Action: A Multi-View Approach // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA, 1996: 73-80 [49] Sminchisescu C. Estimation Algorithms for Ambiguous Visual Models: Three Dimensional Human Modeling and Motion Reconstruction in Monocular Video Sequences. Ph.D Dissertation. Grenoble, France: Institute National Polytechnique de Grenoble, 2002 [50] Theobalt C, Carranza J, Magnor M A, et al. Enhancing Silhouette-Based Human Motion Capture with 3D Motion Fields // Proc of the 11th Pacific Conference on Computer Graphics and Applications. Canmore, Canada, 2003: 185-193 [51] Bregler C, Malik J, Pullen K. Twist Based Acquisition and Tracking of Animal and Human Kinematics. International Journal of Computer Vision, 2004, 56(3): 179-194 [52] Bary M, Koller-Meier E, Schraudolph N N, et al. Stochastic Meta-Descent for Tracking Articulated Structures // Proc of the Workshop on Computer Vision and Pattern Recognition. Washington, USA, 2004, Ⅰ: 7-15 [53] Isard M, Blake A. Condensation: Conditional Density Propagation for Visual Tracking. International Journal of Computer Vision, 1998, 9(1): 5-28 [54] Isard M, Blake A. ICondensation: Unifying Low-Level and High-Level Tracking in a Stochastic Framework // Proc of the 5th European Conference on Computer Vision. Freiburg, Germany, 1998, Ⅰ: 893-898 [55] Isard M, Blake A. A Mixed-State Condensation Tracker with Automatic Model-Switching // Proc of the 6th International Conference on Computer Vision. Bombay, Indian, 1998: 107-112 [56] Pitt M K, Shephard N. Filtering via Simulation: Auxiliary Particle Filtering. Journal of the American Statistical Association, 1999, 94(446): 590-599 [57] Maskell S, Gordon N, Rollason M, et al. Efficient Multi-Target Tracking Using Particle Filters. Journal of Image and Vision Computing, 2002, 21(10): 931-939 [58] Sminchisescu C, Trigs B. Covariance-Scaled Sampling for Monocular 3D Body Tracking // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Hawaii, USA, 2001, Ⅰ: 447-454 [59] Nummiaro K, Koller-Meier E B, Gool L V. Color Feature for Tracking Non-Rigid Objects [EB/OL]. [2003-08-02]. http://www.koller-meier.ch/esther/nummiaroACTA03.pdf [60] Khan Z, Balch T, Dellaert F. Efficient Particle Filter-Based Tracking of Multiple Interacting Targets Using an MRF-Based Motion Model // Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Atlanta, USA, 2003, Ⅰ: 254-259 [61] Bobick A F, Davis J W. The Recognition of Human Movement Using Temporal Templates. IEEE Trans on Pattern Analysis and Machine Intelligence, 2001, 23(3): 257-267 [62] Bobick A F, Wilson A D. A State-Based Approach to the Representation and Recognition of Gesture. IEEE Trans on Pattern Analysis and Machine Intelligence, 1997, 19(12): 1325-1337 [63] Rahman M M, Ishikawa S. Human Motion Recognition Using an Eigenspace. Pattern Recognition Letters, 2005, 26(6): 687-697 [64] Sminchisescu C, Kanaujia A, Li Z G, et al. Conditional Models for Contextual Human Motion Recognition // Proc of the International Conference on Computer Vision. Beijing, China, 2005, Ⅱ: 1808-1815 [65] Yamato J, Ohya J, Ishii K. Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Champaign, USA, 1992: 379-385 [66] Chen H S, Chen H T, Chen Yiwen, et al. Human Action Recognition Using Star Skeleton // Proc of the 4th ACM International Workshop on Video Surveillance & Sensor Networks. Santa Barbara, USA, 2006: 171-178 [67] Brand M, Oliver N, Pentland A. A Coupled Hidden Markov Models for Complex Action Recognition // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Juan, Puerto Rico, 1997: 994-999 [68] Oliver N M, Rosario B, Pentland A P. A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22(8): 831-843 [69]Bui H H, Phung D Q, Venkatesh S. Hierarchical Hidden Markov Models with General State Hierarchy // Proc of the 19th National Conference on Artificial Intelligence. San Jose, USA, 2004: 324-329 [70] Nguyen N T, Phung D Q, Venkatesh S, et al. Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005: 955-960 [71] Luo Ying, Wu T D, Hwang J N. Object-Based Analysis and Interpretation of Human Motion in Sports Video Sequence by Dynamic Bayesian Networks. Computer Vision and Image Understanding, 2003, 92(2): 196-216 [72] Gong Shaogang, Tao Xiang. Recognition of Group Activities Using Dynamic Probabilistic Networks // Proc of the International Conference on Computer Vision. Washington, USA, 2003, Ⅱ: 742-749 [73] Ren Haibing, Xu Guangyou, Kee S. Subject-Independent Natural Action Recognition // Proc of the International Conference on Automatic Face and Gesture Recognition. Seoul, Korea, 2004: 523-528 [74] Cho K, Cho H, Um K. Human Action Recognition by Inference of Stochastic Regular Grammars // Proc of the Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition and Statistical Pattern Recognition. Lisbon, Portugal, 2004: 388-396 [75] Hu Weiming, Xie D, Tan Tieniu. Learning Activity Pattern Using Fuzzy Self-Organizing Neural Network. IEEE Trans on Systems, Man and Cybernetics, 2004, 34(3): 1618-1626 [76] Hongeng S, Nevatia R, Bremond F. Video-Based Event Recognition: Activity Representation and Probabilistic Recognition Methods. Computer Vision and Image Understanding, 2004, 96(2): 129-162 [77] Kojima A, Izumi M, Tamura T, et al. Generating Natural Language Description of Human Behavior from Video Images // Proc of the International Conference on Pattern Recognition. Barcelona, Spain, 2000, Ⅳ: 728-731 [78] Kojima A, Tamura T, Fukunaga K. Textual Description of Human Activities by Tracking Head and Hand Motions // Proc of the International Conference on Pattern Recognition. Québec, Canada, 2002, Ⅱ: 1073-1077 [79] Nevatia R, Zhao Tao, Hongeng S. Hierarchical Language Based Representation of Events in Video Streams // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Madison, USA, 2003, Ⅳ: 39-46 [80] Cho M, Song D, Kim P. Human Activity Description Using Motion Verbs in WordNet // Proc of the 8th International Conference on Advanced Communication Technology. Phoenix Park, USA, 2006: 446-449 [81] Green R D, Guan Ling. Quantifying and Recognizing Human Movement Patterns from Monocular Video Images — Part I: A New Framework for Modeling Human Motion. IEEE Trans on Circuits and Systems for Video Technology, 2004, 14(2): 179-190 [82] Yu S X, Shi Jiaobo. Segmentation Given Partial Grouping Constraints. IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(2): 173-183 [83] Levin A, Vlola P, Freund Y. Unsupervised Improvement of Visual Detectors Using Co-Training // Proc of the International Conference on Computer Vision. Nice, France, 2003, Ⅰ: 626-633 [84] Curio C, Giese M A. Combining View-Based and Model-Based Tracking of Articulated Human Movements // Proc of the IEEE Workshop on Motion and Vision Computing. Breckenridge, USA, 2005: 261-268