Abstract:In this paper, a sparse representation method for human interaction is proposed. The trajectory feature embodying global changes is fused with spatio-temporal feature emphasizing local movement. Firstly, the sparse representation of the trajectory feature is obtained by the bag of words model. Then, multi-level spatio-temporal features are produced by three layered spatial-temporal pyramid and processed by sparse coding. Multi-scale maxpooling algorithm is employed to obtain the local sparse feature. Finally, two kinds of sparse features are weighted and connected to obtain the sparse representation of human interaction. The dynamic latent conditional random field model is employed to verify the proposed sparse representation and the experimental results demonstrate the effectiveness.
[1] PATRON-PEREZ A, MARSZALEK M, REID I, et al. Structured Learning of Human Interaction in TV Shows. IEEE Trans on Pattern Analysis and Machine Intelligence, 2012, 34(12): 2441-2453. [2] KONG Y, JIA Y D, FU Y. Interactive Phrases: Semantic Descriptions for Human Interaction Recognition. IEEE Trans on Pattern Analysis and Machine Intelligence, 2014, 36(9): 1775-1788. [3] CHOI W, SAVARESE S. Understanding Collective Activities of People from Videos. IEEE Trans on Pattern Analysis and Machine Intelligence, 2014, 36(6): 1242-1257. [4] RYOO M S, AGGARWAL J K. Recognition of Composite Human Activities through Context-Free Grammar Based Representation // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA, 2006, II: 1709-1718. [5] BADLER N I. Temporal Scene Analysis: Conceptual Descriptions of Object Movements. Ph.D Dissertation. Philadelphia, USA: University of Pennsylvania, 1975. [6] 韩 磊,李君峰,贾云得.基于时空单词的两人交互行为识别方法.计算机学报, 2010, 33(4): 776-784. (HAN L, LI J F, JIA Y D. Human Interaction Recognition Using Spatio-Temporal Words. Chinese Journal of Computers, 2010, 33(4): 776-784.) [7] SALKOFF M. A Context-Free Grammar of French // Proc of the 8th Conference on Computational Linguistics. Stroudsburg, USA, 1980: 38-45. [8] PARK S, AGGARWAL J K. Recognition of Human Interaction Using Multiple Features in Gray Scale Images // Proc of the 15th International Conference on Pattern Recognition. Barcelona, Spain, 2000, I: 51-54. [9] LIU J G, KUIPERS B, SAVARESE S. Recognizing Human Actions by Attributes // Proc of the 24th IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA, 2011: 3337-3344. [10] RAPTIS M, SIGAL L. Poselet Key-Framing: A Model for Human Activity Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 2650-2657. [11] RIESENHUBER M, POGGIO T. Hierarchical Models of Object Recognition in Cortex. Nature Neuroscience, 1999, 2(11): 1019-1025. [12] RYOO M S, AGGARWAL J K, UT-Interaction Dataset, ICPR Contest on Semantic Description of Human Activities(SDHA). [EB/OL]. [2015-04-30]. http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html. [13] MARSZALEK M, LAPTEV I, SCHMID C. Actions in Context // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 2929-2936. [14] LUCAS B D, KANADE T. An Iterative Image Registration Technique with an Application to Stereo Vision // Proc of the 7th International Joint Conference on Artificial Intelligence. Vancouver, Canada, 1981, II: 674-679. [15] ZHANG D S, LU G J. A Comparative Study on Shape Retrieval Using Fourier Descriptors with Different Shape Signatures[EB/OL]. [2015-04-30].http://knight.temple.edu/~lakamper/courses/cis9601_2009/etc/fouriershape.pdf. [16] DOLLAR P, RABAUD V, COTTRELL G, et al. Behavior Recognition via Sparse Spatio-Temporal Features // Proc of the IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. Beijing, China, 2005: 65-72. [17] CHOI J, JEON W J, LEE S C. Spatio-Temporal Pyramid Matching for Sports Videos // Proc of the 1st ACM International Conference on Multimedia Information Retrieval. Vancouver, Canada, 2008: 291-297. [18] DALAL N, TRIGGS B. Histograms of Oriented Gradients for Human Detection // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005,I: 886-893. [19] CHAUDHRY R, RAVICHANDRAN A, HAGER G, et al. Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on Nonlinear Dynamical Systems for the Recognition of Human Actions // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 1932-1939. [20] NIEBLES J C, WANG H C, LI F F. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. International Journal of Computer Vision, 2008, 79(3): 299-318. [21] YANG J C, YU K, GONG Y H, et al. Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 1794-1801. [22] LEE H, BATTLE A, RAINA R, et al. Efficient Sparse Coding Algorithms [C/OL]. [2015-04-30].http://ai.stanford.edu/~hllee/nips06-sparsecoding.pdf. [23] KONG Y, JIA Y D, FU Y. Learning Human Interaction by Interactive Phrases // Proc of the 12th European Conference on Computer Vision. Florence, Italy, 2012, I: 300-313. [24] VAHDAT A, GAO B, RANJBAR M, et al. A Discriminative Key Pose Sequence Model for Recognizing Human Interactions // Proc of the IEEE International Conference on Computer Vision Workshops. Barcelona, Spain, 2011: 1729-1736. [25] RYOO M S. Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos // Proc of the IEEE International Conference on Computer Vision. Barcelona, Spain, 2011: 1036-1043. [26] YU T H, KIM T K, CIPOLLA R. Real-Time Action Recognition by Spatiotemporal Semantic and Structural Forests [C/OL].[2015-04-30]. http://www.iis.ee.ic.ac.uk/icvl/doc/bmvc10_action_cr.pdf. [27] RAPANTZIKOS K, AVRITHIS Y, KOLLIAS S. Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 1454-1461. [28] LAPTEV I, MARSZALEK M, SCHMID C, et al. Learning Realistic Human Actions from Movies // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA, 2008. DOI: 10.1109/CVPR.2008.4587756. [29] ZHANG B, DE NATALE F G B, CONCI N. Recognition of Social Interactions Based on Feature Selection from Visual Codebooks // Proc of the IEEE International Conference on Image Processing. Melbourne, Australia, 2013: 3557-3561. [30] WANG H, KLSER A, SCHMID C, et al. Action Recognition by Dense Trajectories // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA, 2011: 3169-3176.