Action Recognition Fused with RGB-D Multi-modal Features and Radius-Margin Bounded Extreme Learning
LIU Tianliang1, CHEN Kehu1, DAI Xiubin1, LUO Jiebo2
1.Jiangsu Provincial Key Laboratory of Image Processing and Image Communication, Nanjing University of Posts and Telecommunications, Nanjing 210003
2.Department of Computer Science, University of Rochester, Rochester 14627
An exemplars multiple kernel learning-extreme learning machine(MKL-ELM) based human action recognition approach with multi-modal visual feature fusion from RGB-D videos is proposed to solve the problem of single modal visual feature with the limited discrimination ability for all categories of human actions. Firstly, the robust and dense moving pose features with human skeleton surface fitting and dense trajectories from human motion are extracted. The sparse histogram of oriented principal component(SHOPC) features of 3D body geometry with the normal plane of dense point clouds is perceived and the histogram of 3D gradient orientation(HOG3D) features embedded with human appearance textures on spatial temporal neighbor of body nodes in the given videos is extracted. The modified MKL-ELM with radius-margin bound is exploited to fuse the given multi-modal visual features. Then, the set of the representative exemplars for each human action is mined with the contrast data technique. Finally, each sample is hierarchically classified by the designed exemplars-MKL-ELM model with greedy prediction strategy to recognize the human actions with the fused features and the given exemplars. The experiments show that compared with the traditional methods, the proposed action recognition method has significant advantages with high classification accuracy and computational efficiency.
刘天亮,陈克虎,戴修斌,罗杰波. 融合RGB-D多模特征和半径边缘约束超限学习的动作识别[J]. 模式识别与人工智能, 2018, 31(10): 958-964.
LIU Tianliang, CHEN Kehu, DAI Xiubin, LUO Jiebo. Action Recognition Fused with RGB-D Multi-modal Features and Radius-Margin Bounded Extreme Learning. , 2018, 31(10): 958-964.
[1] POPPE R. A Survey on Vision-Based Human Action Recognition. Image and Vision Computing, 2010, 28(6): 976-990.
[2] 袁 立,田子茹.基于融合特征的行人再识别方法.模式识别与人工智能, 2017, 30(3): 269-278.
(YUAN L, TIAN Z R. Person Re-identification Based on Multi-feature Fusion. Pattern Recognition and Artificial Intelligence, 2017, 30(3): 269-278.)
[3] CIPPITELLI E, GASPARRINI S, GAMBI E, et al. A Human Activity Recognition System Using Skeleton Data from RGBD Sensors. Computational Intelligence and Neuroscience, 2016. DOI: 10.1155/2016/4351435.
[4] WANG H, KLSER A, SCHMID C, et al. Action Recognition by Dense Trajectories // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2011: 3169-3176.
[5] ZANFIR M, LEORDEANU M, SMINCHISESCU C. The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2013: 2752-2759.
[6] RAHMANI H, MAHMOOD A,HUYNH D Q, et al. HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 742-757.
[7] SUNG J, PONCE C, SELMAN B, et al. Unstructured Human Activity Detection from RGBD Images // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2012: 842-849.
[8] LI X Q, ZHANG Y, LIAO D. Mining Key Skeleton Poses with Latent SVM for Action Recognition. Applied Computational Intelligence and Soft Computing, 2017. DOI: 10.1155/2017/5861435.
[9] YANG X D, ZHANG C Y, TIAN Y L. Recognizing Actions Using Depth Motion Maps-Based Histograms of Oriented Gradients // Proc of the 20th ACM International Conference on Multimedia. New York, USA: ACM, 2012: 1057-1060.
[10] YANG X D, TIAN Y L. Super Normal Vector for Activity Recognition Using Depth Sequences // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 804-811.
[11] GNEN M, ALPAYDIN E. Multiple Kernel Learning Algorithms. Journal of Machine Learning Research, 2011, 12: 2211-2268.
[12] WANG C Y, WANG Y Z, YUILLE A L. An Approach to Pose-Based Action Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 915-922.
[13] ZHANG M L, ZHOU Z H. ML-KNN: A Lazy Learning Approach to Multi-label Learning. Pattern Recognition, 2007, 40(7): 2038-2048.
[14] CORTES C, VAPNIK V. Support Vector Networks. Machine Learning, 1995, 20(3): 273-297.
[15] DENG C W, HUANG G B, XU J, et al. Extreme Learning Machines: New Trends and Applications. Science China Information Sciences, 2015, 58(2): 1-16.
[16] LIN L, WANG K Z, ZUO W M, et al. A Deep Structured Model with Radius-Margin Bound for 3D Human Activity Recognition. International Journal of Computer Vision, 2016, 118(2): 256-273.
[17] 夏利民,时晓亭.基于关键帧的复杂人体动作行为识别.模式识别与人工智能, 2016, 29(2): 154-162.
(XIA L M, SHI X T. Recognition of Complex Human Behavior Based on Key Frames. Pattern Recognition and Artificial Intelligence, 2016, 29(2): 154-162.)
[18] WALLACH H M. Topic Modeling: Beyond Bag-of-Words // Proc of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006: 977-984.
[19] ESCALERA S, GONZ LEZ J, BAR X, et al. Multi-modal Ges-ture Recognition Challenge 2013: Dataset and Results // Proc of the
International Conference on Multimodal Interaction. New York, USA: ACM, 2013: 445-452.
[20] LI W Q, ZHANG Z Y, LIU Z C. Action Recognition Based on a Bag of 3D Points // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2010: 9-14.
[21] LI L J, LI F F. What, Where and Who? Classifying Events by Scene and Object Recognition // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2007. DOI: 10.1109/ICCV.2007.4408872.