基于层次部件树结构的动作识别判决模型<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201710003

摘要
图/表
参考文献
相关文章 (9)

全文: PDF (1136 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要研究从静止图像中识别人体姿态动作.首先提出层次部件树结构,树中每个节点由一组Poselet表示该肢体部件的姿态变化,节点之间相互制约,构成一个Pictorial结构.基于此结构,提出基于层次部件树结构的动作识别判决模型.Pictorial结构的对偶潜在函数中除了变形代价,引入Poselet同时出现代价.由于树的邻接节点之间存在包含关系,相对位置可以使用高斯分布描述,推理过程沿用距离转换和置信度传播算法,实现高效匹配.在2个数据集上,对剪枝后节点数量不同的3种判决模型的实验表明,前两层的粗粒度节点具有较强的动作识别显著性,第三层进一步提高动作识别能力,第四层的原子部件对动作识别无明显作用.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	钱银中
	沈一帆

关键词 ：动作识别, 姿态, Poselet, 层次部件树

Abstract：Action recognition of body pose from static image is exploited in this paper. A hierarchical part tree structure is proposed. In the structure, each node is composed by a collection of poselets to represent its pose variation and pairs of linked nodes are constrained to form a pictorial structure. Grounded on the structure, a discriminatively trained action recognition model based on hierarchical part tree is presented. Except for deforming cost, the pairwise potential function in the model introduces co-occurrence cost. Parent part contains child part and the relative position of linked nodes is described by normal distribution, and thus the matching procedure is inferred efficiently in the framework of distance transform and message passing. Three models with different number of nodes by trimming the tree are comparatively evaluated on two datasets. Experimental results demonstrate that coarse parts in former two layers have strong saliency for action recognition, the recognition capability is further improved by body parts in the third layer, and the anatomical stick parts in the fourth layer are basically not useful for action recognition.

Key words： Action Recognition Pose Poselet Hierarchical Part Tree

收稿日期: 2017-05-12

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.61472087)、江苏高校品牌专业建设项目(No.PPZY2015A090)资助

作者简介: 钱银中(通讯作者),男,1970年生,博士研究生,副教授,主要研究方向为计算机视觉、图像处理、机器学习.E-mail:10110240030@fudan.edu.cn.
沈一帆,男,1965年生,博士,教授,主要研究方向为计算机图形学、计算机视觉.E-mail:yfshen@fudan.edu.cn.

引用本文:

钱银中，沈一帆. 基于层次部件树结构的动作识别判决模型^*[J]. 模式识别与人工智能, 2017, 30(10): 885-893. QIAN Yinzhong, SHEN Yifan. Discriminatively Trained Action Recognition Model Based on Hierarchical Part Tree. , 2017, 30(10): 885-893.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201710003 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2017/V30/I10/885

[1] GUO G D, LAI A. A Survey on Still Image Based Human Action Recognition. Pattern Recognition, 2014, 47(10): 3343-3361.
[2] AGGARWAL J K, RYOO M S. Human Activity Analysis: A Review. ACM Computing Survey, 2011, 43(3). DOI: 10.1145/1922649.1922653.
[3] DALAL N, TRIGGS B. Histograms of Oriented Gradients for Human Detection // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2005: 886-893.
[4] IKIZLER-CINBIS N, CINBIS R G, SCLAROFF S. Learning Actions from the Web // Proc of the 12th IEEE International Confe-rence on Computer Vision. Washington, USA: IEEE, 2009: 995-1002.
[5] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial Structures for Object Recognition. International Journal of Computer Vision, 2005, 61(1): 55-79.
[6] YANG Y, REMANAN D. Articulated Pose Estimation with Flexible Mixtures-of-Parts // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2011: 1385-1392.
[7] YANG W, OUYANG W L, LI H S, et al. End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 3073-3082.
[8] GIRSHICK R, IANDOLA F, DARREL T, et al. Deformable Part Models are Convolutional Neural Networks // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 437-446.
[9] FAN X C, ZHENG K, LIN Y W, et al. Combining Local Appea-rance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 1347-1355.
[10] FAN X C, OUYANG W L, LI H S, et al. Structured Feature Learning for Pose Estimation // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 4715-4723.
[11] BOURDEV L, MALIK J. Poselet: Body Part Detectors Trained Using 3D Human Pose Annotations // Proc of the 12th IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2009: 1365-1372.
[12] BOURDEV L, MAJI S, BROX T, et al. Detecting People Using Mutually Consistent Poselet Activations // Proc of the 11th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag, 2010: 168-181.
[13] YANG W L, WANG Y, MORI G. Recognizing Human Actions from Still Images with Latent Poses // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2010: 2030-2037.
[14] WANG Y, TRAN D, LIAO Z C, et al. Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition. Journal of Machine Learning Research, 2012, 13: 3075-3102.
[15] MAJI S, BOURDEV L, MALIK J. Action Recognition from a Distributed Representation of Pose and Appearance // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2011: 3177-3184.
[16] YAO B P, JIANG X Y, KHOSLA A, et al. Human Action Recog-
nition by Learning Bases of Action Attributes and Parts // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2011: 1331-1338.
[17] FELZENSZWALB P F, HUTTENLOCHER D P. Distance Transforms of Sampled Functions. Theory of Computing, 2004, 8(19): 415-428.
[18] JOACHIMS T, FINLEY T, YU C N J. Cutting-Plane Training of Structural SVMs. Machine Learning, 2009, 77(1): 27-59.
[19] CHANG M W, YIH W T. Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction[C/OL]. [2017-03-30]. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DCD-SSVM.pdf.
[20] RAMANAN D. Dual Coordinate Solvers for Large-Scale Structural SVMs[J/OL]. [2017-03-30]. https://studylib.net/doc/18278267/dual-coordinate-solvers-for-large.
[21] NIEBLES J C, HAN B, FERENCZ A, et al. Extracting Moving People from Internet Videos // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer-Verlag, 2008: 527-540.