基于注意力机制的时间分组深度网络行为识别算法

doi:10.16451/j.cnki.issn1003-6059.201910003

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (1180 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract Inspired by the mechanism of human visual perception, a temporal group deep network action recognition algorithm based on attention mechanism is proposed under the framework of deep learning. Aiming at the deficiency of local temporal information in describing complex actions with a long duration, the video packet sparse sampling strategy is employed to conduct video level time modeling at a lower cost. In the recognition stage, channel attention mapping is introduced to further utilize global feature information and capture classified interest points, and channel feature recalibration is performed to improve the expression ability of the network. Experimental results on UCF101 and HMDB51 datasets show that the recognition accuracy of the proposed algorithm is high.

Key words： Action Recognition Deep Learning Convolutional Neural Network Attention

Received: 17 June 2018

ZTFLH:

TP 391.4

Fund:Supported by General Program of National Natural Science Foundation of China(No.61771420), Natural Science Foundation of Hebei Province(No.F2016203422)

Corresponding Authors: HU Zhengping(, Ph.D., professor. His research interests include pattern recognition.

About author:: DIAO Pengcheng, master student. His research interests include action recognition;ZHANG Ruixue, master student. Her research interests include video classification;LI Shufang, Ph.D. candidate. Her resear-ch interests include pattern recognition;ZHAO Mengyao, Ph.D. candidate. Her research interests include video anomaly detection.

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	HU Zhengping
	DIAO Pengcheng
	ZHANG Ruixue
	LI Shufang
	ZHAO Mengyao

Cite this article:

HU Zhengping,DIAO Pengcheng,ZHANG Ruixue等. Temporal Group Deep Network Action Recognition Algorithm Based on Attention Mechanism[J]. , 2019, 32(10): 892-900.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201910003 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2019/V32/I10/892

[1] WANG H, SCHIMID C. Action Recognition with Improved Trajectories // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2013: 3551-3558.
[2] SANDLER M, HOWARD A, ZHU M L, et al. Mobilenetv2: Inverted Residuals and Linear Bottlenecks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4510-4520.
[3] ZHANG X Y, ZHOU X Y, LIN M X, et al. Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6848-6856.
[4] HE K M, ZHANG X Y, REN S Q, et al. Identity Mappings in Deep Residual Networks // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 630-645.
[5] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7794-7803.
[6] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2261-2269.
[7] KARPATHY A, TODERICI G, SHETTY S, et al. Large-Scale Vi-deo Classification with Convolutional Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 1725-1732.
[8] WANG L M, QIAO Y, TANG X O. Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 4305-4314.
[9] SIMONYAN K, ZISSERMAN A. Two-Stream Convolutional Networks for Action Recognition in Videos // Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2014, I: 568-576.
[10] NG J Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond Short Snippets: Deep Networks for Video Classification // Proc of the IEEE Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2015: 4694-4702.
[11] TRAN D, BOURDEV L, FERGUS R, et al. Learning Spatiotemporal Features with 3D Convolutional Networks // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 4489-4497.
[12] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2019-05-20]. https://arxiv.org/pdf/1409.1556.pdf.
[13] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778.
[14] SZEGEDY C, LIU W, JIA Y Q, et al. Going Deeper with Convolutions // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 1-9.
[15] XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated Residual Transformations for Deep Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 5987-5995.
[16] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-Excitation Networks // Proc of the IEEE Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2018: 7132-7141.
[17] MA C Y, CHEN M H, KIRA Z, et al. TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition. Signal Processing(Image Communication), 2019, 71: 76-87.
[18] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Loca-lization // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 618-626.
[19] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet Cla-ssification with Deep Convolutional Neural Networks // Proc of the 25th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2012: 1097-1105.
[20] WU C Y, ZAHEER M, HU H X, et al. Compressed Video Action Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6026-6035.
[21] QIU Z F, YAO T, MEI T. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 5533-5541.
[22] DIBA A, FAYYAZ M, SHARMA V, et al. Temporal 3D ConvNets Using Temporal Transition Layer // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 1230-1234.