基于双流网络与支持向量机融合的人体行为识别

doi:10.16451/j.cnki.issn1003-6059.202109009

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (671 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要传统的双流卷积神经网络存在难以理解长动作信息的问题,并且当长时间流信息损失时,模型泛化能力降低.针对此问题,文中提出基于双流网络与支持向量机融合的人体行为识别方法.首先,提取视频中每帧RGB图像及其对应垂直方向的稠密光流序列图,得到视频中动作的空间信息和时间信息,分别输入空间域和时间域网络进行预训练,预训练完成后进行特征提取.然后,针对双流网络提取的维度相同的特征向量执行并联融合策略,提高特征向量的表征能力.最后,将融合后的特征向量输入线性支持向量机中进行训练及分类处理.在KTH、UCF sports数据集上的实验表明文中方法具有较好的分类效果.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	童安炀
	唐超
	王文剑

关键词 ：双流网络, 支持向量机, 特征融合, 光流

Abstract：It is difficult for the traditional two-stream convolutional neural network to understand the long-motion information, and when the long-time stream information is lost, the generalization ability of the model decreases. Therefore, a method for human action recognition fusing two-stream network and support vector machine is proposed. Firstly, RGB images of each frame in the video and their corresponding dense optical flow sequence diagrams in the vertical direction are extracted, and the spatial information and time information of actions in the video are obtained. The information is input into the spatial domain and time domain networks for pre-training, and feature extraction is carried out after pre-training. Secondly, the feature vectors with the same dimension extracted from the two-stream network are fused in parallel to improve the representation ability of feature vectors. Finally, the fused feature vectors are input into the linear support vector machine for training and classification. The experimental results based on the standard open database proves that the classification effect of the proposed method is good.

Key words： Two-Stream Network Support Vector Machine Feature Fusion Optical Flow

收稿日期: 2021-04-28

ZTFLH:

TP 391.41

基金资助:安徽省自然科学基金项目(No.2008085MF202)、安徽高校自然科学重点项目(No.KJ2020A0660)、多模态认知计算安徽省重点实验室(安徽大学)开放基金项目(No.MMC202003)资助

通讯作者: 唐超,博士,副教授,主要研究方向为机器学习、计算机视觉.E-mail:tangchao77@sina.com.

作者简介: 童安炀,硕士研究生,主要研究方向为深度学习、计算机视觉.E-mail:1350466625@qq.com.
王文剑,博士,教授,主要研究方向为机器学习、计算智能.E-mail:wjwang@sxu.edu.cn.

引用本文:

童安炀, 唐超, 王文剑. 基于双流网络与支持向量机融合的人体行为识别[J]. 模式识别与人工智能, 2021, 34(9): 863-870. TONG Anyang, TANG Chao, WANG Wenjian. Human Action Recognition Fusing Two-Stream Networks and SVM. , 2021, 34(9): 863-870.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202109009 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2021/V34/I9/863

[1] BRÉMOND F, THONNAT M, ZÚÑIGA M. Video-Understanding Framework for Automatic Behavior Recognition. Behavior Research Methods, 2006, 38(3): 416-426.
[2] RAMEZANI M, YAGHMAEE F. A Review on Human Action Ana-lysis in Videos for Retrieval Applications. Artificial Intelligence Review, 2016, 46(4): 485-514.
[3] AZKUNE G, NÚÑEZ-MARCOS A, ARGANDA-CARRERAS I. Vision-Based Fall Detection with Convolutional Neural Networks. Wireless Communications and Mobile Computing, 2017. DOI: 10.1155/2017/9474806.
[4] KOPPULA H S, SAXENA A. Anticipating Human Activities Using Object Affordances for Reactive Robotic Response. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1): 14-29.
[5] AL-FARIS M, CHIVERTON J, NDZI D, et al. A Review on Computer Vision-Based Methods for Human Action Recognition. Journal of Imaging, 2020, 6(6). DOI: 10.3390/jimaging6060046.
[6] SULLIVAN J, CARLSSON S. Recognizing and Tracking Human Action // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2002: 629-644.
[7] OIKONOMOPOULOS A, PATRAS I, PANTIC M. Spatiotemporal Salient Points for Visual Recognition of Human Actions. IEEE Transactions on Systems, Man, and Cybernetics(Cybernetics), 2006, 36(3): 710-719.
[8] PATRONA F, CHATZITOFIS A, ZARPALAS D, et al. Motion Analysis: Action Detection, Recognition and Evaluation Based on Motion Capture Data. Pattern Recognition, 2018, 76(11): 612-622.
[9] KARPATHY A, TODERICI G, SHETTY S, et al. Large-Scale Video Classification with Convolutional Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, USA: IEEE, 2014: 1725-1732.
[10] SIMONYAN K, ZISSERMAN A. Two-Stream Convolutional Networks for Action Recognition in Videos // GHAHRAMANI Z, WELLING M, CORTES C, et al., eds. Advances in Neural Information Processing Systems 27. Cambridge, USA: The MIT Press, 2014: 568-576.
[11] JI S W, XU W, YANG M, et al. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2013, 35(1): 221-231.
[12] HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory. Neural Computation, 1997, 9(8): 1735-1780.
[13] WANG X H, GAO L L, WANG P, et al. Two-Stream 3D convNet Fusion for Action Recognition in Videos with Arbitrary Size and Length. IEEE Transactions on Multimedia, 2018, 20(3): 634-644.
[14] LI Z Y, GAVRILYUK K, GAVVES E, et al. VideoLSTM Convolves, Attends and Flows for Action Recognition. Computer Vision and Image Understanding, 2018, 166: 41-50.
[15] SAINATH T N, MOHAMED A, KINGSBURY B, et al. Deep Con-volutional Neural Networks for LVCSR // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2013: 8614-8618.
[16] FARNEBÄCK G. Two-Frame Motion Estimation Based on Polynomial Expansion // Proc of the Scandinavian Conference on Image Analysis. Berlin, Germany: Springer, 2003: 363-370.
[17] GUAN Q, HUA M, HU H G. A Modified Grabcut Approach for Image Segmentation Based on Local Prior Distribution // Proc of the International Conference on Wavelet Analysis and Pattern Reco-gnition. Washington, USA: IEEE, 2017: 122-126.
[18] DOLLAR P, RABAUD V, COTTRELL G, et al. Behavior Recognition via Sparse Spatio-Temporal Features // Proc of the IEEE International Workshop on Visual Surveillance and Performance Eva-luation of Tracking and Surveillance. Washington, USA: IEEE, 2005: 65-72.
[19] WANG H, KLĀSER A, SCHMID C, et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition. International Journal of Computer Vision, 2013, 103(1): 60-79.
[20] HALL D L, LLINAS J. An Introduction to Multisensor Data Fusion. Proceedings of the IEEE, 1997, 85(1): 6-23.
[21] YANG J, YANG J Y, ZHANG D, et al. Feature Fusion: Parallel Strategy vs. Serial Strategy. Pattern Recognition, 2003, 36(6): 1369-1381.
[22] TONG S, KOLLER D. Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Lear-ning Research, 2002, 2: 45-66.
[23] RODRIGUEZ M D, AHMED J, SHAH M. Action MACH a Spatio-Temporal Maximum Average Correlation Height Filter for Action Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2008. DOI: 10.1109/CVPR.2008.4587727.
[24] SCHULDT C, LAPTEV I, CAPUTO B. Recognizing Human Actions: A Local SVM Approach // Proc of the 17th International Conference on Pattern Recognition. Washington, USA: IEEE, 2004, III: 32-36.
[25] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional Two-Stream Network Fusion for Video Action Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1933-1941.
[26] SARGANO A B, WANG X F, ANGELOV P, et al. Human Action Recognition Using Transfer Learning with Deep Representations // Proc of the International Joint Conference on Neural Networks. Washington, USA: IEEE, 2017: 463-469.
[27] TU Z G, XIE W, QIN Q Q, et al. Multi-stream CNN: Learning Representations Based on Human-Related Regions for Action Re-cognition. Pattern Recognition, 2018, 79(2): 32-43.