|
|
Group Activity Recognition Based on Regional Feature Fusion Network |
YANG Xingming1, FAN Loumiao1 |
1.School of Computer Science and Information Engineering,Hefei University of Technology, Hefei 230601 |
|
|
Abstract The existing group activity recognition methods cannot take full advantage of spatial information of the scene and the computational complexity of them is high. To solve these problems, a group activity recognition method based on regional feature fusion is proposed. Firstly, the convolution neural network is utilized to extract regional features of the scene, and then the regional features are split, arranged and combined into a series of regional feature sequences according to spatial position. Finally, long short term memory network is utilized to fuse regional feature sequences. Additionally, multilevel and multimodal strategies are adopted to improve the performance of the proposed method. Experiments on Collective and Volleyball datasets show that the proposed method achieves better performance.
|
Received: 26 July 2019
|
|
Fund:Supported by Natural Science Foundation of Anhui Province(No.1808085MF168) |
Corresponding Authors:
YANG Xingming, Ph.D., associate professor. His research interests include computer control, Internet of things, image processing and machine learning.
|
About author:: FAN Loumiao, master student. His research interests include image processing and deep learning. |
|
|
|
[1] CHOI W, SHAHID K, SAVARESE S. What Are They Doing?: Collective Activity Classification Using Spatio-Temporal Relationship among People // Proc of the 12th IEEE International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2009: 1282-1289. [2] CHOI W, SHAHID K, SAVARESE S. Learning Context for Collective Activity Recognition // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2011: 3273-3280. [3] CHOI W, SAVARESE S. A Unified Framework for Multi-target Tracking and Collective Activity Recognition // Proc of the Euro-pean Conference on Computer Vision. Berlin, Germany: Springer, 2012: 215-230. [4] 卢湖川,李佩霞,王 栋.目标跟踪算法综述.模式识别与人工智能, 2018, 31(1): 61-76. (LU H C, LI P X, WANG D. Visual Object Tracking: A Survey. Pattern Recognition and Artificial Intelligence, 2018, 31(1): 61-76.) [5] AMER M R, LEI P, TODOROVIC S. HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 572-585. [6] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-Based Lear-ning Applied to Document Recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [7] 马 力,王永雄.基于稀疏化双线性卷积神经网络的细粒度图像分类.模式识别与人工智能, 2019, 32(4): 336-344. (MA L, WANG Y X. Fine-Grained Visual Classification Based on Sparse Bilinear Convolutional Neural Network. Pattern Recognition and Artificial Intelligence, 2019, 32(4): 336-344.) [8] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [9] 吴 帅,徐 勇,赵东宁.基于深度卷积网络的目标检测综述.模式识别与人工智能, 2018, 31(4): 335-346. (WU S, XU Y, ZHAO D N. Survey of Object Detection Based on Deep Convolutional Network. Pattern Recognition and Artificial Intelligence, 2018, 31(4): 335-346.) [10] HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory. Neural Computation, 1997, 9(8): 1735-1780. [11] DONAHUE J, HENDRICKS L A, ROHRBACH M, et al. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 677-691. [12] DENG Z W, VAHDAT A, HU H X, et al. Structure Inference Machines: Recurrent Neural Networks for Analyzing Relations in Group Activity Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 4772-4781. [13] IBRAHIM M S, MURALIDHARAN S, DENG Z W, et al. A Hierarchical Deep Temporal Model for Group Activity Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2016: 1971-1980. [14] RAMANATHAN V, HUANG J, ABU-EL-HAIJA S, et al. Detecting Events and Key Actors in Multi-person Videos // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 3043-3053. [15] WANG M S, NI B B, YANG X K. Recurrent Modeling of Interaction Context for Collective Activity Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017, I: 3048-3056. [16] IBRAHIM M S, MORI G. Hierarchical Relational Networks for Group Activity Recognition and Retrieval // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 742-758. [17] AZAR S M, ATIGH M G, NICKABADI A. A Multi-stream Convolutional Neural Network Framework for Group Activity Recognition[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1812.10328.pdf. [18] LI X, CHUAH C M. SBGAR: Semantics Based Group Activity Re-cognition // Proc of the IEEE International Conference on Compu-ter Vision. Washington, USA: IEEE, 2017: 2876-2885. [19] XU K, BA J L, KIROS R, et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention // Proc of the 32th International Conference on Machine Learning. Berlin, Germany: Springer, 2015: 2048-2057. [20] BAGAUTDINOV T, ALAHI A, FLEURET F, et al. Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3425-3434. [21] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016, I: 770-778. [22] FARNEBCK G. Two-Frame Motion Estimation Based on Polynomial Expansion // Proc of the Scandinavian Conference on Image Analysis. Berlin, Germany: Springer, 2003: 363-370. [23] SHU T M, TODOROVIC S, ZHU S C. CERN: Confidence-Energy Recurrent Network for Group Activity Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 5523-5531. [24] HAJIMIRSADEGHI H, YAN W, VAHDAT A, et al. Visual Re-cognition by Counting Instances: A Multi-instance Cardinality Potential Kernel[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1502.02063.pdf. [25] LAN T, WANG Y, YANG W L, et al. Discriminative Latent Mo-dels for Recognizing Contextual Group Activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 34(8): 1549-1562. [26] MOLVHANOV P, TYREE S, KARRAS T, et al. Pruning Convolutional Neural Networks for Resource Efficient Inference[J/OL]. [2019-05-25]. https://users.aalto.fi/~ailat1/publications/molchanov2017iclr_paper.pdf. |
|
|
|