时序动作单元感知的开集动作识别

doi:10.16451/j.cnki.issn1003-6059.202309004

摘要
图/表
参考文献
相关文章 (6)

全文: PDF (1350 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要开集动作识别任务要求模型不仅能准确识别训练集中的类别,还能拒绝训练集上未出现的未知类动作.目前,大多数方法都将动作视为一个整体,忽略动作本身可被分解为更细粒度的动作单元.为此,文中提出时序动作单元感知的开集动作识别方法.首先,设计动作单元关系模块,学习细粒度的动作单元特征,得到动作和动作单元的关系模式,并通过已知类动作和未知类动作在动作单元上不同的激活程度识别未知类动作.然后,设计动作单元时序模块,建模动作单元的时序信息,研究动作单元的时序性,进一步区分因为外观相似而被混淆的已知类动作和未知类动作.最后,综合考虑关系模式与动作单元时序信息,使模型具备区分已知类动作和未知类动作的能力.在3个动作识别数据集上的实验表明,文中方法性能较优.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	杨凯翔
	高君宇
	冯洋博
	徐常胜

关键词 ：开集识别, 动作识别, 动作单元, 特征对齐, 时序感知

Abstract：In open set action recognition tasks, a model is requested to identify categories within the training set accurately and reject unknown actions that never appear in the training set. Currently, most of the methods treat the action as a whole, ignoring the fact that the action can be decomposed into finer-grained action units. To address this issue, a method for temporal action unit perception based open set action recognition is proposed in this paper. Firstly, an action unit relationship module is designed to learn fine-grained features of action units, and thus the relational pattern between actions and action units is obtained. The unknown actions are identified according to the different degrees of activation of known and unknown actions on action units. Secondly, an action unit temporal module is designed to model the temporal information of action units. The temporal characteristics of action units are explored to further distinguish between known actions and unknown actions that are visually similar but confusable with each other. Finally, with comprehensive consideration of both relational patterns and temporal information of action units, the model is equipped with the capability of distinguishing known actions from unknown actions. Experimental results on three action recognition datasets demonstrate the superior performance of the proposed method.

Key words： Open Set Recognition Action Recognition Action Unit Feature Alignment Temporal Perception

收稿日期: 2023-07-05

ZTFLH:	TP183
	TP391.41

基金资助:国家自然科学基金项目(No.62102415、62036012、62236008、U21B2044、61721004、62072286、62072455、62002355)、北京市自然科学基金项目(No.L201001)、之江实验室开放课题项目(No.2022RC0AB02)资助

通讯作者: 高君宇,博士,副研究员,主要研究方向为计算机视觉、多媒体计算.E-mail:junyu.gao@nlpr.ia.ac.cn.

作者简介: 杨凯翔,硕士研究生,主要研究方向为开集动作识别.E-mail:15005379839@139.com.冯洋博,博士研究生,主要研究方向为计算机视觉、多媒体、动作识别.E-mail:ybfeng6@gmail.com.徐常胜,博士,研究员,主要研究方向为多媒体分析与检索、模式识别、计算机视觉.E-mail:csxu@nlpr.ia.ac.cn.

引用本文:

杨凯翔, 高君宇, 冯洋博, 徐常胜. 时序动作单元感知的开集动作识别[J]. 模式识别与人工智能, 2023, 36(9): 806-817. YANG Kaixiang, GAO Junyu, FENG Yangbo, XU Changsheng. Temporal Action Unit Perception Based Open Set Action Recognition. Pattern Recognition and Artificial Intelligence, 2023, 36(9): 806-817.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202309004 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2023/V36/I9/806

[1] BENDALE A, BOULT T E.Towards Open Set Deep Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1563-1572.
[2] CHEN G Y, QIAO L M, SHI Y M, et al. Learning Open Set Network with Discriminative Reciprocal Points // Proc of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 507-522.
[3] KRISHNAN R, SUBEDAR M, TICKOO O.BAR: Bayesian Activity Recognition Using Variational Inference[C/OL]. [2023-06-22].https://arxiv.org/pdf/1811.03305.pdf.
[4] BAO W T, YU Q, KONG Y.Evidential Deep Learning for Open Set Action Recognition // Proc of the IEEE/CVF International Confe-rence on Computer Vision. Washington, USA: IEEE, 2021: 13329-13338.
[5] LUO W, ZHANG T Z, YANG W F, et al. Action Unit Memory Network for Weakly Supervised Temporal Action Localization // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 9969-9979.
[6] SOOMRO K, ZAMIR A R, SHAH M.UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild[C/OL]. [2023-06-22].https://arxiv.org/abs/1212.0402.
[7] KUEHNE H, JHUANG H, GARROTE E, et al. HMDB: A Large Video Database for Human Motion Recognition // Proc of the International Conference on Computer Vision. Washington, USA: IEEE, 2011: 2556-2563.
[8] MONFORT M, PAN B W, RAMAKRISHNAN K, et al. Multi-moments in Time: Learning and Interpreting Models for Multi-action Video Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 9434-9445.
[9] SETHI I K, JAIN R.Finding Trajectories of Feature Points in a Monocular Image Sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1987, 9(1): 56-73.
[10] ROSTEN E, DRUMMOND T.Machine Learning for High-Speed Corner Detection // Proc of the 9th European Conference on Computer Vision. Berlin, Germany: Springer, 2006: 430-443.
[11] WANG H, KLÄSER A, SCHMID C, et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition. International Journal of Computer Vision, 2013, 103: 60-79.
[12] SIMONYAN K, ZISSERMAN A.Two-Stream Convolutional Networks for Action Recognition in Videos // Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2014, I: 568-576.
[13] NG J Y, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Be-yond Short Snippets: Deep Networks for Video Classification // Proc of the IEEE Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2015: 4694-4702.
[14] ARANDJELOVIĆ R, GRONAT P, TORII A, et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1437-1451.
[15] TRAN D, BOURDEV L, FERGUS R, et al. Learning Spatiotemporal Features with 3d Convolutional Networks // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 4489-4497.
[16] 杨兴明,范楼苗.基于区域特征融合网络的群组行为识别.模式识别与人工智能, 2019, 32(12): 1116-1121.
(YANG X M, FAN L M.Group Activity Recognition Based on Regional Feature Fusion Network. Pattern Recognition and Artificial Intelligence, 2019, 32(12): 1116-1121.)
[17] 张浩博,付冬梅,周珂.时序增强的视频动作识别方法.模式识别与人工智能, 2020, 33(10): 951-958.
(ZHANG H B, FU D M, ZHOU K.Video-Based Temporal Enhanced Action Recognition. Pattern Recognition and Artificial Intelligence, 2020, 33(10): 951-958.)
[18] 胡正平,刁鹏成,张瑞雪,等.基于注意力机制的时间分组深度网络行为识别算法.模式识别与人工智能, 2019, 32(10): 892-900.
(HU Z P, DIAO P C, ZHANG R X, et al. Temporal Group Deep Network Action Recognition Algorithm Based on Attention Mecha-nism. Pattern Recognition and Artificial Intelligence, 2019, 32(10): 892-900.)
[19] CARREIRA J, ZISSERMAN A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset // Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 4724-4733.
[20] 黄敏,尚瑞欣,钱惠敏.面向视频中人体行为识别的复合型深度神经网络.模式识别与人工智能, 2022, 35(6): 562-570.
(HUANG M, SHANG R X, QIAN H M.Composite Deep Neural Network for Human Activities Recognition in Video. Pattern Re-cognition and Artificial Intelligence, 2022, 35(6): 562-570.)
[21] 童安炀,唐超,王文剑.基于双流网络与支持向量机融合的人体行为识别.模式识别与人工智能, 2021, 34(9): 863-870.
(TONG A Y, TANG C, WANG W J.Human Action Recognition Fusing Two-Stream Networks and SVM. Pattern Recognition and Artificial Intelligence, 2021, 34(9): 863-870.)
[22] VASWANI A, SHAZEER N, PARMAR N, et al.Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010.
[23] ZHA X F, ZHU W T, XUN L, et al.Shifted Chunk Transformer for Spatio-Temporal Representational Learning // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 11384-11396.
[24] XING Z, DAI Q, HU H, et al. SVFormer: Semi-Supervised Video Transformer for Action Recognition // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 18816-18826.
[25] AHN D, KIM S, HONG H, et al. STAR-Transformer: A Spatio-Temporal Cross Attention Transformer for Human Action Recognition // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2023: 3319-3328.
[26] HONG Y, KIM M J, LEE I, et al. Fluxformer: Flow-Guided Duplex Attention Transformer via Spatio-Temporal Clustering for Action Recognition. IEEE Robotics and Automation Letters, 2023, 8(10): 6411-6418.
[27] LI F Y, WECHSLER H.Open Set Face Recognition Using Transduction. IEEE Transactions on Pattern Analysis and Machine Inte-lligence, 2005, 27(11): 1686-1697.
[28] 郭凌云,李国和,龚匡丰,等.图像分布外检测研究综述.模式识别与人工智能, 2023, 36(7): 613-633.
(GUO L Y, LI G H, GONG K F, et al. Research on Image Out-of-Distribution Detection: A Review. Pattern Recognition and Artificial Intelligence, 2023, 36(7): 613-633.)
[29] GE Z Y, DEMYANOV S, CHEN Z T, et al. Generative Openmax for Multi-class Open Set Classification[C/OL].[2023-06-22]. https://arxiv.org/pdf/1707.07418.pdf.
[30] BUSTO P P, IQBAL A, GALL J.Open Set Domain Adaptation for Image and Action Recognition. IEEE Transactions on Pattern Ana-lysis and Machine Intelligence, 2020, 42(2): 413-429.
[31] FENG Y B, GAO J Y, YANG S C, et al. Spatial-Temporal Exclusive Capsule Network for Open Set Action Recognition. IEEE Tran-sactions on Multimedia, 2023. DOI: 10.1109/TMM.2023.3252275.
[32] CEN J, ZHANG S W, WANG X, et al. Enlarging Instance-Speci-fic and Class-Specific Information for Open-Set Action Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 15295-15304.
[33] ZHAO C, DU D W, HOOGS A, et al. Open Set Action Recognition via Multi-label Evidential Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 22982-22991.
[34] DU D W, SHRINGI A, HOOGS A, et al. Reconstructing Humpty Dumpty: Multi-feature Graph Autoencoder for Open Set Action Recognition // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2023: 3360-3369.
[35] RAKTHANMANON T, CAMPANA B, MUEEN A, et al. Sear-ching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping // Proc of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2012: 262-270.
[36] HADJI I, DERPANIS K G, JEPSON A D.Representation Lear-ning via Global Temporal Alignment and Cycle-Consistency // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2021: 11063-11072.
[37] LIN J, GAN C, HAN S.TSM: Temporal Shift Module for Efficient Video Understanding // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 7082-7092.
[38] GAL Y, GHAHRAMANI Z.Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning // Proc of the 33rd International Conference on Machine Learning. San Diego, USA: JMLR, 2016: 1050-1059.
[39] HENDRYCKS D, GIMPEL K.A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks[C/OL]. [2023-06-22].https://arxiv.org/pdf/1610.02136.pdf.
[40] FEICHTENHOFER C, FAN H Q, MALIK J, et al. SlowFast Networks for Video Recognition // Proc of the IEEE/CVF Internatio-nal Conference on Computer Vision. Washington, USA: IEEE, 2019: 6201-6210.
[41] YANG C Y, XU Y H, SHI J P, et al. Temporal Pyramid Network for Action Recognition // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 588-597.