属性知识引导的自适应视觉感知与结构理解研究进展

doi:10.16451/j.cnki.issn1003-6059.202312004

摘要
图/表
参考文献(0)
相关文章 (12)

全文: PDF (2207 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要

机器通过自适应感知从环境中提取人类可理解的信息,从而在开放场景中构建类人智能.因属性知识具有类别无关的特性,以其为基础构建的感知模型与算法引起广泛关注.文中首先介绍属性知识引导的自适应视觉感知与结构理解的相关任务,分析其适用场景.然后,总结四个关键方面的代表性工作.1)视觉基元属性知识提取方法,涵盖底层几何属性和高层认知属性;2)属性知识引导的弱监督视觉感知,包括数据标签受限情况下的弱监督学习与无监督学习;3)图像无监督自主学习,包括自监督对比学习和无监督共性学习;4)场景图像结构化表示和理解及其应用.最后,讨论目前研究存在的不足,分析有价值的潜在研究方向,如大规模多属性基准数据集构建、多模态属性知识提取、属性知识感知模型场景泛化、轻量级属性知识引导的模型开发、场景图像表示的实际应用等.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	张知诚
	杨巨峰
	程明明
	林巍峣
	汤进
	李成龙
	刘成林

关键词 ：自适应感知, 结构理解, 属性知识, 弱监督学习, 无监督学习

Abstract：

Machines extract human-understandable information from the environment via adaptive perception to build intelligent system in open-world scenarios. Derived from the class-agnostic characteristics of attribute knowledge, attribution-guided perception methods and models are established and widely studied. In this paper, the tasks involved in attribution-guided adaptive visual perception and structure understanding are firstly introduced, and their applicable scenarios are analyzed. The representative research on four key aspects is summarized. Basic visual attribute knowledge extraction methods cover low-level geometric attributes and high-level cognitive attributes. Attribute knowledge-guided weakly-supervised visual perception includes weakly supervised learning and unsupervised learning under data label restrictions. Image self-supervised learning covers self-supervise contrastive learning and unsupervised commonality learning. Structured representation and understanding of scene images and their applications are introduced as well. Finally, challenges and potential research directions are discussed, such as the construction of large-scale benchmark datasets with multiple attributes, multi-modal attribute knowledge extraction, scene generalization of attribute knowledge perception models, the development of lightweight attribute knowledge-guided models and the practical applications of scene image representation.

Key words： Adaptive Perception Structure Understanding Attribution Knowledge Weakly-Supervised Learning Unsupervised Learning

收稿日期: 2023-10-07

ZTFLH:

TP 37

基金资助:

科技创新2030-“新一代人工智能”重大项目(No.2018AAA0100400)、天津市自然科学基金杰出青年基金项目(No.20JCJQJC00020)、国家自然科学基金项目(No.62325109,U21B2013)、中央高校基本科研业务费资助

通讯作者: 杨巨峰,博士,教授,主要研究方向为计算机视觉.E-mail:yangjufeng@nankai.edu.cn.

作者简介: 张知诚,博士研究生,主要研究方向为计算机视觉.E-mail:gloryzzc6@sina.com.
程明明,博士,教授,主要研究方向为计算机视觉.E-mail:cmm@nankai.edu.cn.
林巍峣,博士,教授,主要研究方向为计算机视觉.E-mail:wylin@sjtu.edu.cn.
汤进,博士,教授,主要研究方向为计算机视觉.E-mail:tangjin@ahu.edu.cn.
李成龙,博士,教授,主要研究方向为计算机视觉.E-mail:lcl1314@foxmail.com.
刘成林,博士,研究员,主要研究方向为模式识别、机器学习、文档分析与识别等.E-mail:liucl@nlpr.ia.ac.cn.

引用本文:

张知诚, 杨巨峰, 程明明, 林巍峣, 汤进, 李成龙, 刘成林. 属性知识引导的自适应视觉感知与结构理解研究进展[J]. 模式识别与人工智能, 2023, 36(12): 1104-1126. ZHANG Zhicheng, YANG Jufeng, CHENG Mingming, LIN Weiyao, TANG Jin, LI Chenglong, LIU Chenglin. Progress in Attribution-Guided Adaptive Visual Perception and Structure Understanding. Pattern Recognition and Artificial Intelligence, 2023, 36(12): 1104-1126.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202312004 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2023/V36/I12/1104

[1] WU Y H, LIU Y, ZHAN X, et al.P2T: Pyramid Pooling Transformer for Scene Understanding. IEEE Transactions on Pattern Ana-lysis and Machine Intelligence, 2023, 45(11): 12760-12771.
[2] FAN D P, ZHANG J, XU G, et al.Salient Objects in Clutter. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2344-2366.
[3] LI S Y, LIU H B, QIAN R, et al.TA²N: Two-Stage Action Alignment Network for Few-Shot Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(2): 1404-1411.
[4] GUO W Y, ZHANG Y, YANG J F, et al.Re-Attention for Visual Question Answering. IEEE Transactions on Image Processing, 2021, 30: 6730-6743.
[5] WANG T, XU N, CHEN K A, et al.End-to-End Video Instance Segmentation via Spatial-Temporal Graph Neural Networks // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 10777-10786.
[6] LIU Y, GU Y C, ZHANG X Y, et al.Lightweight Salient Object Detection via Hierarchical Visual Perception Learning. IEEE Transactions on Cybernetics, 2021, 51(9): 4439-4449.
[7] FENG T L, LIU J X, YANG J F.Probing Sentiment-Oriented Pre-Training Inspired by Human Sentiment Perception Mechanism // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 2850-2860.
[8] LECUN Y, BENGIO Y, HINTON G. Deep Learning. Nature, 2015, 521(7553): 436-444.
[9] HUANG H Z, WANG Y, HU Q H, et al.Class Specific Semantic Reconstruction for Open Set Recognition. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2023, 45(4): 4214-4228.
[10] GAN R T, FAN J S, WANG Y X, et al.Interact with Open Scenes: A Life-Long Evolution Framework for Interactive Segmentation Models // Proc of the 30th ACM International Conference on Multimedia. New York, USA: ACM, 2022: 5688-5697.
[11] ZHU F, CHENG Z, ZHANG X Y, et al.OpenMix: Exploring Outlier Samples for Misclassification Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 12074-12083.
[12] LI J C, XIE C Y, WU X Y, et al. What Makes Good Open-Vocabulary Detector: A Disassembling Perspective[C/OL].[2023-09-20]. https://arxiv.org/pdf/2309.00227.pdf.
[13] MAO B J, ZHANG X B, WANG L F, et al.Learning from the Target: Dual Prototype Network for Few Shot Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(2): 1953-1961.
[14] LI X C, XIA X B, ZHU F, et al.Dynamics-Aware Loss for Lear-ning with Label Noise. Pattern Recognition, 2023, 144. DOI: 10.1016/j.patcog.2023.109835.
[15] CHENG Z, ZHU F, ZHANG X Y, et al.Adversarial Training with Distribution Normalization and Margin Balance. Pattern Recognition, 2023, 136. DOI: 10.1016/j.patcog.2022.109182.
[16] ZHANG Y, TI?O P, LEONARDIS A, et al. A Survey on Neural Network Interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 2021, 5(5): 726-742.
[17] WU Y H, GAO S H, MEI J, et al.JCS: An Explainable COVID-19 Diagnosis System by Joint Classification and Segmentation. IEEE Transactions on Image Processing, 2021, 30: 3113-3126.
[18] TANG K H, ZHANG H W, WU B Y, et al.Learning to Compose Dynamic Tree Structures for Visual Contexts // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 6612-6621.
[19] FAN J S, ZHANG Z X.Memory-Based Cross-Image Contexts for Weakly Supervised Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(5): 6006-6020.
[20] HUANG Y, WANG Y M, ZENG Y N, et al. MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-Text Matching[C/OL].[2023-09-20]. https://papers.nips.cc/paper_files/paper/2022/file/3379ce104189b72d5f7baaa03ae81329-Paper-Conference.pdf.
[21] TIAN K, ZHANG C H, WANG Y, et al.Knowledge Mining and Transferring for Domain Adaptive Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Wa-shington, USA: IEEE, 2021: 9113-9122.
[22] YU H Y, LI T, YU W C, et al.Regularized Graph Structure Learning with Semantic Knowledge for Multi-variates Time-Series Forecasting // Proc of the 31st International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2022: 2362-2368.
[23] FUKUSHIMA K.Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position-Neocognitron. IEICE Technical Report A, 1979, 62(10): 658-665.
[24] LOWE D G.Object Recognition from Local Scale-Invariant Features // Proc of the 7th IEEE International Conference on Computer Vision. Washington, USA: IEEE, 1999. DOI: 10.1109/ICCV.1999.790410
[25] LOWE D G.Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110.
[26] DETONE D, MALISIEWICZ T, RABINOVICH A.SuperPoint: Self-Supervised Interest Point Detection and Description // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2018: 337-349.
[27] LIU Y, SHEN Z H, LIN Z X, et al. GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs[C/OL].[2023-09-20]. https://arxiv.org/pdf/1911.05932.pdf.
[28] LIN W Y, HE X Y, DAI W R, et al.Key-Point Sequence Lossless Compression for Intelligent Video Analysis. IEEE MultiMedia, 2020, 27(3): 12-22.
[29] DUDA R O, HART P E.Use of Hough Transformation to Detect Lines and Curves in Pictures. Communications of the ACM, 1972, 15(1): 11-15.
[30] ZHANG Z H, LI Z X, BI N, et al.PPGNet: Learning Point-Pair Graph for Line Segment Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 7098-7107.
[31] XUE N, BAI S, WANG F D, et al.Learning Attraction Field Map for Robust Line Segment Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 1595-1603.
[32] LEE J T, KIM H U, LEE C, et al.Semantic Line Detection and its Applications // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 3249-3257.
[33] HAN Q, ZHAO K, XU J, et al.Deep Hough Transform for Semantic Line Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 249-265.
[34] ZHAO K, HAN Q, ZHANG C B, et al.Deep Hough Transform for Semantic Line Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 4793-4806.
[35] KANOPOULOS N, VASANTHAVADA N, BAKER R L.Design of an Image Edge Detection Filter Using the Sobel Operator. IEEE Journal of Solid-State Circuits, 1988, 23(2): 358-367.
[36] SORIA X, RIBA E, SAPPA A.Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection // Proc of the IEEE Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2020: 1912-1921.
[37] LIU C, YANG J M, CEYLAN D, et al.PlaneNet: Piece-Wise Planar Reconstruction From a Single RGB Image // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 2579-2588.
[38] LIU C, KIM K, GU J W, et al.PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4445-4454.
[39] WANG T, LING H B.Gracker: A Graph-Based Planar Object Tracker. IEEE Transactions on Pattern Analysis and Machine Inte-lligence, 2017, 40(6): 1494-1501.
[40] ZHANG Z C, LIU S Z, YANG J F.Multiple Planar Object Trac-king // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 23460-23470.
[41] ZHANG Z C, CHEN S, WANG Z C, et al.PlaneSeg: Building a Plug-In for Boosting Planar Region Segmentation. IEEE Transactions on Neural Networks and Learning Systems, 2023. DOI: 10.1109/TNNLS.2023.3262544
[42] LIU J J, LIU Z A, PENG P, et al.Rethinking the U-Shape Structure for Salient Object Detection. IEEE Transactions on Image Processing, 2021, 30: 9030-9042.
[43] WU Y H, LIU Y, ZHANG L, et al.EDN: Salient Object Detection via Extremely-Downsampled Network. IEEE Transactions on Image Processing, 2022, 31: 3125-3136.
[44] WU Y H, LIU Y, ZHANG L, et al.Regularized Densely-Connec-ted Pyramid Network for Salient Instance Segmentation. IEEE Tran-sactions on Image Processing, 2021, 30: 3897-3907.
[45] LIU J J, HOU Q B, LIU Z A, et al.PoolNet+: Exploring the Potential of Pooling for Salient Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 887-904.
[46] FAN D P, CHENG M M, LIU Y, et al.Structure-Measure: A New Way to Evaluate Foreground Maps// Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 4558-4567.
[47] ZHOU T, FAN D P, CHENG M M, et al.RGB-D Salient Object Detection: A Survey. Computational Visual Media, 2021, 7: 37-69.
[48] FAN D P, ZHAI Y J, BORJI A, et al.BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network // Proc of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 275-292.
[49] ZHAI Y J, FAN D P, YANG J F, et al.Bifurcated Backbone Strategy for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, 2021, 30: 8727-8742.
[50] WU Y H, LIU Y, XU J, et al.MobileSal: Extremely Efficient RGB-D Salient Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 10261-10269.
[51] GAO S H, TAN Y Q, CHENG M M, et al.Highly Efficient Salient Object Detection with 100K Parameters // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 702-721.
[52] CHENG M M, GAO S H, BORJI A, et al.A Highly Efficient Model to Study the Semantics of Salient Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(11): 8006-8021.
[53] MINSKY M.The Society of Mind. New York, USA: Simon & Schuster, 1988.
[54] LIU S Z, ZHANG X, YANG J F.SER30K: A Large-Scale Dataset for Sticker Emotion Recognition // Proc of the 30th ACM International Conference on Multimedia. New York, USA: ACM, 2022: 33-41.
[55] ZHAO S J, GE Y X, QI Z A, et al. Sticker820K: Empowering Interactive Retrieval with Stickers[C/OL].[2023-09-20]. https://arxiv.org/pdf/2306.06870.pdf.
[56] WANG L J, GUO W Y, YAO X X, et al.Multimodal Event-Aware Network for Sentiment Analysis in Tourism. IEEE MultiMedia, 2021, 28(2): 49-58.
[57] WEN C S, JIA G L, YANG J F.DIP: Dual Incongruity Perceiving Network for Sarcasm Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 2540-2550.
[58] ESTRADA M L B, CABADA R Z, BUSTILLOS R O, et al. Opi-nion Mining and Emotion Recognition Applied to Learning Environments. Expert Systems with Applications, 2020, 150. DOI: 10.1016/j.eswa.2020.113265.
[59] BORTH D, CHEN T, JI R T, et al.SentiBank: Large-Scale Ontology and Classifiers for Detecting Sentiment and Emotions in Visual Content // Proc of the 21st ACM International Conference on Multimedia. New York, USA: ACM, 2013: 459-460.
[60] SUN M, YANG J F, WANG K, et al.Discovering Affective Regions in Deep Convolutional Neural Networks for Visual Sentiment Prediction // Proc of the IEEE International Conference on Multi-media and Expo. Washington, USA: IEEE, 2016. DOI: 10.1109/ICME.2016.7552961.
[61] SHE D Y, YANG J F, CHENG M M, et al.WSCNet: Weakly Supervised Coupled Networks for Visual Sentiment Classification and Detection. IEEE Transactions on Multimedia, 2020, 22(5): 1358-1371.
[62] YANG Y, JIA J, ZHANG S M, et al.How Do Your Friends on Social Media Disclose Your Emotions? Proceedings of the AAAI Conference on Artificial Intelligence, 2014, 28(1): 306-312.
[63] YANG J Y, LI J, LI L D, et al.Seeking Subjectivity in Visual Emo-tion Distribution Learning. IEEE Transactions on Image Processing, 2022, 31: 5189-5202.
[64] WANG L J, ZHANG X, JIANG N, et al.D²S: Dynamic Distribution Supervision for Multi-label Facial Expression Recognition // Proc of the IEEE International Conference on Multimedia and Expo. Washington, USA: IEEE, 2022. DOI: 10.1109/ICME52920.2022.9859687.
[65] YANG J F, SUN M, SUN X X.Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 224-230.
[66] YANG J F, SHE D Y, SUN M.Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network // Proc of the 26th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2017: 3266-3272.
[67] YANG J F, SHE D Y, LAI Y K, et al.Retrieving and Classifying Affective Images via Deep Metric Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 491-498.
[68] YAO X X, SHE D Y, ZHAO S C, et al.Attention-Aware Polarity Sensitive Embedding for Affective Image Retrieval // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 1140-1150.
[69] JIA G L, YANG J F.S²-VER: Semi-Supervised Visual Emotion Recognition // Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 493-509.
[70] YOU Q Z, LUO J B, JIN H L, et al.Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks. Proceedings of the AAAI conference on Artificial Intelligence, 2015, 29(1): 381-388.
[71] PAN J C, WANG S F, FANG L.Representation Learning through Multimodal Attention and Time-Sync Comments for Affective Video Content Analysis // Proc of the 30th ACM International Conference on Multimedia. New York, USA: ACM, 2022: 42-50.
[72] ZHAO S C, JIA G L, YANG J F, et al.Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies. IEEE Signal Processing Magazine, 2021, 38(6): 59-73.
[73] ZHAO S C, MA Y S, GU Y, et al.An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(1): 303-311.
[74] ZHANG Z C, YANG J F.Temporal Sentiment Localization: Listen and Look in Untrimmed Videos // Proc of the 30th ACM International Conference on Multimedia. New York, USA: ACM, 2022: 199-208.
[75] ZHANG Z C, WANG L J, YANG J F.Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 18888-18897.
[76] LI P, YANG Y, ZHAO W D, et al.Evaluation of Image Fire Detection Algorithms Based on Image Complexity. Fire Safety Journal, 2021, 121. DOI: 10.1016/j.firesaf.2021.103306.
[77] DAI L C, ZHANG K, ZHENG X S, et al.Visual Complexity of Shapes: A Hierarchical Perceptual Learning Model. The Visual Computer, 2022, 38: 419-432.
[78] OLIVIA A, MACK M L, SHRESTHA M, et al.Identifying the Perceptual Dimensions of Visual Complexity of Scenes // Proc of the Annual Meeting of the Cognitive Science Society. New York, USA: ACM, 2004: 1041-1046.
[79] CHEN Y Q, DUAN J, ZHU Y, et al.Research on the Image Complexity Based on Neural Network // Proc of the International Conference on Machine Learning and Cybernetics. Washington, USA: IEEE, 2015: 295-300.
[80] SARAEE E, JALAL M, BETKE M.Visual Complexity Analysis Using Deep Intermediate-Layer Features. Computer Vision and Image Understanding, 2020, 195. DOI: 10.1016/j.cviu.2020.102949.
[81] FENG T L, ZHAI Y J, YANG J F, et al.IC9600: A Benchmark Dataset for Automatic Image Complexity Assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(7): 8577-8593.
[82] KHOSLA A, RAJU A S, TORRALBA A, et al.Understanding and Predicting Image Memorability at a Large Scale // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 2390-2398.
[83] SHOKRI R, STRONATI M, SONG C Z, et al.Membership Infe-rence Attacks against Machine Learning Models // Proc of the IEEE Symposium on Security and Privacy. Washington, USA: IEEE, 2017: 3-18.
[84] ARPIT D, JASTRZ?BSKI S, BALLAS N, et al. A Closer Look at Memorization in Deep Networks // Proc of the 34th International Conference on Machine Learning. San Diego, USA: JMLR, 2017: 233-242.
[85] WANG Z, BOVIK A C, SHEIKH H R, et al.Image Quality Asse-ssment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[86] GIROD B.What's Wrong with Mean Squared Error // WASTON A B, ed. Digital Images and Human Vision. Cambridge, USA: MIT Press, 1993: 207-220.
[87] ZHANG R, ISOLA P, EFROS A A, et al.The Unreasonable Effe-ctiveness of Deep Features as a Perceptual Metric // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 586-595.
[88] DING K Y, MA K D, WANG S Q, et al.Image Quality Assessment: Unifying Structure and Texture Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(5): 2567-2581.
[89] ROY S, MITRA S, BISWAS S, et al.Test Time Adaptation for Blind Image Quality Assessment // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 16742-16751.
[90] LIU X L, VAN DE WEIJER J, BAGDANOV A D. RankIQA: Learning from Rankings for No-Reference Image Quality Assessment // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 1040-1049.
[91] SU S L, YAN Q S, ZHU Y, et al.Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 3664-3673.
[92] ZHANG W X, LI D Q, MA C, et al.Continual Learning for Blind Image Quality Assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 2864-2878.
[93] KANG L, YE P, LI Y, et al.Convolutional Neural Networks for No-Reference Image Quality Assessment // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 1733-1740.
[94] PAN D, SHI P, HOU M, et al.Blind Predicting Similar Quality Map for Image Quality Assessment // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6373-6382.
[95] MURRAY N, MARCHESOTTI L, PERRONNIN F.AVA: A Large-Scale Database for Aesthetic Visual Analysis // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 2408-2415.
[96] ZHANG X D, GAO X B, LU W, et al.A Gated Peripheral-Foveal Convolutional Neural Network for Unified Image Aesthetic Prediction. IEEE Transactions on Multimedia, 2019, 21(11): 2815-2826.
[97] ZHUANG B H, LIU L Q, LI Y, et al.Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning from Web Data // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2915-2924.
[98] NAYAK G, GHOSH R, JIA X W, et al. Weakly Supervised Cla-ssification Using Group-Level Labels[C/OL].[2023-09-20]. https://arxiv.org/pdf/2108.07330v1.pdf.
[99] XU Y H, QIAN Q, LI H, et al.Weakly Supervised Representation Learning with Coarse Labels // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 10573-10581.
[100] JIANG B, WANG L L, CHENG J, et al.GPENs: Graph Data Learning with Graph Propagation-Embedding Networks. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 3925-3938.
[101] ZHENG Z H, YE R G, WANG P, et al.Localization Distillation for Dense Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 9397-9406.
[102] ZHANG S Q, LI C L, JIA Z, et al.DiagIoU Loss for Object Detection. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(12): 7671-7683.
[103] CHEN Z M, CHEN K, LIN W Y, et al.PIoU Loss: Towards Accu-rate Oriented Object Detection in Complex Environments // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 195-211.
[104] ZHANG D W, HAN J W, CHENG G, et al.Weakly Supervised Object Localization and Detection: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(9): 5866-5885.
[105] SHAO F F, CHEN L, SHAO J, et al.Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey. Neurocomputing, 2022, 496: 192-207.
[106] ZHANG Y M, CHEN T.Weakly Supervised Object Recognition and Localization with Invariant High Order Features[C/OL]. [2023-09-20].https://bmvc10.dcs.aber.ac.uk/proc/conference/paper47/paper47.pdf.
[107] TANG Y X, WANG X F, DELLANDREA E, et al.Fusing Generic Objectness and Deformable Part-Based Models for Weakly Supervised Object Detection // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2014: 4072-4076.
[108] SIVA P, RUSSELL C, XIANG T, et al.Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 3238-3245.
[109] SHI Z Y, HOSPEDALES T M, XIANG T.Bayesian Joint Mode-lling for Object Localisation in Weakly Labelled Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(10): 1959-1972.
[110] CINBIS R G, VERBEEK J, SCHMID C.Multi-fold MIL Training for Weakly Supervised Object Localization // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Wa-shington, USA: IEEE, 2014: 2409-2416.
[111] DESELAERS T, ALEXE B, FERRARI V.Localizing Objects While Learning Their Appearance // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2010: 452-466.
[112] SINGH K K, XIAO F Y, LEE Y J.Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 3548-3556.
[113] LI D, HUANG J B, LI Y L, et al.Weakly Supervised Object Localization with Progressive Domain Adaptation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 3512-3520.
[114] SHI M J, CAESAR H, FERRARI V.Weakly Supervised Object Localization Using Things and Stuff Transfer // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 3401-3410.
[115] ZHANG D W, HAN J W, ZHAO L, et al.Leveraging Prior-Knowledge for Weakly Supervised Object Detection under a Collaborative Self-Paced Curriculum Learning Framework. International Journal of Computer Vision, 2019, 127: 363-380.
[116] SANG H B, NI Z L, HE H Y, et al.Trace-Level Invisible Enhanced Network for 6D Pose Estimation // Proc of the IEEE International Conference on Multimedia and Expo. Washington, USA: IEEE, 2022. DOI: 10.1109/ICME52920.2022.9859613.
[117] JIANG P T, HAN L H, HOU Q B, et al.Online Attention Accumulation for Weakly Supervised Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 7062-7077.
[118] LIU Y, WU Y H, WEN P S, et al.Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(3): 1415-1428.
[119] LIN Z, DUAN Z P, ZHANG Z, et al.KnifeCut: Refining Thin Part Segmentation with Cutting Lines // Proc of the 30th ACM International Conference on Multimedia. New York, USA: ACM, 2022: 809-817.
[120] 侯淇彬,韩凌昊,刘姜江,等.互联网图像驱动的语义分割自主学习.中国科学(信息科学), 2021, 51(7): 1084-1099.
(HOU Q B, HAN L H, LIU J J, et al.Autonomous Learning of Semantic Segmentation from Internet Images. Scientia Sinica Informationis, 2021, 51(7): 1084-1099.)
[121] MEI J, CHENG M M, XU G, et al.SANet: A Slice-Aware Network for Pulmonary Nodule Detection. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2022, 44(8): 4374-4387.
[122] CHEN J, LI Z H, LUO J B, et al.Learning a Weakly-Supervised Video Actor-Action Segmentation Model with a Wise Selection // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2020: 9898-9908.
[123] LIU Q, RAMANATHAN V, MAHAJAN D, et al.Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 13963-13973.
[124] LI Y X, XU N, YANG W J, et al.Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective. International Journal of Computer Vision, 2022, 130(10): 2408-2424.
[125] WANG X L, JABRI A, EFROS A A.Learning Correspondence from the Cycle-Consistency of Time // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 2566-2576.
[126] LI X T, LIU S F, DE MELLO S, et al.Joint-Task Self-Supervised Learning for Temporal Correspondence // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 318-328.
[127] YAN L Q, WANG Q F, MA S Q, et al.Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(1): 393-406.
[128] LIN F C, XIE H T, LIU C B, et al.Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(7): 4498-4512.
[129] LIU P D, HE Z B, YAN X Y, et al.WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations // Proc of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 2995-3004.
[130] ZHANG Z, JIN W D, XU J, et al.Gradient-Induced Co-Saliency Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 455-472.
[131] LI Y X, LIN W Y, WANG T, et al.Video Summarization via Cluster-Based Object Tracking and Type-Based Synopsis // Proc of the IEEE Conference on Multimedia Information Processing and Retrieval. Washington, USA: IEEE, 2020: 113-116.
[132] LOCATELLO F, WEISSENBORN D, UNTERTHINER T, et al. Object-Centric Learning with Slot Attention[C/OL].[2023-09-20]. https://arxiv.org/pdf/2006.15055.pdf.
[133] VO V H, SIZIKOVA E, SCHMID C, et al. Large-Scale Unsupervised Object Discovery[C/OL].[2023-09-20]. https://arxiv.org/abs/2106.06650.
[134] GAO S H, LI Z Y, YANG M H, et al.Large-Scale Unsupervised Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 7457-7476.
[135] YAO X X, ZHAO S C, XU P F, et al.Multi-source Domain Ada-ptation for Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 3253-3262.
[136] QIAN R, LI Y X, LIU H B, et al.Enhancing Self-Supervised Video Representation Learning via Multi-level Feature Optimization // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 7970-7981.
[137] CHEN S, XUE J H, CHANG J L, et al.SSL++: Improving Self-supervised Learning by Mitigating the Proxy Task-Specificity Problem. IEEE Transactions on Image Processing, 2021, 31: 1134-1148.
[138] YAO X X, ZHAO S C, LAI Y K, et al.APSE: Attention-Aware Polarity-Sensitive Embedding for Emotion-Based Image Retrieval. IEEE Transactions on Multimedia, 2020, 23: 4469-4482.
[139] VAN DEN OORD A, KALCHBRENNER N, ESPEHOLT L, et al. Conditional Image Generation with PixelCNN Decoders // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 4797-4805.
[140] VAN DEN OORD A, KALCHBRENNER N, KAVUKCUOGLU K. Pixel Recurrent Neural Networks // Proc of the 33rd International Conference on Machine Learning. San Diego, USA: JMLR, 2016: 1747-1756.
[141] KINGMA D P, DHARIWAL P.Glow: Generative Flow with Invertible 1×1 Convolutions // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 10236-10245.
[142] DINH L, SOHL-DICKSTEIN J, BENGIO S.Density Estimation Using Real NVP[C/OL]. [2023-09-20].https://arxiv.org/pdf/1605.08803.pdf.
[143] VAN DEN OORD A, VINYALS O, KAVUKCUOGLU K, et al. Neural Discrete Representation Learning // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6309-6318.
[144] HE K M, CHEN X L, XIE S N, et al.Masked Autoencoders Are Scalable Vision Learners // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 15979-15988.
[145] DEVLIN J, CHANG M W, LEE K, et al.BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long and Short Papers). Stroudsburg, USA: ACL, 2019: 4171-4186.
[146] WEI C, FAN H Q, XIE S N, et al.Masked Feature Prediction for Self-Supervised Visual Pre-Training // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Wa-shington, USA: IEEE, 2022: 14648-14658.
[147] LI S Y, WU D, WU F, et al.Architecture-Agnostic Masked Image Modeling-From ViT Back to CNN // Proc of the 40th International Conference on Machine Learning. San Diego, USA: JMLR, 2023: 20149-20167. [148] HOU Z J, SUN F, CHEN Y K, et al. MILAN: Masked Image Pretraining on Language Assisted Representation[C/OL]. [2023-09-20]. https://arxiv.org/pdf/2208.06049.pdf.
[149] ZENG D L, LIAO M Y, TAVAKOLIAN M, et al. Deep Learning for Scene Classification: A Survey[C/OL]. [2023-09-20]. https://arxiv.org/abs/2101.10531.
[150] SIAGIAN C, ITTI L. Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(2): 300-312.
[151] REN Z, QIAN K, ZHANG Z X, et al. Deep Scalogram Representations for Acoustic Scene Classification. IEEE/CAA Journal of Automatica Sinica, 2018, 5(3): 662-669.
[152] WANG L, SNG D. Deep Learning Algorithms with Applications to Video Analytics for A Smart City: A Survey[C/OL]. [2023-09-20]. https://arxiv.org/abs/1512.03131.
[153] L??PEZ-CIFUENTES A, ESCUDERO-VINOLO M, BESC??S J, et al. Semantic-Aware Scene Recognition. Pattern Recognition, 2020, 102. DOI: 10.1016/j.patcog.2020.107256
[154] TONG Z H, SHI D X, YAN B Z, et al. A Review of Indoor-Outdoor Scene Classification // Proc of the 2nd International Conference on Control, Automation and Artificial Intelligence. New York, USA: ACM, 2017: 469-474.
[155] CHENG G, HAN J W, LU X Q. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proceedings of the IEEE, 2017, 105(10): 1865-1883.
[156] XIA G S, HU J W, HU F, et al. AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965-3981.
[157] MESAROS A, HEITTOLA T, VIRTANEN T. TUT Database for Acoustic Scene Classification and Sound Event Detection // Proc of the 24th European Signal Processing Conference. Washington, USA: IEEE, 2016: 1128-1132.
[158] LOWRY S, SÜNDERHAUF N, NEWMAN P, et al. Visual Place Recognition: A Survey. IEEE Transactions on Robotics, 2016, 32(1): 1-19.
[159] ARANDJELOVIC R, GRONAT P, TORII A, et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1437-1451.
[160] BROWN M, SÜSSTRUNK S. Multispectral SIFT for Scene Category Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2011: 177-184.
[161] VISWANATHAN D G. Features from Accelerated Segment Test (FAST) [C/OL]. [2023-09-20]. https://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV1011/AV1FeaturefromAcceleratedSegmentTest.pdf.
[162] BAY H, ESS A, TUYTELAARS T, et al. Speeded-Up Robust Features(SURF). Computer Vision and Image Understanding, 2008, 110(3): 346-359.
[163] JEEVAN P P, VISWANATHAN K, ANANDU A S, et al. Wave-Mix: A Resource-Efficient Neural Network for Image Analysis[C/OL]. [2023-09-20]. https://arxiv.org/abs/2205.14375.
[164] WANG Q L, XIE J T, ZUO W M, et al. Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8): 2582-2597.
[165] 张羽丰,李昱希,赵明璧,等.局部双目视差回归的目标距离估计.中国图象图形学报, 2021, 26(7): 1604-1613.
(ZHANG Y F, LI Y X, ZHAO M B, et al. Object Distance Estimation Based on Stereo Regional Disparity Regression. Journal of Image and Graphics, 2021, 26(7): 1604-1613.)
[166] ZHANG Y F, LI Y X, ZHAO M B, et al. A Regional Regression Network for Monocular Object Distance Estimation // Proc of the IEEE International Conference on Multimedia and Expo Workshops. Washington, USA: IEEE, 2020. DOI: 10.1109/ICMEW46912.2020.9106012.
[167] LIU H B, LI J G, LI D, et al. Learning Scale-Consistent Attention Part Network for Fine-Grained Image Recognition. IEEE Transactions on Multimedia, 2021, 24: 2902-2913.
[168] FAN J H, LIU H B, YANG W J, et al. Speed Up Object Detection on Gigapixel-Level Images With Patch Arrangement // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 4643-4651.
[169] GUO M H, LU C Z, HOU Q B, et al. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation[C/OL]. [2023-09-20]. https://arxiv.org/pdf/2209.08575v1.pdf.
[170] MEI J, LI R J, GAO W, et al. CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery. IEEE Transactions on Image Processing, 2021, 30: 8540-8552.
[171] PRANGEMEIER T, REICH C, KOEPPL H. Attention-Based Trans-formers for Instance Segmentation of Cells in Microstructures // Proc of the IEEE International Conference on Bioinformatics and Biomedicine. Washington, USA: IEEE, 2020: 700-707.
[172] LI S Y, LIU H B, FEI M J, et al. Temporal Alignment via Event Boundary for Few-shot Action Recognition[C/OL]. [2023-09-20]. https://www.bmvc2021-virtualconference.com/assets/papers/0878.pdf.
[173] LIU H B, LÜ W U X, SEE J, et al. Task-adaptive Spatial-Temporal Video Sampler for Few-Shot Action Recognition // Proc of the 30th ACM International Conference on Multimedia. New York, USA: ACM, 2022: 6230-6240.
[174] LI Y X, LIN W Y, SEE J, et al. CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization // Proc of the European Conference on Computer Vision. Berlin, Germany: Sprin-ger, 2020: 510-527.
[175] LI Y X, ZHANG B S, LI J, et al. LSTC: Boosting Atomic Action Detection with Long-Short-Term Context // Proc of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 2158-2166.
[176] QIAN R, HU D, DINKEL H, et al. Multiple Sound Sources Localization from Coarse to Fine // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 292-308.
[177] LI C L, LIU L, LU A D, et al. Challenge-Aware RGBT Tracking // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 222-237.
[178] CHANG X J, REN P Z, XU P F, et al. A Comprehensive Survey of Scene Graphs: Generation and Application. IEEE Transactions on Neural Networks and Learning Systems, 2023, 45(1): 1-26.
[179] ZAREIAN A, KARAMAN S, CHANG S F. Bridging Knowledge Graphs to Generate Scene Graphs // Proc of the European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2020: 606-623.
[180] LI H S, ZHU G M, ZHANG L, et al. Scene Graph Generation: A Comprehensive Survey. Neurocomputing, 2023. DOI: 10.1016/j.neucom.2023.127052. [181] LI Y K, OUYANG W L, ZHOU B L, et al. Factorizable Net: An Efficient Subgraph-Based Framework for Scene Graph Generation // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 346- 363.
[182] LAFFERTY J, MCCALLUM A, PEREIRA F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data // Proc of the 18th International Conference on Machine Learning. San Diego, USA: JMLR, 2001: 282-289.
[183] BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating Embeddings for Modeling Multi-relational Data // Proc of the 26th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2013: 2787-2795.
[184] DAI B, ZHANG Y Q, LIN D H. Detecting Visual Relationships with Deep Relational Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3298-3308.
[185] CONG W L, WANG W, LEE W C. Scene Graph Generation via Conditional Random Fields[C/OL]. [2023-09-20].https://arxiv.org/pdf/1811.08075.pdf.
[186] ZHANG H W, KYAW Z, CHANG S F, et al. Visual Translation Embedding Network for Visual Relation Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3107-3115.
[187] HUNG Z S, MALLYA A, LAZEBNIK S. Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 3820-3832.
[188] XU D F, ZHU Y K, CHOY C B, et al. Scene Graph Generation by Iterative Message Passing // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3097-3106.
[189] PLUMMER B A, MALLYA A, CERVANTES C M, et al. Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 1946-1955.