Pattern Recognition and Artificial Intelligence

模式识别与人工智能

Wednesday, Aug. 13, 2025

Home About Journal Editorial Board Instructions Ethics Statement Contact Us 中文

	Judgement and Disposal of Academic Misconduct Article

	Copyright Transfer Agreement

	Proof of Confidentiality

	Requirements for Electronic Version

More....

	Chinese Association of Automation

	National ResearchCenter for Intelligent Computing System

	Institute of Intelligent Machines,Chinese Academy of Sciences

More....

	2022 Vol.35 Issue.6, Published 2022-06-25

	Deep Learning Based Object Detection and Recognition

Deep Learning Based Object Detection and Recognition

	483	Multi-scale Gradient Adversarial Examples Generation Network
		SHI Lei, ZHANG Xiaohan, HONG Xiaopeng, LI Jiliang, DING Wenjie, SHEN Chao
		Traditional person re-identification(ReID) adversarial attack methods hold some limitations, such as the dependence on the registry(Gallery) to generate adversarial examples and too single examples generation method . To address these problems, an efficient ReID adversarial attack model, multi-scale gradient adversarial examples generation network(MSG-AEGN), is put forward. MSG-AEGN is based on the multi-scale gradient adversarial networks. A multi-scale network structure is adopted to obtain different semantic levels of the input images and the intermediate features of the generator. An attention module is adopted to convert the intermediate features of the generator into multi-scale weights, thereby modulating the image pixels. Finally, the network outputs high-quality adversarial examples to confuse the ReID models. On this basis, an improved adversarial loss function based on the average distance of image features and the triplet loss is proposed to constrain and guide the training of MSG-AEGN. Experiments on three pedestrian ReID datasets, namely Market1501, CUHK03 and Duke-MTMC-ReID, show that the proposed method produces promising attack effects on both the mainstream Re-ID models based on deep convolutional neural networks and the transformer networks. Moreover, MSG-AEGN possesses the advantages of low required attack energy and high structural similarity between adversarial samples and original images.
		2022 Vol. 35 (6): 483-496 [Abstract] ( 860 ) [HTML 1KB] [ PDF 2839KB] ( 555 )

	497	Expression Recognition Based on Residual Attention Mechanism and Pyramid Convolution
		BAO Zhilong, CHEN Huahui
		With the widespread application of deep learning, facial expression recognition technology develops rapidly. However, how to extract multi-scale features and utilize key features efficiently is still a challenge for facial expression recognition network. To solve these problems, pyramid convolution is employed to extract multi-scale features effectively, and spatial channel attention mechanism is introduced to enhance the expression of key features. An expression recognition network based on residual attention mechanism and pyramidal convolution is constructed to improve the recognition accuracy. Multi-task convolutional neural network is utilized for face detection, face clipping and face alignment, and then the preprocessed images are sent to the feature extraction network. Meanwhile, the network is trained by combining Softmax Loss and the Center Loss to narrow the difference between the same expressions and enlarge the distance between different expressions. Experiments show that the accuracy of the proposed network on Fer2013 dataset and CK+ dataset is high, the number of network parameters is small and the proposed method is more suitable for the application of realistic scenarios of expression recognition.
		2022 Vol. 35 (6): 497-506 [Abstract] ( 506 ) [HTML 1KB] [ PDF 892KB] ( 667 )

	507	Lightweight Face Detection Algorithm with Multi-scale Feature Fusion
		WANG Jian, SONG Xiaoning
		Due to the limitations in computing capacity and storage resources of mobile devices, it is still an open challenge to design an efficient and high-precision face detector. In this paper, a lightweight face detection algorithm with multi-scale feature fusion(LFDMF) is proposed. The multi-level detection structure, regarded as the core component of face detection, is removed. Firstly, the existing lightweight backbone feature extraction network is introduced to encode the input image. Then, the proposed neck network is utilized to expand the receptive field of the feature map, and the multi-scale information with different receptive fields is fused into the one-level feature map. Finally, the proposed multi-task sensitive detector head is employed to perform face classification, regression and key point detection for the one-level feature map. Compared with the face detectors with RetinaFace and DSFD, LFDMF achieves higher accuracy and less computation burden. LFDMF builds three networks of different sizes. The large model, LFDMF-L, is built to achieve the most advanced performance on the Wider Face dataset, while the medium model, LFDMF-M, and the small model, LFDMF-S, achieve impressive performance with a small number of model parameters and less computation.
		2022 Vol. 35 (6): 507-515 [Abstract] ( 650 ) [HTML 1KB] [ PDF 1626KB] ( 708 )

	516	Cross-Domain Person Re-identification Method Based on Point-by-Point Feature Matching
		YANG Ping, WU Xiaohong, HE Xiaohai, CHEN Honggang, LIU Qiang, LI Bo
		To improve the poor generalization and cross-domain capability of the existing direct cross-dataset person re-identification methods, a cross-domain person re-identification method based on point-by-point feature matching is proposed. By the proposed method, the model only needs to be trained on the source domain and tested on the target domain to achieve better results. Firstly, to improve the poor robustness of the network for style and color of cross-domain pedestrian images, instance normalization layer(IN) is introduced into the ResNet50 basic network to extract image features. Secondly, the multi-head self-attention module of Transformer is combined with convolution to enhance the representation ability of features. Finally, by establishing a point-by-point feature mapping relationship in the deep features, image matching is regarded as a point-by-point process of finding the local optimum to improve the ability of the proposed model to resist perspective changes in unknown scenes and enhance its generalization. The experimental results show that the advantages of the proposed method in improving the generalization ability.
		2022 Vol. 35 (6): 516-525 [Abstract] ( 551 ) [HTML 1KB] [ PDF 1393KB] ( 704 )

	526	RGB-D Salient Object Detection Based on Spatial Constrained and Self-Mutual Attention
		YUAN Xiao, XIAO Yun, JIANG Bo, TANG Jin
		Aiming at the problem of RGB-D salient object detection, a RGB-D salient object detection method is proposed based on pyramid spatial constrained self-mutual attention. Firstly, a spatial constrained self-mutual attention module is introduced to learn multi-modal feature representations with spatial context awareness by the complementarity of multi-modal features. Meanwhile, the pairwise relationships between the query positions and surrounding areas are calculated to integrate self-attention and mutual attention, and thus the contextual features of the two modalities are aggregated. Then, to obtain more complementary information, the pyramid structure is applied to a set of spatial constrained self-mutual attention modules to adapt to different features of the receptive field under different spatial constraints and learn local and global feature representations. Finally, the multi-modal fusion module is embedded into a two-branch encoder-decoder network model, and the RGB-D salient object detection task is solved. Experiments on four benchmark datasets show strong competitiveness of the proposed me-thod in RGB-D salient object detection.
		2022 Vol. 35 (6): 526-535 [Abstract] ( 500 ) [HTML 1KB] [ PDF 1697KB] ( 473 )

	536	Textile Defect Detection Combining Attention Mechanism and Adaptive Memory Fusion Network
		DENG Shishuang, DI Lan, LIANG Jiuzhen, JIANG Daihong
		To solve the problems of high cost, low precision and slow speed of defect detection in textile production process, a textile defect detection model combining attention mechanism and adaptive memory fusion network is proposed. Firstly, the improved attention module is introduced into the YOLOv5 backbone network to build a SCNet feature extraction network and improve the ability to extract textile defect features. Then, an adaptive memory feature fusion network is proposed to enhance the transfer of shallow localization information and effectively mitigate the confounding effect generated during feature fusion. Thus, the feature scale invariance is improved while feature information in the backbone network is incorporated into the feature fusion layer. Finally, the control distance intersection over union loss function is introduced into the proposed model to increase the detection accuracy. Experiments on ZJU-Leaper and Tianchi textile defect datasets show that the proposed model generates higher detection accuracy and speed.
		2022 Vol. 35 (6): 536-547 [Abstract] ( 551 ) [HTML 1KB] [ PDF 3131KB] ( 492 )

	548	Real-Time Fire Detection Method with Multi-feature Fusion on YOLOv5
		ZHANG Dasheng, XIAO Hanguang, WEN Jie, XU Yong
		In natural scenes, the accuracy of fire detection is affected by weather conditions, light intensity and background interference. To achieve real-time accurate fire detection in complex scenarios, a real-time efficient fire detection method based on improved YOLOv5 is proposed. The proposed method is combined with Focal Loss, complete intersection over union loss function and multi-feature fusion to detect fires in real time. The focal loss function is introduced to alleviate the imbalance between positive and negative samples and make full use of the information of difficult samples. Meanwhile, combining the static and dynamic features of fires, a multi-feature fusion method is designed to eliminate false alarm fires. Aiming at the lack of fire datasets at home and abroad, a large-scale and high-quality fire dataset of 100 000 magnitude is constructed(http://www.yongxu.org/databases.html). Experiments show that the accuracy, speed, precision and generalization ability of the proposed method are significantly improved.
		2022 Vol. 35 (6): 548-561 [Abstract] ( 922 ) [HTML 1KB] [ PDF 10581KB] ( 840 )

	562	Composite Deep Neural Network for Human Activities Recognition in Video
		HUANG Min, SHANG Ruixing, QIAN Huimin
		Aiming at the deficiencies of 3D convolutional neural network and two-stream convolutional neural network for human activities recognition in video, a composite deep neural network combining two-stream convolutional network and 3D convolutional network is proposed. The improved residual(2+1)D convolutional neural network is utilized in both the temporal sub-network and the spatial sub-network of two-stream architecture. Behavioral representation and classification methods are learned from RGB and optical flow of video, respectively. The classification results of temporal stream and spatial stream sub-networks are combined. Furthermore, in the process of network training, stochastic gradient descent with the momentum improved by gradient centralization algorithm is proposed to improve the network generalization performance without varying the network structure. Experimental results show that the proposed network achieves higher accuracy on UCF101 and HMDB51.
		2022 Vol. 35 (6): 562-570 [Abstract] ( 581 ) [HTML 1KB] [ PDF 640KB] ( 620 )

模式识别与人工智能

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press

Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech 　Email:support@magtech.com.cn