模式识别与人工智能
Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence
22 Judgement and Disposal of Academic Misconduct Article
22 Copyright Transfer Agreement
22 Proof of Confidentiality
22 Requirements for Electronic Version
More....
22 Chinese Association of Automation
22 National ResearchCenter for Intelligent Computing System
22 Institute of Intelligent Machines,Chinese Academy of Sciences
More....
 
 
2021 Vol.34 Issue.9, Published 2021-09-25

Deep Learning Algorithms and the Applications in Image and Vision   
   
Deep Learning Algorithms and the Applications in Image and Vision
777 Interpretable Object Detection Method for Remote Sensing Image Based on Deep Reinforcement Learning
ZHAO Jiaqi, ZHANG Di, ZHOU Yong, CHEN Silin, TANG Jialan, YAO Rui
With the rapid development of remote sensing technology, object detection for remote sensing image is widely applied in many fields ,such as resource exploration, urban planning and natural disaster assessment. Aiming at the complex background and the small target scale of remote sensing images, an interpretable object detection method for remote sensing image based on deep reinforcement learning is proposed. Firstly, deep reinforcement learning is applied to the region proposal network in faster region-convolutional neural network to improve the detection accuracy of remote sensing images by modifying the excitation function. Secondly, the detection speed and portability of the model are improved by lightening the original backbone network with a large number of parameters. Finally, the interpretability of the hidden layer representation in the model is quantified using the network anatomy method to endow the model with an interpretable concept of human understanding. Experiments on three public remote sensing datasets show that the performance of the proposed method is improved and the effectiveness of the proposed method is verified by the improved network anatomy method.
2021 Vol. 34 (9): 777-786 [Abstract] ( 1362 ) [HTML 1KB] [ PDF 4205KB] ( 881 )
787 Visual Tracking Algorithm Based on Rotation Adaptation, Multi-feature Fusion and Multi-template Learning
DU Chenjie, YANG Yuxiang, WU Han, HE Zhiwei, GAO Mingyu
Visual target tracking remains a hard problem due to unpredictable target rotation and external interference. To address this issue, a target tracking algorithm based on rotation adaptation, multi-feature fusion and multi-template learning(RA-MFML) is proposed. Firstly, a multi-template learning model with complementary characteristics is constructed. The global filter template is used for tracking the target. When the global filter template is determined to be contaminated by the decidable filter template, it is corrected by the modified filter template. Then, the color histogram is regarded as visual supplementary information and fused with feature map of VGGNet-19 adaptively. The discriminating ability of the global filter template for object appearance is thus improved. Finally, a rotation adaptation strategy is proposed. The improved tracking confidence is utilized for the estimation of the optimal rotation angle of the tracking box to alleviate performance degradation of the global filter template caused by target rotation. The experiment on OTB-2013 and OTB-2015 datasets demonstrate that RA-MFML is superior in success rate and precision.
2021 Vol. 34 (9): 787-797 [Abstract] ( 608 ) [HTML 1KB] [ PDF 3453KB] ( 368 )
798 Person Re-identification Based on Fusion Relationship Learning Network
WU Ziqiang, CHANG Hong, MA Bingpeng
There are two problems in person re-identification methods based on graph convolutional network(GCN). While graphs are built for feature maps, the semantic information represented by graph node is not salient. The process of selecting feature blocks to build graph just relies on the relative distance among feature blocks, and the content similarity is ignored. To settle these two problems, an algorithm of person re-identification based on fusion relationship learning network(FRLN) is proposed in this paper. By employing attention mechanism, the maximum attention model makes the most important feature block more salient and assigns semantic information to it. The affinities of feature blocks are evaluated by the fusion similarity metric in the aspect of distance and content, and thus the metric is more comprehensive. By the proposed algorithm, the neighbor feature blocks are selected comprehensively and better input graph structures are provided for GCN. Hence, more robust structure relationship features are extracted by GCN. Experiments on iLIDS-VID and MARS datasets verify the effectiveness of FRLN.
2021 Vol. 34 (9): 798-808 [Abstract] ( 562 ) [HTML 1KB] [ PDF 1007KB] ( 503 )
809 Zero-Shot Attribute Recognition Based on De-redundancy Features and Semantic Relationship Constraint
ZHANG Guimei, LONG Bangyao, ZENG Jiexian, HUANG Junyang
The generative zero shot recognition method is affected by redundant information and domain shifting while generating features, and thus its recognition accuracy is poor. To deal with the problem, a zero shot attribute recognition method based on de-redundant features and semantic relationship constraint is proposed. Firstly, the visual features are mapped to a new feature space, and the visual features are de-redundant via cross-correlation information. The redundant visual features are removed with the correlation of the categories preserved. The accuracy of zero shot recognition is improved due to the reduction of redundant information interference in the recognition process. Then, a knowledge transfer model is established using the semantic relationship between the seen and unseen classes, and the loss of semantic relationship is introduced to constrain the process of knowledge transfer. Consequently, the semantic relationship between the seen and unseen classes is reflected better by the visual features generated by the generator ,and the problem of domain shifting between them is alleviated as well. Finally, the cycle consistency structure is introduced to make the generated pseudo-features closer to the real features. Experiments on datasets show that the proposed method improves the accuracy of zero shot recognition tasks with better generalization performance.
2021 Vol. 34 (9): 809-823 [Abstract] ( 429 ) [HTML 1KB] [ PDF 2937KB] ( 308 )
824 Saliency Background Guided Network for Weakly-Supervised Semantic Segmentation
BAI Xuefei, LI Wenjing, WANG Wenjian
Weakly-supervised semantic segmentation methods based on image-level annotation mostly rely on the initial response of class activation map to locate the segmented object region. However, the class activation map only focuses on the most discriminative area of the object, and the shortcomings exit, including small target area and blurred boundary. Therefore, the final segmentation result is incomplete. To overcome this problem, a saliency background guided network for weakly-supervised semantic segmentation is proposed. Firstly, the background seed region is generated through image saliency mapping and background iteration, and then it is fused with the class activation map generated by the classification network. Thus, effective pseudo pixel labels for training the semantic segmentation model are obtained. The segmentation process does not entirely depend on the most discriminative object. The information complementation is implemented through image saliency background features and class activation response map. Consequently, pixel labels are more accurate, and the performance of the segmentationnetwork is improved.Experiments on PASCALVOC 2012 dataset verify the effectiveness of the proposed method. Moreover, the proposed method makes a significant improvement in segmentation performance.
2021 Vol. 34 (9): 824-835 [Abstract] ( 465 ) [HTML 1KB] [ PDF 6006KB] ( 367 )
836 3D Point Cloud Classification Algorithm Based on Residual Edge Convolution
DU Zijin, CAO Feilong, YE Hailiang, LIANG Jiye
The irregularity and disorder of 3D point cloud make the classification of point cloud more challenging. Therefore, a 3D point cloud classification algorithm based on residual edge convolution is designed. The discriminative shape descriptor can be learned from the point cloud directly and used for target classification. Firstly, an edge convolution block with residual learning is designed for feature extraction on the point cloud. In the edge convolution block, local graph is constructed with the input point cloud through the K-nearest neighbor algorithm and the local features are extracted and aggregated via convolution and maximum pooling, respectively. Subsequently, global features are extracted from the original point features through the multi-layer perceptron and combined with the local features in a residual learning way. Finally, a deep neural convolution network is constructed with the convolution block regarded as the basic unit to realize the classification of 3D point cloud. The organic combination of local features and global features is considered comprehensively. With a deeper structure, the network makes the final shape descriptor more abstract and discriminative. Experiments on two challenging datasets, ModelNet40 and ScanObjectNN, show that the proposed method obtains superior classification results.
2021 Vol. 34 (9): 836-843 [Abstract] ( 497 ) [HTML 1KB] [ PDF 1050KB] ( 332 )
844 Smoke Density Measurement Method Based on Dual-Channel Deep CNN
MO Hongfei, XIE Zhenping
In the existing methods for measuring smoke density based on video images, features are mainly extracted manually, and the external conditions, such as ambient light and background, are required as known conditions. To improve the directness and practicability of the smoke density measurement method, the corresponding relationship between smoke images and their density values is established through the Aerosolization equation. The smoke density dataset is established as well. A smoke density measurement method based on dual-channel deep convolution neural network(DCCNN) is proposed to realize the end-to-end direct measurement of smoke density. In DCCNN, 1×1 convolution is used for channel data fusion, and jump connection is introduced to make the neural network training faster. The self-attention mechanism is also introduced to learn the importance of hidden features automatically. Finally, the features extracted from two channels are combined to obtain comprehensive measurement results. Experiments show that DCCNN gains lower mean absolute error and higher comprehensive performance.
2021 Vol. 34 (9): 844-852 [Abstract] ( 658 ) [HTML 1KB] [ PDF 1166KB] ( 945 )
853 Multi-branch Cooperative Network for Person Re-identification
ZHANG Lei, WU Xiaofu, ZHANG Suofei, YIN Zirui
Designing multi-branch networks to learn rich feature representation is one of the important directions in person re-identification (Re-ID). Aiming at the limited feature representation learned by a single branch, a multi-branch cooperative network for person Re-ID (BC-Net) is proposed. Powerful feature representation for person Re-ID is obtained by extracting features from four cooperative branches, local branch, global branch, relational branch and contrastive branch. The proposed network can be applied to different backbone networks. OSNet and ResNet are considered as the backbone of the proposed network for verification. Extensive experiments show that BC-Net achieves state-of-the-art performance on the popular Re-ID datasets.
2021 Vol. 34 (9): 853-862 [Abstract] ( 452 ) [HTML 1KB] [ PDF 2539KB] ( 367 )
863 Human Action Recognition Fusing Two-Stream Networks and SVM
TONG Anyang, TANG Chao, WANG Wenjian
It is difficult for the traditional two-stream convolutional neural network to understand the long-motion information, and when the long-time stream information is lost, the generalization ability of the model decreases. Therefore, a method for human action recognition fusing two-stream network and support vector machine is proposed. Firstly, RGB images of each frame in the video and their corresponding dense optical flow sequence diagrams in the vertical direction are extracted, and the spatial information and time information of actions in the video are obtained. The information is input into the spatial domain and time domain networks for pre-training, and feature extraction is carried out after pre-training. Secondly, the feature vectors with the same dimension extracted from the two-stream network are fused in parallel to improve the representation ability of feature vectors. Finally, the fused feature vectors are input into the linear support vector machine for training and classification. The experimental results based on the standard open database proves that the classification effect of the proposed method is good.
2021 Vol. 34 (9): 863-870 [Abstract] ( 464 ) [HTML 1KB] [ PDF 671KB] ( 458 )
871
2021 Vol. 34 (9): 871-872 [Abstract] ( 244 ) [HTML 1KB] [ PDF 228KB] ( 304 )
模式识别与人工智能
 

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press
 
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn