模式识别与人工智能
Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence
22 Judgement and Disposal of Academic Misconduct Article
22 Copyright Transfer Agreement
22 Proof of Confidentiality
22 Requirements for Electronic Version
More....
22 Chinese Association of Automation
22 National ResearchCenter for Intelligent Computing System
22 Institute of Intelligent Machines,Chinese Academy of Sciences
More....
 
 
2023 Vol.36 Issue.3, Published 2023-03-25

Papers and Reports    Researches and Applications    Surveys and Reviews   
   
Surveys and Reviews
191 Autonomous Driving Safety Challenge: Behavior Decision-Making and Motion Planning
GUAN Xin, SHI Jiamin, CHEN Shitao, LIU Jianyi, ZHENG Nanning
In the development of autonomous driving technology, safety is always regarded as a top priority. The behavior decision-making and motion planning systems, as key components of the technology, possess higher requirements for intelligence. They need continuously make optimal strategies and behaviors according to the changing environment to ensure the safety of vehicle driving. The behavior decision-making and motion planning systems are expounded. Firstly, the theory and applications of rule-based decision algorithms, supervised learning-based decision algorithms, and reinforcement learning-based decision algorithms are introduced. Then, sampling-based planning algorithms, graph search-based planning algorithms, numerical optimization-based planning algorithms and interaction-based planning algorithms in motion planning are discussed and their designs are discussed. Behavior decision-making and motion planning are analyzed from the perspective of safety, and the advantages and disadvantages of various methods are compared. Finally, future research directions and challenges for safety in the field of autonomous driving are predicted.
2023 Vol. 36 (3): 191-210 [Abstract] ( 845 ) [HTML 1KB] [ PDF 1929KB] ( 1217 )
Papers and Reports
211 A Deep Takagi-Sugeno-Kang Fuzzy Classifier for Imbalanced Data
BIAN Zekang, ZHANG Jin, WANG Shitong
Inspired by ensemble learning, a deep Takagi-Sugeno-Kang fuzzy classifier for imbalanced data(ID-TSK-FC) is proposed to enhance the generalization capability and maintain good linguistic interpretability of TSK fuzzy classifier on imbalanced data. ID-TSK-FC is composed of an imbalanced global linear regression sub-classifier(IGLRc) and several imbalanced TSK fuzzy sub-classifiers(I-TSK-FCs). According to the human cognitive behavior "from wholly coarse to locally fine" and the stacked generalization principle, ID-TSK-FC firstly trains an IGLRc on all training samples to obtain a wholly coarse classification result. Then, the nonlinear training samples in the original training samples are classified according to the output of IGLRc. Next, several I-TSK-FCs are generated using a stacked depth structure on the nonlinear training samples to achieve a locally fine result. Finally, the minimum distance voting principle is applied on the outputs of stacked IGLRc and all I-TSK-FCs to obtain the final output of ID-TSK-FC. The experimental results confirm that ID-TSK-FC not only holds interpretability based on feature importance, but also holds at least comparable generalization capability and linguistic interpretability.
2023 Vol. 36 (3): 211-224 [Abstract] ( 514 ) [HTML 1KB] [ PDF 744KB] ( 951 )
225 Calculation Method of RGBD Scene Flow Combining Gaussian Mixture Model and Multi-channel Bilateral Filtering
WANG Zige, LI Yingying, GE Liyue, CHEN Zhen, ZHANG Congxuan
To improve the computational accuracy and robustness of existing RGBD scene flow calculation methods under complex motion scenarios, such as large displacement and motion occlusion, a calculation method of RGBD scene flow combining Gaussian mixture model and multi-channel bilateral filtering is proposed. Firstly, a Gaussian mixture-based optical flow clustering segmentation model is constructed to extract target motion information from optical flow and optimize the results of depth map segmentation layer by layer. Consequently, high-confidence depth motion hierarchical segmentation information is obtained. Then, the RGBD scene flow estimation model combining the Gaussian mixture model and multi-channel bilateral filtering is established by introducing the multi-channel bilateral filtering optimization to overcome the edge-blurring problem of the scene flow computation. Finally, experiments on Middlebury and MPI-Sintel datasets demonstrate that the proposed method exhits higher accuracy and robustness in complex motion scenarios such as large displacements and motion occlusions, particularly in edge-preserving.
2023 Vol. 36 (3): 225-241 [Abstract] ( 297 ) [HTML 1KB] [ PDF 4310KB] ( 287 )
Researches and Applications
242 Texture and Depth Feature Enhancement Based Two-Stream Face Presentation Attack Detection Method
SUN Rui, FENG Huidong, SUN Qijing, SHAN Xiaoquan, ZHANG Xudong
Face presentation attack is a technology using photos, videos and other media to present faces in front of cameras to spoof face recognition systems. Most of the existing face presentation attack detection methods apply depth feature for supervised classification, while ignoring the effective fine-grained information and the correlation between depth information and texture information. Therefore, a texture and depth feature enhancement based two-stream face presentation attack detection method is proposed. One end of the network extracts the facial texture features with a more robust deception texture pattern than the original convolution network through the central differential convolution network. The other end of the network generates the depth information of the depth map through generative adversarial network to improve the robustness to the appearance changes and image quality differences. In the feature enhancement module, a central edge loss is designed to fuse and enhance two types of complementary features. The experimental results on 4 datasets show that the proposed method achieves superior performance in both intra-data set and cross-data set tests.
2023 Vol. 36 (3): 242-251 [Abstract] ( 405 ) [HTML 1KB] [ PDF 1005KB] ( 543 )
252 Double-Branch Multi-attention Mechanism Based Sharpness-Aware Classification Network
JIANG Wentao, ZHAO Linlin, TU Chao
The key to image classification methods based on convolutional neural networks is to extract distinctive important features. To focus on crucial features and enhance the generalization ability of the model, double-branch multi-attention mechanism based sharpness-aware classification network(DAMSNet) is proposed. Based on the ResNet-34 residual network, the size of the convolutional kernel in the input layer of the network is modified and the max pooling layer is removed to reduce the loss of original image features. Then, the double-branch multi-attention mechanism module is designed and embedded into the residual branch to extract the global and local contextual information in both channel and spatial domains. Additionally, sharpness-aware minimization(SAM) algorithm is introduced and combined with stochastic gradient descent optimizer to simultaneously minimize both loss value and loss sharpness, seeking for neighboring parameters with consistently low loss to enhance the generalization ability of the network. Experiments on CIFAR-10, CIFAR-100 and SVHN datasets demonstrate that DAMSNet achieves high classification accuracy and effectively enhances the generalization ability of the network.
2023 Vol. 36 (3): 252-267 [Abstract] ( 448 ) [HTML 1KB] [ PDF 3366KB] ( 675 )
268 Lightweight End-to-End Architecture for Streaming Speech Recognition
YANG Shuying, LI Xin
In streaming recognition methods, chunked recognition destroys parallelism and consumes more resources, while contextual recognition with restricted self-attention mechanism is difficult to obtain all information.Therefore, a lightweight end-to-end acoustic recognition method based on Chunk, CFLASH-Transducer, is proposed by combining the fast linear attention with a single head(FLASH) and convolutional neural networks(CNNs) to obtain delicate local features. Inception V2 network is introduced into the convolutional block to extract multi-scale local features of the speech signal.The coordinate attention mechanism is adopted to capture the location information of the features and interconnections among multiple channels. The depthwise separable convolution is utilized for feature enhancement and smooth transition between layers. The recurrent neural network transducer(RNN-T) architecture is employed for training and decoding to process audio. Global attention computed in the current block is passed into subsequent blocks as a hidden variable, connecting the information of each block, retaining the training parallelism and correlation, and avoiding the consumption of computing resources as the sequence grows.CFLASH-Transducer achieves high recognition accuracy on the open source dataset THCHS30 with the loss of streaming recognition accuracy less than 1% compared to offline recognition.
2023 Vol. 36 (3): 268-279 [Abstract] ( 375 ) [HTML 1KB] [ PDF 705KB] ( 645 )
模式识别与人工智能
 

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press
 
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn