Pattern Recognition and Artificial Intelligence

模式识别与人工智能

Wednesday, Aug. 13, 2025

Home About Journal Editorial Board Instructions Ethics Statement Contact Us 中文

	Judgement and Disposal of Academic Misconduct Article

	Copyright Transfer Agreement

	Proof of Confidentiality

	Requirements for Electronic Version

More....

	Chinese Association of Automation

	National ResearchCenter for Intelligent Computing System

	Institute of Intelligent Machines,Chinese Academy of Sciences

More....

	2023 Vol.36 Issue.11, Published 2023-11-25

	0
		CHENG Xiang

		2023 Vol. 36 (11): 0-0 [Abstract] ( 260 ) [HTML 1KB] [ PDF 255KB] ( 1305 )

	967	Synesthesia of Machines Towards Intelligent Multi-modal Sensing-Communication Integration
		CHENG Xiang, ZHANG Haotian, LI Sijiang, HUANG Ziwei, YANG Zonghui, GAO Shijian, BAI Lu, ZHANG Jia'nan, ZHENG Xinhu, YANG Liuqing
		Integrated sensing and communications(ISAC) technique is limited to the sharing of radar sensing and communications at the spectrum and hardware levels, and it fails to enhance the performance of communication and sensing in future emerging application scenarios. In scenarios involving massive multi-modal sensing and communication data, ISAC should evolve towards the incorporation of multi-modal sensing, specifically intelligent multi-modal sensing-communication integration. Inspired by human synesthesia, a paradigm for intelligent multi-modal sensing-communication integration, synesthesia of machines(SoM), is systematically established and discussed in this paper. Firstly, three typical operational modes of SoM , SoM-evoke, SoM-enhance and SoM-concert, are systematically summarized, and thus the purposes and methods of the mutual assistance and enhancement between communications and multi-modal sensing are given comprehensively. Then, the data foundation of SoM research, mixed multi-modal sensing and communication(M³SC) simulation dataset, and the theoretical foundation of SoM research, SoM mechanism, are also discussed. Finally, the current research status of SoM is reviewed and future research directions are prospected.
		2023 Vol. 36 (11): 967-986 [Abstract] ( 590 ) [HTML 1KB] [ PDF 2844KB] ( 1049 )

	987	Sensing Image Data Based Unmanned Aerial Vehicle Channel Path Loss Prediction
		SUN Mingran, HUANG Ziwei, BAI Lu, CHENG Xiang, ZHANG Hongguang, FENG Tao
		To facilitate the application and development of 6G unmanned aerial vehicle(UAV)-to-ground wireless communications, improve the theory foundation of UAV-to-ground communication system and meet the safety and efficiency requirements of 6G communications, sensing image data based UAV channel path loss prediction in 6G UAV-to-ground communication scenario is studied. Firstly, based on AirSim and Wireless InSite, sensing data simulation platform and channel data simulation platform, a mixed sensing and communication integration dataset for a dynamic UAV-to-ground communication scenario, is established to explore the mapping relationship between physical space and electromagnetic space. Secondly, based on the established dataset, the mapping relationship between sensing image in physical space and channel path loss in electromagnetic space is built and the 6G UAV-to-ground real-time path loss prediction is achieved. Finally, the prediction result of the proposed model is compared with the test set through the simulation test and the results verify the accuracy of the proposed model.
		2023 Vol. 36 (11): 987-996 [Abstract] ( 309 ) [HTML 1KB] [ PDF 2454KB] ( 1008 )

	997	Multi-modality Sensing Aided Beam Prediction for mmWave V2V Communications
		WEN Weibo, ZHANG Haotian, GAO Shijian, CHENG Xiang, YANG Liuqing
		To ensure the transmission reliability of vehicular communication network, precisely aligned beamforming of millimeter-wave communication using massive multi-input multi-output(mMIMO) technology is urgently required. In highly dynamic vehicular communication scenarios, traditional beam alignment schemes incur significant resource overhead and struggle to establish reliable links within the coherence time. To address this critical challenge, a scheme of multi-modality sensing aided beam prediction for mmWave V2V communications is proposed. Two non-RF sensing modalities, vision and ranging(LiDAR) point cloud, are integrated, and deep neural networks are employed for feature extraction and integration of multi-modal information. Accurate matching and deep fusion of image space semantic information and physical space location information are achieved through perspective projection. By collaborative sensing coordinate calibration and vehicle position prediction, the features of physical environment are accurately mapped to the angular-domain channel, enabling real-time and precise beam prediction. The experimental results on the mixed multi-modal sensing-communication dataset(M³SC)show that the proposed scheme achieves high angle tracking accuracy and achievable communication rate.
		2023 Vol. 36 (11): 997-1008 [Abstract] ( 507 ) [HTML 1KB] [ PDF 3734KB] ( 875 )

	1009	Multimodal Fusion-Based Semantic Transmission for Road Object Detection
		ZHU Zengle, WEI Zhiwei, ZHANG Rongqing, YANG Liuqing
		In extreme scenarios with long-tail effects, collaborative perception involving multiple vehicles and sensors can provide effective sensory information for vehicles. However, the differentiation in heterogeneous data, coupled with bandwidth constraints and diverse data formats, makes it challenging for vehicles to achieve unified and efficient scheduling in processing. To organically integrate multi-sensor information among different vehicles under limited communication bandwidth, a semantic communication framework for multimodal fusion object detection based on Transformer is proposed in this paper. Unlike traditional data transmission solutions, self-attention mechanisms are utilized in the proposed framework to fuse data from different modalities, focusing on exploring the semantic correlation and dependencies among modal data. It helps vehicles transmit information and collaborate under limited communication resources, thereby enhancing their understanding of complex road conditions. The experimental results on Teledyne FLIR Free ADAS Thermal dataset show that the proposed model performs well in multimodal object detection semantic communication tasks with accuracy of object detection significantly improved and transmission costs reduced by half.
		2023 Vol. 36 (11): 1009-1018 [Abstract] ( 491 ) [HTML 1KB] [ PDF 1644KB] ( 1191 )

	1019	V2X-Enabled Cooperative Perception with Localization and Communication Constraints
		MAO Ruiqing, JIA Yukuan, SUN Yuxuan, ZHOU Sheng, NIU Zhisheng
		With the continuous development of vehicle-to-everything network, cooperative perception enabled connected autonomous driving becomes an important component in future intelligent transportation systems. It effectively addresses inherent limitations of traditional stand-alone intelligence in perception and computing capabilities. However, most existing cooperative perception algorithms rely on accurate positioning information for data fusion, ignoring constraints of communication bandwidth and commu-nication delay. In this paper, a feature-level cooperative perception algorithm for localization and communication-constrained conditions is proposed. The matching of different perspective information is achieved without relying on accurate positions and poses, while the robustness of the proposed algorithm to communication delay is maintained and the amount of communication data is dynamically adjusted according to the channel state. The traditional two-stage perception paradigm is combined with deep metric learning, utilizing regional feature maps for cross-perspective information matching to overcome the impact of localization errors and communication delays. Moreover, the number of regional feature maps transmitted through V2X communication can be dynamically adjusted in real-time to adapt to different channel conditions, and thus the amount of communication data is changed. Experimental results show that the proposed algorithm exhibits significant cooperative gains in various scenarios, maintains perception accuracy under certain communication delays, and effectively reduces the required amount of transmitted data.
		2023 Vol. 36 (11): 1019-1028 [Abstract] ( 291 ) [HTML 1KB] [ PDF 1796KB] ( 1245 )

	1029	Spatial-Channel Attention Multi-sensor Fusion Based on Bird's-Eye View
		JI Yuzhe, CHEN Yijie, YANG Liuqing, ZHENG Xinhu
		Object perception based on bird's-eye view(BEV) is one of hot issues, but studies on multi-sensor fusion for BEV are still insufficient. Therefore, a multi-sensor fusion module based on spatial-channel attention is proposed. Spatial errors between multiple sensors can be effectively corrected by adding local attention mechanisms to features of different modalities. By using transpose attention operations, the image and point cloud data are fully integrated to resolve the heterogeneity between different modal semantics. Consequently, the fused BEV features achieves more comprehensive and accurate perception by effectively combining the unique information of each sensor without introducing spatial misalignment. Experiment on nuScenes dataset and extensive ablation experiments show that the proposed fusion module effectively improves the accuracy of object detection. Visualization results demonstrate that the fused features can capture more complete and accurate information, especially in distant objects detection.
		2023 Vol. 36 (11): 1029-1040 [Abstract] ( 321 ) [HTML 1KB] [ PDF 1983KB] ( 1107 )

	1041	Construction of Roadside Multi-source Data Space Consistency Dataset and Research on Evaluation Methods
		CHEN Zhiwei, ZHANG Haolin, YAN Yuchen, CHEN Shitao
		The spatial consistency of multi-source sensing data is established as the foundation for the fusion of roadside multi-modal data, playing a crucial role in vehicle-to-infrastructure and roadside intelligence. However, existing roadside multi-modal datasets predominantly focus on recognition tasks such as object detection, lacking various spatial transformation information between multi-source sensors. This deficiency hinders the research into the spatial consistency problem of multi-source data at the roadside. Therefore, a dataset specifically designed for the study of the spatial consistency problem in roadside multi-source data is constructed in this paper-InfraCalib(https://github.com/chenzhiwei888/InfraCalib-Dataset). The dataset comprises over 230,000 frames of images and point cloud data, collected by two roadside smart mobile devices, covering diverse changes in scenes, modalities, lighting, device spatial positions and sensor postures. By matching feature key point pairs to correlate multi-modal data, a perspective-n-point(PnP) problem is constructed, and the extrinsic parameter matrix is solved using the minimum reprojection error method, serving as an approximate ground truth label. Finally, an experimental analysis of the classic feature matching algorithm is conducted on the InfraCalib dataset, and the discussion revolves around the quantitative evaluation indicators for the calibration of external parameters in multi-source sensors.
		2023 Vol. 36 (11): 1041-1058 [Abstract] ( 380 ) [HTML 1KB] [ PDF 6855KB] ( 735 )

模式识别与人工智能

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press

Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech 　Email:support@magtech.com.cn