V2X-Enabled Cooperative Perception with Localization and Communication Constraints
MAO Ruiqing1, JIA Yukuan1, SUN Yuxuan1,2, ZHOU Sheng1, NIU Zhisheng1
1. Department of Electronic Engineering, Tsinghua University, Beijing 100084; 2. School of Electronic and Information Engineering, Beijing Jiao-tong University, Beijing 100044
Abstract:With the continuous development of vehicle-to-everything network, cooperative perception enabled connected autonomous driving becomes an important component in future intelligent transportation systems. It effectively addresses inherent limitations of traditional stand-alone intelligence in perception and computing capabilities. However, most existing cooperative perception algorithms rely on accurate positioning information for data fusion, ignoring constraints of communication bandwidth and commu-nication delay. In this paper, a feature-level cooperative perception algorithm for localization and communication-constrained conditions is proposed. The matching of different perspective information is achieved without relying on accurate positions and poses, while the robustness of the proposed algorithm to communication delay is maintained and the amount of communication data is dynamically adjusted according to the channel state. The traditional two-stage perception paradigm is combined with deep metric learning, utilizing regional feature maps for cross-perspective information matching to overcome the impact of localization errors and communication delays. Moreover, the number of regional feature maps transmitted through V2X communication can be dynamically adjusted in real-time to adapt to different channel conditions, and thus the amount of communication data is changed. Experimental results show that the proposed algorithm exhibits significant cooperative gains in various scenarios, maintains perception accuracy under certain communication delays, and effectively reduces the required amount of transmitted data.
[1] LI P L, CHEN X Z, SHEN S J.Stereo R-CNN Based 3D Object Detection for Autonomous Driving // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 7636-7644. [2] WANG Y, GUIZILINI V C, ZHANG T, et al. DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries[C/OL].[2023-09-19]. https://arxiv.org/pdf/2110.06922.pdf. [3] CHEN X Z, MA H M, WAN J, et al. Multi-view 3D Object Detection Network for Autonomous Driving // Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6526-6534. [4] DENG J, CZARNECKI K.MLOD: A Multi-view 3D Object Detection Based on Robust Feature Fusion Method // Proc of the IEEE Intelligent Transportation Systems Conference. Washington, USA: IEEE, 2019: 279-284. [5] ZEADALLY S, JAVED M A, HAMIDA E B.Vehicular Communications for ITS: Standardization and Challenges. IEEE Communications Standards Magazine, 2020, 4(1): 11-17. [6] YANG Q, FU S, WANG H G, et al. Machine-Learning-Enabled Cooperative Perception for Connected Autonomous Vehicles: Cha-llenges and Opportunities. IEEE Network, 2021, 35(3): 96-101. [7] CHEN Q, TANG S H, YANG Q, et al. Cooper: Cooperative Perception for Connected Autonomous Vehicles Based on 3D Point Clouds // Proc of the IEEE 39th International Conference on Dis-tributed Computing Systems. Washington, USA: IEEE, 2019: 514-524. [8] KIM S W, LIU W, ANG M H, et al. The Impact of Cooperative Perception on Decision Making and Planning of Autonomous Vehi-cles. IEEE Intelligent Transportation Systems Magazine, 2015, 7(3): 39-50. [9] WANG T H, MANIVASAGAM S, LIANG M, et al. V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 605-621. [10] HU Y, FANG S H, LEI Z X, et al. Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps[C/OL].[2023-09-19]. https://arxiv.org/abs/2209.12836v1. [11] HUANG J J, HUANG G, ZHU Z, et al. BEVDet: High-Perfor-mance Multi-camera 3D Object Detection in Bird-Eye-View[C/OL].[2023-09-19]. https://arxiv.org/pdf/2112.11790v1.pdf. [12] LI Z Q, WANG W H, LI H Y, et al. BEVFormer: Learning Bird's-Eye View Representation from Multi-camera Images via Spatiotemporal Transformers // Proc of the European Conference on Compu-ter Vision. Berlin, Germany: Springer, 2022: 1-8. [13] LIU Y F, WANG T C, ZHANG X Y, et al. PETR: Position Embedding Transformation for Multi-view 3D Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 531-548. [14] NASSAR A, LEFEVRE S, WEGNER J D.Simultaneous Multi-view Instance Detection with Learned Geometric Soft-Constraints // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6558-6567. [15] NASSAR A S, D'ARONCO S, LEFEVRE S, et al. GeoGraph: Graph-Based Multi-view Object Detection with Geometric Cues End-to-End // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 488-504. [16] GAWRON J H, KEOLEIAN G A, DE KLEINE R D, et al. Life Cycle Assessment of Connected and Automated Vehicles: Sensing and Computing Subsystem and Vehicle Level Effects. Environmental Science and Technology, 2018, 52(5): 3249-3256. [17] EIA. Study of the Potential Energy Consumption Impacts of Connected and Automated Vehicles[EB/OL].[2023-09-19]. www.eia.gov/analysis/studies/transportation/automated/pdf/automated_vehicles.pdf. [18] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [19] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Mul-tiBox Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [20] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [21] DUAN K W, BAI S, XIE L X, et al. CenterNet: Keypoint Triplets for Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6568-6577. [22] LAW H, DENG J.CornerNet: Detecting Objects as Paired Key-points. International Journal of Computer Vision, 2020, 128: 642-656. [23] CARION N, MASSA F, SYNNAEVE G, et al. End-to-End Object Detection with Transformers // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 213-229. [24] WANG Y M, ZHANG X Y, YANG T, et al. Anchor DETR: Query Design for Transformer-Based Detector. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(3): 2567-2575. [25] RAWASHDEH Z Y, WANG Z.Collaborative Automated Driving: A Machine Learning-Based Method to Enhance the Accuracy of Shared Information // Proc of the 21st International Conference on Intelligent Transportation Systems. Washington, USA: IEEE, 2018: 3961-3966. [26] XIAO Z Y, MO Z B, JIANG K, et al. Multimedia Fusion at Semantic Level in Vehicle Cooperactive Perception // Proc of the IEEE International Conference on Multimedia and Expo Workshops. Washington, USA: IEEE, 2018. DOI: 10.1109/ICMEW.2018.8551565. [27] LIU Y C, TIAN J J, GLASER N, et al. When2com: Multi-agent Perception via Communication Graph Grouping // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 4105-4114. [28] LI Y M, REN S L, WU P X, et al. Learning Distilled Collaboration Graph for Multi-agent Perception[C/OL].[2023-09-19]. https://arxiv.org/pdf/2111.00643.pdf. [29] XU R S, XIANG H, TU Z Z, et al. V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 107-124. [30] DUAN K W, XIE L X, QI H G, et al. Corner Proposal Network for Anchor-Free, Two-Stage Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 399-416. [31] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Com-puter Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944. [32] HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN // Proc of the IEEE International Conference on Computer Vision. Wa-shington, USA: IEEE, 2017: 2980-2988. [33] LIU H Y, TIAN Y H, YANG Y W, et al. Deep Relative Distance Learning: Tell the Difference between Similar Vehicles // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2167-2175. [34] CHU R H, SUN Y F, LI Y D, et al. Vehicle Re-Identification with Viewpoint-Aware Metric Learning // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 8281-8290. [35] KULIS B.Metric Learning: A Survey. Foundations and Trends® in Machine Learning, 2013, 5(4): 287-364. [36] KAYA M, BILGE H S.Deep Metric Learning: A Survey. Symmetry, 2019, 11(9). DOI: 10.3390/sym11091066. [37] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958. [38] SCHROFF F, KALENICHENKO D, PHILBIN J.FaceNet: A Unified Embedding for Face Recognition and Clustering // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 815-823. [39] CIPOLLA R, GAL Y, KENDALL A.Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7482-7491. [40] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [41] MAO R Q, GUO J Y, JIA Y K, et al. DOLPHINS: Dataset for Collaborative Perception Enabled Harmonious and Interconnected Self-Driving // Proc of the Asian Conference on Computer Vision. Berlin, Germany: Springer, 2022: 495-511.