[1] IOANNIDOU A, CHATZILARI E, NIKOLOPOULOS S, et al. Deep Learning Advances in Computer Vision with 3D Data: A Survey. ACM Computing Surveys, 2018, 50(2). DOI: 10.1145/3042064.
[2] HUANG K, SHI B T, LI X, et al. Multi-modal Sensor Fusion for Auto Driving Perception: A Survey[C/OL].[2023-08-12]. https://arxiv.org/pdf/2202.02703v1.pdf.
[3] FENG D, HAASE-SCHÜTZ C, ROSENBAUM L, et al. Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(3): 1341-1360.
[4] FAYYAD J, JARADAT M A, GRUYER D, et al. Deep Learning Sensor Fusion for Autonomous Vehicle Perception and Localization: A Review. Sensors, 2020, 20(15). DOI: 10.3390/s20154220.
[5] PANG S, MORRIS D, RADHA H.CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection // Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Washing-ton, USA: IEEE, 2020: 10386-10393.
[6] ASVADI A, GARROTE L, PREMEBIDA C, et al. Multimodal Vehicle Detection: Fusing 3D-LiDAR and Color Camera Data. Pattern Recognition Letters, 2018, 115: 20-29.
[7] VORA S, LANG A H, HELOU B, et al. Pointpainting: Sequential Fusion for 3D Object Detection // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 4603-3611.
[8] XIE L, XIANG C, YU Z X, et al. PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12460-12467.
[9] LIANG M, YANG B, CHEN Y, et al. Multi-task Multi-sensor Fusion for 3D Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 7337-7345.
[10] YOU Y R, WANG Y, CHAO W L, et al. Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving[C/OL].[2023-08-12]. https://arxiv.org/pdf/1906.06310v1.pdf.
[11] WANG J D, WEI Z, ZHANG T, et al. Deeply-Fused Nets[C/OL].[2023-08-12]. https://arxiv.org/abs/1605.07716.
[12] PHILION J, FIDLER S. Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D // Proc of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 194-210.
[13] HUANG J J, HUANG G, ZHU Z, et al. BEVDet: High-Perfor-mance Multi-camera 3D Object Detection in Bird-Eye-View[C/OL].[2023-08-12]. https://arxiv.org/pdf/2112.11790v1.pdf.
[14] XIE B Z, YU Z D, ZHOU D Q, et al. M2BEV: Multi-camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation[C/OL].[2023-08-12]. https://arxiv.org/pdf/2204.05088.pdf.
[15] ZHOU Y, TUZEL O.VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4490-4499.
[16] YANG Y, MAO Y X, LI B Y.SECOND: Sparsely Embedded Con-volutional Detection. Sensors, 2018, 18(10). DOI: 10.3390/s18103337.
[17] ZHOU Y, SUN P, ZHANG Y, et al. End-to-End Multi-view Fusion for 3D Object Detection in LiDAR Point Clouds // Proc of the Conference on Robot Learning. San Diego, USA: JMLR, 2020: 923-932.
[18] LECUN Y, BENGIO Y. HINTON G. Deep Learning. Nature, 2015, 521: 436-444.
[19] LIU Z J, TANG H T, AMINI A, et al. BEVFusion: Multi-task Multi-sensor Fusion with Unified Bird's-Eye View Representation // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2023: 2774-2781.
[20] BORSE S, KLINGNER M, KUMAR V R, et al. X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2023: 3286-3296.
[21] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9992-10002.
[22] LIN T Y, DOLLÁR R, GIRSHICK K, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944.
[23] BAI X Y, HU Z Y, ZHU X G, et al. Transfusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2022: 1080-1089.
[24] SZEGEDY C, LIU W, JIA Y Q, et al. Going Deeper with Convolutions // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015. DOI: 10.1109/CVPR.2015.7298594.
[25] SZEGEDY C, VANHOUCKE V, LOFFE S, et al. Rethinking the Inception Architecture for Computer Vision // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2818-2826.
[26] ZHANG H, WU C R, ZHANG Z Y, et al. ResNeSt: Split-Attention Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 2735-2745.
[27] EL-NOUBY A, TOUVRON H, CARON M, et al. XCiT: Cross-Covariance Image Transformers[C/OL].[2023-08-12]. https://arxiv.org/pdf/2106.09681.pdf.
[28] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: A Multimodal Dataset for Autonomous Driving // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 11618-11628.
[29] GEIGER A, URTASUN R.Vision Meets Robotics: The KITTI Da-taset. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.
[30] LOSHCHILOV L, HUTTER F.Decoupled Weight Decay Regularization[C/OL]. [2023-08-12].https://arxiv.org/abs/1711.05101. |