|
|
|
| Three-Dimensional Rotation Equivariant Self-Supervised Learning Vector Network Combined with Diffusion Model |
| SHEN Kedi1, ZHAO Jieyu1, XIE Min1 |
| 1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211 |
|
|
|
|
Abstract Some networks for processing 3D data lack rotation equivariance and have difficulty in processing 3D objects after unknown rotation and estimating their pose changes. To solve this problem, a three-dimensional rotation equivariant self-supervised learning vector network combined with diffusion model is proposed in this paper. The network is designed to learn the rotation information of 3D objects, handle the pose change estimation task, and optimize the overall pose information using the local pose information denoised by the diffusion model. For the equivariant vector network, the scalar data are promoted to vector representations using vector neurons. Self-supervised learning is implemented without labeled data to enable the network to learn the vector information of 3D targets and achieve rotation reconstruction and pose change estimation of 3D data. Meanwhile, to solve the problem of local deviation in the pose estimation results, a diffusion model is constructed to optimize the overall pose change estimation results. The model learns local pose information in the process of noising and denoising, and can effectively remove the noise in the local pose. Experiments demonstrate that the designed network can estimate pose changes of the data in 3D space when the test data are randomly rotated, and it outperforms other networks. Moreover, the proposed model achieves superior performance in the reassembly task compared with current state-of-the-art methods, and optimizes the overall pose information through local pose information.
|
|
Received: 08 January 2025
|
|
|
| Fund:National Natural Science Foundation of China(No.62471266), Natural Science Foundation of Zhejiang Province(No.LZ22F020001), 2025 Key Technological Innovation Program of Ningbo City(No.2023Z224) |
|
Corresponding Authors:
ZHAO Jieyu, Ph.D., professor. His research interests include image graphics technology, natural interaction, machine lear-ning, and computer vision.
|
| About author:: SHEN Kedi, Master student. His research interests include machine learning and computer vision.XIE Min, Ph.D. candidate. Her research interests include machine learning, pattern recognition, and computer vision. |
|
|
|
[1] GAO W, LI G.Typical Engineering Applications of 3D Point Clouds // GAO W, LI G, eds. Deep Learning for 3D Point Clouds. Berlin, Germany: Springer, 2025: 273-299. [2] WU H, WEN C L, LI W, et al. Transformation-Equivariant 3D Object Detection for Autonomous Driving. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(3): 2795-2802. [3] YANG J Y, DENG C Y, WU J, et al. EquivAct: SIM(3)-Equivariant Visuomotor Policies beyond Rigid Object Manipulation // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2024: 9249-9255. [4] SIMEONOV A, DU Y L, TAGLIASACCHI A, et al. Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation // Proc of the International Conference on Robotics and Automation. Washington, USA: IEEE, 2022: 6394-6400. [5] KAYHAN O S, VAN GEMERT J C. On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2020: 14262-14273. [6] LIN C E, SONG J W, ZHANG R, et al. SE(3)-Equivariant Point Cloud-Based Place Recognition. Proceedings of Machine Learning Research, 2023, 205: 1520-1530. [7] COHEN T S, WELLING M.Steerable CNNs[C/OL]. [2024-12-26].https://arxiv.org/pdf/1612.08498. [8] ZHU M H, GHAFFARI M, CLARK W A, et al. E2PN: Efficient SE(3)-Equivariant Point Network // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 1223-1232. [9] MIDGLEY L I, STIMPER V, ANTOR?N J, et al. SE(3) Equivariant Augmented Coupling Flows // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 79200-79225. [10] PANG Y T, WANG W X, TAY F E H, et al. Masked Autoenco-ders for Point Cloud Self-Supervised Learning // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 604-621. [11] CHANG A X, FUNKHOUSER T, GUIBAS L, et al. ShapeNet: An Information-Rich 3D Model Repository[C/OL].[2024-12-26]. https://arxiv.org/pdf/1512.03012. [12] WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: A Deep Representation for Volumetric Shapes // Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 1912-1920. [13] MARON H, GALUN M, AIGERMAN N, et al. Convolutional Neu-ral Networks on Surfaces via Seamless Toric Covers. ACM Transactions on Graphics, 2017, 36(4). DOI: 10.1145/3072959.3073616. [14] SELLÁN S, CHEN Y C, WU Z Y, et al. Breaking Bad: A Dataset for Geometric Fracture and Reassembly // Proc of the 36th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2022: 38885-38898. [15] KONDOR R, LIN Z, TRIVEDI S. Clebsch-Gordan Nets: A Fully Fourier Space Spherical Convolutional Neural Network // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 10138-10147. [16] COHEN T, WEILER M, KICANAOGLU B, et al. Gauge Equivariant Convolutional Networks and the Icosahedral CNN. Procee-dings of Machine Learning Research, 2019, 97: 1321-1330. [17] HINTON G E, KRIZHEVSKY A, WANG S D.Transforming Auto-Encoders // Proc of the 21st International Conference on Artificial Neural Networks. Berlin, Germany: Springer, 2011: 44-51. [18] SABOUR S, FROSST N, HINTON G E. Dynamic Routing Between Capsules // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge,USA: MIT Press, 2017: 3859-3869. [19] HUANG N Y, LEVIE R, VILLAR S. Approximately Equivariant Graph Networks // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 34627-34660. [20] FUCHS F B, WORRALL D E, FISCHER V, et al.SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 1970-1981. [21] LIU C, RUHE D, FORRÉ P.Multivector Neurons: Better and Faster O(n)-Equivariant Clifford GNNs[C/OL]. [2024-12-26].https://arxiv.org/pdf/2406.04052. [22] LIM D, ROBINSON J, JEGELKA S, et al.Expressive Sign Equi-variant Networks for Spectral Geometric Learning // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 16426-16455. [23] DENG C Y, LITANY O, DUAN Y Q, et al. Vector Neurons: A General Framework for SO(3)-Equivariant Networks // Proc of the IEEE/CVF International Conference on Computer Vision. Wa-shington, USA: IEEE, 2021: 12180-12189. [24] VASWANI A, SHAZEER N, PARMAR N, et al.Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010. [25] GUI J, CHEN T, ZHANG J, et al. A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 9052-9071. [26] WU Y H, ZHANG T, KE W, et al. Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 23052-23061. [27] YU H T, SONG M F.MM-Point: Multi-view Information-Enhan-ced Multi-modal Self-Supervised 3D Point Cloud Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(7): 6773-6781. [28] WANG H C, LIU Q, YUE X Y, et al. Unsupervised Point Cloud Pre-training via Occlusion Completion // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9762-9772. [29] YU X M, TANG L L, RAO Y M, et al. Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2022: 19291-19300. [30] HE K M, CHEN X L, XIE S N, et al. Masked Autoencoders Are Scalable Vision Learners // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 15979-15988. [31] ZHAO H S, JIANG L, JIA J Y, et al. Point Transformer // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 16259-16268. [32] DHARIWAL P, NICHOL A. Diffusion Models Beat GANs on Ima-ge Synthesis // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 8780-8794. [33] HO J, JAIN A, ABBEEL P.Denoising Diffusion Probabilistic Mo-dels // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 6840-6851. [34] SONG Y, ERMON S.Generative Modeling by Estimating Gradients of the Data Distribution // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 11918-11930. [35] HUANG S Y, WANG Z, LI P H, et al. Diffusion-Based Generation, Optimization, and Planning in 3D Scenes // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 16750-16761. [36] LUO S T, HU W.Diffusion Probabilistic Models for 3D Point Cloud Generation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2836-2844. [37] GIULIARI F, SCARPELLINI G, JAMES S, et al. Positional Di-ffusion: Ordering Unordered Sets with Diffusion Probabilistic Mo-dels[C/OL].[2024-12-26]. https://arxiv.org/pdf/2303.11120. [38] SCARPELLINI G, FIORINI S, GIULIARI F, et al. DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 28098-28108. [39] EADE E.Lie Groups for 2D and 3D Transformations[C/OL]. [2024-12-26].https://ethaneade.com/lie.pdf. [40] NIMIER-DAVID M, VICINI D, ZELTNER T, et al. Mitsuba 2: A Retargetable Forward and Inverse Renderer. ACM Transactions on Graphics, 2019, 38(6). DOI: 10.1145/3355089.3356498. [41] VAN DER MAATEN L, HINTON G. Visualizing Data Using t-SNE. Journal of Machine Learning Research, 2008, 9(86): 2579-2605. [42] QI C R, SU H, MO K, et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 77-85. [43] CHEN Y C, LI H D, TURPIN D, et al. Neural Shape Mating: Self-Supervised Object Assembly with Adversarial Shape Priors // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 12714-12723. [44] WU R H, TIE C R, DU Y S, et al. Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly // Proc of the IEEE/CVF International Conference on Computer Vision. Wa-shington, USA: IEEE, 2023: 14265-14374. |
|
|
|