1.合肥工业大学 计算机与信息学院 合肥 230601 2.合肥工业大学 大数据知识工程教育部重点实验室 合肥 230009 3.Department of Computer Science and Engineering, University of North Texas, Denton, Texas 76201, United States
LKDD-Net: Lightweight Keypoint and Deformable Descriptor Extraction Network
FANG Baofu1,2, ZHANG Keao1,2, WANG Hao1,2, YUAN Xiaohui3
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601 2. Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education of China, Hefei University of Technology, Hefei 230009 3. Department of Computer Science and Engineering, University of North Texas, Denton, Texas 76201, United States
Abstract:Keypoint extraction is a crucial step in visual simultaneous localization and mapping(VSLAM). Existing deep learning based keypoint extraction methods suffer from low efficiency and fail to meet real-time requirements. Furthermore, they do not provide the geometric invariance required by descriptors. To address this issue, a lightweight keypoint and deformable descriptor extraction network(LKDD-Net) is proposed. A lightweight network module is introduced in the backbone network to improve the efficiency of feature extraction, and then the deformable convolution module is applied to the descriptor decoder to extract deformable descriptors. LKDD-Net is capable of simultaneously obtaining both keypoint locations and deformable descriptors. To study the effectiveness of LKDD-Net, a visual odometry system based on LKDD-Net is designed. Experiments on HPatches public dataset and TUM public dataset show that LKDD-Net can run in real-time on GPUs with keypoint extraction time being as low as 8.3 ms, while maintaining high accuracy in various scenarios. The performance of the visual odometry system composed of LKDD-Net is superior to traditional vision and VSLAM systems based on deep learning keypoint extraction. The proposed method successfully tracks all six sequences in TUM public dataset, demonstrating stronger robustness.
[1] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: An Efficient Alternative to SIFT or SURF // Proc of the International Conference on Computer Vision. Washington, USA: IEEE, 2011: 2564-2571. [2] DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint: Self-Supervised Interest Point Detection and Description // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2018: 224-236. [3] TANG J X, KIM H, GUIZILINI V, et al. Neural Outlier Rejection for Self-Supervised Keypoint Learning[C/OL]. [2024-10-14]. https://arxiv.org/pdf/1912.10615. [4] GLEIZE P, WANG W Y, FEISZLI M. SiLK: Simple Learned Keypoints // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 22442-22451. [5] LIU Z, LI J G, SHEN Z Q, et al. Learning Efficient Convolutional Networks through Network Slimming // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2755-2763. [6] 王宪保,刘鹏飞,项 圣,等.基于神经架构搜索的非结构化剪枝方法.模式识别与人工智能, 2023, 36(5): 448-458. (WANG X B, LIU P F, XIANG S, et al. Unstructured Pruning Method Based on Neural Architecture Search. Pattern Recognition and Artificial Intelligence, 2023, 36(5): 448-458.) [7] KOONCE B. MobileNetV3 // KOONCE B, ed. Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization. Berlin, Germany: Springer, 2021: 125-144. [8] HAN K, WANG Y H, TIAN Q, et al. GhostNet: More Features from Cheap Operations // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 1577-1586. [9] MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design // Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 122-138. [10] 王 浩,周申超,方宝富.动态环境下基于延迟语义的RGB-D SLAM算法.模式识别与人工智能, 2023, 36(10): 953-966. (WANG H, ZHOU S C, FANG B F. RGB-D SLAM Algorithm Based on Delayed Semantic Information in Dynamic Environment. Pattern Recognition and Artificial Intelligence, 2023, 36(10): 953-966.) [11] LOWE D G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110. [12] BAY H, TUYTELAARS T, VAN GOOL L. SURF: Speeded up Robust Features // Proc of the 9th European Conference on Computer Vision. Berlin, Germany: Springer, 2006: 404-417. [13] TANG J X, ERICSON L, FOLKESSON J, et al. GCNv2: Efficient Correspondence Prediction for Real-Time SLAM. IEEE Robotics and Automation Letters, 2019, 4(4): 3505-3512. [14] ONO Y, TRULLS E, FUA P, et al. LF-Net: Learning Local Features from Images // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 6237-6247. [15] CHRISTIANSEN P H, KRAGH M F, BRODSKIY Y, et al. UnsuperPoint: End-to-End Unsupervised Interest Point Detector and Descriptor[C/OL]. [2024-10-14]. https://arxiv.org/pdf/1907.04011. [16] YU L J, YANG E F, YANG B Y, et al. A Robust Learned Feature-Based Visual Odometry System for UAV Pose Estimation in Challenging Indoor Environments. IEEE Transactions on Instrumentation and Measurement, 2023, 72. DOI: 10.1109/TIM.2023.3279458. [17] KANAKIS M, MAURER S, SPALLANZANI M, et al. ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2023: 6114-6123. [18] POTJE G, CADAR F, ARAUJO A, et al. XFeat: Accelerated Features for Lightweight Image Matching // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 2682-2691. [19] WANG W H, DAI J F, CHEN Z, et al. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 14408-14419. [20] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common Objects in Context // Proc of the 13th European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2014: 740-755. [21] BALNTAS V, LENC K, VEDALDI A, et al. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3852-3861. [22] SCHUBERT D, GOLL T, DEMMEL N, et al. The TUM VI Bench-mark for Evaluating Visual-Inertial Odometry // Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington, USA: IEEE, 2018: 1680-1687. [23] KINGMA D P, BA J L. Adam: A Method for Stochastic Optimization[C/OL]. [2024-10-14]. https://arxiv.org/pdf/1412.6980. [24] REVAUD J, WEINZAEPFEL P, DE SOUZA C, et al. R2D2: Repeatable and Reliable Detector and Descriptor // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 12414-12424. [25] MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. [26] WHELAN T, LEUTENEGGER S, SALAS-MORENO R F, et al. ElasticFusion: Dense SLAM without a Pose Graph[C/OL]. [2024-10-14]. https://www.roboticsproceedings.org/rss11/p01.pdf. [27] LI D J, SHI X S, LONG Q W, et al. DXSLAM: A Robust and Effi-cient Visual SLAM System with Deep Features // Proc of the IEEE/RSJ International conference on Intelligent Robots and Systems. Washington, USA: IEEE, v2020: 4958-4965. [28] TEED Z, LIPSON L, DENG J. Deep Patch Visual Odometry // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 39033-39051.