Abstract:To tackle the issues of depth quality and non-linear classification in the large-scale RGB-D dataset, a 3D object recognition method is designed on the basis of convolutional-recursive neural network(CNN-RNN) and kernel extreme learning machine(KELM). Firstly, a depth coding algorithm is introduced to correct the numerical losses and noises in the original depth cue and unify the point cloud into the standard angle. And the original depth and the encoded depth are fused as the new depth cue. Secondly,multi-cue hierarchical features are learned using CNN-RNN. Meanwhile, the two-way spatial pyramid pooling method is exploited for each cue. Finally, KELM is constructed as the classifier to recognize 3D objects. The experimental results demonstrate the proposed method effectively improves the 3D object recognition accuracy and the classification efficiency.
[1] CSURKA G, DANCE C R, FAN L X, et al. Visual Categorization with Bags of Keypoints // Proc of the ECCV International Workshop on Statistical Learning in Computer Vision. New York, USA: ACM, 2004: 1-22. [2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Classification with Deep Convolutional Neural Networks[C/OL]. [2017-10-23]. https://papers.nips.cc/paper/4824-imagenet-classifica tion-with-deep-convolutional-neural-networks.pdf. [3] TANG S, WANG X Y, L X T, et al. Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor // Proc of the 11th Asian Conference on Computer Vision. Berlin, Germany: Springer, 2012: 525-538. [4] BENGIO Y, COURVILLE A, VINCENT P. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828. [5] BENGIO Y, COURVILLE A, VINCENT P. Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives. IEEE Transactions on Software Engineering, 2012, 35(8): 1-30. [6] BO L F, REN X F, FOX D. Unsupervised Feature Learning for RGB-D Based Object Recognition // Proc of the 13th International Symposium on Experimental Robotics. Berlin, Germany: Springer, 2013: 387-402. [7] SOCHER R, HUVAL B, BHAT B, et al. Convolutional-Recursive Deep Learning for 3D Object Classification // Proc of the IEEE International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2012: 665-673. [8] CHENG Y H, ZHAO X, HUANG K Q, et al. Semi-supervised Learning for RGB-D Object Recognition // Proc of the 22nd International Conference on Pattern Recognition. Washington, USA: IEEE, 2014: 2377-2382. [9] BLUM A, MITCHELL T. Combining Labeled and Unlabeled Data with Co-training // Proc of the 11th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1998: 92-100. [10] BAI J, WU Y, ZHANG J M, et al. Subset Based Deep Learning for RGB-D Object Recognition. Neurocomputing, 2015, 165: 280-292. [11] ZAKI H F M, SHAFAIT F, MIAN A. Localized Deep Extreme Learning Machines for Efficient RGB-D Object Recognition // Proc of the International Conference on Digital Image Computing: Techniques and Applications. Washington, USA: IEEE, 2015. DOI: 10.1109/DICTA.2015.7371280. [12] HUANG G B, ZHU Q Y, SIEW C K. Extreme Learning Machine: Theory and Applications. Neurocomputing, 2006, 70(1/2/3): 489-501. [13] SCHWARZ M, SCHULZ H, BEHNKE S. RGB-D Object Recognition and Pose Estimation Based on Pre-trained Convolutional Neural Network Features // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2015: 1329-1335. [14] HE K M, ZHANG X Y, REN S Q, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(9): 1904-1916. [15] CHENG Y H, ZHAO X, HUANG K Q, et al. Semi-supervised Learning and Feature Evaluation for RGB-D Object Recognition. Computer Vision and Image Understanding, 2015, 139: 149-160. [16] HUANG G B, ZHOU H M, DING X J, et al. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Transactions on Systems, Man, and Cybernetics(Cybernetics), 2012, 42(2): 513-529. [17] HUANG G B, ZHU Q Y, SIEW C K. Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks // Proc of the IEEE International Joint Conference on Neural Networks. Washington, USA: IEEE, 2004, II: 985-990. [18] HUANG G, HUANG G B, SONG S J, et al. Trends in Extreme Learning Machines: A Review. Neural Networks, 2015, 61: 32-48. [19] LO B P L, YANG G Z. Neuro-Fuzzy Shadow Filter // Proc of the 7th European Conference on Computer Vision. London, UK: Springer-Verlag, 2002: 381-392. [20] COATES A, LEE H, NG A Y. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. Journal of Machine Learning Research, 1991, 15: 215-223. [21] 时增林,叶阳东,吴云鹏,等.基于序的空间金字塔池化网络的人群计数方法.自动化学报, 2016, 42(6): 866-874. (SHI Z L, YE Y D, WU Y P, et al. Crowd Counting Using Rank-Based Spatial Pyramid Pooling Network. Acta Automatica Sinica, 2016, 42(6): 866-874.) [22] ZENG Y J, XU X, SHEN D Y, et al. Traffic Sign Recognition Using Kernel Extreme Learning Machines with Deep Perceptual Features. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(6): 1647-1653. [23] LAI K, BO L F, REN X F, et al. A Large-Scale Hierarchical Multi-view RGB-D Object Dataset // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2011: 1817-1824. [24] ZHENG W N, BO P B, LIU Y, et al. Fast B-spline Curve Fitting by L-BFGS. Computer Aided Geometric Design, 2011, 29(7): 448-462.