Abstract:Path integral is derived from stochastic optimal control. It is a numerical iteration method and solves the problem of the optimal control about continuous nonlinear systems at a high convergence speed without system model. A policy improvement algorithm based on path integral reinforcement learning is proposed for the target-directed locomotion of a snake-like robot in this paper. The path integral reinforcement learning approach is employed to learn the parameters of the snake-like robot serpentine equation, and the robot is controlled to arrive at the target position fast without contacting obstacles in simulation environment. Moreover, the robot with the priori knowledge from the simulation in real environment can complete the task well. Experimental result verifies the validity of the propose algorithm.
[1] HIROSE B S. Biologically Inspired Robots: Snake-Like Locomotors and Manipulators. Oxford, UK: Oxford University Press, 1993. [2] KELASIDI E, LILJEBACK P, PETTERSEN K Y, et al. Innovation in Underwater Robots: Biologically Inspired Swimming Snake Robots. IEEE Robotics and Automation Magazine, 2016, 23(1): 44-62. [3] BORENSTEIN J, HANSEN M, BORRELL A. The OmniTread OT-4 Serpentine Robot-Design and Performance. Journal of Field Robo-tics, 2007, 24(7): 601-621. [4] ROLLINSON D, CHOSET H. Pipe Network Locomotion with a Snake Robot. Journal of Field Robotics, 2014, 33(3): 322-336. [5] TANAKA M, NAKAJIMA M, SUZUKI Y, et al. Development and Control of Articulated Mobile Robot for Climbing Steep Stairs. IEEE/ASME Transactions on Mechatronics, 2018, 23(2): 531-541. [6] SATO M, FUKAYA M, IWASAKI T. Serpentine Locomotion with Robotic Snakes. IEEE Control Systems Magazine, 2002, 22(1): 64-81. [7] ROLLINSON D, CHOSET H. Gait-Based Compliant Control for Snake Robots // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2013: 5123-5128. [8] WU X D, MA S G. Adaptive Creeping Locomotion of a CPG-Controlled Snake-Like Robot to Environment Change. Autonomous Robots, 2010, 28(3): 283-294. [9] CRESPI A, IJSPEERT A J. Online Optimization of Swimming and Crawling in an Amphibious Snake Robot. IEEE Transactions on Robotics, 2008, 24(1): 75-87. [10] MATSUNO F, MOGI K. Redundancy Controllable System and Control of Snake Robots Based on Kinematic Model // Proc of the IEEE Conference on Decision and Control. Washington, USA: IEEE, 2000, V: 4791-4796. [11] MOHAMMADI A, REZAPOUR E, MAGGIORE M, et al. Maneuvering Control of Planar Snake Robots Using Virtual Holonomic Constraints. IEEE Transactions on Control Systems Technology, 2015, 24(3): 884-899. [12] ARIIZUMI R, MATSUNO F. Dynamic Analysis of Three Snake Robot Gaits. IEEE Transactions on Robotics, 2017, 33(5): 1075-1087. [13] OKAL B, ARRAS K O. Learning Socially Normative Robot Navigation Behaviors with Bayesian Inverse Reinforcement Learning // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2016: 2889-2895. [14] KRETZSCHMAR H, SPIES M, SPRUNK C, et al. Socially Compliant Mobile Robot Navigation via Inverse Reinforcement Lear-ning. International Journal of Robotics Research, 2016, 35(11): 1289-1307. [15] ZHU Y K, MOTTAGHI R, KOLVE E, et al. Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2017: 3357-3364. [16] GONG C H, TRAVERS M J, ASTLEY H C, et al. Kinematic Gait Synthesis for Snake Robots. International Journal of Robotics Research, 2016, 35(1/2/3): 100-113. [17] THEODOROU E, BUCHLI J, SCHAAL S. A Generalized Path Integral Control Approach to Reinforcement Learning. Journal of Machine Learning Research, 2010, 11: 3137-3181. [18] WILLIAMS G, DREWS P, GOLDFAIN B, et al. Aggressive Dri-ving with Model Predictive Path Integral Control // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2016: 1433-1440. [19] CHEBOTAR Y, KALAKRISHNAN M, YAHYA A, et al. Path Integral Guided Policy Search[J/OL]. [2018-08-23]. https://arxiv.org/pdf/1610.00529.pdf. [20] OKADA M, RIGAZIO L, AOSHIMA T. Path Integral Networks: End-to-End Differentiable Optimal Control[J/OL]. [2018-08-23]. https://arxiv.org/pdf/1706.09597.pdf. [21] CHATTERJEE S, NACHSTEDT T, WORGOTTER F, et al. Reinforcement Learning Approach to Generate Goal-Directed Locomotion of a Snake-Like Robot with Screw-Drive Units // Proc of the 23rd International Conference on Robotics in Alpe-Adria-Danube Region. Washington, USA: IEEE, 2014. DOI: 10.1109/RAAD.2014.7002234. [22] POREZ M, IJSPEERT A J. Improved Lighthill Fish Swimming Mo-del for Bio-inspired Robots: Modeling, Computational Aspects and Experimental Comparisons. The International Journal of Robotics Research, 2014, 33(10): 1322-1341.