OCPA Bionic Autonomous Learning System and Its Application to Robot Poster Balance Control
CAI Jian-Xian1,2, RUAN Xiao-Gang1
1.College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124 2.Institute of Disaster Prevention, Sanhe 065201
Abstract:An operant conditioning probabilistic automation (OCPA) bionic autonomous learning system is constructed according to nonlinear, strong-coupling and complex two-wheeled self-balancing robot dynamic system. The OCPA bionic autonomous learning system is a probabilistic automaton based on Skinner operant conditioning whose main character lies in simulating the operant conditioning mechanism of biology. And it has bionic self-organization function which contains the self-learning and adaptive functions, and thus the OCPA automaton can be used to describe, simulate and design various self-organization systems. The convergence of operant conditioning learning algorithm of OCPA learning system is proved theoretically. The results of both simulation and experiment applied to two-wheeled robot poster balance control indicate that the OCPA learning system does not require the robot model, and the motion balanced skills of robot are formed, developed and perfected gradually by simulating the operant conditioning mechanism of biology.
[1] Urakubo T, Tsuchiya K, Tsujita K. Motion Control of a Two-Wheeled Mobile Robot. Advanced Robotics, 2001, 15(7): 711-728 [2] Huang Yongzhi, Chen Weidong. Design and Implementation of Motion Controller of Two-Wheeled Mobile Robot. Robot, 2004, 26(1): 40-44 (in Chinese) (黄永志,陈卫东.两轮移动机器人运动控制系统的设计与实现.机器人, 2004, 26(1): 40-44) [3] Wu Kehe, Li Wei, Liu Changan, et al. Dynamic Control of Two-Wheeled Mobile Robot. Journal of Astronautics, 2006, 27(2): 272-275 (in Chinese) (吴克河,李 为,柳长安,等.双轮驱动式移动机器人的动力学控制.宇航学报, 2006, 27(2): 272-275) [4] Kozlowski K, Pazderski D. Stabilization of Two-Wheeled Mobile Robot Using Smooth Control Law: Experiment Study // Proc of the IEEE International Conference on Robotics and Automation. Orlando, USA, 2006: 3387-3392 [5] McFarland D, Bsser T. Intelligent Behavior in Animals and Robots. Cambridge, USA: MIT Press, 1993 [6] Aristidis L. Reinforcement Learning Using the Stochastic Fuzzy Min-Max Neural Network. Neural Processing Letters, 2001,13(3): 213-220 [7] Anderson C W. Learning to Control an Inverted Pendulum Using Neural Networks. IEEE Control System Magazine, 1989, 9(3): 31-37 [8] Jiang Guofei, Wu Cangpu. Learning to Control an Inverted Pendulum Using Q-Learning and Neural Networks. Acta Automatica Sinica, 1998, 24(5): 662-666 ( in Chinese) (蒋国飞,吴沧浦.基于Q学习算法和BP神经网络的倒立摆控制.自动化学报, 1998, 24(5): 662-666) [9] Skinner B F. Two Types of Conditioned Reflex and a Pseudo Type. Journal of General Psychology, 1935, 12: 66-77 [10] Saksida L M, Touretzky D S. Application of a Model of Instrumental Conditioning to Mobile Robot Control // Proc of the Conference on Sensor Fusion and Decentralized Control in Autonomous Robotic Systems. Pittsburgh, USA, 1997: 55-66 [11] Touretzky D S, Saksida L M. Operant Conditioning in Skinnerbots. Adaptive Behavior, 1997, 5(3/4): 219-247 [12] Itoh K, Miwa H, Matsumoto M, et al. Behavior Model of Humanoid Robot Based on Operant Conditioning // Pro of the 5th IEEE-RAS International Conference on Humanoid Robots. Tsukuba, Japan, 2005: 220-225