Abstract:Aiming at agent’s behavioral cognition problem, a behavior cognition computational model based on the coordination of cerebellum and basal ganglia is proposed. Operant conditioning learning algorithm is the central algorithm including evaluation mechanism, action selection mechanism, tropism mechanism, and the coordination mechanism between cerebellum and basal ganglia. The learning signals come from not only the Inferior Olive but also the Substantia Nigra in the beginning. The convergence of the algorithm can be guaranteed in the sense of entropy. With the proposed method, a motor nerve cognitive system for the self-balancing two-wheeled robot has been built using the RBF neural network as the actor and evaluation function approximator. The simulation results show that the learning speed is increased as well as the failure times are reduced by the proposed method than by the Actor-Critic method with the only Basal Ganglia mechanism. Through decreasing temperature in the late stage, the learning speed is increased and the vibration disappeares eventually, and the learning effect is improved.
[1] Houk J C,Wise S P.Distributed Modular Architectures Linking Basal Ganglia,Cerebellum and Cerebral Cortex: Their Role in Planning and Controlling Action.Cerebral Cortex,1995,5(2): 95-110 [2] Lu X,Hikosaka O,Miyachi S.Role of Monkey Cerebellar Nuclei in Skill for Sequential Movement.Journal of Neurophysiology,1998,79(5): 2245-2254 [3] Doya K.What Are the Computations of the Cerebellum,the Basal Ganglia and the Cerebral Cortex? Neural Networks,1999,12(7/8): 961-974 [4] Houk J C.Agents of the Mind.Biological Cybernetics,2005,92(6): 427-437 [5] Girard B,Tabareau N,Phama Q C,et al.Where Neuroscience and Dynamic System Theory Meet Autonomous Robotics: A Contracting Basal Ganglia Model for Action Selection.Neural Networks,2008,21(4): 628-641 [6] Bogacz R,Gurney K.The Basal Ganglia and Cortex Implement Optimal Decision Making between Alternative Actions.Neural Computation,2007,19(2): 442-477 [7] Mink J W.The Basal Ganglia: Focused Selection and Inhibition of Competing Motor Programs.Progress in Neurobiology,1996,50(4): 381-425 [8] Cohen M X,Frank M J.Neurocomputational Models of Basal Ganglia Function in Learning,Memory and Choice.Behavioural Brain Research,2009,199(1): 141-156 [9] Balleine B W,Liljeholm M,Ostlund S B.The Integrative Function of the Basal Ganglia in Instrumental Conditioning.Behavioural Brain Research,2009,199(1): 43-52 [10] Berthier N E,Singh S P,Barto A G,et al.Distributed Representation of Limb Motor Programs in Arrays of Adjustable Pattern Generators.Journal of Cognitive Neuroscience,1993,5(1): 56-78 [11] Bliss T V P,Collingridge G L.A Synaptic Model of Memory: Long-Term Potentiation in the Hippocampus.Nature,1993,361(6407): 31-39 [12] Hua S E,Houk J C.Cerebellar Guidance of Premotor Network Development and Sensorimotor Learning.Learning Memory,1997,4(1): 63-76 [13] Wang Shangfei,Wang Xufa.Brain Emotion Circuit Based Artificial Emotional Intelligence Model.Pattern Recognition and Artificial Intelligence,2007,20(2): 167-172(in Chinese) (王上飞,王煦法.基于大脑情感回路的人工情感智能模型.模式识别与人工智能,2007,20(2): 167-172 ) [14] Skinner B F.The Behavior of Organisms.New York,USA: Appleton-Century Appleton-Century-Crofts,1938 [15] Pavlov I P.Conditioned Reflexes.Oxford,UK: Oxford University Press,1927 [16] Brembs B.Research: Neurobiology of Behavior [EB/OL].[2010-06-01].http://brembs.net [17] Brembs B,Plendl W.Double Dissociation of PKC and AC Manipulations on Operant and Classical Learning in Drosophila.Current Biology,2008,18(15): 1168-1171 [18] Brembs B.The Importance of Being Active.Journal of Neurogenetics,2009,23(1/2): 120-126 [19] Zalama E,Gomez J,Paul M,et al.Adaptive Behavior Navigation of a Mobile Robot.IEEE Trans on Systems,Man and Cybernetics,2002,32(1): 160-169 [20] Itoh K,Miwa H,Matsumoto M,et al,Behavior Model of Humanoid Robots Based on Operant Conditioning // Proc of the IEEE/RAS International Conference on Humanoid Robots.Tsukuba,Japan,2005: 220-225 [21] Yao Chengwei,Chen Gencai.A Emotion Development Agent Model Based on OCC Model and Operant Conditioning // Proc of the International Conference on Info-Tech and Info-Net.Beijing,China,2001,III: 246-250 [22] Hoshi E,Tremblay L,Féger J,et al.The Cerebellum Communicates with the Basal Ganglia.Nature Neuroscience,2005,8(11): 1491-1493 [23] Bostan A C,Dum R P,Strick P L.The Basal Ganglia Communicate with the Cerebellum.Proc of the National Academy of Sciences of the United States of America,2010,107(18): 8452-8456 [24] Bostan A C,Strick P L.The Cerebellum and Basal Ganglia Are Interconnected.Neuropsychology Review,2010,20(3): 261-270 [25] Wang Yecang,Yung N H C,Wang Danwei.A Fuzzy Controller with Supervised Learning Assisted Reinforcement Learning Algorithm for Obstacle Avoidance.IEEE Trans on Systems,Man and Cybernetics,2003,33(1): 17-27 [26] Joo M Er,Deng Chang.Obstacle Avoidance of a Mobile Robot Using Hybrid Learning Approach.IEEE Trans on Industrial Electronics,2005,52(3): 898-905 [27] Rosenstein M,Barto A.Supervised Actor-Critic Reinforcement Learning // Si J,Barto A,Pouell W,et al,eds.Learning and Approximate Dynamic Programming: Scaling up to the Real World.New York,USA: John Wiley Sons,2004: 359-380 [28] Sutton R S.Learning to Predict by the Methods of Temporal Difference.Machine Learning,1988,3(1): 9-44 [29] Ruan Xiaogang.Neural Computational Science: Simulation Brain Function at the Cellular Level.Beijing,China: National Defense Industry Press,2006 (in Chinese) (阮晓钢.神经计算科学:在细胞的水平上模拟脑功能.北京:国防工业出版社,2006) [30] Randlov J,Barto A G,Rosenstein M T.Combining Reinforcement Learning with a Local Control Algorithm // Proc of the 17th International Conference on Machine Learning.Stanford,USA,2000: 775-782 [31] Lazo A V,Rathie P.On the Entropy of Continuous Probability Distributions.IEEE Trans on Information Theory,1978,24(1): 120-122