|
|
Data Driven Optimal Stabilization Control and Simulation Based on Reinforcement Learning |
LU Chaolun1, LI Yongqiang1, FENG Yuanjing1 |
1.College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023 |
|
|
Abstract Q-learning algorithm is used to solve the optimal stabilization control problem while only the data, rather than the model of the plant, is available. Due to the continuity of state space and control space, Q-learning can only be implemented in an approximate manner. Therefore, the proposed approximate Q-learning algorithm can obtain only one suboptimal controller. Although the obtained controller is suboptimal, the simulation shows that the closed-loop domain of attraction of the proposed algorithm is broader and the cost function is also smaller than the linear quadratic regulator and deep deterministic policy gradient method for the strongly nonlinear plant.
|
Received: 11 October 2018
|
|
Fund:Supported by National Natural Science Foundation of China(No.61703369), Zhejiang Key Research and Development Program(No.2017C03039), Major Science and Technology Project of Wenzhou(No.ZS2017007) |
About author:: LU Chaolun, Ph.D. candidate. His research interests include data-driven control and reinforcement learning.LI Yongqiang, Ph.D., lecturer. His research interests include data-driven control and optimal control.FENG Yuanjing(Corresponding author), Ph.D., professor. His research interests include image processing and intelligent optimization. |
|
|
|
[1] LEWIS F L, VRABIE D. Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control. IEEE Circuits and Systems Magazine, 2009, 9(3): 32-50. [2] LIU L, WANG Z S, ZHANG H G, et al . Neural-Network-Based Robust Optimal Tracking Control for MIMO Discrete-Time Systems with Unknown Uncertainty Using Adaptive Critic Design. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(4): 1239-1251. [3] KHAN S G, HERRMANN G, LEWIS F L, et al . Reinforcement Learning and Optimal Adaptive Control: An Overview and Implementation Examples. Annual Reviews in Control, 2012, 36(1): 42-59. [4] WATKINS C. Learning from Delayed Rewards. Ph.D. Dissertation. Cambridge, UK: King′s College of Cambridge, 1989. [5] MITCHELL T M. Machine Learning. New York, USA: McGraw-Hill Science Engineering, 1997. [6] LUO B, LIU D R, HUANG T W, et al . Model-Free Optimal Tracking Control via Critic-Only Q-learning. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2134-2144. [7] WEI Q L, LEWIS F L, SUN Q Y, et al . Discrete-Time Deterministic Q-learning: A Novel Convergence Analysis. IEEE Transactions on Cybernetics, 2017, 47(5): 1224-1237. [8] WEI Q L, LIU D R, SHI G. A Novel Dual Iterative Q-learning Method for Optimal Battery Management in Smart Residential Environments. IEEE Transactions on Industrial Electronics, 2015, 62(4): 2509-2518. [9] ZHAO D B, ZHU Y H. MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(2): 346-356. [10] LI Y Q, HOU Z S. Data-Driven Asymptotic Stabilization for Discrete-Time Nonlinear Systems. Systems and Control Letters, 2014, 64: 79-85. [11] SILVER D, LEVER G, HEESS N, et al . Deterministic Policy Gra- dient Algorithms[C/OL]. [2018-09-25]. http://proceedings.mlr.press/v32/silver14.pdf. [12] LILLIERAP T P, HUNT J J, PRITZEL A, et al . Continuous Control with Deep Reinforcement Learning[C/OL]. [2018-09-25]. https://arxiv.org/pdf/1509.02971.pdf. [13] XU J X, HOU Z S. Notes on Data-Driven System Approaches. Acta Automatica Sinica, 2009, 35(6): 668-675. [14] VAMVOUDAKIS K G. Non-zero Sum Nash Q-learning for Unknown Deterministic Continuous-Time Linear Systems. Automatica, 2015, 61: 274-281. [15] RASMUSSEN C E, NICKISCH H. The GPML Toolbox Version 3.6[DB/OL]. [2018-09-25].http://www.gaussianprocess.org/gpml/code/matlab/doc. |
|
|
|