[1] OSA T, PAJARINEN J, NEUMANN G, et al. An Algorithmic Perspective on Imitation Learning. Foundations and Trends® in Robo-tics, 2018, 7(1/2): 1-179.
[2] SUTTON R S, BARTO A G.Reinforcement Learning: An Introduction. Cambridge, USA: The MIT Press, 1998.
[3] AKKAYA I, ANDRYCHOWICZ M, CHOCIEJ M, et al. Solving Rubik's Cube with a Robot Hand[C/OL].[2021-12-26]. https://arxiv.org/pdf/1910.07113.pdf.
[4] LEVINE S, FINN C, DARRELL T, et al. End-to-End Training of Deep Visuomotor Policies. Journal of Machine Learning Research, 2016, 17(1): 1334-1373.
[5] FAZELI N, OLLER M, WU J, et al. See, Feel, Act: Hierarchical Learning for Complex Manipulation Skills with Multisensory Fusion. Science Robotics, 2019, 4(26). DOI: 10.1126/scirobotics.aav3123.
[6] FISAC J F, AKAMETALU A K, ZEILINGER M N, et al. A Gene-ral Safety Framework for Learning-Based Control in Uncertain Robotic Systems. IEEE Transactions on Automatic Control, 2019, 64(7): 2737-2752.
[7] KROEMER O, NIEKUM S, KONIDARIS G.A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms. Journal of Machine Learning Research, 2021, 22: 1-82.
[8] BELLMAN R.On the Theory of Dynamic Programming. Proceedings of the National Academy of Sciences of the United States of America, 1952, 38(8): 716-719.
[9] MOERLAND T M, BROEKENS J, JONKER C M.Model-Based Reinforcement Learning: A Survey[C/OL]. [2021-12-26].https://arxiv.org/pdf/2006.16712v3.pdf.
[10] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the Game of Go without Human Knowledge. Nature, 2017, 550(7676): 354-359.
[11] SUTTON R S.Dyna, An Integrated Architecture for Learning, Pla-nning, and Reacting. ACM SIGART Bulletin, 1991, 2(4): 160-163.
[12] SUTTON R S.Planning by Incremental Dynamic Programming // Proc of the 8th International Conference on Machine Learning. San Diego, USA: JMLR, 1991: 353-357.
[13] SUTTON R S, SZEPESVÁRI C, GERAMIFARD A, et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping // Proc of the 24th Conference on Uncertainty in Artificial Intelligence. New York, USA: ACM, 2008: 528-536.
[14] PARR R, LI L H, TAYLOR G, et al. An Analysis of Linear Mo-dels, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning // Proc of the 25th International Con-ference on Machine learning. San Diego, USA: JMLR, 2008: 752-759.
[15] HESTER T, STONE P.Learning and Using Models // WIERING M, VAN OTTERLO M, eds. Reinforcement Learning. Berlin, Ger-many: Springer, 2012: 111-141
[16] JONG N K, STONE P.Model-Based Function Approximation in Reinforcement Learning // Proc of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. New York, USA: ACM, 2007: 1-8.
[17] HESTER T, STONE P.TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots. Machine Learning, 2013, 90: 385-429.
[18] MÜLLER K R, SMOLA A J, RÄTSCH G, et al. Predicting Time Series with Support Vector Machines // Proc of the International Conference on Artificial Neural Networks. Berlin, Germany: Springer, 1997: 999-1004.
[19] FU J, LEVINE S, ABBEEL P.One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors // Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington, USA: IEEE, 2016: 4019-4026.
[20] WAHLSTRÖM N, SCHÖN T B, DEISENROTH M P. From Pixels to Torques: Policy Learning with Deep Dynamical Models[C/OL].[2021-12-26]. https://arxiv.org/pdf/1502.02251v2.pdf.
[21] DEISENROTH M P, FOX D, RASMUSSEN C E.Gaussian Pro-cesses for Data-Efficient Learning in Robotics and Control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(2): 408-423.
[22] KOPICKI M, ZUREK S, STOLKIN R, et al. Learning Modular and Transferable Forward Models of the Motions of Push Manipulated Objects. Autonomous Robots, 2017, 41(5): 1061-1082.
[23] LIN L J.Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching. Machine Learning, 1992, 8: 293-321.
[24] VAN HASSELT H P, HESSEL M, ASLANIDES J. When to Use Parametric Models in Reinforcement Learning? // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2019: 14345-14356.
[25] DER KIUREGHIAN A, DITLEVSEN O.Aleatory or Epistemic? Does It Matter? Structural Safety, 2009, 31(2): 105-112.
[26] LANG T, TOUSSAINT M, KERSTING K.Exploration in Relatio-nal Domains for Model-Based Reinforcement Learning. Journal of Machine Learning Research, 2012, 13: 3725-3768.
[27] MOERLAND T M, BROEKENS J, JONKER C M.Learning Multimodal Transition Dynamics for Model-Based Reinforcement Lear-ning[C/OL]. [2021-12-26].https://arxiv.org/pdf/1705.00470.pdf.
[28] SCHOLZ J, LEVIHN M, ISBELL C, et al. A Physics-Based Mo-del Prior for Object-Oriented MDPs. Proceedings of Machine Lear-ning Research, 2014, 32(2): 1089-1097.
[29] HÖGMAN V, BJÖRKMAN M, MAKI A, et al. A Sensorimotor Learning Framework for Object Categorization. IEEE Transactions on Cognitive and Developmental Systems, 2016, 8(1): 15-25.
[30] DEISENROTH M P, RASMUSSEN C E.PILCO: A Model-Based and Data-Efficient Approach to Policy Search // Proc of the 28th International Conference on Machine Learning. San Diego, USA: JMLR, 2011: 465-472.
[31] GAL Y, MCALLISTER R T, RASMUSSEN C E. Improving PILCO with Bayesian Neural Network Dynamics Models[C/OL]. [2021-12-26]. http://mlg.eng.cam.ac.uk/yarin/PDFs/DeepPILCO.pdf.
[32] KURUTACH T, CLAVERA I, DUAN Y, et al. Model-Ensemble Trust-Region Policy Optimization[C/OL].[2021-12-26]. https://arxiv.org/pdf/1802.10592v1.pdf.
[33] KE N R, SINGH A, TOUATI A, et al. Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future[C/OL].[2021-12-26]. https://arxiv.org/pdf/1903.01599v2.pdf.
[34] VENKATRAMAN A, HEBERT M, BAGNELL J A.Improving Multi-step Prediction of Learned Time Series Models // Proc of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2015: 3024-3030.
[35] ABBEEL P, NG A Y. Learning First-Order Markov Models for Control[C/OL]. [2021-12-26]. http://ai.stanford.edu/~ang/papers/nips04-controlmodel.pdf.
[36] ASADI K, CATER E, MISRA D, et al. Towards a Simple Approach to Multi-step Model-Based Reinforcement Learning[C/OL].[2021-12-26]. https://arxiv.org/pdf/1811.00128.pdf.
[37] MISHRA N, ABBEEL P, MORDATCH I.Prediction and Control with Temporal Segment Models // Proc of the 34th International Conference on Machine Learning. San Diego, USA: JMLR, 2017: 2459-2468.
[38] FINN C, GOODFELLOW I, LEVINE S.Unsupervised Learning for Physical Interaction through Video Prediction // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2016: 64-72
[39] FINN C, LEVINE S.Deep Visual Foresight for Planning Robot Motion // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2017: 2786-2793.
[40] EBERT F, FINN C, DASARI S, et al. Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control[C/OL].[2021-12-26]. https://arxiv.org/pdf/1812.00568.pdf.
[41] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-Level Control through Deep Reinforcement Learning. Nature, 2015, 518(7540): 529-533.
[42] VAN HASSELT H, GUEZ A, SILVER D.Deep Reinforcement Learning with Double Q-Learning // Proc of the 30th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2016: 2094-2100.
[43] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling Network Architectures for Deep Reinforcement Learning[C/OL].[2021-12-26].https://arxiv.org/pdf/1511.06581.pdf.
[44] HAUSKNECHT M, STONE P.Deep Recurrent Q-Learning for Partially Observable MDPs[C/OL]. [2021-12-26].https://arxiv.org/pdf/1507.06527v4.pdf.
[45] BELLEMARE M G, DABNEY W, MUNOS R.A Distributional Perspective on Reinforcement Learning[C/OL]. [2021-12-26].https://arxiv.org/pdf/1707.06887.pdf.
[46] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized Experience Replay[C/OL].[2021-12-26]. https://arxiv.org/pdf/1511.05952.pdf.
[47] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy Networks for Exploration[C/OL].[2021-12-26]. https://arxiv.org/pdf/1706.10295.pdf.
[48] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: Combining Improvements in Deep Reinforcement Learning // Proc of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2018: 3215-3222.
[49] KINGMA D P, WELLING M.Auto-Encoding Variational Bayes[C/OL]. [2021-12-26].https://arxiv.org/pdf/1312.6114v10.pdf.
[50] WATTER M, SPRINGENBERG J T, BOEDECKER J, et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images[C/OL].[2021-12-26]. https://arxiv.org/pdf/1506.07365.pdf.
[51] ZHANG M, VIKRAM S, SMITH L, et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning. Proceedings of Machine Learning Research, 2019, 97: 7444-7453.
[52] HA D, SCHMIDHUBER J.World Models[C/OL]. [2021-12-26].https://arxiv.org/pdf/1803.10122.pdf.
[53] DIUK C, COHEN A, LITTMAN M L.An Object-Oriented Representation for Efficient Reinforcement Learning // Proc of the 25th International Conference on Machine Learning. San Diego, USA: JMLR, 2008: 240-247.
[54] FRAGKIADAKI K, AGRAWAL P, LEVINE S, et al. Learning Visual Predictive Models of Physics for Playing Billiards[C/OL].[2021-12-26]. https://arxiv.org/pdf/1511.07404.pdf.
[55] KANSKY K, SILVER T, MÉLY D A, et al. Schema Networks: Zero-Shot Transfer with a Generative Causal Model of Intuitive Physics // Proc of the 34th International Conference on Machine Learning. San Diego, USA: JMLR, 2017: 1809-1818.
[56] VAN STEENKISTE S, CHANG M, GREFF K, et al. Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and Their Interactions[C/OL].[2021-12-26]. https://arxiv.org/pdf/1802.10353v1.pdf.
[57] BATTAGLIA P W, PASCANU R, LAI M, et al. Interaction Networks for Learning about Objects, Relations and Physics[C/OL].[2021-12-26]. https://arxiv.org/pdf/1612.00222.pdf.
[58] KIPF T, VAN DER POL E, WELLING M. Contrastive Learning of Structured World Models[C/OL].[2021-12-26]. https://arxiv.org/pdf/1911.12247.pdf.
[59] WATTERS N, MATTHEY L, BOSNJAK M, et al. Cobra: Data-Efficient Model-Based Rl through Unsupervised Object Discovery and Curiosity-Driven Exploration[C/OL].[2021-12-26]. https://arxiv.org/pdf/1905.09275v1.pdf.
[60] BURGESS C P, MATTHEY L, WATTERS N, et al. MONet: Un-supervised Scene Decomposition and Representation[C/OL].[2021-12-26]. https://arxiv.org/pdf/1901.11390.pdf.
[61] VEERAPANENI R, CO-REYES J D, CHANG M, et al. Entity Abstraction in Visual Model-Based Reinforcement Learning. Proceedings of Machine Learning Research, 2020, 100: 1439-1456.
[62] MA X, CHEN S W, HSU D, et al. Contrastive Variational Reinforcement Learning for Complex Observations. Proceedings of Machine Learning Research, 2021, 155: 959-972.
[63] SERMANET P, LYNCH C, CHEBOTAR Y, et al. Time-Contrastive Networks: Self-Supervised Learning from Video // Proc of the IEEE International Conference on Robotics and Automation. Wa-shington, USA: IEEE, 2018: 1134-1141.
[64] GHOSH D, GUPTA A, LEVINE S.Learning Actionable Representations with Goal-Conditioned Policies[C/OL]. [2021-12-26].https://arxiv.org/pdf/1811.07819.pdf.
[65] JONSCHKOWSKI R, BROCK O.Learning State Representations with Robotic Priors. Autonomous Robots, 2015, 39(3): 407-428.
[66] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-Driven Exploration by Self-Supervised Prediction // Proc of the 34th International Conference on Machine Learning. San Diego, USA: JMLR, 2017: 2778-2787.
[67] AGRAWAL P, NAIR A, ABBEEL P, et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics[C/OL].[2021-12-26]. https://arxiv.org/pdf/1606.07419.pdf.
[68] SHELHAMER E, MAHMOUDIEH P, ARGUS M, et al. Loss Is Its Own Reward: Self-Supervision for Reinforcement Learning[C/OL].[2021-12-26]. https://arxiv.org/pdf/1612.07307v1.pdf.
[69] ZHANG A, SATIJA H, PINEAU J.Decoupling Dynamics and Reward for Transfer Learning[C/OL]. [2021-12-26].https://arxiv.org/pdf/1804.10689.pdf.
[70] SAWADA Y.Disentangling Controllable and Uncontrollable Fac-tors of Variation by Interacting with the World[C/OL]. [2021-12-26].https://arxiv.org/pdf/1804.06955v2.pdf.
[71] THOMAS V, BENGIO E, FEDUS W, et al. Disentangling the Independently Controllable Factors of Variation by Interacting with the World[C/OL].[2021-12-26]. https://arxiv.org/pdf/1802.09484.pdf.
[72] CHOI J, GUO Y J, MOCZULSKI M, et al. Contingency-Aware Exploration in Reinforcement Learning[C/OL].[2021-12-26]. https://arxiv.org/pdf/1811.01483.pdf.
[73] GRIMM C, BARRETO A, SINGH S, et al. The Value Equivalence Principle for Model-Based Reinforcement Learning[C/OL].[2021-12-26]. https://arxiv.org/pdf/2011.03506.pdf.
[74] GRIMM C, BARRETO A, FARQUHAR G, et al. Proper Value Equivalence[C/OL].[2021-12-26]. https://arxiv.org/pdf/2106.10316v1.pdf.
[75] FARQUHAR G, BAUMLI K, MARINHO Z, et al. Self-Consistent Models and Values[C/OL].[2021-12-26]. https://arxiv.org/pdf/2110.12840.pdf.
[76] SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 2020, 588(7839): 604-609.
[77] SILVER D, VAN HASSELT H, HESSEL M, et al. The Predictron: End-to-End Learning and Planning. Proceedings of Machine Learning Research, 2017, 70: 3191-3199.
[78] OH J, SINGH S, LEE H.Value Prediction Network[C/OL]. [2021-12-26].https://arxiv.org/pdf/1707.03497.pdf.
[79] LENZ I, KNEPPER R, SAXENA A. DeepMPC: Learning Deep Latent Features for Model Predictive Control[C/OL]. [2021-12-26]. http://www.roboticsproceedings.org/rss11/p12.pdf.
[80] HUBER A, GERDTS M.A Dynamic Programming MPC Approach for Automatic Driving along Tracks and Its Realization with Online Steering Controllers. IFAC-PapersOnLine, 2017, 50(1): 8686-8691.
[81] ZHANG T H, KAHN G, LEVINE S, et al. Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2016: 528-535.
[82] HAFNER D, LILLICRAP T, FISCHER I, et al. Learning Latent Dynamics for Planning from Pixels // Proceedings of the 36th International Conference on Machine Learning. San Diego, USA: JMLR, 2019: 2555-2565.
[83] VOLPI N C, WU Y, OGNIBENE D.Towards Event-Based MCTS for Autonomous Cars // Proc of the AFSIPA Annual Summit and Conference. Washington, USA: IEEE, 2017: 420-427.
[84] HESTER T, QUINLAN M, STONE P.RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2012: 85-90.
[85] LEVINE S, KOLTUN V. Guided Policy Search[C/OL]. [2021-12-26]. http://proceedings.mlr.press/v28/levine13.pdf.
[86] LEVINE S, ABBEEL P.Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics // Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2014, I: 1071-1079.
[87] KALWEIT G, BOEDECKER J.Uncertainty-Driven Imagination for Continuous Deep Reinforcement Learning. Proceedings of Machine Learning Research, 2017, 78: 195-206.
[88] GU S X, LILLICRAP T, SUTSKEVER I, et al. Continuous Deep Q-Learning with Model-Based Acceleration // Proc of the 33rd International Conference on Machine Learning. San Diego, USA: JMLR, 2016: 2829-2838.
[89] FEINBERG V, WAN A, STOICA I, et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning[C/OL].[2021-12-26]. https://arxiv.org/pdf/1803.00101v1.pdf.
[90] BUCKMAN J, HAFNER D, TUCKER G, et al.Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion // Proc of the 32nd Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2018: 8224-8234.
[91] LUO Y P, XU H Z, LI Y Z, et al. Algorithmic Framework for Model-Based Deep Reinforcement Learning with Theoretical Gua-rantees[C/OL].[2021-12-26].https://arxiv.org/pdf/1807.03858v3.pdf.
[92] JANNER M, FU J, ZHANG M, et al. When to Trust Your Model: Model-Based Policy Optimization[C/OL].[2021-12-26]. https://arxiv.org/pdf/1906.08253v2.pdf.
[93] LAI H, SHEN J, ZHANG W N, et al. Bidirectional Model-Based Policy Optimization[C/OL].[2021-12-26]. https://arxiv.org/pdf/2007.01995.pdf.
[94] HEESS N, WAYNE G, SILVER D, et al.Learning Continuous Control Policies by Stochastic Value Gradients // Proc of the 28th International Conference on Neural Information Processing Sys-tems. Cambridge, USA: The MIT Press, 2015, II: 2944-2952.
[95] CLAVERA I, FU V, ABBEEL P.Model-Augmented Actor-Critic: Backpropagating through Paths[C/OL]. [2021-12-26].https://arxiv.org/pdf/2005.08068.pdf.
[96] HAFNER D, LILLICRAP T, BA J, et al. Dream to Control: Learning Behaviors by Latent Imagination[C/OL].[2021-12-26]. https://arxiv.org/pdf/1912.01603v1.pdf.
[97] TAMAR A, LEVINE S, ABBEEL P, et al. Value Iteration Networks[C/OL].[2021-12-26]. https://arxiv.org/pdf/1602.02867v1.pdf.
[98] SRINIVAS A, JABRI A, ABBEEL P, et al. Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control. Proceedings of Machine Learning Research, 2018, 80: 4732-4741.
[99] KARKUS P, HSU D, LEE W S.QMDP-Net: Deep Learning for Planning under Partial Observability[C/OL]. [2021-12-26].https://arxiv.org/pdf/1703.06692.pdf.
[100] GUEZ A, WEBER T, ANTONOGLOU I, et al. Learning to Search with MCTSnets[C/OL].[2021-12-26]. https://arxiv.org/pdf/1802.04697.pdf.
[101] RACANIÈRE S, WEBER T, REICHERT D P, et al. Imagination-Augmented Agents for Deep Reinforcement Learning // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2017: 5694-5705.
[102] PASCANU R, LI Y J, VINYALS O, et al. Learning Model-Based Planning from Scratch[C/OL].[2021-12-26]. https://arxiv.org/pdf/1707.06170v1.pdf.
[103] NAIR A, PONG V, DALAL M, et al. Visual Reinforcement Lear-ning with Imagined Goals[C/OL].[2021-12-26]. https://arxiv.org/pdf/1807.04742v2.pdf.
[104] LOWREY K, RAJESWARAN A, KAKADE S, et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control[C/OL].[2021-12-26]. https://arxiv.org/pdf/1811.01848v1.pdf.
[105] TODOROV E, EREZ T, TASSA Y.MuJoCo: A Physics Engine for Model-Based Control // Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington, USA: IEEE, 2012: 5026-5033.
[106] SEKAR R, RYBKIN O, DANIILIDIS K, et al. Planning to Explore via Self-Supervised World Models[C/OL].[2021-12-26]. https://arxiv.org/pdf/2005.05960.pdf.
[107] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight Experience Replay[C/OL].[2021-12-26]. https://arxiv.org/pdf/1707.01495v1.pdf.
[108] KOENIG N, HOWARD A. Design and Use Paradigms for Gazebo, an Open-Source Multi-robot Simulator // Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington, USA: IEEE, 2004, III: 2149-2154.
[109] TOBIN J, FONG R, RAY A, et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World // Proc of the IEEE/RSJ International Conference on Inte-lligent Robots and Systems. Washington, USA: IEEE, 2017: 23-30.
[110] PENG X B, ANDRYCHOWICZ M, ZAREMBA W, et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2018: 3803-3810.
[111] ANDRYCHOWICZ O A I M, BAKER B, CHOCIEJ M, et al. Learning Dexterous In-Hand Manipulation. International Journal of Robotics Research, 2020, 39(1): 3-20.
[112] BOUSMALIS K, IRPAN A, WOHLHART P, et al. Using Simu-lation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2018: 4243-4250.
[113] JAMES S, WOHLHART P, KALAKRISHNAN M, et al. Sim-to-Real via Sim-to-Sim: Data-Efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 12627-12637.
[114] CHRISTIANO P, SHAH Z, MORDATCH I, et al. Transfer from Simulation to Real World through Learning Deep Inverse Dyna-mics Model[C/OL].[2021-12-26]. https://arxiv.org/pdf/1610.03518v1.pdf.
[115] NAGABANDI A, CLAVERA I, LIU S M, et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning[C/OL].[2021-12-26]. https://arxiv.org/pdf/1803.11347v6.pdf. |