SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY

doi:10.2316/J.2019.206-0216

SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY

Tao Yan, Wenan Zhang, Simon X. Yang, and Li Yu

References

[1] D. Silver, A. Huang, C.J. Maddison, et al., Mastering thegame of go with deep neural networks and tree search, Nature,529(7587), 2016, 484–489.
[2] A.Y. Ng, A. Coates, M. Diel, et al., Autonomous inverted heli-copter ﬂight via reinforcement learning, Experimental RoboticsIX, Springer, Berlin, Heidelberg, 2006, 363–372.
[3] S. Levine, C. Finn, T. Darrell, and P. Abbeel, End-to-endtraining of deep visuomotor policies, The Journal of MachineLearning Research, 17(1), 2016, 1334–1373.
[4] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, Continuous deepQ-learning with model-based acceleration, Proc. InternationalConf. on Machine Learning, New York City, NY, USA, 2016,2829–2838.
[5] M.P. Deisenroth, C.E. Rasmussen, and D. Fox, Learning to con-trol a low-cost manipulator using data-eﬃcient reinforcementlearning, Proc. Robotics: Science and Systems, Los Angeles,California, USA, 2011, 57–64.
[6] J. Gl¨ascher, N. Daw, P. Dayan, and J.P. O’Doherty, Statesversus rewards: dissociable neural prediction error signalsunderlying model-based and model-free reinforcement learning,Neuron, 66(4), 2010, 585–595.
[7] V. Mnih, A.P. Badia, M. Mirza, et al., Asynchronous methodsfor deep reinforcement learning, Proc. International Conf. onMachine Learning, New York City, NY, USA, 2016, 1928–1937.
[8] J. Schulman, S. Levine, P. Abbeel, et al., Trust region policyoptimization, Proc. International Conf. on Machine Learning,Lille, France, 2015, 1889–1897.
[9] V. Mnih, K. Kavukcuoglu, D. Silver, et al., Human-level controlthrough deep reinforcement learning, Nature, 518(7540), 2015,529.
[10] S. Bhatnagar, D. Precup, D. Silver, et al., Convergent temporal-diﬀerence learning with arbitrary smooth function approx-imation, Proc. Advances in Neural Information ProcessingSystems, Vancouver, BC, Canada, 2009, 1204–1212.
[11] T.P. Lillicrap, J.J. Hunt, A. Pritzel, et al., Continuous controlwith deep reinforcement learning, Proc. International Conf.on Learning Representations, San Juan, Puerto Rico, 2016.
[12] Y. Duan, X. Chen, R. Houthooft, et al., Benchmarking deepreinforcement learning for continuous control, Proc. Interna-tional Conf. on Machine Learning, New York City, NY, USA,2016, 1329–1338.
[13] B.D. Ziebart, A.L. Maas, J.A. Bagnell, and A.K. Dey, Max-imum entropy inverse reinforcement learning, Proc. AAAI,Chicago, Illinois, USA, Vol. 8, 2008, 1433–1438.
[14] T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, Reinforcementlearning with deep energy-based policies, Proc. InternationalConf. on Machine Learning, Sydney, Australia, 2016, 1352–1361.
[15] B.D. Ziebart, Modeling purposeful adaptive behavior with theprinciple of maximum causal entropy (Seattle: University ofWashington, 2010).
[16] M. Andrychowicz, F. Wolski, A. Ray, et al., Hindsight experi-ence replay, Proc. Advances in Neural Information ProcessingSystems, Long Beach, California, USA, 2017, 5048–5058.
[17] I. Popov, N. Heess, T. Lillicrap, et al., Data-eﬃcient deepreinforcement learning for dexterous manipulation, arXiv:1704.03073, 2017.
[18] B. Sallans and G.E. Hinton, Reinforcement learning withfactored states and actions, Journal of Machine LearningResearch, 5, 2004, 1063–1088.
[19] B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih,PGQ: Combining policy gradient and Q-learning, arXivpreprint arXiv:1611.01626, 2016.
[20] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, Soft actor-critic: Oﬀ-policy maximum entropy deep reinforcement learn-ing with a stochastic actor, Proc. International Conf. onMachine Learning, Stockholm, Sweden, 2018, 1856–1865.
[21] T. Schaul, D. Horgan, K. Gregor, and D. Silver, Universalvalue function approximators, Proc. International Conf. onMachine Learning, 2015, 1312–1320.
[22] E. Todorov, T. Erez, and Y. Tassa, Mujoco: A physicsengine for model-based control, Proc. Intelligent Robots andSystems (IROS), 2012 IEEE/RSJ International Conf. on,IEEE, Vilamoura, Algarve, Portugal, 2012, 5026–5033.
[23] G.E. Uhlenbeck and L.S. Ornstein, On the theory of theBrownian motion, Physical Review, 36(5), 1930, 823.

Important Links:

Abstract
DOI: 10.2316/J.2019.206-0216
From Journal (206) International Journal of Robotics and Automation - 2019

Go Back