Tao Yan, Wenan Zhang, Simon X. Yang, and Li Yu
[1] D. Silver, A. Huang, C.J. Maddison, et al., Mastering thegame of go with deep neural networks and tree search, Nature,529(7587), 2016, 484–489. [2] A.Y. Ng, A. Coates, M. Diel, et al., Autonomous inverted heli-copter flight via reinforcement learning, Experimental RoboticsIX, Springer, Berlin, Heidelberg, 2006, 363–372. [3] S. Levine, C. Finn, T. Darrell, and P. Abbeel, End-to-endtraining of deep visuomotor policies, The Journal of MachineLearning Research, 17(1), 2016, 1334–1373. [4] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, Continuous deepQ-learning with model-based acceleration, Proc. InternationalConf. on Machine Learning, New York City, NY, USA, 2016,2829–2838. [5] M.P. Deisenroth, C.E. Rasmussen, and D. Fox, Learning to con-trol a low-cost manipulator using data-efficient reinforcementlearning, Proc. Robotics: Science and Systems, Los Angeles,California, USA, 2011, 57–64. [6] J. Gl¨ascher, N. Daw, P. Dayan, and J.P. O’Doherty, Statesversus rewards: dissociable neural prediction error signalsunderlying model-based and model-free reinforcement learning,Neuron, 66(4), 2010, 585–595. [7] V. Mnih, A.P. Badia, M. Mirza, et al., Asynchronous methodsfor deep reinforcement learning, Proc. International Conf. onMachine Learning, New York City, NY, USA, 2016, 1928–1937. [8] J. Schulman, S. Levine, P. Abbeel, et al., Trust region policyoptimization, Proc. International Conf. on Machine Learning,Lille, France, 2015, 1889–1897. [9] V. Mnih, K. Kavukcuoglu, D. Silver, et al., Human-level controlthrough deep reinforcement learning, Nature, 518(7540), 2015,529. [10] S. Bhatnagar, D. Precup, D. Silver, et al., Convergent temporal-difference learning with arbitrary smooth function approx-imation, Proc. Advances in Neural Information ProcessingSystems, Vancouver, BC, Canada, 2009, 1204–1212. [11] T.P. Lillicrap, J.J. Hunt, A. Pritzel, et al., Continuous controlwith deep reinforcement learning, Proc. International Conf.on Learning Representations, San Juan, Puerto Rico, 2016. [12] Y. Duan, X. Chen, R. Houthooft, et al., Benchmarking deepreinforcement learning for continuous control, Proc. Interna-tional Conf. on Machine Learning, New York City, NY, USA,2016, 1329–1338. [13] B.D. Ziebart, A.L. Maas, J.A. Bagnell, and A.K. Dey, Max-imum entropy inverse reinforcement learning, Proc. AAAI,Chicago, Illinois, USA, Vol. 8, 2008, 1433–1438. [14] T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, Reinforcementlearning with deep energy-based policies, Proc. InternationalConf. on Machine Learning, Sydney, Australia, 2016, 1352–1361. [15] B.D. Ziebart, Modeling purposeful adaptive behavior with theprinciple of maximum causal entropy (Seattle: University ofWashington, 2010). [16] M. Andrychowicz, F. Wolski, A. Ray, et al., Hindsight experi-ence replay, Proc. Advances in Neural Information ProcessingSystems, Long Beach, California, USA, 2017, 5048–5058. [17] I. Popov, N. Heess, T. Lillicrap, et al., Data-efficient deepreinforcement learning for dexterous manipulation, arXiv:1704.03073, 2017. [18] B. Sallans and G.E. Hinton, Reinforcement learning withfactored states and actions, Journal of Machine LearningResearch, 5, 2004, 1063–1088. [19] B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih,PGQ: Combining policy gradient and Q-learning, arXivpreprint arXiv:1611.01626, 2016. [20] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learn-ing with a stochastic actor, Proc. International Conf. onMachine Learning, Stockholm, Sweden, 2018, 1856–1865. [21] T. Schaul, D. Horgan, K. Gregor, and D. Silver, Universalvalue function approximators, Proc. International Conf. onMachine Learning, 2015, 1312–1320. [22] E. Todorov, T. Erez, and Y. Tassa, Mujoco: A physicsengine for model-based control, Proc. Intelligent Robots andSystems (IROS), 2012 IEEE/RSJ International Conf. on,IEEE, Vilamoura, Algarve, Portugal, 2012, 5026–5033. [23] G.E. Uhlenbeck and L.S. Ornstein, On the theory of theBrownian motion, Physical Review, 36(5), 1930, 823.
Important Links:
Go Back