CPG BASED RL ALGORITHM LEARNS TO CONTROL OF A HUMANOID ROBOT LEG

doi:10.2316/Journal.206.2015.2.206-4185

CPG BASED RL ALGORITHM LEARNS TO CONTROL OF A HUMANOID ROBOT LEG

Önder Tutsoy

References

[1] A.G. Barto, R.S. Sutton, and C.W. Anderson, Neuronlikeadaptive elements that can solve diﬃcult learning controlproblems, Artiﬁcial Neural Networks, 13, 1983, 81–93.
[2] R.S. Sutton and A.G. Barto, Reinforcement learning: Anintroduction (Cambridge: The MIT Press, 1998), 1–300.
[3] R.S. Sutton, Learning to predict by the methods of temporaldiﬀerences, Machine Learning, 3, 1998, 9–44.
[4] D. Kenji, Reinforcement learning in continuous time and space,Neural Computation, 12, 2000, 219–245.
[5] M.R. Lagoudakis, R. Parr, and M. Littman, Least-squaresmethods in reinforcement learning for control, Methods andApplications of Artiﬁcial Intelligence, 2308, 2002, 249–260.
[6] L. Busoniu, B.D. Schutter, and R. Babuska, Decentralized re-inforcement learning control of a robotic manipulator, Control,Automation, Robotics and Vision, 9th International Conf.,2006, 1–6.
[7] H. Benbrahim and J.A. Franklin, Biped dynamic walking usingreinforcement learning, Robotics and Autonomous Systems, 22,1996, 283–302.
[8] G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, andG. Cheng, Learning CPG-based biped locomotion with apolicy gradient method: Application to a humanoid robot,International Journal of Robotic Research, 27, 2008, 213–228.
[9] G.L. Liu, M.K. Habib, K. Watanabe, and K. Izumi, Centralpattern generators based on Matsuoka oscillators for the loco-motion of biped robots, Artiﬁcial Life and Robotics, 12, 2008,263–269.
[10] N. Zeitlin, Reinforcement learning methods to enable auto-matic tuning of legged robots, Technical report, University ofCalifornia, Berkeley, 2012.
[11] A. Kralj, R.J. Jaeger, and M. Munih, Analysis of standing upand sitting down in humans: Deﬁnitions and normative datapresentation, Journal of Biomechanics, 23, 1990, 1123–1138.
[12] C. Rougier, J. Meunier, A. St-Anaud, and J. Rousseau, Falldetection from human shape and motion history using videosurveillance, 21st International Conf. on Advanced InformationNetworking and Applications Workshops, AINAW ’07, 2007,875–880.
[13] J. Morimoto and K. Doya, Reinforcement learning of dynamicmotor sequence: learning to stand up, Proc. IEEE/RSJ In-ternational Conf. on Intelligent Robots and Systems, 1998,1721–1726.
[14] J.N. Tsitsiklis, On the convergence of optimistic policy iter-ation, The Journal of Machine Learning Research, 3, 2002,59–72.
[15] M. Geist and O. Pietquin, Parametric value function approx-imation: A uniﬁed view, 2011 IEEE Symposium on AdaptiveDynamic Programming and Reinforcement Learning (AD-PRL), 9–16.
[16] F.L. Lewis and D. Vrabie, Reinforcement learning and adaptivedynamic programming for feedback control, IEEE Circuits andSystem Magazine, 9, 2009, 32–50.
[17] J.N. Tsitsiklis and B.V. Roy, An analysis of temporal-diﬀerencelearning with function approximation, IEEE Transactions onAutomatic Control, 42, 1997, 674–690.
[18] L. Baird, Residual algorithms: Reinforcement learning withfunction approximation, Proc. of the Twelfth InternationalConf. on Machine Learning, 1995, 30–37.
[19] H. Kimura and S. Kobayashi, An analysis of actor/criticalgorithms using eligibility traces: Reinforcement learningwith imperfect value function, in Proceedings of the FifteenthInternational Conference on Machine Learning, 1998, 278–286.
[20] K. Matsuoka, N. Ohyama, A. Watanabe, and M. Ooshima,Control of a giant swing robot using a neural oscillator,Advances in Natural Computation, Lecture Notes in ComputerScience, 3611, Springer, 2005, 274–282.
[21] M. Tokic, Adaptive ε-greedy exploration in reinforcementlearning based on value diﬀerences, Proc. of the 33rd AnnualGerman Conf. on Advances in Artiﬁcial Intelligence, 2010,203–210.
[22] O. Tutsoy, M. Brown, and H. Wang, Reinforcement learningalgorithm application and multi-body system design by usingMapleSim and Modelica, International Conf. on AdvancedMechatronic Systems (ICAMechS), 2012, 650–655.
[23] Y. Nakamura, T. Mori, M.A. Sato, and S. Ishii, Reinforcementlearning for a biped robot based on a CPG-actor-critic method,Neural Networks, 20, 2007, 723–735.
[24] MapleSim used in the creation of breakthrough vehi-cle driving simulator technology. Maplesoft [online], 2012,http://www.maplesoft.com/company/publications/articles/.
[25] P. Goossens and T. Richard, (2011, Nov.) Using symbolic tech-nology to derive inverse kinematic solutions for actuator con-trol development. Maplesoft, Waterloo, Canada [Online]. Avail-able: http://www.maplesoft.com/Whitepapers/Mathmod2012_pgoossens_trichard_preprint.

Important Links:

Abstract
DOI: 10.2316/Journal.206.2015.2.206-4185
From Journal (206) International Journal of Robotics and Automation - 2015

Go Back