CPG BASED RL ALGORITHM LEARNS TO CONTROL OF A HUMANOID ROBOT LEG

Önder Tutsoy

References

  1. [1] A.G. Barto, R.S. Sutton, and C.W. Anderson, Neuronlikeadaptive elements that can solve difficult learning controlproblems, Artificial Neural Networks, 13, 1983, 81–93.
  2. [2] R.S. Sutton and A.G. Barto, Reinforcement learning: Anintroduction (Cambridge: The MIT Press, 1998), 1–300.
  3. [3] R.S. Sutton, Learning to predict by the methods of temporaldifferences, Machine Learning, 3, 1998, 9–44.
  4. [4] D. Kenji, Reinforcement learning in continuous time and space,Neural Computation, 12, 2000, 219–245.
  5. [5] M.R. Lagoudakis, R. Parr, and M. Littman, Least-squaresmethods in reinforcement learning for control, Methods andApplications of Artificial Intelligence, 2308, 2002, 249–260.
  6. [6] L. Busoniu, B.D. Schutter, and R. Babuska, Decentralized re-inforcement learning control of a robotic manipulator, Control,Automation, Robotics and Vision, 9th International Conf.,2006, 1–6.
  7. [7] H. Benbrahim and J.A. Franklin, Biped dynamic walking usingreinforcement learning, Robotics and Autonomous Systems, 22,1996, 283–302.
  8. [8] G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, andG. Cheng, Learning CPG-based biped locomotion with apolicy gradient method: Application to a humanoid robot,International Journal of Robotic Research, 27, 2008, 213–228.
  9. [9] G.L. Liu, M.K. Habib, K. Watanabe, and K. Izumi, Centralpattern generators based on Matsuoka oscillators for the loco-motion of biped robots, Artificial Life and Robotics, 12, 2008,263–269.
  10. [10] N. Zeitlin, Reinforcement learning methods to enable auto-matic tuning of legged robots, Technical report, University ofCalifornia, Berkeley, 2012.
  11. [11] A. Kralj, R.J. Jaeger, and M. Munih, Analysis of standing upand sitting down in humans: Definitions and normative datapresentation, Journal of Biomechanics, 23, 1990, 1123–1138.
  12. [12] C. Rougier, J. Meunier, A. St-Anaud, and J. Rousseau, Falldetection from human shape and motion history using videosurveillance, 21st International Conf. on Advanced InformationNetworking and Applications Workshops, AINAW ’07, 2007,875–880.
  13. [13] J. Morimoto and K. Doya, Reinforcement learning of dynamicmotor sequence: learning to stand up, Proc. IEEE/RSJ In-ternational Conf. on Intelligent Robots and Systems, 1998,1721–1726.
  14. [14] J.N. Tsitsiklis, On the convergence of optimistic policy iter-ation, The Journal of Machine Learning Research, 3, 2002,59–72.
  15. [15] M. Geist and O. Pietquin, Parametric value function approx-imation: A unified view, 2011 IEEE Symposium on AdaptiveDynamic Programming and Reinforcement Learning (AD-PRL), 9–16.
  16. [16] F.L. Lewis and D. Vrabie, Reinforcement learning and adaptivedynamic programming for feedback control, IEEE Circuits andSystem Magazine, 9, 2009, 32–50.
  17. [17] J.N. Tsitsiklis and B.V. Roy, An analysis of temporal-differencelearning with function approximation, IEEE Transactions onAutomatic Control, 42, 1997, 674–690.
  18. [18] L. Baird, Residual algorithms: Reinforcement learning withfunction approximation, Proc. of the Twelfth InternationalConf. on Machine Learning, 1995, 30–37.
  19. [19] H. Kimura and S. Kobayashi, An analysis of actor/criticalgorithms using eligibility traces: Reinforcement learningwith imperfect value function, in Proceedings of the FifteenthInternational Conference on Machine Learning, 1998, 278–286.
  20. [20] K. Matsuoka, N. Ohyama, A. Watanabe, and M. Ooshima,Control of a giant swing robot using a neural oscillator,Advances in Natural Computation, Lecture Notes in ComputerScience, 3611, Springer, 2005, 274–282.
  21. [21] M. Tokic, Adaptive ε-greedy exploration in reinforcementlearning based on value differences, Proc. of the 33rd AnnualGerman Conf. on Advances in Artificial Intelligence, 2010,203–210.
  22. [22] O. Tutsoy, M. Brown, and H. Wang, Reinforcement learningalgorithm application and multi-body system design by usingMapleSim and Modelica, International Conf. on AdvancedMechatronic Systems (ICAMechS), 2012, 650–655.
  23. [23] Y. Nakamura, T. Mori, M.A. Sato, and S. Ishii, Reinforcementlearning for a biped robot based on a CPG-actor-critic method,Neural Networks, 20, 2007, 723–735.
  24. [24] MapleSim used in the creation of breakthrough vehi-cle driving simulator technology. Maplesoft [online], 2012,http://www.maplesoft.com/company/publications/articles/.
  25. [25] P. Goossens and T. Richard, (2011, Nov.) Using symbolic tech-nology to derive inverse kinematic solutions for actuator con-trol development. Maplesoft, Waterloo, Canada [Online]. Avail-able: http://www.maplesoft.com/Whitepapers/Mathmod2012_pgoossens_trichard_preprint.

Important Links:

Go Back