Create New Account
Login
Search or Buy Articles
Browse Journals
Browse Proceedings
Submit your Paper
Submission Information
Journal Review
Recommend to Your Library
Call for Papers
CPG BASED RL ALGORITHM LEARNS TO CONTROL OF A HUMANOID ROBOT LEG
Önder Tutsoy
References
[1] A.G. Barto, R.S. Sutton, and C.W. Anderson, Neuronlikeadaptive elements that can solve difficult learning controlproblems, Artificial Neural Networks, 13, 1983, 81–93.
[2] R.S. Sutton and A.G. Barto, Reinforcement learning: Anintroduction (Cambridge: The MIT Press, 1998), 1–300.
[3] R.S. Sutton, Learning to predict by the methods of temporaldifferences, Machine Learning, 3, 1998, 9–44.
[4] D. Kenji, Reinforcement learning in continuous time and space,Neural Computation, 12, 2000, 219–245.
[5] M.R. Lagoudakis, R. Parr, and M. Littman, Least-squaresmethods in reinforcement learning for control, Methods andApplications of Artificial Intelligence, 2308, 2002, 249–260.
[6] L. Busoniu, B.D. Schutter, and R. Babuska, Decentralized re-inforcement learning control of a robotic manipulator, Control,Automation, Robotics and Vision, 9th International Conf.,2006, 1–6.
[7] H. Benbrahim and J.A. Franklin, Biped dynamic walking usingreinforcement learning, Robotics and Autonomous Systems, 22,1996, 283–302.
[8] G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, andG. Cheng, Learning CPG-based biped locomotion with apolicy gradient method: Application to a humanoid robot,International Journal of Robotic Research, 27, 2008, 213–228.
[9] G.L. Liu, M.K. Habib, K. Watanabe, and K. Izumi, Centralpattern generators based on Matsuoka oscillators for the loco-motion of biped robots, Artificial Life and Robotics, 12, 2008,263–269.
[10] N. Zeitlin, Reinforcement learning methods to enable auto-matic tuning of legged robots, Technical report, University ofCalifornia, Berkeley, 2012.
[11] A. Kralj, R.J. Jaeger, and M. Munih, Analysis of standing upand sitting down in humans: Definitions and normative datapresentation, Journal of Biomechanics, 23, 1990, 1123–1138.
[12] C. Rougier, J. Meunier, A. St-Anaud, and J. Rousseau, Falldetection from human shape and motion history using videosurveillance, 21st International Conf. on Advanced InformationNetworking and Applications Workshops, AINAW ’07, 2007,875–880.
[13] J. Morimoto and K. Doya, Reinforcement learning of dynamicmotor sequence: learning to stand up, Proc. IEEE/RSJ In-ternational Conf. on Intelligent Robots and Systems, 1998,1721–1726.
[14] J.N. Tsitsiklis, On the convergence of optimistic policy iter-ation, The Journal of Machine Learning Research, 3, 2002,59–72.
[15] M. Geist and O. Pietquin, Parametric value function approx-imation: A unified view, 2011 IEEE Symposium on AdaptiveDynamic Programming and Reinforcement Learning (AD-PRL), 9–16.
[16] F.L. Lewis and D. Vrabie, Reinforcement learning and adaptivedynamic programming for feedback control, IEEE Circuits andSystem Magazine, 9, 2009, 32–50.
[17] J.N. Tsitsiklis and B.V. Roy, An analysis of temporal-differencelearning with function approximation, IEEE Transactions onAutomatic Control, 42, 1997, 674–690.
[18] L. Baird, Residual algorithms: Reinforcement learning withfunction approximation, Proc. of the Twelfth InternationalConf. on Machine Learning, 1995, 30–37.
[19] H. Kimura and S. Kobayashi, An analysis of actor/criticalgorithms using eligibility traces: Reinforcement learningwith imperfect value function, in Proceedings of the FifteenthInternational Conference on Machine Learning, 1998, 278–286.
[20] K. Matsuoka, N. Ohyama, A. Watanabe, and M. Ooshima,Control of a giant swing robot using a neural oscillator,Advances in Natural Computation, Lecture Notes in ComputerScience, 3611, Springer, 2005, 274–282.
[21] M. Tokic, Adaptive ε-greedy exploration in reinforcementlearning based on value differences, Proc. of the 33rd AnnualGerman Conf. on Advances in Artificial Intelligence, 2010,203–210.
[22] O. Tutsoy, M. Brown, and H. Wang, Reinforcement learningalgorithm application and multi-body system design by usingMapleSim and Modelica, International Conf. on AdvancedMechatronic Systems (ICAMechS), 2012, 650–655.
[23] Y. Nakamura, T. Mori, M.A. Sato, and S. Ishii, Reinforcementlearning for a biped robot based on a CPG-actor-critic method,Neural Networks, 20, 2007, 723–735.
[24] MapleSim used in the creation of breakthrough vehi-cle driving simulator technology. Maplesoft [online], 2012,http://www.maplesoft.com/company/publications/articles/.
[25] P. Goossens and T. Richard, (2011, Nov.) Using symbolic tech-nology to derive inverse kinematic solutions for actuator con-trol development. Maplesoft, Waterloo, Canada [Online]. Avail-able: http://www.maplesoft.com/Whitepapers/Mathmod2012_pgoossens_trichard_preprint.
Important Links:
Abstract
DOI:
10.2316/Journal.206.2015.2.206-4185
From Journal
(206) International Journal of Robotics and Automation - 2015
Go Back