Create New Account
Login
Search or Buy Articles
Browse Journals
Browse Proceedings
Submit your Paper
Submission Information
Journal Review
Recommend to Your Library
Call for Papers
RESEARCH PROGRESS ABOUT DEEP REINFORCEMENT LEARNING, 210-217. SI
Liu Liu and Lin-hui Chen
References
[1] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness,M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland,G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou,H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis,Human-level control through deep reinforcement learning,Nature, 518(7540), 2015, 529–533.
[2] L. Sergey, F. Chelsea, D. Trevor, and P. Abbeel, End-to-end training of deep visuomotor policies, Journal of MachineLearning Research, 17(39), 2016, 1–40.
[3] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre,G. Van Den Driessche, J. Schrittwieser, I. Antonoglou,V. Panneershelvam, M. Lanctot, and S. Dieleman, Masteringthe game of Go with deep neural networks and tree search,Nature, 529(7587), 2016, 484–489.
[4] C. David, Using machine learning in communication networks[Invited], IEEE/OSA Journal of Optical Communications andNetworking, 10(10), 2018, 100–109.
[5] Y. Kok-Lim Alvin, K. Peter, and T.D. Paul, Reinforcementlearning for context awareness and intelligence in wirelessnetworks: Review, new features and open issues, Journal ofNetwork and Computer Applications, 35(1), 2012, 253–267.
[6] A. Kai, D. Marc Peter, B. Miles, and A.A. Bharath, Deepreinforcement learning: A brief survey, IEEE Signal ProcessingMagazine, 34(6), 2017, 26–38.
[7] Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature,521, 2015, 436–444.
[8] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learningrepresentations by back-propagating errors, Nature, 323, 1986,533–536.
[9] H.E. Geoffrey, O. Simon, and T. Yee-Whye, A fast learningalgorithm for deep belief nets, Neural Computation, 18(7),2006, 1527–1554.
[10] G.E. Hinton and R.R. Salakhutdinov, Reducing the dimension-ality of data with neural networks, Science, 313(5786), 2006,504–507.
[11] F. Zubair Md., T. Fengxiao, M. Bomin, N. Kato, O. Akashi,T. Inoue, and K. Mizutani, State-of-the-art deep learning:evolving machine intelligence toward tomorrow’s intelligentnetwork traffic control systems, IEEE Communications Surveys& Tutorials, 19(4), 2017, 2432–2455.
[12] R.S. Sutton and A.G. Barto, Reinforcement learning: Anintroduction, (Cambridge: Cambridge Univ. Press, 1998), 1–13.215
[13] M. Hamidreza, R. Isura, L.L. Frank, and D.O. Popa, Optimizedassistive human-robot interaction using reinforcement learning,IEEE Transactions on Cybernetics, 46(3), 2016, 655–667.
[14] H.-T.L. Chiang, J. Hsu, M. Fiser, L. Tapia, and A. Faust,RL-RRT: Kinodynamic motion planning via learning reachabil-ity estimators from RL policies, IEEE Robotics and AutomationLetters, 4(4), 2019, 4298–4305.
[15] N. Sadeghianpourhamami, J. Deleu, and C. Develder, Definitionand evaluation of model-free coordination of electrical vehiclecharging with reinforcement learning, IEEE Transactions onSmart Grid, 11(1), 2020, 203–214.
[16] L.P. Kaelbling, M.L. Littman, and A.R. Cassandra, Planningand acting in partially observable stochastic domains, ArtificialIntelligence, 101(1/2), 1998, 99–134.
[17] B. Jang, M. Kim, G. Harerimana, and J.W. Kim, Q-learningalgorithms: A comprehensive classification and applications,IEEE Access, 7, 2019, 133653–133667.
[18] Y. Wang, T.-H.S. Li, and C. Lin, Backward Q-learning: Thecombination of Sarsa algorithm and Q-learning, EngineeringApplications of Artificial Intelligence, 26(9), 2013, 2184–2193.
[19] S. Parisi, V. Tangkaratt, J. Peters, and M.E. Khan,TD-regularized actor-critic methods, Machine Learning,108(8–9), 2019, 1467–1501.
[20] H. Liu, Y. Wu, and F. Sun, Extreme trust region policyoptimization for active object recognition, IEEE Transactionson Neural Networks and Learning Systems, 29(6), 2018,2253–2258.
[21] N. Vanvuchelen, J. Gijsbrechts, and R. Boute, Use of proximalpolicy optimization for the joint replenishment problem,Computers in Industry, 119, 2020, 103239.
[22] V. Mnih, K. Kavukcuoglu, D. Silver, G. Alex, A. Ioannis,W. Daan, and R. Martin, Playing Atari with deep reinforcementlearning, Proc. of the Workshops at the 26th Neural InformationProcessing Systems, New York: ACM, 2013, 201–220.
[23] G. Konidaris, S. Osentoski, and P. Thomas, Value functionapproximation in reinforcement learning using the Fourierbasis, Proc. of 2011 AAAI Conf. on Artificial Intelligence,Palo Alto, USA: AAAI Press, 2011, 1–17.
[24] M.E. Connell, E. Connell, and P.E. Utgoff, Learning to controla dynamic physical system, Computational Intelligence, 3(1),1987, 330–337.
[25] C.G. Atkeson, A.W. Moore, and S. Schaal, Locally weightedlearning for control, (Berlin, Germany: Springer, 1997).
[26] M. Yogeswaran and S.G. Ponnambalam, Reinforcementlearning: Exploration-exploitation dilemma in multi-agentforaging task, Opsearch, 49(3), 2012, 223–236.
[27] L.J. Lin, Self-improving reactive agents based on reinforcementlearning, planning and teaching, Machine Langnage, 8(3/4),1992, 293–321.
[28] H. van Hasselt, A. Guez, and D. Silver, Deep reinforcementlearning with double Q-learning. Proc. Thirtieth AAAI Conf.on Artificial Intelligence, New York: ACM, 2016, 2094–2100.
[29] J. Liu, F. Gao, and X. Luo, Survey of deep reinforcementlearning based on value function and policy gradient, ChineseJournal of Computers, 42(6), 2019, 1406–1438.
[30] T.-W. Ban, an autonomous transmission scheme using duelingDQN for D2D communication networks, IEEE Transactionson Vehicular Technology, 69(12), 2020, 16348–16352.
[31] L. Huang, H. Fu, A. Rao, A.A. Irissappane, J. Zhang, andC. Xu, A distributional perspective on multiagent cooperationwith deep reinforcement learning, IEEE Transactions on NeuralNetworks and Learning Systems, 2022, 36121959.
[32] P. Jan and S. Stefan, Natural actor-critic, Neurocomputing,71(7–9), 2008, 1180–1190.
[33] Q. Wei, L. Wang, Y. Liu, and M.M. Polycarpou, Optimalelevator group control via deep asynchronous actor-criticlearning, IEEE Transactions on Neural Networks and LearningSystems, 31(12), 2021, 5245–5256.
[34] N. Tasfi and M.A.M. Capretz, Noisy importance samplingactorcritic: An off-policy actor-critic with experience replay,Proc. 2020 International Joint Conf. on Neural Networks,Piscataway: IEEE, 2020, 1–8.
[35] H. Johannes, L. Marc, and S. David, Fictitious self-play inextensive-form games, Proc. of International Conf. on MachineLearning Research, 37(37), 2015, 805–813.
[36] K. Li, B. Jiu, W. Pu, H. Liu, and X. Peng, Neural fictitiousself-play for radar antijamming dynamic game with imperfectinformation, IEEE Transactions on Aerospace and ElectronicSystems, 58(6), 2022, 5533–5547.
[37] H. Cuayahuitl, S. Keizer, and O. Lemon, Strategic dia-logue management via deep reinforcement learning, 2015,arXiv:1511.08099.
[38] Z. Liu, Q. Liu, L. Tang, K. Jin, H. Wang, M. Liu, and H. Wang,Visuomotor reinforcement learning for multirobot cooperativenavigation, IEEE Transactions on Automation Science andEngineering, 19(4), 2021, 3234–3245.
[39] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen,Learning hand-eye coordination for robotic grasping with deeplearning and large-scale data collection, International Journalof Robotics Research, 37(4–5), 2018, 421–436.
[40] I. Lenz, R.A. Knepper, and A. Saxena, DeepMPC: Learningdeep latent features for model predictive control, Proc. of theRobotics Science and Systems, Rome, Italy, 2015, 201–209.
[41] D.B. Tim, K. Jens, and T. Karl, Deep reinforcement learning forrobotic manipulation, Proc. IEEE/RSJ International Conf. onIntelligent Robots and Systems (IROS 2016), 2016, 3947–3952.
[42] S. B. Remman, I. Str¨umke, and A. M. Lekkas, Causalversus marginal shapley values for robotic lever manipulationcontrolled using deep reinforcement learning, Proc. AmericanControl Conf. (ACC), 2022, 2683–2690.
[43] A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, and S. Levine,Collective robot reinforcement learning with distributedasynchronous guided policy search, Proc. IEEE InternationalConf. on Intelligent Robots and Systems, 2017, 79–86.
[44] W. Liu, J. Zhong, R. Wu, B. L. Fylstra, J. Si, andH. H. Huang, Inferring human-robot performance objectivesduring locomotion using inverse reinforcement learning andinverse optimal control, IEEE Robotics and Automation Letters,7(2), 2022, 2549–2556.
[45] N. Bredeche and N. Fontbonne, Social learning in swarmrobotics, Philosophical Transactions of the Royal SocietyB-Biological Sciences, 377(1843), 2020, 20200309.
[46] Q. Fang, X. Xu, X. Wang, and Y Zeng, Target-drivenvisual navigation in indoor scenes using reinforcement learningand imitation learning, CAAI Transactions on IntelligenceTechnology, 7(2), 2022, 167–176.
[47] Y. Lyu, Y. Shi, and X. Zhang, improving target-driven visualnavigation with attention on 3D spatial relationships, NeuralProcessing Letters, 54(5), 2022, 3979–3998.
[48] T. Dong, F. Xue, C. Xiao, and J. Li, Task scheduling basedon deep reinforcement learning in a cloud manufacturingenvironment, Concurrency and Computation-Practice &Experience, 32(11), 2020, e5654.
[49] Y. Shang, J. Li, M. Qin, and Q. Yang, Deep reinforce-ment learning-based task scheduling in heterogeneous MECnetworks, Proc. IEEE 95th Vehicular Technology Conf.:(VTC-Spring), Helsinki, Finland, 2022, 1–6.
Important Links:
Abstract
DOI:
10.2316/J.2023.201-0371
From Journal
(201) Mechatronic Systems and Control - 2023
Go Back