Tao Yan, Wenan Zhang, Simon X. Yang, and Li Yu
Reinforcement learning, maximum entropy, robotic manipulation,hindsight experience replay
The key challenges in applying reinforcement learning (RL) to complex robotic control tasks are the fragile convergence property, very high sample complexity and the need to shape a reward function. In this work, we present a soft actor-critic (SAC) style algorithm, an off-policy actor-critic RL method based on the maximum entropy RL framework, where the objective of the actor is to maximize the expected reward while also maximizing the entropy. This effectively improves the stability of the performance of algorithm and the robustness to modelling and estimate error. Moreover, we combine SAC with a new transition replay scheme called hindsight experience replay so as to make policy learning more efficient from sparse rewards. Finally, the effectiveness of the proposed method is verified on a range of manipulation tasks in simulated environment.
Important Links:
Go Back