Policy Improvement via Planning in Maximum Entropy Reinforcement Learning
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
This thesis examines maximum entropy reinforcement learning, an alternative formulation of the traditional reinforcement learning paradigm. Maximum entropy policies prioritize actions leading to states where an agent has more choice between high-value future trajectories. Here we propose two novel algorithms rooted in this framework that use planning-based approaches to improve policy learning. The first model-based algorithm achieves improved value function estimates via Monte Carlo Tree Search. The second algorithm uses a model-free heuristic-based approach to improve deep exploration in challenging environments. Finally, we present a preliminary analysis comparing optimal maximum entropy policies to optimal policies under the traditional reinforcement learning objective.Type
textElectronic Thesis
Degree Name
M.S.Degree Level
mastersDegree Program
Graduate CollegeComputer Science