Extending the Temporal Range of Hierarchical Deep Reinforcement Learning

Open Access
De La Fuente Duran, Andres
Area of Honors:
Computer Science
Bachelor of Science
Document Type:
Thesis Supervisors:
  • Vasant Gajanan Honavar, Thesis Supervisor
  • John Joseph Hannan, Honors Advisor
  • machine learning
  • reinforcement learning
  • deep neural networks
  • temporal abstraction
  • hierarchical learning
  • artificial intelligence
In Reinforcement Learning, a long-standing problem is that of extending the temporal range of an agent, or in other words the capability to learn effective behaviors in environments where positive feedback has a long delay. Conceptually, this is analogous to the notion of longterm strategy or planning. In this paper I propose an approach for improving temporal range which is built on a few recent developments. One such development is the successful creation of a Neural Network model for use in Reinforcement Learning, called the Deep Q Network. Despite the success of this model in achieving cutting edge performance, it still faces the problem of long-term delayed rewards. Recent work has proposed a hierarchical organization of Deep Q Networks in order to achieve a certain level of temporal abstraction. This has shown to be an improvement over a single Deep Q Network in the context of long-term delayed rewards. However, it still relies on the Deep Q Network, which has a weakness: its updates (which are when the weights of the model are altered as a response to new observations) are only based on limited information. More specifically, its architecture does not have a built-in persistence of information. Consider trying to read this paragraph while only being able to remember a few random words you have read so far. Recurrent Neural Networks are designed to have long term information persistence as a result of their structure. Here I propose a model which incorporates a Recurrent Neural Network called ‘Long Short Term Memory’ into the Hierarchical Deep Q Network approach with the goal of improving temporal range.