A Two Time-Scale Update Rule Ensuring Convergence of Episodic Reinforcement Learning Algorithms at the Example of RUDDER

Markus Holzleitner, Jose Arjona Medina, Marius-Constantin Dinu, Andreu Vall, Lukas Gruber, Sepp Hochreiter

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

We prove under commonly used assumptions the convergence of episodic, that is, sequence-based, actor-critic-like reinforcement algorithms for which the policy becomes more greedy during learning. The most prominent example of such algorithms is the recently introduced RUDDER method, which speeds up the learning of delayed reward problems by reward redistribution. RUDDER is based on simultaneously learning a policy and a reward redistribution network similar to actor-critic methods. We show the convergence of RUDDER which can be generalized to similar actor-critic-like algorithms. In contrast to previous convergence proofs for actor-critic-like methods, we consider whole episodes as learning examples, undiscounted reward, and a policy that becomes more greedy during learning. We employ recent techniques from two time-scale stochastic approximation theory which are equipped with a controlled Markov process to account for the policy getting more greedy. We expect our framework to be useful to prove convergence of other algorithms based on reward shaping or on attention mechanisms.
Original languageEnglish
Title of host publicationNeural Information Processing Systems Foundation (NeurIPS 2019), 2019
Number of pages8
Publication statusPublished - 2019

Fields of science

  • 305907 Medical statistics
  • 202017 Embedded systems
  • 202036 Sensor systems
  • 101004 Biomathematics
  • 101014 Numerical mathematics
  • 101015 Operations research
  • 101016 Optimisation
  • 101017 Game theory
  • 101018 Statistics
  • 101019 Stochastics
  • 101024 Probability theory
  • 101026 Time series analysis
  • 101027 Dynamical systems
  • 101028 Mathematical modelling
  • 101029 Mathematical statistics
  • 101031 Approximation theory
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 102018 Artificial neural networks
  • 102019 Machine learning
  • 102032 Computational intelligence
  • 102033 Data mining
  • 305901 Computer-aided diagnosis and therapy
  • 305905 Medical informatics
  • 202035 Robotics
  • 202037 Signal processing
  • 103029 Statistical physics
  • 106005 Bioinformatics
  • 106007 Biostatistics

JKU Focus areas

  • Digital Transformation

Cite this