Abstract
We prove under commonly used assumptions the convergence of episodic, that is, sequence-based, actor-critic-like reinforcement algorithms for which the policy becomes more greedy during learning. The most prominent example of such algorithms is the recently introduced RUDDER method, which speeds up the learning of delayed reward problems by reward redistribution. RUDDER is based on simultaneously learning a policy and a reward redistribution network similar to actor-critic methods. We show the convergence of RUDDER which can be generalized to similar actor-critic-like algorithms. In contrast to previous convergence proofs for actor-critic-like methods, we consider whole episodes as learning examples, undiscounted reward, and a policy that becomes more greedy during learning. We employ recent techniques from two time-scale stochastic approximation theory which are equipped with a controlled Markov process to account for the policy getting more greedy. We expect our framework to be useful to prove convergence of other algorithms based on reward shaping or on attention mechanisms.
Original language | English |
---|---|
Title of host publication | Neural Information Processing Systems Foundation (NeurIPS 2019), 2019 |
Number of pages | 8 |
Publication status | Published - 2019 |
Fields of science
- 305907 Medical statistics
- 202017 Embedded systems
- 202036 Sensor systems
- 101004 Biomathematics
- 101014 Numerical mathematics
- 101015 Operations research
- 101016 Optimisation
- 101017 Game theory
- 101018 Statistics
- 101019 Stochastics
- 101024 Probability theory
- 101026 Time series analysis
- 101027 Dynamical systems
- 101028 Mathematical modelling
- 101029 Mathematical statistics
- 101031 Approximation theory
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102004 Bioinformatics
- 102013 Human-computer interaction
- 102018 Artificial neural networks
- 102019 Machine learning
- 102032 Computational intelligence
- 102033 Data mining
- 305901 Computer-aided diagnosis and therapy
- 305905 Medical informatics
- 202035 Robotics
- 202037 Signal processing
- 103029 Statistical physics
- 106005 Bioinformatics
- 106007 Biostatistics
JKU Focus areas
- Digital Transformation