TY - UNPB
T1 - History Compression via Language Models in Reinforcement Learning
AU - Paischer, Fabian
AU - Adler, Thomas
AU - Patil, Vihang
AU - Bitto-Nemling, Angela
AU - Holzleitner, Markus
AU - Lehner, Sebastian
AU - Eghbal-Zadeh, Hamid
AU - Hochreiter, Sepp
PY - 2022
Y1 - 2022
N2 - In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with pretrained token embeddings. To form these associations, a modern Hopfield network stores these token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results.
AB - In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with pretrained token embeddings. To form these associations, a modern Hopfield network stores these token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results.
U2 - 10.48550/arXiv.2205.12258
DO - 10.48550/arXiv.2205.12258
M3 - Preprint
T3 - arXiv.org
BT - History Compression via Language Models in Reinforcement Learning
ER -