xLSTM: Extended Long Short-Term Memory

Activity: Talk or presentationInvited talkscience-to-science

Description

The Long Short-Term Memory (LSTM) has stood the test of time and contributed to numerous Deep Learning success stories, in particular LSTMs constituted the first Large Language Models. However, the advent of the Transformer technology marked the dawn of a new era and they have become the driving force of today's LLMs. We ask: How far do we get when scaling LSTMs to billions of parameters? We enhance the LSTM by exponential gating and by a new matrix memory structure and a covariance update rule. Integrating these LSTM extensions into residual block backbones gives xLSTM blocks that are then stacked in residual xLSTM architectures. xLSTM has a constant memory size and a linear compute complexity in the context length, goes beyond pairwise token interactions, can directly modify memory content, and has high abstraction capabilities. xLSTM performs favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.
Period26 Jul 2024
Event titleICML 2024 Workshop: Next Generation of Sequence Modeling Architectures
Event typeWorkshop
LocationVienna, AustriaShow on map
Degree of RecognitionInternational

Fields of science

  • 102001 Artificial intelligence
  • 102019 Machine learning

JKU Focus areas

  • Digital Transformation