Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

xLSTM: LSTM is so back

Aktivität: Vortrag oder PräsentationEingeladener VortragScience-to-science

Beschreibung

Long Short-Term Memory (LSTM) networks have withstood the test of time, forming the foundation of many early deep learning breakthroughs—including the first generation of Large Language Models (LLMs). However, the rise of Transformers has since overshadowed LSTMs, establishing them as the dominant architecture for LLMs.
We revisited the potential of LSTMs and ask: Can LSTMs be scaled up to compete with Transformers?
We introduce xLSTM, a significantly enhanced version of LSTM featuring exponential gating and a novel matrix memory with covariance-based updates. Our kernel implementations of xLSTM adheres to scaling laws and demonstrates faster training than Transformers. More crucially, xLSTM has linear-time inference in the number of produced tokens, in stark contrast to the quadratic complexity of attention mechanisms, making it highly efficient for deployment.
A 7-billion parameter xLSTM model achieves comparable performance to state-of-the-art Transformer models, while offering significantly faster inference. We are currently developing distilled xLSTM variants from large Transformer models with accelerated inference. Additionally, xLSTM time-series foundation models are constructed, which already outperform leading approaches such as Chronos (Amazon), TimesFM (Google), and Moirai (Salesforce).
xLSTM is already seeing real-world adoption: companies like Spleenlab and Festo have successfully integrated it into commercial products.
Zeitraum03 Juli 2025
EreignistitelInternational Joint Conference on Neural Networks
VeranstaltungstypKonferenz
OrtRom, ItalienAuf Karte anzeigen
BekanntheitsgradInternational

Wissenschaftszweige

  • 101019 Stochastik
  • 102003 Bildverarbeitung
  • 103029 Statistische Physik
  • 101018 Statistik
  • 101017 Spieltheorie
  • 102001 Artificial Intelligence
  • 202017 Embedded Systems
  • 101016 Optimierung
  • 101015 Operations Research
  • 101014 Numerische Mathematik
  • 101029 Mathematische Statistik
  • 101028 Mathematische Modellierung
  • 101026 Zeitreihenanalyse
  • 101024 Wahrscheinlichkeitstheorie
  • 102032 Computational Intelligence
  • 102004 Bioinformatik
  • 102013 Human-Computer Interaction
  • 101027 Dynamische Systeme
  • 305907 Medizinische Statistik
  • 101004 Biomathematik
  • 305905 Medizinische Informatik
  • 101031 Approximationstheorie
  • 102033 Data Mining
  • 102 Informatik
  • 305901 Computerunterstützte Diagnose und Therapie
  • 102019 Machine Learning
  • 106007 Biostatistik
  • 102018 Künstliche Neuronale Netze
  • 106005 Bioinformatik
  • 202037 Signalverarbeitung
  • 202036 Sensorik
  • 202035 Robotik

JKU-Schwerpunkte

  • Digital Transformation