Memory Concepts for Large Language Models

Activity: Talk or presentationInvited talkscience-to-science

Description

Currently, the most successful Deep Learning architecture for large language models is the transformer. The attention mechanism of the transformer is equivalent to modern Hopfield networks, therefore is an associative memory. However, this associative memory has disadvantages like its quadratic complexity with the sequence length when mutually associating sequences elements, its restriction to pairwise associations, its limitations in modifying the memory, its insufficient abstraction capabilities. The memory grows with growing context. In contrast, recurrent neural networks (RNNs) like LSTMs have linear complexity, associate sequence elements with a representation of all previous elements, can directly modify memory content, and have high abstraction capabilities. The memory is fixed independent of the context. However, RNNs cannot store sequence elements that were rare in the training data, since RNNs have to learn to store. Transformer can store rare or even new sequence elements, which is one of the main reasons besides their high parallelization why they outperformed RNNs in language modelling. I think that future successful Deep Learning architectures should comprise both of these memories: attention for implementing episodic memories and RNNs for implementing short-term memories and abstraction.
Period07 Feb 2024
Event titleDistinguished Lecture Series
Event typeOther
LocationGermanyShow on map

Fields of science

  • 101031 Approximation theory
  • 102 Computer Sciences
  • 305901 Computer-aided diagnosis and therapy
  • 102033 Data mining
  • 102032 Computational intelligence
  • 101029 Mathematical statistics
  • 102013 Human-computer interaction
  • 305905 Medical informatics
  • 101028 Mathematical modelling
  • 101027 Dynamical systems
  • 101004 Biomathematics
  • 101026 Time series analysis
  • 202017 Embedded systems
  • 101024 Probability theory
  • 305907 Medical statistics
  • 102019 Machine learning
  • 202037 Signal processing
  • 102018 Artificial neural networks
  • 103029 Statistical physics
  • 202036 Sensor systems
  • 202035 Robotics
  • 106005 Bioinformatics
  • 106007 Biostatistics
  • 101019 Stochastics
  • 101018 Statistics
  • 101017 Game theory
  • 101016 Optimisation
  • 102001 Artificial intelligence
  • 101015 Operations research
  • 102004 Bioinformatics
  • 101014 Numerical mathematics
  • 102003 Image processing

JKU Focus areas

  • Digital Transformation