Attention Approximates Sparse Distributed Memory

Attention Approximates Sparse Distributed Memory

Date Posted:  October 20, 2021
Date Recorded:  October 19, 2021
Speaker(s):  Trenton Bricken, Harvard University
  • All Captioned Videos
  • CBMM Research
Description: 

Abstract: While Attention has come to be an important mechanism in deep learning, it emerged out of a heuristic process of trial and error, providing limited intuition for why it works so well. Here, we show that Transformer Attention closely approximates Sparse Distributed Memory (SDM), a biologically plausible associative memory model, under certain data conditions. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.

Associated Research Module: