Attention Approximates Sparse Distributed Memory

Yes

Date Posted: October 20, 2021

Date Recorded: October 19, 2021

Speaker(s): Trenton Bricken, Harvard University

All Captioned Videos
CBMM Research

Associated CBMM Pages:

Research Meeting: Module 2 research presentation by Trenton Bricken and Will Xiao

Description:

Abstract: While Attention has come to be an important mechanism in deep learning, it emerged out of a heuristic process of trial and error, providing limited intuition for why it works so well. Here, we show that Transformer Attention closely approximates Sparse Distributed Memory (SDM), a biologically plausible associative memory model, under certain data conditions. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.

Associated Research Module:

Module 2: Memory and Executive Function

Search form

You are here

Video

Yes

Attention Approximates Sparse Distributed Memory

Associated Research Module: