How Deep Sparse Networks Avoid the Curse of Dimensionality: Efficiently Computable Functions are Compositionally Sparse

TitleHow Deep Sparse Networks Avoid the Curse of Dimensionality: Efficiently Computable Functions are Compositionally Sparse
Publication TypeCBMM Memos
Year of Publication2022
AuthorsPoggio, TA
Date Published10/2022
Abstract

The main claim of this perspective is that compositional sparsity of the target function, which corresponds

to the task to be learned, is the key principle underlying machine learning. Mhaskar and

Poggio (2016) proved that sparsity of the compositional target functions (when the functions are on

the reals, the constituent functions are also required to be smooth), naturally leads to sparse deep

networks for approximation and thus for optimization. This is the case of most CNNs in current use,

in which the known sparse graph of the target function is reflected in the sparse connectivity of the

network. When the graph of the target function is unknow, I conjecture that transformers are able to

implement a flexible version of sparsity (selecting which input tokens interact in the MLP layer), through

the self-attention layers.

Surprisingly, the assumption of compositional sparsity of the target function is not restrictive in practice,

since I prove here that for computable functions (if on the reals with Lipschitz continuous derivatives)

compositional sparsity is equivalent to efficient computability, that is Turing computability in polynomial

time.

DSpace@MIT

https://hdl.handle.net/1721.1/145776

CBMM Memo No:  138

Research Area: 

CBMM Relationship: 

  • CBMM Funded