How Deep Sparse Networks Avoid the Curse of Dimensionality: Efficiently Computable Functions are Compositionally Sparse

TitleHow Deep Sparse Networks Avoid the Curse of Dimensionality: Efficiently Computable Functions are Compositionally Sparse
Publication TypeCBMM Memos
Year of Publication2022
AuthorsPoggio, TA
Date Published10/2022
Abstract

The main claim of this perspective is that compositional sparsity of the target function, which corresponds  to the task to be learned, is the key principle underlying machine learning. Mhaskar and  Poggio (2016) proved that sparsity of the compositional target functions (when the functions are on  the reals, the constituent functions are also required to be smooth), naturally leads to sparse deep  networks for approximation and thus for optimization. This is the case of most CNNs in current use,  in which the known sparse graph of the target function is reflected in the sparse connectivity of the  network. When the graph of the target function is unknown, I conjecture that transformers are able to  implement a flexible version of sparsity (selecting which input tokens interact in the MLP layer), through  the self-attention layers.  Surprisingly, the assumption of compositional sparsity of the target function is not restrictive in practice,  since I prove here that for computable functions (if on the reals with Lipschitz continuous derivatives)  compositional sparsity is equivalent to efficient computability, that is Turing computability in polynomial  time.

DSpace@MIT

https://hdl.handle.net/1721.1/145776

CBMM Memo No:  138

Research Area: 

CBMM Relationship: 

  • CBMM Funded