Theoretical Frameworks for Intelligence

Understanding intelligence and the brain requires theories at different levels, including the biophysics of single neurons, algorithms and circuits, overall computations and behavior, and a theory of learning. Advances have been made in many of these areas from multiple perspectives in the past few decades. In fact several major contributors to these advances are members of our team.

This theoretical foundation provides a common framework for fields as diverse as computer science, cognitive science, and neuroscience. Recent successes in intelligent systems applications – from Google to Watson – would not have been possible without these developments. For the first time, we have the beginnings of a unifying and useful mathematics of brains, minds, and machines with rigorous foundations, demonstrated applicability in almost every area of cognitive and neural science, and real practical value for building intelligent systems.

Invariant Representation Learning for Speech Recognition

The recognition of sound categories and speech in the human brain is remarkably robust to signal variations that preserve identity. Apart from any contextual inference imposed through complex language models, the lower-level neural representation of speech sounds might also be important for recognition. The idea of hierarchical representations that handle invariance through successive, feedforward maps, prominent in biologically plausible, computational models for vision is a starting point for developing computational models and forming hypothesis for representations in the auditory cortex.

An invariant representation of speech, in both biological and artificial systems, is crucial for improving the robustness of the acoustic to phonetic mapping, decreasing the sample complexity (i.e., the number of labeled examples) and enhancing the generalization performance of learning in the presence of distribution mismatch due to speech variability. In human brains, learning to associate sounds or words is the result of a few, directed, examples and the unsupervised observation of auditory objects and their transformations (across speakers, modes of speech etc.). A key element of this might be the unsupervised learning of effective data representations, i.e., the mapping of the sensory data in a feature space that is resilient to (lexical or sub-lexical) identity-preserving transformations, such as changes in voices, speakers and acoustic environments.

The project goal is to provide a theoretical and computational framework for speech representation learning while formulating plausibility hypotheses for learning and processing mechanisms in the human auditory cortex. The following research directions span both machine learning and neuroscience:

Algorithms for invariant representation learning in machines: An appropriate representation of the data (encoding, feature map, embedding) aims at facilitating a statistical learning problem. Data-adaptive, as opposed to deterministic, representations can be learned, in a supervised or unsupervised way by imposing criteria on a representation map, for example the preservation of distances or the reconstruction accuracy. We are interested in representations that are invariant to class-preserving transformations and selective to class-specific properties, for learning to separate multiple categories with reduced sample complexity.

Invariant speech representations in brains: We make hypothesis for the feedforward processing in the human auditory cortex, using the visual cortex paradigm (e.g., Hubel-Wiesel cells, hierarchical models, invariance). We are interested in the invariance and selectivity properties of auditory representations, the form of auditory receptive fields (STRFs) in hierarchical organizations, the acoustic-to-phonetic mapping of speech sounds in the human brain, the types of associations for learning sound categories and the levels and parts of the representation of “auditory objects”.

Speech recognition from a few labeled examples: A case study involves systems for speech recognition (words or phonemes) that require less resources for achieving high accuracy in classification tasks. We consider several aspects of the underlying speech representations, such as signal scales, input domain representations, network structures (shallow vs. multilayer/hierarchical), and ways of sampling for templates and transformations (explicit or learned discriminatively/unsupervised).

The Center for Brains, Minds & Machines

Theoretical Frameworks for Intelligence

Invariant Representation Learning for Speech Recognition

Associated Research Thrust(s):

Tomaso Poggio

Lorenzo Rosasco

Georgios Evangelopoulos

Fabio Anselmi

Chiyuan Zhang

Stephen Voinea

Search form

You are here

Theoretical Frameworks for Intelligence

Associated Research Thrust(s):