%0 Conference Paper
%B workshop on "AI for Social Good", NIPS 2018
%D 2018
%T The Language of Fake News: Opening the Black-Box of Deep Learning Based Detectors
%A Nicole O'Brien
%A Sophia Latessa
%A Georgios Evangelopoulos
%A Xavier Boix
%X <p>The digital information age has generated new outlets for content creators to publish so-called “fake news”, a new form of propaganda that is intentionally designed to mislead the reader. With the widespread effects of the fast dissemination of fake news, efforts have been made to automate the process of fake news detection. A promising solution that has come up recently is to use machine learning to detect patterns in the news sources and articles, specifically deep neural networks, which have been successful in natural language processing. However, deep networks come with lack of transparency in the decision-making process, i.e. the “black-box problem”, which obscures its reliability. In this paper, we open this “black-box” and we show that the emergent representations from deep neural networks capture subtle but consistent differences in the language of fake and real news: signatures of exaggeration and other forms of rhetoric. Unlike previous work, we test the transferability of the learning process to novel news topics. Our results demonstrate the generalization capabilities of deep learning to detect fake news in novel subjects only from language patterns.</p>
%B workshop on "AI for Social Good", NIPS 2018
%C Montreal, Canada
%8 11/2018
%G eng
%U http://hdl.handle.net/1721.1/120056

%0 Generic
%D 2017
%T Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets
%A Andrea Tacchetti
%A Stephen Voinea
%A Georgios Evangelopoulos
%X <p>The complexity of a learning task is increased by transformations in the input space that preserve class identity. Visual object recognition for example is affected by changes in viewpoint, scale, illumination or planar transformations. While drastically altering the visual appearance, these changes are orthogonal to recognition and should not be reflected in the representation or feature encoding used for learning. We introduce a framework for weakly supervised learning of image embeddings that are robust to transformations and selective to the class distribution, using sets of transforming examples (orbit sets), deep parametrizations and a novel orbit-based loss. The proposed loss combines a discriminative, contrastive part for orbits with a reconstruction error that learns to rectify orbit transformations. The learned embeddings are evaluated in distance metric-based tasks, such as one-shot classification under geometric transformations, as well as face verification and retrieval under more realistic visual variability. Our results suggest that orbit sets, suitably computed or observed, can be used for efficient, weakly-supervised learning of semantically relevant image embeddings.</p>
%8 03/2017
%1 <p><a href="https://arxiv.org/abs/1703.04775v1">arXiv:1703.04775v1</a></p>
%2 <p><a href="http://hdl.handle.net/1721.1/107446">http://hdl.handle.net/1721.1/107446</a></p>

%0 Conference Paper
%B AAAI Spring Symposium Series, Science of Intelligence
%D 2017
%T Representation Learning from Orbit Sets for One-shot Classification
%A Andrea Tacchetti
%A Stephen Voinea
%A Georgios Evangelopoulos
%A Tomaso Poggio
%X <p>The sample complexity of a learning task is increased by transformations that do not change class identity. Visual object recognition for example, i.e. the discrimination or categorization of distinct semantic classes, is affected by changes in viewpoint, scale, illumination or planar transformations. We introduce a weakly-supervised framework for learning robust and selective representations from sets of transforming examples (orbit sets). We train deep encoders that explicitly account for the equivalence up to transformations of orbit sets and show that the resulting encodings contract the intra-orbit distance and preserve identity either by preserving reconstruction or by increasing the inter-orbit distance. We explore a loss function that combines a discriminative term, and a reconstruction term that uses a decoder-encoder map to learn to rectify transformation-perturbed examples, and demonstrate the validity of the resulting embeddings for one-shot learning. Our results suggest that a suitable definition of orbit sets is a form of weak supervision that can be exploited to learn semantically relevant embeddings.</p>
%B AAAI Spring Symposium Series, Science of Intelligence
%C AAAI
%G eng
%U https://www.aaai.org/ocs/index.php/SSS/SSS17/paper/view/15357

%0 Generic
%D 2017
%T Symmetry Regularization
%A F. Anselmi
%A Georgios Evangelopoulos
%A Lorenzo Rosasco
%A Tomaso Poggio
%X <p>The properties of a representation, such as smoothness, adaptability, generality, equivari- ance/invariance, depend on restrictions imposed during learning. In this paper, we propose using data symmetries, in the sense of equivalences under transformations, as a means for learning symmetry- adapted representations, i.e., representations that are equivariant to transformations in the original space. We provide a sufficient condition to enforce the representation, for example the weights of a neural network layer or the atoms of a dictionary, to have a group structure and specifically the group structure in an unlabeled training set. By reducing the analysis of generic group symmetries to per- mutation symmetries, we devise an analytic expression for a regularization scheme and a permutation invariant metric on the representation space. Our work provides a proof of concept on why and how to learn equivariant representations, without explicit knowledge of the underlying symmetries in the data.</p>
%8 05/2017
%2 <p>http://hdl.handle.net/1721.1/109391</p>

%0 Conference Paper
%B INTERSPEECH-2015
%D 2015
%T Discriminative Template Learning in Group-Convolutional Networks for Invariant Speech Representations
%A Chiyuan Zhang
%A Stephen Voinea
%A Georgios Evangelopoulos
%A Lorenzo Rosasco
%A Tomaso Poggio
%B INTERSPEECH-2015
%I International Speech Communication Association (ISCA)
%C Dresden, Germany
%8 09/2015
%G eng
%U http://www.isca-speech.org/archive/interspeech_2015/i15_3229.html

%0 Generic
%D 2014
%T A Deep Representation for Invariance And Music Classification
%A Chiyuan Zhang
%A Georgios Evangelopoulos
%A Stephen Voinea
%A Lorenzo Rosasco
%A Tomaso Poggio
%K Audio Representation
%K Hierarchy
%K Invariance
%K Machine Learning
%K Theories for Intelligence
%X <p>Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose the use of such computational modules for extracting invariant and discriminative audio representations. Building on a theory of invariance in hierarchical architectures, we propose a novel, mid-level representation for acoustical signals, using the empirical distributions of projections on a set of templates and their transformations. Under the assumption that, by construction, this dictionary of templates is composed from similar classes, and samples the orbit of variance-inducing signal transformations (such as shift and scale), the resulting signature is theoretically guaranteed to be unique, invariant to transformations and stable to deformations. Modules of projection and pooling can then constitute layers of deep networks, for learning composite representations. We present the main theoretical and computational aspects of a framework for unsupervised learning of invariant audio representations, empirically evaluated on music genre classification.</p>
%8 03/2014
%1 <p><a href="http://arXiv:1404.0400v1">arXiv:1404.0400v1</a></p>
%2 <p>http://hdl.handle.net/1721.1/100163</p>

%0 Conference Paper
%B ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing
%D 2014
%T A Deep Representation for Invariance and Music Classification
%A Chiyuan Zhang
%A Georgios Evangelopoulos
%A Stephen Voinea
%A Lorenzo Rosasco
%A Tomaso Poggio
%K acoustic signal processing
%K signal representation
%K unsupervised learning
%B ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing
%I IEEE
%C Florence, Italy
%8 05/04/2014
%G eng
%U http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6854954
%R 10.1109/ICASSP.2014.6854954

%0 Generic
%D 2014
%T Learning An Invariant Speech Representation
%A Georgios Evangelopoulos
%A Stephen Voinea
%A Chiyuan Zhang
%A Lorenzo Rosasco
%A Tomaso Poggio
%K Theories for Intelligence
%X <p>Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intraclass transformations and deformations. We propose an extension of a theory for unsupervised learning of invariant visual representations to the auditory domain and empirically evaluate its validity for voiced speech sound classification. Our version of the theory requires the memory-based, unsupervised storage of acoustic templates — such as specific phones or words — together with all the transformations of each that normally occur. A quasi-invariant representation for a speech segment can be obtained by projecting it to each template orbit, i.e., the set of transformed signals, and computing the associated one-dimensional empirical probability distributions. The computations can be performed by modules of filtering and pooling, and extended to hierarchical architectures. In this paper, we apply a single-layer, multicomponent representation for phonemes and demonstrate improved accuracy and decreased sample complexity for vowel classification compared to standard spectral, cepstral and perceptual features.</p>
%8 06/2014
%1 <p><a href="http://arxiv.org/abs/1406.3884">arXiv:1406.3884</a></p>
%2 <p>http://hdl.handle.net/1721.1/100186</p>

%0 Conference Paper
%B INTERSPEECH 2014 - 15th Annual Conf. of the International Speech Communication Association
%D 2014
%T Phone Classification by a Hierarchy of Invariant Representation Layers
%A Chiyuan Zhang
%A Stephen Voinea
%A Georgios Evangelopoulos
%A Lorenzo Rosasco
%A Tomaso Poggio
%K Hierarchy
%K Invariance
%K Neural Networks
%K Speech Representation
%X <p>We propose a multi-layer feature extraction framework for speech, capable of providing invariant representations. A set of templates is generated by sampling the result of applying smooth, identity-preserving transformations (such as vocal tract length and tempo variations) to arbitrarily-selected speech signals. Templates are then stored as the weights of “neurons”. We use a cascade of such computational modules to factor out different types of transformation variability in a hierarchy, and show that it improves phone classification over baseline features. In addition, we describe empirical comparisons of a) different transformations which may be responsible for the variability in speech signals and of b) different ways of assembling template sets for training. The proposed layered system is an effort towards explaining the performance of recent deep learning networks and the principles by which the human auditory cortex might reduce the sample complexity of learning in speech recognition. Our theory and experiments suggest that invariant representations are crucial in learning from complex, real-world data like natural speech. Our model is built on basic computational primitives of cortical neurons, thus making an argument about how representations might be learned in the human auditory cortex.</p>
%B INTERSPEECH 2014 - 15th Annual Conf. of the International Speech Communication Association
%I International Speech Communication Association (ISCA)
%C Singapore
%G eng
%U http://www.isca-speech.org/archive/interspeech_2014/i14_2346.html

%0 Generic
%D 2014
%T Speech Representations based on a Theory for Learning Invariances
%A Stephen Voinea
%A Chiyuan Zhang
%A Georgios Evangelopoulos
%A Lorenzo Rosasco
%A Tomaso Poggio
%X <p>Recognition of sounds and speech from a small number of labelled examples (like humans do), depends on the properties of the representation of the acoustic input. We formulate the problem of extracting robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intraclass transformations and deformations. We propose an extension of a theory for unsupervised learning of invariant visual representations to the auditory domain, that requires the memory-based, unsupervised storage of acoustic templates -- such as specific phones or words -- together with all the transformations of each that normally occur. A quasi-invariant representation for a speech signal can be obtained by projecting it to a number of template orbits, i.e., each one a set of transformed template signals, and computing the associated one-dimensional empirical probability distributions. The computations are perfomed by modules of filtering and pooling, that can be used for obtaining a mapping in single- or multilayer architectures. We consider several aspects of such representations including different signal scales (word vs. frame), input domains (raw waveforms vs. frequency filterbank responses), structures (shallow vs.&nbsp;multilayer/hierarchical), and ways of sampling from template orbit sets given a set of observations (explicit vs. learned). Preliminary empirical evaluations for learning to separate speech phones and words are given on TIMIT and subsets of TI-DIGITS.&nbsp;</p>
%C SANE 2014 - Speech and Audio in the Northeast
%8 10/2014
%9 poster presentation

%0 Conference Paper
%B INTERSPEECH 2014  - 15th Annual Conf. of the International Speech Communication Association
%D 2014
%T Word-level Invariant Representations From Acoustic Waveforms
%A Stephen Voinea
%A Chiyuan Zhang
%A Georgios Evangelopoulos
%A Lorenzo Rosasco
%A Tomaso Poggio
%K Invariance
%K Speech Representation
%K Theories for Intelligence
%X <p>Extracting discriminant, transformation-invariant features from raw audio signals remains a serious challenge for speech recognition. The issue of speaker variability is central to this problem, as changes in accent, dialect, gender, and age alter the sound waveform of speech units at multiple scales (phonemes, words, or phrases). Approaches for dealing with this variability have typically focused on analyzing the spectral properties of speech at the level of frames, on par with frame-level acoustic modeling usually applied to speech recognition systems. In this paper, we propose a framework for representing speech at the whole-word level and extracting features from the acoustic, temporal domain, without the need for spectral encoding or pre-processing. Leveraging recent work on unsupervised learning of invariant sensory representations, we extract a signature for a word by first projecting its raw waveform onto a set of templates and their transformations, and then forming empirical estimates of the resulting one-dimensional distributions via histograms. The representation and relevant parameters are evaluated for word classification on a series of datasets with increasing speaker-mismatch difficulty, and the results are compared to those of an MFCC-based representation.</p>
%B INTERSPEECH 2014  - 15th Annual Conf. of the International Speech Communication Association
%I International Speech Communication Association (ISCA)
%C Singapore
%G eng
%U http://www.isca-speech.org/archive/interspeech_2014/i14_2385.html