Treebank of Learner English (TLE)

Publication TypeDataset
Year of Publication2016
AuthorsBerzak, Y, Kenney, J, Spadine, C, Wang, JXian, Lam, L, Mori, KSophie, Garza, S, Katz, B
Date Published08/2016

The majority of the English text available worldwide is generated by non-native speakers. Learner language introduces a variety of challenges and is of paramount importance for the scientific study of language acquisition as well as for Natural Language Processing. Despite the ubiquity of non-native English, there has been no publicly available syntactic treebank for English as a Second Language (ESL). To address this shortcoming, we released the Treebank of Learner English (TLE), a first of its kind resource for non-native English, containing 5,124 sentences manually annotated with Part of Speech (POS) tags and syntactic dependency trees. Full syntactic analyses are provided for both the original and error corrected versions of each sentence. We also introduced annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. We envision the treebank to support a wide range of linguistic and computational research on language learning as well as automatic processing of ungrammatical language.

