Corpus Linguistics

Last update: December 15th, 2010


Tutorials

Nadja Nesselhauf's web tutorial
Corpus Linguistics (McEnery/Wilson) - companion website
The Use of Corpora in Language Studies (McEnery/Wilson)

Overview of corpora available to date

Well-known and influential corpora (Richard Z. Xiao)

English corpora online

BYU CORPUS COLLECTION which includes:
»TIME Magazine Corpus of American English (100 million words, 1920s - 2000s)
»Corpus of Contemporary American English (COCA, 385 million words from 1990 - present)
»BYU-BNC (British National Corpus, BNC, 100 million words, 1980s - 1993)
»Oxford English Dictionary corpus (37 million words, c900s - 2000s)
Infosite for the British National Corpus
BNCweb interface
American National Corpus
Michigan Corpus of Academic Spoken English (MICASE)
Michigan Corpus of Upper-Level Student Papers (MICUSP)
WebCorp: The Web as Corpus
The International Corpus of English Homepage
Collins WordbanksOnline (The Bank of English)
The CHILDES System

German corpora online

Liste deutschsprachiger Korpora
DWDS Corpus
Textkorpora des IDS - Übersicht
COSMAS II (IDS Mannheim)
NEGRA Corpus
Wortschatz Deutsch

Parallel / Translation corpora (German-English or English-German)

Parallel German-English Corpus "de-news", Daily News 1996-2000
English/German translation corpus (TU Chemnitz)
German(-English) parallel corpora (Europarl and German News)
OPUS - an open source parallel corpus

Other languages

French
Spanish
Portuguese
Polish

Learner Corpora

International Corpus of Learner English and other learner corpora
Louvain International Database of Spoken English Interlanguage
Cambridge Learners Corpus
Corpus of Academic Learner English (CALE)
Corpora 4 Learning - a website providing links and references for the use of corpora, corpus linguistics and corpus analysis in the context of language learning and teaching

Diachronic corpora


Link collections

Michael Barlow's page on "Text Corpora and Corpus Linguistics"
David Lee's Bookmarks for Corpus-based Linguists

(Web-based) tools

ParaConc - a bilingual/multilingual concordancer that can be used in contrastive analyses, language learning, and translation studies/training (free demo available)
Phrases in English (PIE)
SketchEngine
GlossaNet

Institutions

UNIVERSITY CENTRE FOR COMPUTER CORPUS RESEARCH ON LANGUAGE UCREL), Lancaster UK.
International Computer Archive of Modern and Medieval English (ICAME)
Centre for Corpus Linguistics, University of Birmingham
Centre for English Corpus Linguistics, University of Louvain, Belgium
HUB Korpuslinguistik
Corpora List

Journals

Language Learning & Technology
International Journal of Corpus Linguistics
Corpus Linguistics and Linguistic Theory
Corpora
ICAME Journal
Literary and Linguistic Computing