For more information on texts and composition, click on
icon at the top of the page of each corpus.
Texts (95% available
in full-text data)
Focus / strengths
COCA: Corpus of Contemporary American English
on 2012-2015 update)
|520 million words /
220,000 texts. US, 1990-2015.
||Best coverage of
all types of genres (informal to formal):
spoken, fiction, magazines, newspaper, academic.
The most widely-used corpus of English.
COHA: Corpus of Historical American English
||400 million words /
107,000 texts. US, 1810-2009.
100x as large as next-largest historical corpus
GloWbE: Global Web-based English
||1.9 billion words /
1.8 million texts. 20 countries.
||About 60% blogs
(very informal). Recent: 2013. Comparing
varieties of English: American, British,
Australian, etc. 100x as large as the
next-largest corpus of English dialects.
New full-text data (December 2016)
NOW: News on the Web
|4.89 billion words /
6.0+ million texts. (As of early Dec 2016;
continually growing). 20 countries.
||The most up-to-date
corpus of English. 4-5 words added each day (130
million each month, 1.5 billion each year). Wide
range of online newspapers and magazines
(technology, entertainment, sports, politics,
|1.9 billion words /
4.4 million texts.
||Best corpus for
specialized language for an almost unlimited
range of topics: science, entertainment,
technology, history, sports, etc
Corpus del Espaņol (Spanish)
|2.0 billion words /
2.0 million texts. 21 countries.
well-annotated corpus of Spanish. All of the
strengths of GloWbE (above), but for Spanish