Full-text corpus data


You can now download the Coronavirus Corpus for offline use. To date, this is about 963 million words of data that you would have on your own machine. The Coronavirus Corpus contains data on the medical, social, cultural, and economic impact of the coronavirus (COVID-19) in 126,615 texts from online magazines and newspapers in 20 different English-speaking countries from 1 Jan 2020 to the current time. Most importantly, the corpus grows by 3-5 million words of data each day. This translates to about 90-110 million words each month (for example, there are more than 100 million words for ).

When you purchase the full-text data from the Coronavirus Corpus, you get all of the data up through the previous month, as well as data for the next 12 months. In other words, with one license you get all of the data up to the present time, as well as the next year of data. This is different that the NOW Corpus, where these would be two separate licenses.

For example, if you purchase both datasets on 15 April 2021 you would have the data from 1 Jan 2020 - 31 March 2021 (which was released on 3 April 2021), as well as the data for one more year: Apr 2021 - Mar 2022.

    Time period Size Samples
1 Through current month Jan 2020 - month of purchase (Currently) About 963 million words
(Samples are Jan-May 2020)
Database, WordLemPoS, Text, Sources, Lexicon
2 Next 12 months The 12 month period after month of purchase
(Samples are from February 2021)
Database, WordLemPoS, Text, Sources, Lexicon

Note also that the monthly updates will be released at the beginning of the following month. You will be notified by email as soon as the update is available, and you will have ten days to download the data.