Full-text corpus data

You can now download the Coronavirus Corpus for offline use. To date, this is about 1.42 billion words of data that you would have on your own machine. The Coronavirus Corpus contains data on the medical, social, cultural, and economic impact of the coronavirus (COVID-19) in 0 texts from online magazines and newspapers in 20 different English-speaking countries from 1 Jan 2020 to the current time. Most importantly, the corpus grows by 3-5 million words of data each day. This translates to about 90-110 million words each month (for example, there are more than 100 million words for Nov 2021).

When you purchase the full-text data from the Coronavirus Corpus, you get all of the data up through the previous month. You can also purchase an annual subscription, which will give you data for the next 12 months.

For example, if you purchase both datasets on 15 September 2022, you would have the data from January 2020 - August 2022 (which was released on 1 September 2022; see the samples in #2 below), and an annual subscription would give you the data for one more year: September 2022 - August 2023.

    Time period Size Samples
1 One-time purchase Jan 2020 - month of purchase (Currently) About 1.42 billion words
(Samples are Jan-May 2020)
Database, WordLemPoS, Text, Sources, Lexicon
2 Annual subscription The 12 month period after month of purchase Database, WordLemPoS, Text, Sources, Lexicon

If you purchase just #1 above, it would be the price of one corpus, and there would be a discount for purchasing both corpora (#1 and #2) at the same time.

Note also that the monthly updates will be released at the beginning of the following month. You will be notified by email as soon as the update is available, and you will have ten days to download the data.