Full-text corpus data


You can now download the NOW corpus for offline use, including monthly updates via a subscription. In total, this is about 21.7 billion words of data that you can have on your own machine. There's nothing else like it.

The NOW corpus contains data from 36,808,275 texts from online magazines and newspapers in 20 different English-speaking countries from 2010 to the current time (see sources). At 21.7 billion words, it is by far the largest corpus (of any language) that is available in full-text format. Most importantly, the corpus grows by 8-10 million words of data each day. This translates to about about 250 million words each month and about 2.5 - 3.0 billion words each year.  (See totals by month.) If you're interested in what's going on in English up to and including right now, this is by far the best corpus available.

When you purchase the full-text data from NOW, you get all of the data from 2010 up through the previous month. You can also purchase an annual subscription, which will give you data for the next 12 months (typically about 1.5 billion words each year).

For example, if you purchase both datasets on 15 April 2025, you would have the data from January 2010 - March 2025 (which was released on 1 April 2025), and an annual subscription would give you the data for one more year: April 2025 - March 2026.

Note that the samples for all years (the first of the two rows below) are quite large, since they contain 215 million words of data: 1,688 MB for wordLemPoS, 1,096 MB for database, and 463 MB for text. If you have limited bandwidth, you might want to just download the data from 2024, which is "only" 21.7 million words of data.

    Time period Size Samples
1 One-time purchase 2010 - month of purchase (Currently) About 21.7 billion words Database, WordLemPoS, Text, Sources, Lexicon
2 Annual subscription The 12 month period after month of purchase From 2024: Database, WordLemPoS, Text

If you purchase just #1 above, it would be the price of one corpus. If you purchase the subscription as well, there would be a discount for purchasing both corpora (#1 and #2) at the same time.

Note also that the monthly updates will be released at the beginning of the following month. You will be notified by email as soon as the update is available, and you will have ten days to download the data.

Notes about limitations with the texts and metadata