You can now
download the
NOW
corpus for offline use, including monthly
updates via a subscription. In total, this is about
21.7 billion words of
data that you can have on your own machine. There's
nothing else like it.
The NOW
corpus contains data from 36,808,275
texts from online magazines and newspapers in 20
different English-speaking countries from 2010 to the
current time (see
sources).
At 21.7 billion words, it is by far the largest corpus (of any language) that
is available in full-text format. Most importantly, the
corpus grows by 8-10 million words of data each day. This translates to about
about 250 million words each month and about 2.5 - 3.0
billion words each year. (See
totals by month.) If you're interested in what's going on
in English up to and including right now, this is by far
the best corpus available.
When you purchase the
full-text data from NOW, you get all of the data from
2010 up
through the previous month. You can also purchase an
annual subscription, which will give you data for the
next 12 months (typically about 1.5 billion words each year).
|
For example, if you
purchase both datasets on 15 April 2025, you would
have the data from January 2010 - March 2025 (which
was released on 1 April 2025),
and an annual subscription would give you the data for
one more year: April 2025 - March 2026.
Note that the samples for
all years (the first of the two rows below) are quite
large, since they contain 215 million words of data:
1,688 MB for wordLemPoS, 1,096 MB for database, and 463
MB for text. If you have limited bandwidth, you might
want to just download the data from 2024, which is
"only" 21.7 million words of data.
If you purchase just #1
above, it would be the price of
one corpus. If you purchase the subscription as
well, there would be a
discount for purchasing both
corpora (#1 and #2) at the same time.
Note also that the monthly updates will be
released at the beginning of the following month. You will be notified
by email as soon as the update is available, and you will have
ten days to download the data.
Notes about
limitations with the
texts and metadata
|