Full-text corpus data

Once you have the full-text data on your computer, there is no end to the possible uses for the data. The following are just a few ideas:

Create your own frequency lists -- in the entire corpus, for specific genres (COCA, e.g. Fiction), dialects (GloWbE, e.g. Australia), time periods (COHA, e.g. 1950s-1960s), topics (Wikipedia, e.g. molecular biology), websites/dates (NOW, e.g. Wall Street Journal from Sep-Oct 2016), very informal language (TV, Movies, or SOAP corpus) or specific sub-genres (COCA, e.g. Newspaper-Financial).
Find collocates -- what are the most common words occurring near a specific word, which provides great insight into the meaning and usage of the word.
Create your own n-grams lists -- what are the most common strings involving whatever words you want.
Generate your own concordance lines -- thousands or tens of thousands of lines of data for any list of words -- without the limits imposed by using the web interfaces for the corpora from English-Corpora.org
If you're a computational linguist, you can do all of the things that you can do only with full-text data -- sentiment analysis, topic modeling, named entity recognition, advanced regex searches, creating treebanks, etc.

Note that "pre-packaged" COCA-based frequency lists, collocates, and n-grams are all available (see samples) for those who don't want to extract their own data. But with the full-text data, you have much more control over this data.

Remember that in your queries, you can search by word form, lemma (e.g. walk = walks, walked, walking), or part of speech, or any combination of these. This can be very useful, for example, for advanced work on syntactic constructions.
You can compare one section of the corpus to another -- for example, words that occur much more in Magazine-Financial than magazines in general (COCA), adjectives that are much more common in Great Britain than the United States (GloWbE), or the collocates of a word in the 1800s and 1900s (COHA). (Some of this data is already available in "pre-packaged" form, but you will have much more control over the data.)

Using the lexicon and sources files:

You have access to a file that lists all sources in the corpora (along with its associated metadata). You can use this data to create your own "sub-corpora" -- texts from just a particular year, or a certain source, or whatever other criteria you want.
Because you have access to a lexicon of all word forms in the corpus (millions of entries for each corpus), you can add any features you want for any word -- pronunciation, meaning, etc -- and then use that as part of your search.

Basically, anything that you can do with the corpora online, you can do with this data -- and much more. But because the data is on your own computer, there are no limits on how many queries you can do each day; you don't have to worry about hundreds of other people using the corpora at the same time you are; and you can even leave programs running overnight to search the hundreds of millions (or billions) of words in complex ways. The possibilities are endless.