Full-text corpus data


The samples of full-text data below are from about 1% of the corpus, or about 14 million words. This is a random sample of the ~95,000 websites, where the website ID ends in '53', e.g. website #3953, website #29453, website #70253, etc.
 
Format (overview) Download samples
Text 0  1  2  3  4  5  6  7  8  9 
Word / lemma / PoS 0  1  2  3  4  5  6  7  8  9 
Database 0  1  2  3  4  5  6  7  8  9 
Shared files Lexicon (see notes), sources