Full-text corpus data


Note: this data is based on corpora that were created solely by Mark Davies, Professor of Linguistics at Brigham Young University. As the result of an agreement between BYU and Mark Davies, all transactions regarding payments and licenses for this data are made solely with Mark Davies, rather than with BYU.

 

Complete full-text data from iWeb
(See sample)
Academic *   $495 License agreement
Commercial   $995 License agreement
(Just) URLs data from iWeb
(See sample)
Academic *   $145 License agreement
Commercial   $295 License agreement

Notes:
* The "complete" full-text data also includes the URLs data
* When you purchase the complete full-text data, you have access to all three formats: database, word/lemma/PoS, and text (running text).
* The full-text data is about 20% more expensive than the other BYU full-text data, but iWeb is much larger than these corpora (e.g. 25x as large as COCA)
* Approx. size of full-text data (uncompressed): Text: 78 GB; Database: 480 GB; WordLemPoS: 660 GB

These are the steps to obtain the data:

1. Download and fill out the license agreement. This states that you will not give the data to anyone else outside of your university or company (which also means that you cannot post it on the web). You just need to fill in your name and company (if that is applicable), and then send it back to us  as an attachment. * Note that you must use an academic email address (e.g. *.edu or *.ac.edu) for an academic license.
2. Once we receive the license agreement, we'll send you a request for payment from PayPal.
3. You make the payment with a credit card at PayPal. Note that you do not need a PayPal account to make the payment.
4. As soon as we receive confirmation of the payment, we'll send you the link to download the data.