Restrictions on use of the
corpora
You must agree to these restrictions in order to
obtain the data
, or else obtain a waiver from us for a particular point
listed below.
[ Data ]
1. In no case can substantial amounts of the full-text data (typically, a total of 50,000 words or
more) be distributed outside the organization listed on the license agreement. For
example, you cannot create a large
word list or set of
n-grams, and then
distribute this to others, and you could not copy 70,000 words from different
texts and then place this on a website where users from outside your
organization would have access to the data.
2. The data cannot be
placed on a network (including the Internet), unless access to the
data is
limited (via restricted login, password, etc) just to those from the
organization listed on the
license agreement.
For example, it cannot be placed on another corpus site, which indexes the data
and then makes it available to end users, because that other corpus site would
then have access to the data.
3. In addition to the
full-text data itself, #2 also applies to
derived frequency, collocates, n-grams,
concordance and similar data that is based on the corpus.
4. If portions of the
derived data is made available to others, it cannot include substantial portions
of the the raw frequency of words (e.g.
the word occurs 3,403 times in the corpus) or the rank order (e.g. it is the
304th most common words). (Note: it is acceptable to use the frequency data to
place words and phrases in "frequency bands", e.g. words 1-1000, 1001-3000, 3001-10,000,
etc. However, there should not be more than about 20 frequency bands in your
application.)
[ License ]
5. Academic licenses: are only valid for one
campus. So if you are part of a research group, for example, with members at
universities X, Y, and Z, they all need to purchase the data separately.
6. Academic
licenses: you can not use the data to create software or products that will be sold to
others.
7. Academic licenses: students in your
undergraduate classes cannot have access to substantial portions of the data (e.g. 50,000
words or more). Graduate students can have access to the data for work on theses
and dissertations. The data is primarily intended for use in research, not teaching.
If you need corpus data for undergraduate classes, please use the
standard web interface for
the corpora.
8. Academic and Commercial
licenses: supervisors will make best efforts to ensure that other employees or
students who have access to the data are aware of these restrictions.
9. Commercial license: large companies with
employees at several different sites (especially different countries) may need
to contact us for a special license.
[ General ]
10. There are no refunds, unless you find that the
samples that you have examined are not representative
of the "full" data that you download (which they are not).
11. Any publications or products that are based on
the data should contain a reference to the source of the data:
https://www.corpusdata.org.
12. Note that a small, unique
change will be made to each set of data, and this will serve as a "fingerprint"
to identify you as the unique source of the datasets that you download. Automated Google searches are run
daily to find copies of the data on the Web. If we find the data
online and it is the data that was sent to you (and we will be able to
determine that is the case), then you will be required to contact the administrators
for that website, to have the data removed. |
If you want to purchase the data, please indicate that you have read and
agree to these conditions on use, before you send us the license agreement. Thanks.
|