The ECI Multilingual Corpus I
The European Corpus Initiative (ECI) was founded to oversee the
acquisition and preparation of a large multilingual corpus, and
supports existing and projected national and international efforts to
carefully design, collect and publish large-scale multilingual written
and spoken corpora.
ECI has produced Multilingual Corpus I (ECI/MCI) of over 98 million
words, covering most of the major European languages, as well as
Turkish, Japanese, Russian, Chinese, Malay and more. The primary
focus in this effort is on textual material of all kinds, including
transcriptions of spoken material.
The ECI/MCI is now available at a price of DFl 80 (for payments made
by credit card or Eurocheque); 95 DFl (for payments by bank transfer);
or 120 DFl (for payments by cheques other than Eurocheques) .
What's in it?
Just a sampling of the contents of the CD-ROM:
Look here for a complete listing of
- German newspaper texts from the Frankfurter Rundschau
from July 1992 - March 1993.
Provided by Universität Gesamthochschule, Paderborn, Germany.
Approximately 34 million words.
- French newspaper texts from Le Monde,
consisting of material from September 1989, October 1989, and
Provided by LIMSI CNRS, France.
Approximately 4.1 million words
- Extracts from the Leiden Corpus of Dutch,
consisting of newspapers, transcribed speech, etc.
Provided by Instituut voor Nederlandse Lexicologie, Leiden, Holland.
Approximately 5.5 million words
- International Labor Organisation (ILO)
"Official Bulletin, B Series". Vols LXVII(1984) - LXXII(1989).
Parallel texts in English, French and Spanish
provided by the International Labor Organisation.
Approximately 5 million words.
About the User Agreement
As some of the data is restricted, purchasers of the ECI/MCI CD-ROM will need
to sign a license agreement which restricts
them to use the data only for research purposes. You can either
- print out a postscript
version of the licence agreement which you will then need to
complete by hand, sign and send back to ELSNET at the address below,
- download the LaTeX source, which
will allow you to print out a version with your name (and research
group, if applicable) as part of the text. Again, you will need to
sign and return hardcopy of this form.
Please note that we cannot supply you with a copy of the corpus
until we have received a completed copy of the licence agreements from
you. Please do not send payment with your signed license
agreements. An invoice will be sent to you, with the CD, after
the agreements have been received .
We interpret the aim of the ECI/MCI Agreement, and of our
efforts in providing this data, as follows:
The ECI/MCI is made available for scientific research without
royalties. All copyrighted materials submitted for inclusion in the
collection remain the exclusive property of the copyright holders for
all other purposes. You should not redistribute the data that you get
from us, nor should you sell it, or charge for access to it, or
otherwise put it to any direct commercial use. Commercial
application of "analytical materials" derived from the text, such as
statistical tables or grammar rules, is not ruled out, as long
as copyright law is observed, but as the application of copyright law
in this area is unclear, anyone intending such exploitation should contact
the original providers.
Copyright holders who agree to make material available are being very
generous. Their contributions will make possible a resource of great
general utility for research and development in language technology
and linguistics. It is not our intent to deprive them of any revenues
that they should receive in the ordinary course of their business.
Thus it would be a violation of trust, as well as a violation of
copyright law, for you to republish a dictionary or other work to be
distributed under the user agreement, whether in print or electronic
Send completed license agreements to:
For further information about ECI, contact:
OTS, Utrecht University
3512 JK Utrecht
Henry S. Thompson (email@example.com)
2 Buccleuch Place,
Edinburgh EH8 9LW, UK