The ECI Multilingual Corpus I


The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora.

ECI has produced Multilingual Corpus I (ECI/MCI) of over 98 million words, covering most of the major European languages, as well as Turkish, Japanese, Russian, Chinese, Malay and more. The primary focus in this effort is on textual material of all kinds, including transcriptions of spoken material.

The ECI/MCI is now available at a price of DFl 80 (for payments made by credit card or Eurocheque); 95 DFl (for payments by bank transfer); or 120 DFl (for payments by cheques other than Eurocheques) .

What's in it?

Just a sampling of the contents of the CD-ROM:

Look here for a complete listing of the contents.

About the User Agreement

As some of the data is restricted, purchasers of the ECI/MCI CD-ROM will need to sign a license agreement which restricts them to use the data only for research purposes. You can either
Please note that we cannot supply you with a copy of the corpus until we have received a completed copy of the licence agreements from you. Please do not send payment with your signed license agreements. An invoice will be sent to you, with the CD, after the agreements have been received .
We interpret the aim of the ECI/MCI Agreement, and of our efforts in providing this data, as follows:

The ECI/MCI is made available for scientific research without royalties. All copyrighted materials submitted for inclusion in the collection remain the exclusive property of the copyright holders for all other purposes. You should not redistribute the data that you get from us, nor should you sell it, or charge for access to it, or otherwise put it to any direct commercial use. Commercial application of "analytical materials" derived from the text, such as statistical tables or grammar rules, is not ruled out, as long as copyright law is observed, but as the application of copyright law in this area is unclear, anyone intending such exploitation should contact the original providers.
Copyright holders who agree to make material available are being very generous. Their contributions will make possible a resource of great general utility for research and development in language technology and linguistics. It is not our intent to deprive them of any revenues that they should receive in the ordinary course of their business. Thus it would be a violation of trust, as well as a violation of copyright law, for you to republish a dictionary or other work to be distributed under the user agreement, whether in print or electronic form.

Send completed license agreements to:
OTS, Utrecht University
Trans 10
3512 JK Utrecht
The Netherlands
For further information about ECI, contact:
Henry S. Thompson (
2 Buccleuch Place,
Edinburgh EH8 9LW, UK