Sharing Linguistic Data


This article orginally appeared in the Nov-Dec 1993 issue of Language Industry Monitor

Tools, rules, and lots of text: the RELATOR project will make more of these vital resources available to industry and academia.

While modestly funded (ECU 500,000), the LRE RELATOR project is nonetheless a timely new undertaking to set up a basic repository for written and spoken linguistic data, rules, and tools in Europe. As developers are well aware, building robust NLP and speech applications with wide coverage requires enormous amounts of raw data. The Linguistic Data Consortium (LDC) at the University of Pennsylvania has been addressing this need by gradually making such materials available for English and a growing number of other languages, but it nonetheless makes sense for a similar kind of organization to be established in Europe. A European counterpart to the LDC would in principle be better placed to respond to the needs of European developers; it would also be the logical channel for the distribution of European materials.
    RELATOR will not be starting completely from scratch but will be able to build on earlier like,minded efforts by European organizations such as ELSNET, ESCA, and the EACL. The RELATOR consortium’s initial goals will be to assemble an industrial advisory board for obtaining potential user input, get a small distributed network operational, and acquire some experience in CD-ROM pressing.
    Gathering such materials will not be a trivial task. Companies and organizations tend to be wary of sharing what might be considered strategic data. Then there is a thicket of legal and intellectual property issues. Yet another hurdle will be to make all of such materials available in a standard format. However, in the final tally, such an initiative must succeed. As language technology moves laterally across more and more languages, no single company or organization will be able to gather the requisite linguistic data on its own.

COPYRIGHT © 1993 BY LANGUAGE INDUSTRY MONITOR

[ return | top | home ]