Dictionary Standards: Lex Get It Straight

By Andrew Joscelyne


This article orginally appeared in the Jul/Aug 1991 issue of Language Industry Monitor

Instigated in response to a European Ministerial decision in Rome 1990 and bankrolled to the generous tune of Fr250m ($48m), Genelex is a 4-year 250 man-year project aimed at developing a generic electronic dictionary for initially French, Italian, and Spanish. Motivating the project is the realization that dictionary-building costs in language engineering applications tend to rise exponentially; if a way could be found to reuse existing lexical resources by designing a metamodel which could accommodate all possible dictionary entries, plus a set of tools allowing gateways between this super-format and any local application that might require lexical material, then the rate at which natural language applications could be implemented in transeuropean communications as a whole would speed up — and get cheaper.
    Pitching itself in the same strategic (but certainly not funding) class as the MITI-funded Electronic Dictionary Research effort, Genelex is a competitive, product-driven affair based on a bottom-up association of teams which have commited themselves to the industrialization of their product. This way, risks are shared, as well as any eventual profits; the ultimate goal is to position the results of their labors on the market, not in the research lab. In Genelex consortium, the cream of French language engineering (teams from Bull, Gsi-Erli, Hachette, IBM Research, LADL at University of Paris VII, and SEMA GROUP) has mixed its resources in with a rich Italian ingredient (the prestigious Pisa Research Consortium, Servedi, itself a subsidairy of publishers Utet and Paravia), and Lexicon, with the University of Salerno as sub-contractor, and a generous dollop of Spanish pimento (publisher Salvat, Tecsidel and the Autonomous University of Barcelona).
    Despite the suggestive suffix, do not confuse GENELEX with either the MULTILEX or ACQUILEX projects, even though data or theory feeds and shareouts between these various lexicon projects will be encouraged. MULTILEX is a three year ESPRIT II (ie, pre-competitive) project which began in December 1990 with the aim of proposing standards for “pre-theoretic” (theory-neutral) lexicon technology; first, lexical database formats and, second, a box of software tools for handling the compilation or conversion of existing lexical resources into the proposed standard format. ACQUILEX sounds like a sub-routine in the MULTILEX project: its target is the design of an explicit and standardized representation of language in order to build a computational model of a dictionary entry (for any lexical database). A particular emphasis will be placed on developing methods for extracting semantic information from large machine-readable dictionaries accessible via the proposed standardized format.

COPYRIGHT © 1991 BY LANGUAGE INDUSTRY MONITOR

[ return | top | home ]