Le Lexicaliste


This article orginally appeared in the July-Aug 1992 issue of Language Industry Monitor

By Andrew Joscelyne

Site is now marketing the dictionary generator it developed for the French ptt’s CNET labs.

As the European leader in multilingual documentation, the French company site has gradually begun to market tools that were originally developed to meet in-house needs for large translation and document processing jobs. In the field of terminology management, Phénix and its slimmer brother Aquila are two well-known though apparently little-used site products which grew out of several years of solving terminology difficulties in projects for the aerospace and military equipment industries. The latest offspring is Le Lexicaliste, a highly ambitious dictionary generator which inaugurates a new generation of automation tools for creating and maintaining dictionaries and integrating them into systems.
    According to Le Lexicaliste project manager Dominique Maret, the stimulus for the system was a 1990 commission from CNET (Centre national de télécommunications), the research wing of the French telecoms operator. As part of its ongoing work in building natural language understanding into the user interface of Minitel, the French videotex network, CNET needed a large, permanently upgradable database of French words, containing easily exploitable phonetic, morpho-syntactic and semantic information. CNET wanted the lexical processing done on a single workstation, and, given the scale of the investment, dictated that the product be reusable for future, unspecified lexicon-based applications. As a result, Le Lexicaliste bears the trademark of France Télécom and has a French monolingual interface and lexical base. The documentation and user interface, however, are currently being translated into English.
    The key feature of Le Lexicaliste is its “genericity.” The system has been built as rigorously pre-application to ensure that it is compatible with current and future industry standards in computing, interfacing, and data exchange. Lexical entries can be imported from various sources (other dictionaries, files, term lists, etc.) and defined on the basis of multiple attribute criteria within a very rich set of possible linguistic data structures. When lexical data is required for a given application, a dictionary can be generated and exported to that application in sgml format. Le Lexicaliste runs under the Oracle RDBMS on Sun workstations and sports a X/Windows-Motif user-interface.

Le Maximaliste
Dominique Maret believes Le Lexicaliste differs from other current generic dictionary projects, such Genelex, Multilex, Aquilex, and the Japanese EDR project, because it adopts a “maximalist,” theory-independent approach to handling linguistic information: “We believe our design is much closer to the actual needs of potential applications. By building in full, open extendibility at every level, we can allow for special needs and usages in lexical development which cannot be foreseen at the time when the data structures are specified.”
    One of the first large-scale applications of Le Lexicaliste outside of its CNET application will be as a dictionary generator for eurolang, the new consortium led by site that is developing an automatic translation project under a ECU70 million Eureka contract.

SITE, 11 ave. Morane Saulnier, B.P.189, 78143 Vélizy-Villacoublay Cedex, France Tel +33 1 30 70 16 16, Fax +33 1 34 65 91 43.

COPYRIGHT © 1992 BY LANGUAGE INDUSTRY MONITOR

[ return | top | home ]