Ingénia Langage Naturel: Targetting French Grammar


This article orginally appeared in the Nov-Dec 1994 issue of Language Industry Monitor

By Andrew Joscelyne

The language processing industry is largely composed of small one-prod oct companies trying to exploit the results of academic research. Their fortunes offer a handy indicator of market trends in the sector, as French company Langage Naturel’s profile suggests.

Founded by artificial intelligence researcher Alain Bonnet in 1990, Langage Naturel’s original ambition was to market the results of a comprehensive academic research program into the computer processing of French morphology, syntax and especially semantics. The flagship product was to be based on a semantic network that was generated automatically from the concepts used in word definitions found in standard published dictionaries. And not surprisingly, the original application was to be an add-on to information retrieval packages, providing beefed-up thesaurus help when searching for documents in French. Although the company’s basic strategy of putting language engineering products rather than services on the market has not changed, the products themselves have. As has the structure of the firm’s equity.

In mid 1994, Langage Naturel was taken over by Ingénia, a large French artificial intelligenceoriented technology group. Senior management at Ingénia believe that the language applications market is due to boom in three year’s time, and they want to strengthen the firm’s skills base. For Langage Naturel, the deal offered the additional benefit of synergy with some of the group’s ongoing language activities, and the chance to generate some real income. Now called Ingénia-Langage Naturel, the combined forces offer two product lines targeting distinct user groups. But the French semantic network no longer figures among them.
    Inherited essentially from Ingénia, the STIL toolbox comprises a set of functions designed for the niche market of inter-bank message handling. The system takes the textual data found in the free field of SWIFT-type fund transfer messages and “translates” them into fixed field data so that complete messages can be automatically batch-processed. The result is a radical acceleration of the circulation of messages. STIL draws on specialized dictionaries to handle features such as abbreviations, ellipsis and erroneous spellings. According to Ingénia-Langage Naturel’ director, Patrick Constant, customers for STIL include such banks as Crédit Lyonnais, the Royal Bank of Canada and Crédit Commercial de France. It generates three quarters of the company’s income.

While STIL focuses on a niche need, Langage Naturel’s original product is being geared to a potentially broader market. Under the name Sylex, it consists of a set of three components, two of which are now on sale. Interestingly, it is the “Graphe” semantic network that failed to reach the user; it is now used only as an in-house resource. ”We found that user needs in terms of document retrieval aids vary enormously,” says Constant. “So far, we have not been able to develop the right kind of interface to allow it to be used in applications.”
    Of the other two Sylex components, Lex is a comprehensive lexicon of French plus a set of lemmatization and other functions, while Base is a morphological and syntactic analyzer of French. The intended application field for Sylex is the analysis of text into syntactic categories. “What is important about this product,” says Constant, “is that it is one of the very few products on the market that lever, ages analysis beyond the character string level of information in texts.”

Constant sees three major areas of applications for Sylex: lemmatization for full-text indexing, syntactic analysis of texts, and statistical analysis of corpora. In other words, Sylex is a library of functions that can be used in tandem with other computing technology to deliver enhanced information about the form and content of texts. “Our emphasis is on tools that can adapt to any environment in which knowledge about linguistic structure will help people do their own jobs better,” summarizes Constant. “We see ourselves as a supplier of products that make other products work better.”
    The Sylex-Lex lemmatization function, for example, will recover the root and syntactic cat’ egory of all full words in a text base. “We have integrated it into a number of text searching applications including French company Ergosum’s full-text DOXIS indexing system,” he says. “It offers better quality searches, and a better silence-to-noise ratio than character string-based competitors.”
    The syntactic analyzer is Constant’s own brainchild, for which he claims the broadest cover’ age for any French parsing product now available. ”The grammar consists of some twenty-five thousand lines of code, excluding the sixty-three thousand headwords in the lexicon,” he explains. “We have developed our own formalism which attempts to eliminate complexity in both code and grammar, offering a simple but exceedingly powerful descrip’ tive framework.”
    Among other possible applications, one of the firm’s customers uses Sylex-Base to scan press agency dispatches in search of all the specific structures representing information about movements of capital for a specific range of companies — a process that can only work if the parser can handle fairly complex grammatical patterns. The ultimate aim is logical if ambitious: “In a few years, we hope to cover every significant grammatical structure in the language,” Constant says.
    The third field of application is text statistics. ”To date, people analyzing corpora have only been using statistical tools that offer very rudimentary levels of evaluation. Using a combination of Lex and Base, we can push the analysis much further towards the representation of conceptual informa’ tion in texts. Our tools reveal regularities in texts that help understanding.” For the corporate sector again, Sylex can generate a snapshot of such features as the grammatical phenomena that occur in texts, the ratio of positive to negative adjectives modifying nouns referring to a given firm, or adverbs co-occurring with certain types of verbs. This kind of data can deliver a fairly accurate “first impression” report of how a given company is being represented in newspaper articles, for example. Whether there is much of a market for this “pre’ semantic” alternative to genuine text summarizing tools remains to be seen.
    Given the effort that has gone into the French language tools, Ingénia-Langage Naturel’s strategy is essentially product-based. “We don’t wish to compete on the rather well-supplied French NLP services market,” says Constant. “We have very reliable products — zero errors in the lemmatization engine — and we are beginning to gain attention on a market that has been disappointed by broken NLP promises. Although we’ve started developing an English parser using our formalism, the admittedly large English language market also has far more competitors. So we have decided to go all out for the French language market, where we think we can try and guarantee the best possible computer coverage of French in a fully portable environment.

One final advantage of this concentration on the full syntactic coverage of one language is that with Ingénia-Langage Naturel’s multi-platform approach, Sylex-Base could eventually be plugged into a machine translation system to provide a French analysis module for any up and running translation system. It would theoretically provide greater grammatical coverage than any equivalent module developed from within the usual constraints of a multilingual development project. And it has the current advantage of being operational. Will such virtues be sufficient to expand the market base for what is in effect a machine tool for the language industry?

Ingénia-Langage Naturel, 92 bis ave Victor Cresson, F, 92130 Issy-les-Moulineaux, France; Tel: +33 1 47 36 29 00, Fax: +33 1 45 29 03 04

COPYRIGHT © 1994 BY LANGUAGE INDUSTRY MONITOR

[ return | top | home ]