Textware’s Gestorlex


This article orginally appeared in the Nov/Dec 1991 issue of Language Industry Monitor

New lexicography software from Denmark — and much more.

With the official launching of GestorLEX at the October Buchmesse in Frankfurt, the world is a computational lexicographer's workbench richer. Developed by enterprising Danish software house TEXTware, GestorLEX is something of a hybrid product, a strong database engine providing the platform for a structured editing environment based on SGML. It runs under OS/2, clothed in a elegant Presentation Manager jacket.
    GestorLEX is designed to liberate the dictionary compiler (or editor) from formatting cares by providing a clean separation between the content of dictionary entries and their final (printed or electronic) form. Each element in an entry is clearly defined, rather like the fields in a database. The formatting attributes assigned to each of these elements can be defined or changed at any time. A lexicographer need only enter the relevant information in a predefined template to do his or her job.
    Because of its database orientation, GestorLEX offers some sophisticated functions for dictionary building. Lexicographers can “hide” certain elements of an entry. Elements can have their own keyboard layouts, allowing, for example, phonetic character sets to become active for phonetic notation of pronunciation. Like any good database, GestorLEX supports content restrictions to ensure consistency throughout the dictionary. This might apply to, for example, word class indicators or abbreviations.
    GestorLEX has facilities to allow more than one person to work on a dictionary at a time. Using a built-in query function, the chief editor can select and export a selection of entries, for example one letter or specific domains or even parts of speech, on to a floppy disk for further editing on another machine. This selection is then “locked” out in the main database although it can be viewed. This approach allows teams of lexicographers to work either “vertically,” ie, editing all the elements of a given set of entries, or “horizontally,” dealing with just certain aspects. For a given dictionary project, an expert in phonetics, for example, can edit just the pronunciations while a colleague works on botanical terms, each at their own computer.
    Later, upon completion, the entries can be imported back in to the system. While this copying and shifting process might seem less dynamic than a multiuser system running on a LAN, it does allow lexicographers to work remotely and allows the chief editor to check new entries before merging them with the rest of the dictionary.
    TEXTware president Jens Erlandsen says that while his company had primarily dictionary publishers in mind when developing GestorLEX — and they are already beginning to express their interest — it could also be used for related activities, such as compiling encyclopedias or corporate terminology databases. GestorLEX was developed by TEXTware in collaboration with the large Danish reference publisher Gyldendal, whereby Gyldendal obtained exclusive rights to GestorLEX within Denmark and TEXTware is allowed to market it abroad. GestorLEX may prove daunting competition for the Compulexis, although a major new update of that venerable lexicographer's package is said to offer similar functionality and ease-of-use.

Electronic dictionary publishing
TEXTware is also highly active in the general field of electronic publishing, although its name is usually “shielded” behind those of its customers, mainly publishers. At last count, the twelve-strong company — half programmers, half (computational) linguists — has produced twenty titles, with more than a dozen others on the way. Customers include many Scandinavian publishers as well as Dutch, English, German, and French ones. Their product range consists for the most part of bilingual electronic dictionaries, but also includes several monolingual titles and two versions of the bible.
    Erlandsen and colleagues call the TEXTware retrieval platform the Bookcase (not to be confused with Microsoft's Bookshelf). As they share similar file formats, all of TEXTware's electronic dictionaries can be accessed at the same time by the memory-efficient (11 KB) retrieval software. For this reason, Erlandsen favors traditional magnetic media over CD-ROM for distribution, noting that most people have at most one CD-ROM drive whereas they can install as many different dictionaries on to their harddisk from floppy disks as they have room for. With more than three year's experience producing electronic dictionaries and a complete set of tools developed for the purpose, Erlandsen estimates the average production time is around eight weeks. And everything, he adds, is SGML compliant.
    Erlandsen scoffs at fellow developers of electronic dictionaries who do not take advantage of the possibilities offered by the computer: “You can't just take the information and present it passively without thinking about it. You need to develop an overall editorial philosophy with regard to electronic publishing. Simple lookup functions aren't enough. You have to offer more. Otherwise, people get bored with your software and will stop using it.”
    What does Bookcase offer? Erlandsen demonstrates some of the features which distinguish TEXTware's software: a command which expands and contracts dictionary entries to control the amount of information displayed; a backtracking feature for retracing your steps through previous entries; and an “Occurrences” search facility, which allows you to search entries for which you are not sure of the lemma. Searching for occurrences of “Thailand” and “currency,” for example, will return “baht.” Other forms of added value include the ability to handle phrases and so-called “synonym collecting.”

Getting back to your roots
Erlandsen says TEXTware has developed a lemmatization routine for a Danish monolingual dictionary, with Norwegian, and Swedish versions soon to follow, whereby dictionary entries are automatically derived from declined forms. This feature is very useful for finding cross-references in dictionaries. He adds, however, that it is very difficult to implement in bilingual dictionaries. In “active” bilingual dictionaries, where the source language is presumed to be the user's mother-tongue, there is very little grammatical information supplied. In “passive” bilingual dictionaries, on the other hand, there may be more data, but not always enough for the lemmatization routine to be implemented. Adding a lemmatizer to either would require the addition of varying amounts of supplementary information together with extra effort on part of the lexicographer (and that is not even considering generating the equivalent form in the target language). Monolingual dictionaries, however, offer by definition a full palette of lexical information, facilitating the lemmatization of entries in software.
    There are profound changes looming on the horizon, with the distinctions between publishing, broadcasting, and computing becoming less and less water-tight. It is not yet clear whether the publishers of the future be the software companies, or whether today's publishers will be the suppliers of tomorrow's software. How does a small Danish company fit into all of this? “The first order of business has been labelling text-based information. This is the drive behind SGML and it means taking a content view of data. The next step will be to exploit linguistic knowledge about it. And that's where we come in. I see a place for our company positioned somewhere between the traditional publishers and the computer companies.”

GESTORLEX costs DKR 15,000 (c.$2000)

TEXTware a/s, Raadmandsgade 43, DK-2300 Copenhagen N, Denmark; +45 35 820077, Fax +45 35 820205

COPYRIGHT © 1991 BY LANGUAGE INDUSTRY MONITOR

[ return | top | home ]