Termbase Design and TIF


This article orginally appeared in the Mar-Apr 1993 issue of Language Industry Monitor

When building a terminology database, should you take the relational database route or keep to the familiar structured free-text path? There are convincing arguments for both approaches.

The experts differ on the best strategy for designing termbases. Two models prevail: the relational database and structured free-text. For keyterm, cap debis opted to build it around a relational database engine, partly because of its mainframe and Unix background. Other pc-based packages have followed the structured free-text file format that originated with mtx, the widely used terminology package from LinguaTech. Most notable of these is TIF-compliant MultiTerm, which extended the mtx approach in a number of ways. The design debate is not an academic one; the ease with which users can set up termbase, enter data, and search for terms significantly effects the usefulness of a termbase. Translators, in general, are not interested in becoming database experts; sql is not usually a language in their arsenal.
    Traditional relational databases have not always been well suited for handling terminological or lexical data. Requirements such as fixed length fields and fixed length keys can frustrate efforts to work with text-oriented data. Some still do not support alternate character sets – devastating for multilingual applications. Newer packages do support variable length fields and memo fields, but do not necessarily provide editing and search functions within fields for handling a large amount of text. On the other hand, relational databases do offer extensive control over data entry, and the use of field types (i.e. dates) makes complex Boolean searches possible.
    The structured free-text format pioneered by Alan Melby in the mtx software ten years ago is hierarchical rather than relational in nature. As a result, an mtx-type database is easier for an end user to set up. A well designed structured free- text file is not flat like that of a simple “card file” program; rather, its fields can have complex internal relationships, thereby making it flexible enough to handle such things as synonyms and alternate entries.

Gregory Shreve, of Kent State University’s Department of Applied Linguistics, has argued for many years in favor of relational databases for terminology. Says Shreve, “Alan Melby and I have disagreed for a long time about the virtues of relational versus hierarchical databases. Fact is, hierarchical systems are an old software technology. The problem that most people have with relational databases is that they don’t understand the design process for setting up the relational tables, in particular the process of normalization.”
    “Given a good design, a relational system can be extremely flexible and, I believe, will outperform a database model such as that used in the mtx software. A properly set up relational database will have a number of primary relations that are linked by co-relations that consist primarily of keys. These provide extensive and flexible linkage between data items without duplicating any data elements. Further, with a good database design the ability to implement complex searches is enhanced. mtx-type databases are record-oriented, the logical organization within the record is up to the user and very flexible from that viewpoint. That trades off against the ability to manipulate the individual data categories that you gain in a relational system.”
    “A well designed relational system should be able to take the total pool of relations and generate a large number of different virtual or logical records from a common physical data pool. There is, in a sense, no single physical record, but a large, relationally linked data pool navigated in different ways by different users, who each have their own ‘views’ of the data. In my opinion, people who think that relational systems are too inflexible don’t understand their full power because they are still thinking about a single relation as the analogue of a physical record in a hierarchical system.”

Alan Melby believes that the ideal implementation of a termbase lies somewhere between traditional databases with fixed length fields at one end of the spectrum and completely unstructured text on the other. It consists of records of structured text with automatic validation and multiple indexed fields. He feels that hierarchical records are more intuitive than groups of relational tables and more closely correspond with the natural form of terminological and lexical data.
    Melby is not opposed to relational databases. Rather, he has had to be pragmatic. “Ten years ago, we were writing software for pc xts and we couldn’t fit a relational database into a 100 Kb TSR, so we adopted the structured free-text format. We have since extended this approach with data validation, automatic indexing, and Boolean search facilities, things which are normally only available with relational databases. The time may come, however, when relational databases are easier to set up and use.” A package like keyterm seems to be pointing in that direction. In the spirit of friendly competition, Melby would like to see both approaches pushed to their maximum capabilities and have the market decide.
    Whatever your philosophy, if your terminological data is TIF conformant, you can have it both ways: you can convert your terminology to either structured text records or complex relational database tables. With TIF, your data is not imprisoned within a given system. Because it is a comprehensive – and extendable – common format, TIF frees software engineers to implement the approach that they feel is best.

[ return | top | home ]