Of Synonyms and Fuzzy Searches


This article orginally appeared in the Nov-Dec 1993 issue of Language Industry Monitor

New thesaurus-building and fulltext search tools are the latest offerings from IBM.

More of IBM’s vast linguistic programming resources are seeing the light of day. The latest offering is the first of its kind, a thesaurus development and maintenance package which runs under os/2, called IBM Thesaurus Management System. With a graphical programming toolkit, called Thesaurus Administrator/2, you can build a custom,made thesaurus for your organization — but one with more than just synonyms. The Administrator supports other kinds of relations, such as hierarchical or associative relations. You can even define relations of particular relevance to your organization, such as made-of. For using thesauri built with the Administrator, IBM will also be supplying a companion package, the Thesaurus End User System Toolkit/2. It retrieves information from the indexes and provides thesaurus data to applications via a fully documented API. IBM sees its Thesaurus Management system as a potentially valuable addition to document management, information retrieval, and translation support systems.

The first program written to use Thesaurus data is — not surprisingly — also from IBM. It is SearchMaster/2, a free, text search and retrieval package designed for standalone workstations and LANs. SearchMaster permits precise or fuzzy searches, with Boolean and proximity searches of words, phrases, compound words. Its wildcard functions allow for character masking, front, middle, and end-masking, and word-masking. The linguistic functions of SearchMaster are based on dictionaries for thirteen languages (the base product is supplied with two dictionaries, English and your choice of a second). A unique feature is that SearchMaster can perform multilingual searches if the appropriate dictionaries are installed and the texts are stored in one of the following file formats which explicitly identify national languages: WordPerfect, Microsoft RTF, and SearchMaster DSFT. According to project manager Sebastian Goeser, efforts are underway to render other packages — both of IBM and of third parties Thesaurus Administrator/2’compatible. Recognizing that it is a heterogenous world out there, IBM wisely supplies client versions of the SearchMaster software for Windows and DOS as well as os/2. The package has been localized for the following languages: English, French, German, Italian, and Spanish.
    As always, when IBM launches a linguistically sophisti, cated package, it supports lots of languages right from the start, a tribute to the concern’s extensive linguistic resources and parallel international development efforts. This is something its competitors can scarcely touch. However, IBM is a late arrival. pc,based full text retrieval packages have been on the market for years; packages like ZyIndex have enjoyed tremendous success recently as users grapple with ever expanding harddisks. With OS/2, IBM seems to be making great strides in the operating system venue. But in the application arena, you have to wonder: when will IBM become a leader again, not a follower?

Prices: SearchMaster ± US$300, Thesaurus Manager/2 administrator ±US$3000, toolkit ±US$150. Prices vary per country.

IBM Deutschland, German Software Development Lab, Hanns Klemmstr. 45, D-71003 Böblingen, Germany; Tel +497031 16 6567, Fax +49 7031 16 6736

[ return | top | home ]