A Nose for Text


This article orginally appeared in the July-Aug 1994 issue of Language Industry Monitor

Will Oracle do for textbases what it did for the relational database?

”We’d like to see ConText become the equivalent of Dolby noise-reduction for the IT industry,” says a confident Brett Newbold of Oracle’s Text Server Division. If it is up to Newbold and colleagues, you won’t be finding ConText built into the operating system of every PC, as the analogy with Dolby would suggest, but rather it will be a standard interface that will come between you and that vast world of largely text-based digital information out there. This past May, Oracle announced the “immediate availability” of Oracle ConText, a “revolutionary” text analysis system running under Unix and Windows. ConText is currently integrated with Oracle Book 2.1, an online document viewing tool, for which it generate “back-of-the-book” indexes as well as summaries and synopses of documents.
    Newbold states that ConText is a fully application-independent linguistic technology whose output can be applied to dozens of applications, such as filtering, routing, indexing, and automatically creating hypertext links. Initially, ConText will not be offered as an end-user package, as befits a development tool, but rather it will be licensed to a small number of Oracle strategic technology partners, such as Dow Jones and University Microfilms, both major sources of text,based data. If ConText were deployed on top of something like DowVision, the online news service that Dow Jones currently offers, we might soon see the rise of personalized news deli very services. Another possibility for Oracle is an Internet,based service. “Oracle,” says Newbold, “is very interested in the Internet these days.”
    ConText will also be at the heart of Oracle’s forthcoming TextServer3, what Newbold describes as the “the first commercial information retrieval system which uses full parses for indexing full,text and is designed from the ground up to exploit massively parallel processing.” ConText is based on a linguistic database of English originally conceived by Kelly Wical of Artificial Linguistics, a company of which Oracle acquired the assets several years ago. The ConText database contains six-hundred thousand entries with up to one thousand attributes apiece, and represents a three hundred person’years of labor, making it one of the major linguistic engineering efforts of our time. ConText is not the first commercial product based on this linguistic database; Oracle has also been marketing CoAuthor, a “terminology and style manager” for Interleaf, based on this technology.
    For years, machine translation as been regarded as the Holy Grail of language processing and correspondingly been the subject of inordinate attention. ConText is just one of the signs that the burgeoning field of information retrieval in fact may have more surprises in store the over the next few years.

Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065, USA; Tel: + 1 415 506 3188, Fax: + 1 415 506 7103

COPYRIGHT © 1994 BY LANGUAGE INDUSTRY MONITOR

[ return | top | home ]