PolyDoc’s Docucentric Universe | This article orginally appeared in the Jan/Feb 1995 issue of Language Industry Monitor Combining workflow technology and sophisticated writing support tools is the specialty of this Maastricht/based company. Few new companies have impressed us as much in recent years as PolyDoc. Less than two years old, the company has developed an appealing, down,t-earth approach based on sound, anything but radicallinguisti.c principles and off-the-shelf software packages for tackling some of the obstacles which have prevented more widespread use of language technology within the documentation process. ”We see ourselves as a bridge between language technology and business,” says Huub Rutten, co’ founder of PolyDoc, based in Maastricht, the Netherlands. In essence, that means combining off, the,shelf lingtech products with docu’ment creation and management tools to create custom workflow environments. Pol yDoc has developed a methodology for building such systems. Called Integrated Document Production Architecture (IDP/A), it encompasses an extensive linguistic analysis of a customer’s documentation and’ workflow patterns, the development of customized interfaces, the integration of telecommunications services, and, last but not least, access to linguistic software. As PolyDoc’s other co-founder, Frank van Ruyssevelt, puts its succinctly, “a company with a terminology problem isn’t interested in a terminology database. They want a solution.” Not surprisingly, the growing number of companies which are turning to PolyDoc for its services are large concerns; Rutten says the criteria for suitable application of the IDP/A methodology within an organization include highly repetitive documentation, fairly high outlays for documentation, ie, more than NLG two million (US$ 1.5 million) per year, a large number of producers and consumers of text, and situations where style or control and verification of language are key issues, such as for marketing or legal reasons. In particular, Rutten sees document verification as looming as a huge problem for companies doing business in Europe. “New EU regulations on liability place the burden of proof on you as a company,” explains Rutten. “Errors in documentation can be fatal. In such cases, texts have to be checked line by line.” But above and beyond this particularly difficult problem, Rutten frequently sees documentation within large organizations which is virtually out of control. “Time and time again, I encounter large, well-known companies whose documentation is in shambles. Some will have saved all their texts in wordprocessor formats. Others will have big flat files with labels, or they’ve tried some kind of template system. Still others have tried full,text retrieval techniques or scanning and indexing. Yet they all have trouble retrieving managing documentation.” PolyDoc, which was founded at the end of 1993 by Rutten and Van Ruyssevelt {both professional linguists), got its start within the framework of a Eureka project (a program of nationally funded projects which involve more than one European country). Its partners were the British Computer Task Group, the Austrian MD, Philips in Maagdenburg, and the Dutch chemical giant DSM, and its purpose was to develop and apply a methodology for designing multilingual documentation systems, which later evol ved into IDP/A. The documentation system which PolyDoc developed for DSM (See sidebar ), within the Eureka project served as its launch and has remained, up to now, the largest project’ the company has under, taken and the most comprehensive. As Ruyssevelt says, “it incorporates just about everything we do.” With’the DSM system under its belt, PolyDoc has gone on to attract’ other customers, including the Dutch insurer Algemene Burgerlijk Pensioenfonds (ABP) and, more recently, British broadcaster Carleton. At the basis of PolyDoc’s approach to building documentation systems is a careful study of the production process. “That means going back to basics,” says Rutten. “You have to look at what people do — and how they work. Rutten dispenses the conventional notion of a “user” as being less than useful. “What is a user?”, he asks rhetorically. “It is not a single fixed entity. Users are many things.” Rutten prefers to distill this mythical user down into tasks and roles which are supported by various tools on a case-by-case basis. Coupled with this analysis of the workflow is a rigorous study of the documentation that is prod, uced. For Rutten, this text,based output is the heart . and soul of a company’s knowledge base. “Too often this knowledge’ base is literally a ‘soup,’” says Rutten. “There is no logical structure to it.” This calls for a rigorous semantic analysis such texts -based on sound linguistic principles. “A concept for us is a text corpus embodying an entity. Once we have a defined a concept by virtue of its use, we can then label it by means of a meta-language.” “For example, take a passage in technical specification which discusses the viscosity of a material. You might have a paragraph discussing this property in which the actual term viscosity doesn’t occur. You’re lost unless this passage is < so> labeled. Having a list of relevant terminology just isn’t enough.” PolyDoc’s secret for developing systems in which a meta’ language can be usefull y employed is creating end,user tools which solicit the needed information as the text is created — and verify it. An important part of the IDP/A analysis process is understanding the flow of documents through an organization, the nowadays frequently mentioned “workflow.” For PolyDoc, this encompasses the production process and lifecycle of a document, including a detailed profile of the various users and what their specific tasks are. The result is a table indicating the cost and time required for each step. The company then creates a front-end to help writers and other users create documents based on a clear understanding of the latter’s internal structure. PolyDoc builds its writers’ workbenches around off,the,shelf commercial software, glued together and augmented, where needed, by custom program’ ming by the company’s software engineering team in Hoofddorp. Rutten characterizes the PolyDoc workbenches as “a mix of editing functions and data logistics.” For the ABP, PolyDoc recently completed a workbench for preparing special claims related to World War II related disabilities, many of which are only surfacing now in peoples’ later years. Such claims comprise thick dossiers with reports from doctors, therapists, and social workers who need to evaluate. As it did for the DSM application, PolyDoc broke down the ABP document — in this case, the claim — into discrete semantic units and built a grid which the various evaluators fill in to write the report. The program knows which units are manda, tory and which are optional, and which are relevant to an individual, given their role in the process. It also prompts the user to enter keywords for the various semantic units where appropriate. To complete the offerings, PolyDoc incorporates a variety of writing aids: spell, and grammarcheckers, dictionaries. As the text is created, its terminology is verified and stored paragraph by paragraph — or even sentence by sentence — in a relational database (in this case Microsoft Access), which supports variable length fields. Along with text is numerical informa, tion or whatever other kind of data is relevant. For searching, managing, and producing printouts of documents, PolyDoc uses Folio Views, a full,text search and presentation package. Layout informa, tion for documents is stored in Folio Views and the program extracts the needed information from the relational database to produce reports. Rutten acknowledges that language technology plays just a small part in such an application, but he points out that when you look at the entire process, a problem like translation may be a relatively minor item in terms of time and money. In other words, bigger gains can be found by improving efficiency elsewhere in the documentation process. That being said, having the document workflow under control places a company in an excellent position to exploit more ambitious techniques, like high,end MT. “You can only exploit language technology if you under, stand the production process — the workflow,” says Rutten. Moreover, after an initial adjustment, users like the workbenches; it gives them a lot more control over their work. Because of this, Rutten likens it to a “cockpit.” “WordPerfect is not a cockpit,” he adds. While Rutten has focused on the development of IDP/A, Van Ruyssevelt has been building up Poly, Doc “LangTech Warehouse” by licensing a wide variety of commercial linguistic software packages. At the moment, PolyDoc has licensed more than seventy packages, including proofing tools, multi, lingual dictionaries, and translation aids. The main selection criterium, according to Ruyssevelt, have been ease of implementation, and that means primarily PC,based products. Rather than buy packages outright, PolyDoc negotiates licenses on a royalty basis. This works out well for both parties: PolyDoc can hereby limit its upfront investment while the software developers can look forward to ongoing revenue. PolyDoc installs simpler writing tools, like dictionaries and spellcheckers, on each machine locally. But for more ambitious programs, like Globalink’s PowerTranslator line, the company has set up a dial,up server and has built telecommunica’ tion links into the workbenches. Either way, users pay on the basis of use; if programs are installed locally, usage statistics are uploaded on a daily basis, and PolyDoc, in turn, pays its programs’ developers royalties. Ruyssevelt cites Inso (formerly InfoSoft) as one of Pol yOoc’ s most important suppliers; PolyDoc has licensed many of lnso’s writing tools. For its part, Inso, always on the lookout for linguistic resources, has shown interest in the data that PolyDoc is acquiring in the course of projects. PolyDoc has plans to open its langTech Warehouse to outside subscribers and is currently developing an telecommunications interface for this service. Surveying the language technology landscape, Rutten acknowledges the unfulfilled promise, yet is deeply sceptical of what he calls the “traditional reaction” to this, namely: “we just need more technology.” “With a technology push, we just won’t make it,” he says. Rutten is correspondingly doubtful of the efforts of funding agencies to stimulate advancements in this field simply by pouring more money into R&D. “If we wait for the language technology folks to deliver the goods, we still won’t be anywhere in twenty years.” “We’ve been asking ourselves for a long time why people haven’t been buying language technology. I think we’ve discovered the answer: if you haven’t solved the document process problem, you can’t exploit language technology. You have to have the architecture for it.” Perhaps most satisfying for Rutten the linguist is that IDP/A isn’t “just another theory.” Rutten: “We are now building working systems with it.” In fact, he adds, the actual implementation is irrelevant: “In principle, we should also be able to write this stuff in Cobol too.” Polydoc, Officebuilding ABC, Gaetano Martinolaan 59, NL-6229 GS Maastricht, The Netherlands; Tel: 043 821 574, Fax: 043 617477 COPYRIGHT © 1995 BY LANGUAGE INDUSTRY MONITOR
|