This article orginally appeared in the May-June 1994 issue of Language Industry Monitor With Nokia Telecommunication’s decision to implement the Kielikone MT systetn in its document production process, the world is a commercial high-end MT system richer. Helsinki, Finland — One of Finland’s largest companies recently licensed its Unix-based Finnish-English machine translation system, it has sold twelve thousand hand-held dictionaries in the past several years, it licenses its Finnish morphology and spellchecker to major OEM suppliers, and its electronic dictionary packages are enjoying growing popularity in Finland these days. With close ties to both the translation industry and the research world, Kielikone has achieved a unique position in this country of five million people. What is the Finnish secret? After several years of evaluation and co,development, Nokia Telecommunications officially accepted the Kielikone machine translation system last fall and is currently implementing it within its docu, mentation department (See sidebar ). For the linguistic engineering community, this is important news, because it is (still) not every day that a new, high-end MT system is brought on,line in an industrial context. In Finland, news of the Kielikone system, which was formally announced in February, made a bit of a splash in the public eye as well: mainstream media coverage included appearances on TV (including one in which the “Key Deed of the Month” award was bestowed upon the company) and a feature article in the science and technology section of a major Finnish daily (written, fortunately, by Kielikone’s managing director, Harri Arnola). The Kielikone MT system is the result of an unusually close collaboration between Kielikone and Nokia. As Leo Kulikov, managing director of the MT group Leo Kulikov, explains, Kielikone built the general lexicon while Nokia compiled the domainspecific lexicon (for “information technology”). Strictly,speaking, the Kielikone system is domain, independent and is not “tuned” for the Nokia domain, but the system has been exhaustively tested on Nokia texts. Nokia took responsibility for the dictionary development tools because it had its own extensive termbases on mainframe systems which it wanted to port to the MT system. However, Kulikov adds that Kielikone has its own dictionary interface tool under development, Currently, users can only allow add nominal lexical entries (no verbs), but later this year Kielikone will start looking for ways of allowing users to teach and tune the system without actually touching the grammar rules. For development purposes, Kielikone has a carefully compiled fi ve’thousand sentence bilingual corpora which it carefully examines for changes when new grammar rules are added. The group grades the translation of each sentence as good, ugly, or bad. Says Kulikov, “grammar rules can be dangerous.” Nokia Telecom is Kielikone’s first genuine user, but the system has also been installed for test purposes at Trantex, a translation company and Rautaruukki Oy, a steel company. Both companies are in the process of evaluating the possibility of using the system for production, and are partly supporting the development of the system. Additional support for development comes from TEKES, a research agency administered by the Finnish Ministry of Trade and Industry. At the moment, the MT group is formally a part of Kielikone, although it is a separate cost-center within the company. However, there are plans to spin the MT group off as a separate company, possibly as early as this summer. According to Harri Arnola, Kielikone is also investigating the possibility of offering MT services in conjunction with a partner. He wonders whether such a service might not be able to address a “latent demand” for translations; texts which companies might consider translating if it could be done quickly and cheaply. ”A rough but ‘true’ translation of a patent, for example, might be sufficient, since such a translation would be carefully checked by translators and lawyers anyway,” Arnola points out. While delivering a high-end commercial MT system is an impressive achievement, Kielikone has been surprisingly successful in more modest arenas as well, namely with consumer and OEM products. M.O.T. is Kielikone’s line of bilingual electronic dictionaries available for the language pairs Finnish- English, Finnish-German, Finnish-Swedish, Finnish- French. The company supplies both “general” as well as “business” and “technical” dictionaries; a Finnish thesaurus is an additional option. M.O.T., which was developed entirely in-house, is available in DOS, Mac, Windows, and Unix versions. While Kielikone is now clearly a commercial enterprise, it was originally established as a research project in 1982 to study natural language database interfaces for Finnish; it was financed by SITRA, apublic venture capital fund. Early on, the project generated a tangible spin-off in the form of a high, quality morphological analyzer for Finnish, which made it possible to spellcheck the richly inflected Finnish language for the first time. Since its inception as a research project more than ten years ago, Kielikone has been guided by Harri Amola (formerly jäppinen), currently Kielikone’s managing director. While Arnola appears to have quite successfully piloted the group technologically and organizationally into the commercial sector, his primary passion remains the development of the Kielikone parser. He currently divides his time between managerial activities at the Kielikone offices and further refinement of the parser at home, safely ensconced away from ringing telephones. “I must admit,” says Arnola, “I’m very proud of the parser ,” which he describes as a “deterministic dependency parser.” While parsers for English and other Western European languages tend to be based on constituent models (ie, noun phrase, verb phrase, etc.), a dependency model uses the verb as a point of departure and builds up a structure around it, (ie, subject, object, indirect object, etc). Arnola says it is well suited for inflectional languages like Finnish (and, incidently, Japanese). He adds that once you have Finnish morphology licked, parsing the language comes easier, since so much grammatical information is encoded in the inflections. While many of his colleagues will argue that semantic information is required to improve the accuracy of parsers, in particular to resolve ambiguity, Arnola remains highly sceptical of that approach. “Semantics is a swamp,” says Arnola. “It brings with it a raft of difficulties: data representation problems and great complexity problems.” The Kielikone system does not attempt any semantic processing, although Amola points out that there is a “flavor” of semantics in the dependency model. For MT, Arnola feels pure syntactic processing can produce translations of sufficient quality. Owing to the idiosyncrasies of the Finnish language, the small market, and probably a few other factors, Kielikone has achieved a position within Finland which is difficult to find a parallel to in any other country. Does Amola have any insights to explain the company’s success? “You need a well thought-out theoretical basis for applied computational linguistics, such as spellchecking, morphology, or MT ,” he says. “But you also have to keep implementation in mind early on. Certain theoretical decisions can have dire consequences for efficiency.” Because he was originally trained as an engineer, Amola says he has an ingrained awareness of limited resources. “An engineer only has a limited amount of concrete with which to build a bridge,” he says. The physical world has natural constraints; in the digital world such constraints are less visible but they nonetheless remain. Previous MT projects, he points out, started out with strong theoretical foundations, but they did not keep efficiency in mind, and hence this became a major obstacle to building a working system. Finland is set to join the European Union in January, 1995. Does Arnola foresee any useful opportunities in terms of participating in EU-funded research programs? Arnola waxes a bit hesitant. “I don’t know — all that paperwork,” he sighs. But surely there is paperwork attached to public funding in Finland? “Yes, but it is nowhere near as bad.” A final thought: maybe Kielikone’s secret weapon is that not having had to rely on Euro-funding, its researchers did not have to spend months writing proposals, they did not have to travel to meetings in various parts of Europe, they did not have to show their faces in Luxembourg once in a while, and they were not required to work with(geo’)politically correct partners. Instead, they could get some work done. Prices: M.A.T., FIM1700 (±US$280) a single language pair; additional two,way lexicons are FIM850 (±US$140); technical and business lexicons, FIM1450 (±US$240) Kielikone Oy, Vattuniemenkuja 4, PL 126, Helsinki, SF-00211, Finland; Tel: +358 0 6820 211, Fax: +35806820 167, Email: kkoy@kielikone.fi |