This article orginally appeared in the Sep/Oct 1993 issue of Language Industry Monitor Why there might be a Japanese MT system in your future. If there was proportionally the same level of MT activity in America as there is in Japan, a country of 130 million people, every major hardware manufacturer and a handful of adven-turous system houses would have an in-house MT development project. There would be at least twenty commercial workstation-based systems on the market, with another dozen or so lurking in the wings. The major players would be co-funding the development of an ambitious lexical resource for an important language pair. And a variety of organizations, such as the Library of Congress and one or two major broadcasting networks, would have developed their own in-house systems. While the many obvious differences between these two countries preclude pursuing this admittedly facile comparison any further, this exercise should nonetheless provide a crude indication of the level of interest in MT in Japan. There is probably no single country in the world with the need for translation—and, by association, machine translation—that Japan has, except perhaps Canada. It comes as no surprise, therefore, to discover that half of the world’s MT research is found on that densely populated archipelago. This country’s insatiable appetite for information, the commercial imperatives of its export activities, and its population’s comparative lack of foreign language skills drive the Japanese to build and experiment with MT systems relentlessly. The Japanese MT experience didn’t materialize out of thin air of course. Realizing that no one else was going to do it for them, the Japanese have been grappling with the sticky matter of mating their exceedingly complex language with computers for decades. The introduction of the first wapuro by Toshiba in 1978, with its sophisticated Kana-kanji conversion routines, has had a profound effect on Japanese culture, giving Kanji a new lease on life and producing a generation of wapurobaka (literally “wordprocessor idiots”), young office workers who are suspected of barely being able to write Japanese by hand. nec, Fujitsu, and others quickly followed Toshiba into the wapuro arena. It is now a ¥300 billion a year business; the market leader is currently Sharp, with a market share of twenty-five percent. With the wapuro, Japanese writing accoutrements vaulted in one leap from the pre-industrial age to the digital era. Few Japanese ever mastered the Japanese typewriter, an ungainly behemoth with two thousand keys and seven shift states. As if that wasn’t enough, the Japanese have also been singularly successful in exploiting beyond their borders the computers and peripherals they designed to handle their language. Having had the need for bit-mapped, high output resolution output technology, Japan unleashed twenty-four pin dot matrix printers on an unsuspecting world. In its quest to build a smaller wapuro, Toshiba more or less invented the laptop computer. And of course the fax was also a response to the language processing challenge. It likewise had a revolutionary impact on Japanese business (the Japanese telex system never caught on) while spreading like wildfire throughout the rest of the world. Kana-kanji conversion is a non-trivial task (current systems get it right about ninety percent of the time) and calls for morphological, syntactic, and even semantic analysis to ascertain the correct kanji character represented by a given string of kana characters. Japanese companies therefore acquired experience in this new field simply out of necessity. But this necessity has stood them well; these pragmatic origins have proven fertile ground for subsequent experimentation with more advanced language processing technologies like MT. This pragmatism may also go part way to explaining a more utilitarian, even opportunistic approach to MT than is found in the West. No doubt more than a few companies cherish the desire to replicate the smashing success of the wapuro. Unfortunately, many Western researchers disdain Japanese MT because of this utilitarian approach. For them, the only “good” MT research being done in the world is in the us, where it is protected and sanctified by the guard- ians of theoretical purity. That Japanese MT by definition lacks firm theoretical foundations is nonsense. Rather, Japanese MT developers are not dogmatic, something that cannot always be said of their American counterparts. As Kyoto University’s Makoto Nagao pointed out at the previous Summit in Washington, “[Japanese] manufacturers know well that a single linguistic theory cannot lead to a good MT system. They realize that a huge amount of language phenomena must be processed in an ad hoc manner. ” cmu’s Masaru Tomita once put it another way: “it is fun to design an MT system, but very hard to develop it into a fully operational environment. ” Today, a system like astransac proves that it isn’t where you start that matters but how far you get. astransac is based on the atn formalism, something out of vogue in academic circles, but years of tinkering, improvising, and a hundred thousand lexical rules later it is a working, commercial system. Likewise, the strength of the mainframe jicst system is probably the 500,000 technical terms in the system’s lexicon. Four or five years ago, you might have looked around Japan, seen the systems being offered, acknowledged their existence, but wondered: where, though, are the users? There was indeed a time when Japanese MT systems were “bundled with the mainframe. ” acquired for reasons of corporate prestige, or given away as presents, like baskets of ¥10,000 melons. Meeting a genuine Japanese MT user was once as rare as encountering a redhead on the Tokyo metro. Have times changed? Consider these examples: Nikkei Printing uses nec’s pivot and Sharp’s duet to translate 18,000 pages of computer manuals into Japanese per year. The Japan Information Center of Science and Technology (jicst) translates 15,000 abstracts and 70,000 citations per year with the Japanese-English system it developed in-house. And Mazda churns out 1200 pages of English automotive service manuals a month using ATLAS. At the other end of the spectrum, four hundred users each paid ¥50,000 within the past year to upgrade to version five of Bravice J/E, a pc-based MT system. The bottom line is that Japan is quietly acquiring formidable experience using MT. That boils down to exploring the constraints and compromises required to be able to exploit this generation of MT systems in areas where human translators are currently deployed and using creative thinking to discover new domains where materials are not being translated at the moment. But don’t be mistaken. Japanese MT developers are not getting rich; MT is still a long way from the mainstream of computing in Japan and in that regard does not differ from the situation in North America or Europe. Some suppliers are at the point where they are funding their ongoing development work from revenues generated, but they are all a long way from recouping their huge initial invest- ment. “Are you kidding?”, exclaimed Fujitsu’s Michael Beirne when asked whether his company had recovered its R&D investment in MT with atlas. “No way. But our company is in the telecoms business. We know that if we don’t stay with MT inevitably somebody else will do it. ” As Nagao pointed out at the last Summit in Washington, “manufacturers have already invested huge amounts of money and manpower and therefore cannot withdraw from MT easily. A manufacturer cannot stop their R&D efforts unless other competing companies do so at the same time. Dropping out of this competition spells defeat in the future big markets of the information society. ” Unless you suspect some miti-induced conspiracy or fit of collective madness the people in Japanese companies who are paid to think ahead do appear to be thinking ahead. They must see the logical outcome of Japan’s economic evolution, as this manufacturing giant moves from the industrial into the information age in which communications play a vital role. Ironically, as meticulous as the Japanese are in so many aspects of life, they appear remarkably tolerant about bad translations. Commented one Western observer, “in a country were very, very few people have a command of English, Japanese companies are desperate when it comes to export documentation and will put up with anything, as long as it looks like English. ” But the problem runs deeper. There is no technical writing tradition in Japan. Technical writing is not taught in schools or universities, possibly because students at that level are still mastering the complexities of the general language. It is not uncommon for Japanese companies to teach young recruits how to write. Moreover, the Japanese use ellipsis frequently, resulting in sentences without subjects or verbs. The implications for MT are obvious: lots of pre-editing. Nearly all MT suppliers try to impress upon the customers the sizable return they get by thinking in terms of MT-friendly texts. No deep secrets here. “Writing shorter and clearer sentences. ” says Fujitsu’s Kenji Sugiyama, “and ensuring that sentences have both a subject and main verb improve the quality of the MT output. ” He adds that writing with MT in mind has had some welcome side- effects: “the Japanese manuals have gotten better too. ” something that must have Japanese language purists wringing their hands in despair. Relates Toshiba’s Amano, “We used to say, ’Japanese hardware is great, the software is good, but the documentation is bad.’” But this is changing, he hastens to add. Not all Japanese MT systems have been created equal. While many of the systems have been developed by protégés of Nagao, the doyen of Japanese MT, the systems in practice do vary, from the nearly direct Pense‚ system of Oki to sophisticated, semantically-rich systems like Toshiba’s astransac and Fujitsu’s ATLAS. Other systems boast a wholly different lineage, notably newcomer LogoVista. Based on the linguistic theories of Harvard University professor Susumo Kuno, LogoVista is being developed by Language Engineering Corp. of Belmont, Mass. (usa), with the support of a Japanese consortium which includes Catena-resource, the developer of star. Today, there may not be one Japanese MT system that stands head and shoulders above the rest, but many of them do have noteworthy features and characteristics. While some originated on the mainframe, nearly all now run on workstations, albeit in some cases proprietary systems. Building (or downscaling) MT systems for workstations has brought with it some important gains; it also makes certain demands. The gains include better integration with existing document production software and, correspondingly, with the entire document production environment. Packages like astransac and Argo are now designed to work with industry standard publishing packages such as Interleaf and FrameMaker, important issues which can even outweigh linguistic matters. How- ever, this greater accessibility makes correspondingly larger demands on the developer in terms of the system’s front end the user interface. Japanese MT developers realize this and have directed considerable effort in this direction in recent years. The Matsushita system, for example, which has not been commercially released, is an machine- aided translation system proper, whose strong point is its manual translating facilities and sophisticated online bilingual dictionaries. Most of the other Japanese MT systems, including Hitachi’s hicats, nec’s pivot, and Catena’s star, offer at the very least rudimentary bilingual editing facilities. While initial prognostications (and over-eager sales pitches) may have led potential users to hope otherwise, the MT systems currently available in Japan are primarily (but not exclusively) suited to technical documentation; in that respect, the situation is again no different here than in North America or Europe. The suppliers are now quite candid about this, making explicit for which the domains their systems are suited. There are, after all, a lot of instruction and service manuals in the world. To help new users get up to speed, almost all of the suppliers offer supplementary lexicons for their systems, containing terminology for various technical domains. However, whether MT will ever become more than an extension of the manufacturing process is not at all certain. A frequent complaint of MT systems in general is that MT output is grammatical and correct but stated in a way that no human translator would express it. A very promising new technique which is receiving a lot of interest in Japan is Example-Based Machine Translation (ebmt). Very simply, it is based on the notion of using a corpus of bilingual phrases and sentences, together with a thesaurus system for substituting words, to generate translations. While it is unlikely that a system could be built solely using ebmt techniques, ebmt could be employed for both extending the coverage of MT systems into new domains and producing more natural sounding output. In the meanwhile, maybe we should simply get used to Japanese-style English: no articles, few determiners, and everything in the singular. Says Toshiba’s Shin-ya Amano, “since the Meiji Restoration, Japanese people have grown accustomed to English-sounding Japanese. As a result, we have a higher tolerance for less than perfect MT output. ” Is it now our turn? MT Summit IV The makers and shakers of the machine translation world converged this past July in Kobe, Japan’s answer to Seattle (or was that Rotterdam?) The tradition of a few short years dictates that the site of the biannual Machine Translation Summit rotate among Japan, Europe, and North America. This year, the circle was closed; the Summit returned to Japan, the site of the first Summit in 1989. Unlike previous Summits, notable for their diversity, this year’s affair was largely a showcase for Far Eastern systems. If you weren’t interested in Japanese MT or don’t believe in committee-run projects, you might have been disappointed, because there wasn’t much else. Many people stayed home; only sixty of the three hundred attendees came from outside of Japan. The Japanese Ministry of Trade and Industry (miti) picked up three-quarters of the Summit tab and we paid the piper; a significant portion of the program was dedicated to the big prestigious miti-funded programs: the Advanced Telephone Research (atr) program, the Electronic Dictionary Research (edr) project, and the Multilingual Machine Translation (mmt) project. Presentations on international joint projects dominated the ceremonies, but we can’t say we weren’t warned: the theme of this year’s summit was “International Cooperation for Global Communication. ” ATR The atr program was launched in 1987 and, having recently moved into brand-new quarters in the Kyoto area, has commenced a seven-year project in automatic interpreting, or, if you prefer, speech translation of spontaneous utterances. The chosen domain is “international conference registrations. ” Fifty international researchers are currently involved in the project, which is funded to the tune of ¥16 billion. The atr is also collaborating with Siemens and Karlsruhe University in Germany and the cmu in the us in cstar, the Consortium for Speech Translation Advanced Research. Each of the partners is responsible for the speech recognition and synthesis components for their respective languages. Lots of publicity was drummed up for cstar in January by a televised demonstration, which was widely broadcast. EDR The Japan Electronic Dictionary Research (edr) project officially winds up its seven-year development trajectory this year. The deliverables include 300,000 word mono- and bilingual lexicons (Japanese and English), a concept ontology (400,000 concepts), a co-occurrence dictionary, and Japanese and English corpora (250,000 sentences each). In a gesture of laudable solidarity, eight of Japan’s major MT players collectively funded thirty percent of the ¥14 billion put into the project, miti picking up the rest. However, most of the eight industrial partners have mature MT systems with large lexicons and it is not likely that in the immediate future they will rebuild their systems to take advantage of the edr. While the designers of new systems may welcome the basic bilingual core lexicon, the usefulness of the extensive and potentially valuable semantic coding to others will largely depend on the choice of semantic items that were used and the consistency with which they were applied. As one potential customer, Koichi Takeda of ibm Japan, judiciously expressed it in a paper presented at the tmi, “the edr provides lexical coverage for a huge number of words but not enough mapping specifications for phrasal or sentential represent- ations. ” While funded by a number of them, the edr was developed largely independently of its likely users, and hence it remains unclear how useful a resource it will actually be to them. MMT The ambitious mmt project is approaching the testing and evaluation stage, according to Susumu Funaki, the general manager of the MT System Lab at the Center of the International Cooperation for Computerization (ccic) in Tokyo, the coordinator of the project. Another miti-funded endeavor, mmt is an international collaboration between Japan, China, Indonesia, Thailand, and Malaysia to develop an interlingual MT system for the languages of these respective countries. The project is obviously a direct response to Japan’s position as the economic powerhouse of the region. Funaki points out that there aren’t many Japanese translators, though, who can handle those languages, fewer still with the prerequisite technical background. At the moment, much translation between these languages is done via English. The project teams have also been developing ocr systems to skirt the input problems endemic to this part of the world. Judging by the size of the dictionaries and rule bases, it is highly unlikely that the project will produce a broad-coverage MT system by the end of 1994, but it is at least a start and there are no alternatives available yet. Verbmobil Yet another big international project spotlighted in Kobe was Verbmobil, a long-term, spoken language project similar to atr that is being coordinated by the German Institute of AI (dfki). Verbmobil is being funded by the German Ministry for Research and Technology (bfmt) for the amount of dm 60 million for the initial, four-year phase; project leader Wolfgang Wahlster of the dfki was on hand to detail its aspirations. Wary of past failures, the consortium and its backers have set modest goals and have budgeted ample time (ten years) to accomplish them. The involvement of various industrial and academic research groups as well as the appointment of an illustrious advisory board would appear to serve as additional surety that this won’t go the way of Eurotra. According to Wahlster, the long-term goal of the project is to build a portable translation device which two speakers of different languages could use to communicate with at meetings. English will be used for the system’s internal dialog. The first two-year phase of Verbmobil is to result in a demonstration system with which a German speaker and a Japanese speaker, both with a reasonable passive knowledge of English, can make an appointment. Verbmobil is to track the conversation in English, and where the speakers lapse into their respective mother tongues it will spring to the rescue. There are obvious parallels with the work going on at the atr and, indeed, the two institutions will be collaborating. Dialog interpret- ation is probably the most difficult language processing task of all it makes text-based MT look easy and the Verbmobil project has met with some scepticism. It is appealing from a popular point of view it is easy for layfolk to visualize but raises expectations that linguists and engineers will have a difficult time satisfying. Stir into this mixture the additional complications of managing an international consortium and you have the potential for a grand disaster. However, from a commercial perspective, Wahlster suggests that industry is keenly interested in such interpersonal communication tools, more so than in such information assistance applications as arpa’s ight inform- ation project atis, and there is consequently a strong economic impetus to go in this direction. Retreating back from the future to the present, John Hutchins provided a worthy review of the major trends and directions in MT over the past few years. As the author of a definitive history of early MT and editor of MT News International, John Hutchins is the field’s unofficial chronicler and one of the most well-informed observers on the scene. He noted the beginning of a new era in MT at the end of the 1980s, with the approaches of the 70s being replaced by the unification and lexicalist formalisms of the 90s. The presentation of Yorick Wilks (now at Sheffield) segued neatly after that of Hutchins’s; Wilks took one important research direction, corpus-based statistical MT in particular the work at ibm’s Yorktown labs and discussed its lasting implications on MT as a whole. You might not believe in trigrams, warned Wilks, but don’t be dogmatic. Gather your successes where you can find them. As a cautionary tale, he pointed out that certain distin-guished personages in the linguistics world did very badly in arpa’s tipster conferences, while those participants shameless enough to cast theoretical allegiances to the wind did much better. Muriel Vasconcellos supplied a welcome air of timeliness to the proceedings with her survey of MT use. Vasconellos had nearly eighty respondents to a survey of major users of MT in North America, Europe, and Japan. They provided some useful information, including systems being used, volume, and materials translated. Yes, the bulk of the material is technical manuals. Vasconcellos totalled the annual translation volumes her respondents supplied; this amounted to 680,000 pages a year, and she estimated that the total annual volume of MT might be in the vicinity of 1.2 million pages. The question that everyone now wants to ask is almost impossible to answer: what proportion is this of the total translation volume? Probably less then five percent, but just how much no one knows for sure. In any case, whatever your personal opinion of MT, you can’t argue that it is not being used or does not exist. If you came to Kobe from Mars with the hope of discovering for yourself the state of MT on planet Earth, you would probably think that MT was a Japanese invention, but something that Americans like to talk about, and something Europeans like to dream about. Granted, there weren’t many Martians in Kobe, but there were people who didn’t attend the last Summit and did come to Kobe hoping for an overview of what was going on in this field. You can argue about what an MT Summit is or should be, but if it isn’t the right forum for learning more MT here and now, what is? However, glaringly absent from the program was attention to any of the new systems now going into operation or shortly to be deployed. The panel discussion “International Cooperation” was ninety minutes that could have been better spent detailing practical issues concerning such new systems as cmu’s Caterpillar system, which should be in operation by the beginning of next year, Kielekone Oy’s Finnish-English system, likewise ripe for deployment, or even Eurolang, whose begetters may lack finished goods at the moment but offer manifold compensation in terms of chutzpah. While the technical basis of ibm’s lmt was detailed at earlier Summits (as, for that matter, was cmu’s kant and the Kielekone Oy system), ibm has been using the system extensively in-house in Europe, and it might have been interesting to hear more about that. Margaret King’s panel on evaluation would likewise have immensely gained in relevance had CompuServe’s Mary Flanagan, present at the conference, been included in the proceedings. Flanagan has just been through an extensive evaluation process to select an MT system for online use, a potentially important breakthrough for this industry. With a paucity of here-and-now, this year’s Summit was more ceremonial than substantial. A Translation Factory Tokyo translation company ibs has built an assembly line around nec’s pivot system. It’s one approach to staying competitive. >The translation business, like many Japanese service industries, has been hit hard by Japan’s recession, the first in living memory for many young Japanese. “Translation is the first to go. ” says Keizo Sakurai, director of ibs, a translation company in the Tokyo suburb of Hachioji, about an hour from the Shinjuku train station in Tokyo. Sakurai says his company’s current monthly volume is two to three thousand pages, down from twelve thousand pages a month two years ago. Most of ibs’s business is English to Japanese; the remainder is Japanese to English. ibs was founded ten years ago by Sakurai, himself a translator who started his career many years ago translating the good old way by dictation. While Sakurai’s company may share the woes of the industry as a whole, ibs is markedly different in one way: ibs is one the few translation companies in Japan completely committed to MT. Everything the company translates (except legal materials) is run through its MT system. ibs offers fast turnaround and consistent results at competitive prices. Sakurai likes to think of ibs as the “MacDonald’s” of translation. Sakurai is a brave man: over the years, he has experimented with a variety of MT systems within his company, including Bravice’s MicroPak, Oki’s Pense‚, and Sharp’s duet. These have each required investments in the respective proprietary pcs required to run them (until the advent of dos/v, a surprisingly common phenomenon in Japan), and, indeed, the Pense‚ box can still be seen gathering dust in a corner of the ibs offices. Moving from one system to another has meant discarding the user dictionaries; with each system, Sakurai has started from scratch. Three years ago, ibs made the move to nec’s pivot, the system the company currently uses. Sakurai took the daring decision to commit his company to using MT solely. That meant, among other things, moving out of downtown Tokyo and hiring full-time translators, very much a rarity for Japanese translation companies, which traditionally rely almost completely on freelance translators. “I realized we could no longer continue working with freelancers. ” says Sakurai. “For MT, we needed to have a steady team of in- house translators. And that meant we needed to have more room. Office space here in Hachioji is about a fifth of what it costs in Tokyo. ” Ibs compensates for its being physically distant from its customers, largely in downtown Tokyo, by telecommunications facilities. The fax, of course, is central; it is the primary vehicle for sending and receiving translations. ibs has one of the still uncommon Group IV faxes and the requisite isdn link. Not all customers have G4 faxes, but those that do make the lives of Sakurai and his colleagues that much easier. To render source language materials in machine tenable form, incoming texts both Japanese and English are processed by pc-based ocr software and converted to text files. Unfortunately, this is not an automatic process; it still requires manual intervention to block out paragraphs and clean up the ocr output. With a resolution 400 dpi, the G4 fax is obviously far preferable to standard fax copy for ocr purposes. For outgoing translations, Sakurai likes to think of the G4 fax as an “on-site printer. ” thereby going a long way towards obliterating the physical distance between customer and supplier. Once installed, the G4 fax is economical, too. Transmitting a page in less than ten seconds, it brings transmission costs anywhere in Japan to ¥10 per page. ibs has invested heavily in infrastructure; alongside the G3 and G4 faxes its compact quarters are packed with computers and peripherals, with a Novell network connecting everyone, including the company’s well-equipped, Mac-based dtp department. “Originally, we would have liked just to stick with translation. ” explains Sakurai, “but customers also required layout services. ” The pivot-based MT backbone consists of five Unix workstations and five X/Windows terminals; the ratio of one terminal to one workstation provides acceptable translation speed. The nec software provides what Sakurai considers to be the complete translation environment: a multilingual word-processing environment (texts in parallel columns aligned sentence by sentence), electronic dictionaries, and MT. ibs has put considerable effort into building up pivot dictionaries. Along with its own basic user dictionaries, ibs has developed thirty-one customer-specific dictionaries, bringing the total to 700,000 to 800,000 terms online. What this hardworking pivot system may lack in linguistic finesse it compensates for with ample terminological brawn. The company also maintains terminology databases in Lotus 1-2-3 running on laptops adjacent to the Unix workstations. These termbases are developed on the basis of terminology provided by major clients. ibs keeps the 1-2-3 termbases synchronized with the pivot dictionaries and uses them for verification purposes. For large jobs, ibs supplies its customers with a list of the terminology used; sometimes, says Sakurai, customers forget the terminology they’ve defined in the past and these 1-2-3 termbases help them refresh their memories. He says he’d like see an MT system one day that would generate a glossary of terms at the end of a translation for review purposes. A staff of six fulltime Japanese translators currently pre-edits Japanese source texts and post-edit Japanese translations. In addition, Ian Wilson, a Brit, is employed to clean up the English ocr output prior to translation and to post-edit the English texts, a fulltime task. During the transition to MT, all of Sakurai’s previous employees left, and now, for a variety of reasons, he prefers to hire people fresh from college or English vocational school, the chief requirements being a solid grasp of basic English and good typing skills. Typing has until only very recently been a rare skill in Japan, but now with the widespread popularity of the wapuro, it is becoming more common. Sakurai estimates that fifty percent of college students in Japan can now type. ”Experienced translators instinctively believe that they have most of the knowledge they need in their minds. ” explains Sakurai. “They don’t readily submit themselves to an MT system. ” Inexperienced translators lack this psychological baggage and don’t mind having the system do the bulk of the translation work. He believes his company compensates for its translators’ relative lack of experience with the depth and breadth of its pivot lexicons. With the assurance that the terminology is correct, post-editors need only concern themselves with cleaning up the grammar of the Japanese output (or input). “MT also changes the relationship between manager and employee in a positive way. ” says Sakurai. Rather than taking the adversarial stance of looking over a new recruit’s shoulder at his or more likely her work, Sakurai and his translator examine together how the MT system is doing and look at the problems it is having. ibs’s small team of translators help each other and give newcomers a hand getting up to speed. This kind of teamwork is “good for the company. ” says Sakurai. With the MT system as company knowledge base and a team of full-time translators wedded to it, Sakurai also feels more secure that his hard-won business won’t walk or get snatched away. Whereas Sakurai would hesitate to give the name of a customer to a freelance translator for obvious reasons, he is quite comfortable having his inhouse translators contact customers, providing them with a name, a telephone number, and a fax number with a translation job. He even has a cordless telephone in his office so that translators can call customers while sitting in front of their workstations. “Customers gripe about all the calls they get from us. ” he says, but it doesn’t faze him; the ends, after all, justify the means. While there may be better MT systems on the market, Sakurai says pivot satisfies a number of crucial requirements: it is multiuser (i.e., its dictionaries can be shared across a network) and it is fast enough to allow multiple users to do interactive, trial and error translation. While Sakurai liked duet and speaks highly of Sharp’s customer service, the system suffers from “weak hardware”; it only runs on the underpowered (for MT) Sharp laptops. As market leader, Sharp’s strategy may be more oriented towards developing a mass-market commodity product for personal use rather than the powerful server a company likes ibs requires. With Unix-based pivot, in contrast, nec addresses these needs. Most important, the ibs setup is extendable. Sakurai can double the number of translation workstations just by plugging additional workstations and inexpensive (¥500,000) X/Terminals into the network, something he hopes to do soon. Using MT in any context is a challenge but translating Japanese into English brings with it its own particular problems. ibs translators spend a lot of time pre-editing Japanese texts prior to submitting them to pivot. Sakurai says the lack of technical writing skills in Japan means many of the texts they have to translate are poorly written. Verbose and ambiguous, they are difficult for humans to digest, let alone machines. In contrast, Sakurai singles out ibm’s manuals. Written for worldwide dissemination in mind, they are admirably suited for MT. ibs also operates a translation service on pc-van, which is, with more than 600,000 million users, Japan’s largest online service. Unless requested otherwise, incoming texts are automatically processed by pivot with no manual intervention namely pre- or post-editing of any kind. Under the agreement with pc-van, texts are treated as confidential, so Sakurai has no idea about what kind of material is translated by pc-van customers. However, if the customer wishes, post-editing is available. The cost of raw MT is ¥350 per 180 words, post-edited MT is ¥1500 per 180 words, and rewritten MT is ¥4500 per words. Sakurai is very interested in high resolution output and inquires whether European translation companies are switching over to 1200 dpi plain paper laserprinters, for example. A healthy half of his business and many of the desirable larger translation projects are contracted out by printing companies. In Japan, translation is seen as part of the printing process, and indeed one of Japan’s largest MT users is Nikkei Printing. Like others in the translation business, Sakurai hopes the downscaling of printing technology will offer him the possibility of gaining more control of the documentation production process. While ibs is set up to handle big projects, like last January’s thousand-page Chrysler Cherokee manual, much of ibs’s current translation load is ironically two- to three- page documents. “Two years ago, we turned that kind of work down. ” says Sakurai. “Now we are living from it. ” He estimates his company does three hundred such jobs a month at the moment. However, Sakurai feels that the company will be well positioned when the economy picks up. He believes he will be able to gear up for big projects much faster than competing translation companies. If and when the hefty contracts start coming in again, he says he can double or even triple capacity in two months, just by plugging more workstations and terminals in the network and hiring new help. This year, ibs is just break-ing even; Keizo Sakurai is biding time until things get better. International Business Service (ibs), 1-24-5 Koyasu, Hachioji, Tokyo, Japan; Tel +81 426 46 8801, G3 fax +81 426 60 7002, G4 fax +81 426 60 7021 Fujitsu Porting atlas to Unix workstations gave this venerable MT system a new lease on life. Launched nearly ten years ago, Fujitsu’s atlas is the oldest of the commercially available Japanese MT systems. The English-Japanese atlas I was introduced in 1984 and it was followed in 1985 by the Japanese-English atlas II system. In 1992, Fujitsu consolidated a number of advances with the introduction of the atlas S line; these run on Fujitsu’s S family of Unix sparc workstations and employ a combination of syntactic and semantic processing. The atlas S systems share an interlingual architecture, and Fujitsu has been working on other language pairs for atlas, including Japanese-Korean and Japanese-German, but these have not been commercially released yet. While it is hard to pin down just how much atlas is used (there are a lot of “dormant” mainframe licenses), one Fujitsu customer who uses atlas extensively is the automobile manufacturer Matsuda (Mazda). According to Michael Beirne of Fujitsu Public Relations, Matsuda has been using atlas to translate a vast amount of automotive manufacturing document-ation from Japanese into English. Matsuda has a factory in Michigan which was built by a Fujitsu subsidiary that specializes in building highly automated factories. After building one such factory for Matsuda in Japan, this subsidiary was contracted to build an identical one in Michigan. While the design specifications for the original plant were all in Japanese naturally the subsidiary nonetheless needed to subcontract out the actual construction to local contractors in the US, as is customary. Matsuda hadn’t anticipated this. Fortunately, another wing of Fujitsu could offer an MT system. Matsuda started using atlas for parts lists and texts of technical drawings and soon turned to technical specifications. With a bit of lexicon building, the company quickly improved the accuracy rate from sixty to eighty percent. Emboldened by this success, Matsuda then turned to the full documentation but discovered that it faced an enormous challenge. The original documentation was sprinkled throughout the original factory and much of it was available only in hardcopy form. This highlights a common problem faced by early users of MT in Japan: the systems were not integrated properly within the document production system. Fujitsu’s “smart” ocr systems were developed partly to help Matsuda prepare this document- ation for translation by atlas. Matsuda currently trans- lates approximately 1200 pages a month of automotive- related manufacturing documentation. The company estimates that MT reduces translation time from twenty-two minutes per page to half that. atlas is also available through Nifty-Serve, the Japanese counterpart of CompuServe. Nifty-Serve’s 550,000 users can send text to be translated by atlas, paying by the characters (¥1 per word E-J, ¥2 per word J-E). Additional post-editing services cost extra, depending on the service required (”rough” or “native- speaker” quality). Translating via Nifty-Serve has its limitations, most notably that users cannot build up or use special lexicons, and high-volume Nifty-Serve users are urged to license the software for on-site use. Nifty-Serve is operated by a joint venture partly owned by Fujitsu, which has the rights to exploit CompuServe “west of Hawaii. ” Fujitsu makes abundantly explicit what kind of texts atlas is suitable for. No surprises here; atlas, like most MT systems, is best at technical manuals and to a lesser extent depending on the language pair content scanning of technical materials. What it isn’t suitable for is correspondence, yet that is just what a lot of companies would like to use MT for. To answer this requirement, Fujitsu has developed a writing tool called ReadyPen which is designed to facilitate writing business letters in English. This Windows-based product includes a basic editor, an address database, 160 sample letters, a bilingual sample sentence database, and a basic Japanese-English dictionary. The package also includes Houghton-Mifflin’s grammar checker, which ags at least some of the agree- ment and article errors Japanese writers are prone to make while writing English. Fujitsu Laboratories, 6-1 Marunouchi 1-chome, Chiyoda-ku, Tokyo 100, Japan; Tel +81 3 3215 5236, Fax +81 3 3216 9365 Toshiba With astransac, Toshiba got it right: connectable, compatible, and configurable. Where to now? Having launched the Japanese wordprocessor revolution in the late 1970s, it was only natural that Toshiba should join the MT race. And, indeed, 1985 saw the introduction of the Japanese-English astransac followed by an English-Japanese version five years later. As Shin-Ya Amano, a Chief Research Scientist at Toshiba’s Kawasaki R&D Center, points out, Toshiba has been involved in nlp research for many years, since 1971 to be exact. Back then, nlp efforts were hampered by the lack of powerful computers, but today that is no longer a problem. astransac runs on Sun workstations and Toshiba’s compact sparc-based laptops. The latter may be small but they pack plenty of power: up to sixty mips. While these laptops may be portable and self-contained, the astransac software is not designed to be an island in the office. The package includes built-in ocr software which uses the system’s MT dictionaries to improve recognition performance. In the all-important file compatibility department, astransac can process FrameMaker documents. While the first generation of Japanese mainframe-based MT systems may have been lacking in interactive qualities, Toshiba has expended considerable effort in refining the user interface of astransac, currently implemented in OpenLook. Like similar systems, astransac displays source and target texts aligned in parallel, sentence by sentence, in its multilingual text editor. If operated in interactive mode, the system will highlight ambiguities in the target text, such as the notorious prepositional phrase attachments, and the user can view the different readings. The Toshiba MT group has enhanced astransac with custom-iz-ation facilities that allow users to “personalize” the translation output by adjusting various default values. Going from English to Japanese, you can choose “polite” or “normal” style, “you” omission, and the default trans- lation for participial constructions: sequenced chrono- logically, causally, or temporally. Going from Japanese to English, you can choose between the passive or the imperative voice for subjectless sentences in Japanese sentences (quite common), default articles and determiners, and default tenses for verbs. astransac has a basic lexicon of 50,000, technical lexicons for several fields, and space for up to 200,000 terms in its user dictionary. According to Amano, the astransac basic lexicon has been fine-tuned with 100,000 lexical rules. Toshiba’s astransac is a mature, well-established MT system by any standard and it enjoys a good reputation in the MT world. There are an estimated three hundred astransac licenses (mostly E-J) at two hundred sites in Japan, such as Mitsui and Company (a large trading company) and the Japanese subsidiary of Unisys. astransac sales and marketing is handled by a department in a remote part of the company, and Amano and his team seldom have contact with users now. While the development work is completed and the system now enjoys a solid base of users, this does not imply that there is no room for improvement. Customization facilities and lexical rules go part way to making the system more exible and help broaden its coverage, but astransac, like other comparable MT systems, has reached the upper boundaries of what the current tech- nology permits. Amano suggests that they are at a stage where there is not much more they can do to the existing system and new techniques are needed if they want to improve its output quality or expand the number of domains the system can handle. Amano also points out that many of the problems they would like to solve are not grammatical. Such common phenomena as ellipsis, apposition, coordination, and insertion are difficult pragmatic issues which computational linguists have yet to solve. They are stumbling blocks on the path to better systems. Even the use of articles and determiners, which trips up Japanese MT systems, is, strictly speaking, not a grammatical issue but one of world knowledge. As part of their on-going research efforts, Amano and his colleagues have recently been looking into example-based techniques to try to improve the output of the system, possibly at the transfer stage. This approach requires a large bilingual corpus and they have been gathering materials from another division of Toshiba, a large-scale integrated chip producer. The latter has been supplying several hundred pages of documentation per month, the Japanese original and the (theoretically) corresponding English translation, for the MT researchers to work with. For Amano and his colleagues at Toshiba, not to mention other Japanese MT teams, this is new terrain; it is still a long way from being tried and proven. The way forward remains uncertain. Toshiba Corp., R&D Center, Communications and Information Systems Laboratories, 1, Komukai Toshiba-cho, Saiwai Ku, Kawasaki, 210 Japan; Tel +81 44 549 22 39, Fax +81 44 549 22 63 CSK Argo is a more recent arrival among Japanese MT systems. csk is a relative newcomer to the MT field in Japan and differs from its fellow MT suppliers in several ways. For one thing, csk is not a hardware manufacturer, an obvious advantage in a world increasingly dominated by the open systems philosophy. Rather, it is a large, international systems integration house (1992 revenues: ¥87 billion). For another, the company targets the area of internal correspondence with its MT system, Argo, instead of the traditional domain of technical manuals. csk sees an enormous potential for the application of MT between Japanese companies and overseas relations, particularly as Japan has entered a phase where it is actively exporting factories. A typical situation as envisaged by csk would be the communications between the Japanese headquarters of a company and its offshore subsidiary in a country like Malaysia. “For these operations, a lot of material goes back and forth. ” explains Kazuhiko Nishioka, director of csk’s Advanced Technology Division, “and although it’s rather technical, people usually know the subject. Moreover, it’s usually confidential. The output quality is not so important, unlike user manuals. ” The output would be offered without post-editing. For Nishioka, these constraints suggest a suitable domain for an MT system like Argo. csk’s first excursion into MT in the 1980s resulted in a system for producing English translations of the Nikkei stock market reports in realtime for distribution via the Nikkei Telecom network to Reuters and its customers worldwide. csk targeted as potential takers American and European companies interested in Japanese stockmarket activities. However, current events overtook the company; the system came online in September 1986, just a month before the notorious stock market crash of October 1986, and the expected interest for this service did not material- ize. csk launched its commercial package Argo, which runs on Sun Unix machines, in 1992, but once again csk found itself swimming against the tide; this time it was due to Japan’s current recession. Nonetheless, Argo is currently being used at ten sites in Japan. Nishioka readily acknowledges that the market for MT is still very undefined in Japan; development is not driven (yet) by user demand, a real problem. According to Tsuneo Noda, manager of the Machine Translation Department, Argo is based on a modified dependency model. It goes a step further than the pure dependency design at the heart of many Japanese MT systems with a close analysis of the relation of all the items in a sentence. For semantic analysis, csk has settled on sixty semantic primitives, which offers a tradeoff between sophistication and ease of dictionary updating. csk is obviously aware that the latter is a keen issue from the user’s perspective, and the user interface of Argo has become increasingly important to the system. csk now offers two different Argo packages: Argo Pro, the original bare-bones system, is easily extendable and designed for embedding in other systems but has austere interface, and Argo-I, which works with Interleaf and with its attractive OpenLook interface provides a far friendlier environment in which to work. Interleaf is not yet widely used in Japan, but a Japanese edition of Interleaf 5 <sgml> was released this past August and there is great interest in it on account of its extensibility. The company also has Frame- Maker and HyperCard interfaces under development. csk clearly positions Argo not as a standalone technology but as an integral part of a larger document production system. Like many MT developers, csk has developed a sophisticated interface for entering words in its lexicon, but csk goes a step further by partly automating the procedure. During entry of the word, the system will prompt the user with such things as “Can you touch it?”, “Does it think?”, and “Can it read?” and it automatically switches on the corresponding semantic markers, thereby considerably lowering the level of linguistic knowledge required of potential users. An Argo license costs ¥5.6 million and includes three months of customization by csk engineers, who analyze a sample corpus of 200 to 250 pages of customer documentation to elicit a user dictionary for the system. Among other things, csk demonstrates to its new customers the advantages of writing MT-friendly texts. While csk has yet to meet with substantial commercial success with Argo, it nonetheless gives the indication of being thoroughly committed to MT. The company counts fifty people in MT development and Nishioka and Noda speak of other domains which they hope to target, in particular in-house business procedures. In targeting the domain of in-house documentation, csk efforts are hampered by the overwhelming popularity of the MT- unfriendly fax machine, but Nishioka sees companies switching to email when people discover they need to use information in different forms. For csk, the future lies in providing raw MT for materials which would not be otherwise translated manually. csk Corporation, 3F Nishi Shinjuku, Forest Building, 4-32-12 Nishi Shinjuku, Shinjuku-ku, Tokyo 160, Japan; Tel +81 3 3370 9600, Fax +81 3 3370 9691 MT Labs, Inc. MicroPak lives on. ”The launch of Bravice’s micropak nearly ten years ago made a big splash in Japan. ” recalls Akira Matsuo, president of MT Laboratories and ex-Bravice employee. “It was headline news in the Asahi Times. ” The pc- based, Japanese-English package went on to become the best selling translation system in Japan, with upwards of five thousand packages in circulation. Bravice later went on to acquire Weidner Communications, Provo’s other MT company, in 1984, publishing its Microcat system as Euro-Pak. In 1991, Bravice rather suddenly went bankrupt and seemed to disappear without a trace. However, Matsuo, together with fellow Bravice ex- employees Naoyuki Akiyama and Kazunori Nishimura, resurfaced. They recamped and established MT Laboratories in April of last year, buying the rights to the Bravice software. While MT Labs focuses on development activities, another company, Pacific Eye, has been established by another ex-Bravice employee and is responsible for marketing and end-user support. Last October, MT Labs brought out an upgrade, Bravice J/E version 5 and Matsuo says sales are running at eight to ten packages a month for the ¥485,000 package. He adds that some four-hundred users of earlier Bravice versions paid ¥50,000 to upgrade to version 5. As well as being an obvious source of much-needed income for the young company, this is also a useful indication of how much the package is actually being used. MT Labs offers thirty-one specialist lexicons for a variety of technical fields to enhance the J/E V’s basic 60,000 word lexicon, costing between ¥12,800 and ¥98,000. In addition to the familiar area of user manual product- ion, micropak also found particular favor in universities and hospitals, where people used micropak for producing English versions of scientific papers. “Writing papers is a big problem in Japan. ” explains Matsuo. Publishing in Japanese obviously severely limits the circulation of papers and would be futile in fields where English is the lingua- franca, yet few Japanese enjoy preparing English versions themselves. Although it might seem like a doubtful proposition for Japanese people to produce texts in English for publicat- ion, Matsuo says that many of the users know English fairly well and are even able to provide valuable feedback to the developers; however, they find writing in English time-consuming and an MT program, even if it produces less than perfect output, simply saves time. This would match the perception foreign visitors inevitably get in Japan that Japanese people sometimes have a rather good passive knowledge of English. Nonetheless, Matsuo acknowledges that output quality remains the biggest problem. In investigating ways of improving the quality, he is considering developing a version in which the grammar is tailored to the specific domain of scientific papers and offering it and a general version of J/E, the latter at a lower price. MT Labs also has an English-Japanese system under development, a considerably more ambitious product. Running under Unix (and possible os/2), it has a more sophisticated lfg architecture than its J-E counterpart and is being developed with the support of a new Hawaiian translation agency, Trans-Link, which hopes to begin using it sometime next year as part of its English-to-Asian language translation services. Trans-Link and MT Labs are collaborating on maintenance utilities and domain-specific dictionary modules for batch processing of translation for which, according to Matsuo, they have received patents. As befits a young start-up venture, MT Labs is keeping its fixed costs low for the time being; the company is currently quartered in Matsuo’s house in Niiza, about an hour from Tokyo. Here, the three developers can take advantage of the peaceful surroundings of this quiet suburb to concentrate on the tasks at hand. As the only small, independent company completely dedicated to MT devel- opment, MT Laboratories is unique in Japan. MT Laboratories, 4-19-43, Nobidome, Niiza, Saitama, 352 Japan; Tel +81 48 477 0639, Fax +81 48 482 0821 NHK Do subtitling and MT mix? One of the most intriguing applications of MT in Japan can be found at NHK, the Japanese Broadcasting Corporation. As part of an ongoing search to improve the speed and efficiency of translating foreign video feeds for Japanese news programs, NHK has extensively adapted and extended Catena’s star English-Japanese system for its own purposes (the system is now simply called the NHK MT system) and, since 1989, has been using it on an experimental basis (52,000 words in 1992) to translate a small segment of English-language television news every day. In the current configuration, a translator transcribes the spoken English, shortening and simplifying where necessary, runs it through a spellchecker, and feeds it, sentence by sentence, to the NHK system. The translator then post-edits the output where necessary and the output is re-synchronized with the video. While the original NHK system ran on a Sun 3, the NHK has recently developed a highly attractive Macintosh-based package for transcription and translation. Running on a powerful Mac Centris 650, this integrated system allows onscreen viewing of video, which facilitates the arduous synchronization process. According to NHK’s Hideki Tanaka, the system does not increase overall throughput substantially transcription of the English remains a difficult, laborious process and good transcribers are rare and expensive in Japan but the NHK remains committed to experimenting with the system. However, a major breakthrough may be on the horizon; US broadcasters may soon be offering “closed caption” services to foreign broadcasters like the NHK. If and when this occurs, this could dramatically increase the possibilities for automating the subtitling process and NHK’s long investment in MT may well pay off. The NHK also deploys its MT system for one of those classic domains for which like weather reports and avalanche warnings MT is so well suited. This is the translation in real-time of the Associated Press Economic News for internal use within the organization. In contrast to news broadcasts, which are virtually unlimited in their subject matter, the Economic News is a narrow domain with a fairly limited range of expressions (”Other late dollar rates in Europe, compared to late Monday... ”) and the NHK is able to exploit this fact successfully. To supplement the NHK’s basic lexicon of 120,000 entries, 400 fixed patterns have been coded into the AP system, and these cover nearly thirty percent of the incoming material, offering an accuracy rate of one hundred percent. If the system is not able to translate a piece of text on the basis of these patterns, it falls back to its domain-specific grammar, which has an average accuracy rate of seventy percent. If this fails, it in turn falls back on the general grammar, which averages only ten percent accuracy. The output is of high enough quality for internal use without post-editing. NHK, 1-10-11 Kinuta, Setagaya-ku, Tokyo 157, Japan; Tel +81 3 5494 2314, Fax +81 3 5494 2309 COPYRIGHT © 1993 BY LANGUAGE INDUSTRY MONITOR
|