Boeing’s Simplified English Checker | This article orginally appeared in the Jan/Feb 1993 issue of Language Industry Monitor Simplified English is a lot easier to read than write. So Boeing decided to give its hundreds of technical writers a helping hand. Contemplating a visit to Boeing Computer Services (bcs) might make you imagine driving south from Seattle to Boeing Field, where a winding path through immense hangars strewn with jumbo jet parts would lead you to Boeing’s nerve center in some far-off outbuilding. In reality, however, jumbo jets are assembled at Boeing’s Everett facility and a visit to bcs means traveling east from Seattle across Lake Washington to the monotonous sub-suburban sprawl of Bellevue, a long way away from the smell of burning jet fuel and the sound of roaring engines. Here, within the large cluster of office buildings which quarter bcs, the most significant item on the landscape is probably the enormous air conditioning installation which keeps the Cray supercomputers Boeing uses for aerodynamic modeling from losing their cool. “bcs dates from the mainframe era,” explains Boeing’s Rick Wojcik, a computational linguist in the nlp group, itself part of the bcs’s Research & Technology Division in Bellevue. “Periodically, the company asks itself whether they still need us. So far, we've managed to swim upstream,” he comments non-chalantly. Concerning their nlp efforts, Wojcik says that they have proven to be solid, strategic assets for Boeing. By skillfully promoting their achievements and abilities, group leader Jim Hoard has piloted the nlp group through the recurring budget cuts endemic to the aerospace industry; as a result, the group is intact and flourishing. Aerospace English Wojcik briefly reviews the ongoing projects with which the nlp group is involved, apologizing in advance for the onslaught of acronyms. The English Test Language (etl), Process Planning English (ppe), and Airworthiness, Reliability, and Maintainability (arms) projects are all clearly related to the processes of building and maintaining aircraft and involve, broadly speaking, marrying natural language interfaces and plain-English programming to knowledge-based systems. Another, more military-oriented project also underway is the Real Time Information Management System (rtims), a message-processing system for battlefield command and control. rtims is partially funded from Defense monies received by Boeing’s Defense and Space Group. The nlp group’s most important achievement to date, however, is unquestionably the Simplified English Checker. The Checker, in shop-floor jargon, has been “in production” since April, 1990. It was developed by the group to help Boeing technical writers adhere to Boeing’s contractual commitment to European airlines to produce its documentation in Simplified English (se), an international standard originally defined for the aerospace industry by the Association Européene des Constructeurs de Matérial Aérospatiale (aecma). English is the official language of the aerospace industry — Boeing’s manuals are not translated — and aecma’s Simplifled English is designed to make aircraft documentation easier to read, especially for non-native speakers of English. As Jim Hoard points out, this isn’t an abstract issue but a practical problem: the crash of an Airbus in France was thought to be caused by the improper latching of a cargo hatch by a foreign employee who may not have properly understood a printed instruction. Although se has the support of English-language-based carriers as well, Wojcik acknowledges that native English speakers tend to be less enthusiastic about se than non-English speakers. Too much to remember Pointing to a fat binder on his desk, Wojcik explains that se consists of an extensive series of rules and restrictions ranging from permissable grammatical structures to sentence and paragraph length. Its most important feature is probably the prescribed base vocabulary of about a thousand carefully selected words. Generally, there is only one word for a given concept and a word can only be used in one way. The word spring, for example, is acceptable only as a noun, not as a verb. Aerospace manufacturers are allowed to augment this general vocabulary with designated nouns and verbs for specific technical names and processes. Boeing has defined a company-specific technical vocabulary of about 2700 words for se. “It’s nearly impossible for a technical writer to memorize all the rules of se,” explains Wojcik. “But we’ve found that this is something that we can quite successfully automate with a computer.” The Boeing Checker, which was based on work done for a natural language understanding project and had a gestation period of just one year, is built around a syntactic analyzer containing a tokenizer, a lexicon, a parser, and grammar containing more than 350 syntactic rules for English. It is sensitive to the distinction made in the documentation between procedural text, descriptive text, and notes. In procedural text, for example, only the active voice is allowed and the maximum sentence length is twenty words. Descriptive text has slightly relaxed requirements in these respects, allowing for sentences up to twenty-five words of length and no more than one passive per six sentences. Low noise Even though it is limited to syntactic analysis, the Checker does quite well. “We can parse, on average, ninety percent of the sentences in a text,” says Wojcik. “Of these analyses, only about ten percent contain inaccuracies.” Sensitive to the low threshold of user annoyance, they adopted a minimalist approach on error reports, keeping “nuisance errors to a minimum.” The Checker was developed in close cooperation with Boeing Customer Services and Wojcik credits this as an essential factor in its success. “We’ve had to learn how technical writers work, especially the editorial cycle. We learned a lot about the pressure of schedules, the agony of trying to “nd the right phrase in se, the diverse backgrounds of the writers, relationships between technical writers and engineers, and so on.” As it is currently configured, the Checker runs on unix workstations and operates in batch mode. A writer uploads a text for analysis from a mainframe and receives an error report in return. Some three hundred Boeing technical writers now use the Checker to prepare many thousands of pages of documentation per year. The Checker is only used within Boeing; it is regarded as a strategic asset and not offered to third parties. Boeing has run an estimated 2.5 million sentences through the Checker since it went into production. Shoot the messenger Paul Montague, of Boeing Commercial Airlines Group’s Airplane Maintenance and Repair Engineering, is the author of the Checker’s user manual and functions as the editorial liaison between users and developers. Asked about the attitude of writers to se and reception of the Checker, Montague replies, “most writers don’t really like se,” and adds, to a chorus of groans from the nlp group, “thus, they don’t really like the Checker.” Nonetheless, it appears that most Boeing users are satisfied with the output report produced by the Checker. “Some would like immediate feedback,” he says, “but that’s not possible for our group, even though the Checker has the ability.” Montague estimates that a bare minimum of eight hours training is required to bring a writer up to speed with se. “Users need to understand the intent of se and receive some instruction in writing se before they will be able to understand the reports the Checker returns.” At a certain point, however, the se checker becomes a useful teaching aid for writers unfamiliar with se. Before the Checker had been developed, the Boeing technical writers had a long wish-list of things they wanted to have in one. Now that they have been using it for more than two years, are there any major improvements they desire? “No,” says Montague. “Most of the experienced users don’t feel that major improvements are needed. The one enhancement we are looking forward to is a temporary file that allows users to ’approve’ their own technical names to eliminate repetitive errors.” This capability is due to be installed in the production system sometime in 1993. Measuring success In explaining to an outsider the benefits Boeing reaps from the Checker, Wojcik and Hoard are keen not to stress specific savings in terms of time or money. “The Checker may not actually reduce the amount of time in the editing cycle, but it dramatically improves the accuracy and consistency of the output,” explains Wojcik. “Boeing is the only airline manufacturer with an independent check for se compliancy in its documentation. The Checker gives us a competitive advantage.” Hoard sees the value of the Checker not in terms of savings but rather of cost avoidance. “If you wanted to produce se compliant texts manually with the same level of consistency, you would have to double—more or less—the amount of money you spend writing them,” he says. “In real life, that wouldn’t happen; no one would double the number of writers to get the job done. Thus, it’s better to speak of cost avoidance and quality improvement than of actual monetary savings.” Now that the system is up and running, maintenance costs are extremely low: less than a person-year is expended on keeping the Checker running and making periodical user-requested enhancements. “At least one nlp effort is in production somewhere in the world which is a ’regular, solid citizen’ software system,” says Hoard, “and that’s at Boeing.” Real-world linguistics Hoard, like Wojcik, seems to relish the pungent industrial atmosphere of Boeing, notwithstanding the lack of jumbo jets in the immediate vicinity. Hoard came to Boeing in 1986 after years in academia; he is the co-author of a widely used textbook, Introduction to Phonology (1978). Here at Boeing, he is now able to test linguistic knowledge with real data, to address genuine problems. “For years, much of linguistics has been by definition the province of ‘armchair linguists,’” he says. “However, now, with the advent of the computer, we are forced to verify our theories and find out whether our rules hold water.” Within ten years, he estimates, you will not be able to find a linguist who is not sitting behind a computer. For Hoard, this implies studying large corpora of language but not simply collecting statistics. Rather, he is interested in developing a knowledge-based methodology which he can verify with corpora. He calls it the difference between an inductive and a deductive approach. “It wouldn’t interest me to build a device that produced something correct but not know why - the figurative black box,” Hoard explains. “To build a robust system, you need some level of self-awareness. You need repair and unknown-word strategies, for example. We don’t want systems which will simply give up and say: ”no parse.’“ Acknowledging the tremendous efforts involved and the challenges which still lie ahead, Wojcik sums things up by saying, “All things considered, nlp is an expensive, high-technology method for solving a problem. It justifies its costs only if you compare it to older methods that produced inferior results. We in the nlp field must learn not to place so much emphasis on saving man-hours or cutting overall manufacturing costs. We need to emphasize the effects of quality improvement on products such as maintenance manuals if we are to succeed in selling our work.” Boeing Computer Services, po Box 24346, Seattle, wa 98124-0346, usa Tel +1 206 865 2964, Fax +1 206 865 3844 COPYRIGHT © 1993 BY LANGUAGE INDUSTRY MONITOR
|