Future Tense for Euro Systran?


This articles was originally published in issue 9.3 of Language International.

Having eleven official languages with which to contend, there is no larger producer and consumer of translation in the world today than the institutions at the helm of the European Union. Perhaps the most familiar of these is the European Commission, the enormous administrative wing of the EU divided between Luxembourg and Brussels, but there is also the European Parliament, the Council, the Court of Justice, the Court of Auditors, the Economic and Social Committee and the European Investment Bank.

Because the EU is committed to “linguistic equality” among its member states, all of the legislation of the Union needs to be translated into the eleven official languages. But it doesn’t stop there. Countless calls for proposals, internal reports, and intermediary drafts need to be translated on an ad hoc basis, not to mention the need for live interpreting of meetings. Moreover, as new countries joining the Union and additional languages need to be reckoned with, the translation burden expands exponentially.

The major EU institutions all have their own internal translation departments. The largest of these is the Commission’s Translation Service, familiarly known as the Service de Traduction (SdT). With an army of some 1,500 full-time professional translators divided between Luxembourg (one-third) and Brussels (two-thirds), the SdT is the largest single translation organization in the world.

In the mid-1970s, at the instigation of a far-sighted EU official, Loll Rolling (now retired), the Commission made its first incursion into the world of machine translation (MT). The institution acquired a licence to exploit Systran, and commenced beefing up the system’s dictionaries with EU terminology and developing additional language pairs that the Commission required. (While sharing its origins, the Commission’s system, for practical purposes, has scarcely more than name in common with the eponymous commercial PC-based software developed and marketed by California software company Systran).

Since the beginning of this decade, use of MT at the Commission has soared, primarily due to the adoption within the organization of email, which simplifies submitting and retrieving texts, but partly thanks also to some judicious internal promotion. In 1996, some 220,000 pages were run through the system, making the Commission, volume-wise, faraway the most prolific user of MT in the world. The SdT accounted for slightly less than a third of this volume, with the remainder being non-linguists in the many administrative departments.

Up until now, funding for development and maintenance of Systran has been provided by DG XIII under the Multilingual Action Plans (MLAPs), but the success of the system within the organization placed the Commission in a quandary. DG XIII’s raison d’etre is funding research in telecommunications and language engineering, and it could no longer continue to justify subsidizing development of the system if it had truly passed from being an advanced research topic to a fully operational concern. Hence, in 1995, DG XIII announced its intention to phase out its support by the end of 1997, and over the past two years the SdT has been contemplating the way forward.

One option is for the SdT to allocate funds from its own operational budget to support the use of MT within the Commission. To determine whether this would be appropriate, an extensive feasibility study was undertaken last year, encompassing a user survey, practical experiments with in-house translators, an examination of legal issues, a market study, and a cost-benefit analysis.

More than fifteen-hundred users — both translators and non-linguists alike —- responded to the survey, providing a very detailed glimpse of the use of machine translation within the Commission. Open to bad news as well as good, the SdT also surveyed a number of non-users, people who for one reason or another do not avail themselves of the system. The results of the survey provide a unique picture of the use of MT within this vast organization.

Among its users in the administrative departments, the vast majority turn to MT for urgent translations that they might otherwise have sent to the SdT, for browsing, and for preparing draft versions of documents.
Within the SdT, some translators do not find MT helps them in their work, or remain opposed to the use of MT in principle, but many value the system’s fast turnaround, finding its vast terminological resources and its preservation of formatting to be important benefits. While post-editing machine output can be a chore, some translators, as Dorothy Senez of the MT Help Desk wryly notes, find consolation in the system’s sense of humor.

Of course, the system is not monolithic, and the quality of its translations varies greatly among the language pairs. At the moment, the French-English, French-Spanish, French-Italian, and English-French language pairs are considered by the users to be the best.

Feedback from the practical experiments carried out in the SdT shows that on average a time savings of thirty-five percent can be achieved, provided a number of conditions are met. These include the appropriate kind of documents, experienced post-editors, and prior preparation of MT dictionaries, with, to be sure, the actual results depending on the language pair.

The study acknowledges that SdT users and administrative users have different requirements, but the general consensus is that the primary value of MT lies in its immediacy — MT is fast. They also perceive the need for improved linguistic coverage as well as better promotion within the Commission.

So where does this leave the SdT with regard to the future of MT within the institution? While the exact details have yet to be hammered out, it appears that the SdT and DG XIII have reached a happy compromise.
The SdT’s cost-benefit analysis having demonstrated to its satisfaction that the MT system both directly and indirectly benefits the Service itself but also the Commission as a whole, the SdT will henceforth support the mature, operational language pairs.

DG XIII meanwhile has agreed to continue funding development of other language pairs, with the important caveat that this will only be undertaken with co-financing by the relevant Member States. In other words, a Finish-English language pair may be deemed a priority, but then the government of Finland would also have to prepared to partly underwrite the effort.

The Commission will be issuing calls for tenders for the maintenance of the most promising language pairs as well as for systems or services for languages not covered by the Commission’s system or for language pairs which are of lesser quality. Keeping its eye closely on developments in the commercial arena, the Commission could conceivably license a language pair not covered by Systran from a third party developer, should such a product become available.

By virtue of both its substantial internal translation requirements as well as its commitment to linguistic diversity, the European Commission is in many ways an exemplary test bed for linguistic technology, such as MT. As such, it is in the unique position of playing the role of both a user and a mover. What lessons can be drawn from the Commission’s experience by other, albeit smaller organizations?

For one, the MT development team and the MT user base (the SdT in particular) have enjoyed close proximity; feedback from the latter to the former has ensured practical results. Since few organizations can justify development of their own MT system (the Pan American Health Organization being an exception that comes to mind), this is admittedly an exceptional albeit pertinent factor.

In addition, the Commission has striven to integrate MT within the document flow of the organization. That means a substantial investment in the software engineering side of things — such as document format filters and integration with email — admittedly prosaic matters which have all too often been given short shrift by language technologists in the past. This task has been neither easy nor trivial.

Not least of all, the Commission has also expended tremendous effort building up the Systran dictionaries. The four top-rated language pairs boasted nearly 700,000 entries — and that was before the Eurodicautom data was imported. Currently, Systran has more than four million entries distributed across the sixteen extant language pairs. This is a lesson that applies to both small and large MT users alike.

As such, given its needs and its resources, it would indeed not have bode well had the Commission been unable to exploit this technology. As Dorothy Senez puts it, “if we can’t make it work, who can?”

Unfortunately, for interested parties outside the Union’s institutions, the Commission’s Systran, for the foreseeable future, is likely to remain intriguing but out of reach. In accordance with a deep-seated neo-liberal economic philosophy, access to Commission’s MT system is restricted to users within the EU institutions, so as not to distort the competitiveness of the free market.

In any event, Systran, which originated from research at the Georgetown University (Washington, DC) in the 1950s, looks poised to enjoy a rosy future well into the 21st Century. If language is the soul of a culture, then the soul of a unified but multicultural Europe lies in its multilinguality. Systran is an imperfect yet admirable acknowledgement of this.

[ return | top | feedback | home ]