Europe's Language Technology PushThis article was published in the summer of 1999 in the EU-sponsored e-zine. Talking robots, thinking machines, computers that listen — such is the stuff from which many a science fiction tale is spun. In reality, language processing in one form or another has been a pursuit in the scientific world nearly as long as computers have existed. Spurred on by successes in decoding secret messages in WWII, researchers turned to ”decoding” human language with the computer but found that it was a lot more difficult than they had anticipated. Nonetheless, their efforts laid the foundation for the field of language technology, also referred to as language engineering, whose applications in day-to-day life are becoming increasingly common. Language technology encompasses a number of interrelated disciplines which all share in common the primary goal of processing both written and spoken human language. Language processing ranges from such familiar technologies as wordprocessing and document production to such ambitious pursuits as automatic translation, speech recognition, and handwriting recognition. These technologies vary in complexity and are therefore at different stages of commercial exploitation. Language technology is a field with tremendous potential — both economic as well as social — but there are still substantial obstacles which prevent it from becoming more widely used. It may be decades before we have the listening and talking computers popularized in science fiction; moreover, it may never be possible to build a computer that can ”understand” human language perfectly. Nevertheless, language technology, even in its inevitably imperfect state, is gradually being deployed in a growing number of applications. However, as Brian Oakley, former director of Logica, notes, “it takes great patience to achieve commercial success with language technology, but not all companies are willing to take the long-term view.” In Europe, with the emergence of the European Union and the resulting growth of intracommunal trade and commerce, both national governments and the European Commission have felt the need to develop language policies on various levels to cope with issues related to Europe’s multicultural, multilingual society. For reasons of social cohesiveness, Europe cannot allow one or two languages to prevail over the rest. This requires careful preservation of national, ethnic, or regional identity, of which language is probably the most crucial and complex element. As Edinburgh University’s John Laver points out, “language is the most central articulation of a culture.” In other words, discriminate against a language and you discriminate against the people who speak it. Europeans need look no further than the Balkan tragedy for a grim reminder of the dangers of ethnic isolationism and divisiveness. The twelve official languages of the EU (the eleven working languages plus Gaelic) are just part of the equation. There are also another ten or so indigenous languages in Europe, such as Welsh, Basque, Catalan, Galician, and Frisian, which also deserve commensurate support for social reasons. Lastly, there are also the languages spoken by Europe’s immigrant population, notably Arabic and Turkish. Moreover, the countries of Eastern European and the former Soviet Union are clamouring to join, and this will mean not linear but exponential growth of Europe’s already substantial translation requirements. By the beginning of the next century, the already unwieldy number of eleven official languages of the European Community may soon rise to twelve, fifteen, and maybe even twenty over the course of the next few decades as the Commission admits new members. Currently, the eleven working languages of the Community represent 110 pairs. By the middle of the next century, there could be as many as 29 languages within the Community, representing a mind-boggling 812 language pairs. The European Commission is itself probably the largest user of language technology in Europe. Many working documents within the Commission are routinely circulated in English, French, and German, while all official documents eventually have to be translated into all eleven working languages of the Commission. To service its substantial internal translation requirements, the Commission acquired rights to the Systran machine translation system in 1975, and it has continued to work on the system, developing a number of new language pairs. Systran is now widely used within the Commission. In the late 1970s and 1980s, the Commission funded the well-known Eurotra program, an ambitious project in which researchers from all twelve member states built a prototype machine translation system for the (then) nine working languages of the Union. This project created an important foundation for further work in this young field. Within the ESPRIT research program, the Commission also funded a number of speech and written language projects; these have also helped disseminate knowledge and expertise throughout Union (see sidebar on ELSNET). More recently, the Commission funded the Linguistic Research and Engineering (LRE) program. In short, Europeans attach great importance to the culture — and cultivation — of multilinguality, both in terms of technology as well as in the social sphere. Language processing is seen as one of the cornerstones of a broad range of applications in the domain of what the Commission likes to refer to as “telematics,” the exciting confluence of telecommunications and information technology within the rapidly evolving global society. ELSNET: Building Bridges in EuropeThe European Language and Speech Network ( ELSNET) is a good example of EU efforts on behalf of language technology. The network is composed of one hundred academic and industrial organizations from throughout Europe which are engaged in various facets of language technology, from basic research to commercial exploitation. ELSNET organizes workshops and tutorials and publishes an informative newsletter on the activities of its members. ELSNET, which is funded within the framework of the EU ESPRIT program, encompasses both text and spoken language processing, and indeed one of ELSNET’s goals has been to serve as a bridge between these two, somewhat distinct communities. Speech recognition is largely engineering-driven, while text processing straddles linguistics and other disciplines. As ELSNET’s director Steven Krauwer, of the University of Utrecht, observes, “the text community has tended to regard the speech community as ‘just’ a bunch of engineers, while the speech people tend to regard the text community as overly occupied with theoretical matters.” Another goal of ELSNET has been to bring industry and academia closer together. Because language technology requires a relatively high initial investment and long payoff, there has been a slow takeup of language technology from the research centers of Europe. Where there is a large potential market, such as for English-language products and services, commercial viability is gradually emerging, likewise to a lesser degree for French and German. But for the other official languages of the European Union, not to mention indigenous languages like Gaelic and Catalan or languages like Turkish or Arabic spoken by many people living in Europe, the situation is more problematic. A third sphere of ELSNET’s activities is in the geographical dimension, that is the dispersion of knowledge throughout the various member states of the European Union. ELSNET has members, both academic and industrial, from all the countries of the EU. Thanks to ELSNET and other initiatives of the European Commission, language technology has gotten its foot in the door in countries — in particular the southern European ones — where there hitherto had been no activities in this field. It is a difficult achievement to measure directly, but it is nonetheless clear that a new generation of computational linguists and speech researchers have grown up accustomed to meeting and collaborating with fellow researchers in other Union countries. More recently, as part of a separate project in the framework of the EU Copernicus program, ELSNET has also been extending its activities in the form of ELSNET Goes East, an effort to cultivate knowledge and interaction with the still nascent text and speech communities in the countries of Eastern Europe and the former Soviet Republic. The ultimate goal is a Pan-European network of speech and language. ELSNEThas an extensive World Wide Web site with links to its many members. http://www.cogsci.ed.ac.uk/elsnet/home.html |