This sidebar orginally appeared in the Jul-Aug 1992 issue of Language Industry Monitor In honor of Spain’s quincentennial, scholars and computer scientists are using new technologies to extend the life of old Spanish texts. “Throughout her history, Spain has lost magnificent libraries, such as that of Ferdinand Columbus (the son of Christopher), which today would be the best library in the world for the study of the medieval and early modern literature of the Romance languages had it not be been gradually destroyed by neglect,” writes Professor Francisco Marcos-Marín and colleagues in the preface to a general description of admyte. “Preservation, however, is not enough. We must also make available to the world at large that which has been preserved. Unfortunately, these two goals are incompatible: to make books available is to ensure their destruction. Modern technology, however, has given us the means to solve both these problems: to preserve the past, yet make it available.” To achieve this noble aim, the admyte projects plans to publish the entire corpus of Medieval Spanish literature on CD-ROM together with research tools to enable researchers to study these works. Admyte is being coordinated by the Industrias de al Lengua section of the Sociedad Estatal Quinto Centenario, a commission charged with the dissemination of Hispanic culture throughout the world in this quincentennial year. The main participants of admyte are the Biblioteca Nacional de España (Madrid), which has the finest collection of Spanish mss in the world, and Spanish software house Micronet, which specializes in CD-ROM publishing and storage & retrieval techniques. Assisting them will be linguists and computer scientists from Universidad Autónomo de Madrid, which houses the Spanish eurotra group and the ibm Scientific Centre, Universidad Complutense de Madrid, and the Universities of Berkeley, Wisconsin, and Toronto. The first CD-ROM will include a bibliography of Old Spanish texts, a dictionary of lemmata and forms, texts-mad, a collec tion of transcribed Medieval texts, and diverse programs for processing and studying these texts. Subsequent CD-ROMs will contain transcribed and tagged (tei-compatible) texts, such as Antonio de Nebrija’s Grammar and Dictionaries and Columbus’s navigation letters, facsimile images of documents, and Clarity-cd, Micronet’s retrieval program for texts and high-resolution images. Future plans include the digitalization of the National Library’s complete collection of mss and incunabula, using advanced image-processing techniques to improve legibility where required, and automatic transcription, using optical character recognition techniques. Marcos-Marín et al write: “The techniques we are developing for the digitization of mss and the automatic transcription of incunabula can be applied not only to the language and literature of the Medieval Kingdom of Castile and Leon and its modern descendants, but also to languages from any corner of the world. These techniques will be placed at the disposal of scholars of any language, country, or historical period.” Under the auspices of the Sociedad Quinto Centenario, Marcos-Marín has also prepared a million-word corpus of spoken Spanish. Industrias de la Lengua Aravaca 22 bis, Madrid 28040, Spain Fax +34 1 535 0129 (See article that this sidebar accompanied) |