Seminar: Martin Volk, Stockholm University

Details Title: Cross-Language Information Retrieval, Parallel Treebanks and Machine Translation
Speaker: Martin Volk, Stockholm University

Locations and times

This seminar will take place at the following locations and dates:
  • Sydney
  • Date: 15 January 2007 at 11am
    Location: Building E6A, room 357, Macquarie University, North Ryde, Sydney
    Contact: Rolf Schwitter, rolfs@ics.mq.edu.au
  • Brisbane
  • Date: 22 January 2007 at 2pm
    Location: Queensland University of Technology, Gardens Point Campus, Room S524
    Contact: James Hogan, j.hogan@qut.edu.au
  • Melbourne
  • Date: 29 January 2007, 2pm-3:30pm
    Location: University of Melbourne, Alan Gilbert Theatre 1
    Contact: Tim Baldwin, tim@csse.unimelb.edu.au

Summary

This presentation deals with different aspects of multilingual language technology. We start by summarizing our work in a project on Cross-Language Information Retrieval in the Medical Domain. In this project we have evaluated different means of bridging the gap between German queries and English documents and vice versa. We worked with a parallel collection of medical abstracts in the two languages.

The combined research on such parallel corpora and on treebanks has recently led to parallel treebanks. A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated documents. In addition, the syntax trees of two corresponding sentences are aligned on a sub-sentential level (word and phrase level). Parallel treebanks can be used as training or evaluation corpora for word and phrase alignment, as input for example-based machine translation, as training corpora for transfer rules, or for translation studies.

We are developing a German-English-Swedish parallel treebank, with texts from financial documents and from a novel. We will report on our methods and tools for building the monolingual treebanks in the three languages and for aligning the corresponding units on the word and phrase level.

In a related project the Computational Linguistics Group at Stockholm University has joined forces with a leading subtitling company in building a system for the automatic translation of film subtitles from Swedish to Danish. The company has provided a wealth of already translated subtitles, and our group builds a translation system to re-use and re-assemble the previous translations at various levels of granularity.

A first prototype has been built and produces good results. The output will be checked by a professional translator, but it is expected that at least a third of the automatically translated subtitles need not be touched. We will report on experiences with handling the large parallel corpus and the current status of the project.

Bio

Martin Volk has received his PhD from the University of Koblenz (Germany) in 1994. He has subsequently worked in Switzerland at the University of Zurich, the Zurich University of Applied Sciences, and at Eurospider Information Technology AG. Since 2003 he has been a professor of Computational Linguistics at Stockholm University (Sweden). His main research interests are in multilingual corpus annotation, cross-language information retrieval and machine translation.

Materials