CSLI Publications logo
new books
catalog
series
contact us
for authors
order
search
LFG Proceedings
CSLI Publications
Facebook

Parallel LFG Grammars on Parallel Corpora: A Base for Practical Triangulation

Gerlof Bouma, Jonas Kuhn, Bettina Schrader, and Kathrin Spreyer

Abstract

This paper presents an approach to annotation projection in a multi-parallel corpus, that is, a collection of translated texts in more than two languages. Existing analysis tools, like the LFG grammars from the ParGram project, are applied to two of the languages in the corpus and the resulting annotation is projected to a third language, taking advantage of the largely parallel character of f-structure. The third language can be a low-resource language. The technique can thus be particularly beneficial for corpus-based (cross-) linguistic research.

We discuss a number of ways to realize automatic corpus annotation based on multi-source projection, including direct projection and approaches with an additional generalization step that employs machine learning techniques. We present a series of detailed experiments for a sample annotation task, verb argument identification, using the German and English ParGram grammars for projection to Dutch and maximum entropy models for learning generalization.

pubs @ csli.stanford.edu 
CSLI Publications
Stanford University
Cordura Hall
210 Panama Street
Stanford, CA 94305-4101
(650) 723-1839