CSLI Publications logo
new books
catalog
series
contact us
for authors
order
search
LFG Proceedings
CSLI Publications
Facebook

Robust PCFG-Based Generation using Automatically Acquired LFG Approximations

Aoife Cahill and Josef van Genabith

Abstract

Wide coverage grammars automatically extracted from treebanks are a corner-stone technology in state-ofthe-art probabilistic parsing. They achieve robustness and coverage at a fraction of the development cost of hand-crafted grammars. It is surprising to note that to date, such grammars do not usually figure in the complementary operation to parsing: natural language surface realisation. Bangalore et al. (2001) investigate the effect of training size on performance while using grammars automatically extracted from the Penn-II Treebank for generation. Using an automatically extracted XTAG grammar, they achieve a string accuracy of 0.749 on their test set. Nakanishi et al. (2005) present probabilistic models for a chart generator using a HPSG grammar acquired from the Penn-II Treebank (the Enju HPSG). They investigate discriminative disambiguation models following Valldal and Oepen (2005) and their best model achieves coverage of 90.56% and a BLEU score of 0.7723 on Penn-II WSJ Section 23 sentences of length <=20. We present a new architecture for stochastic LFG surface realisation using the automatically annotated treebanks and extracted PCFG-based LFG approximations of Cahill et al. (2004). Our model maximises the probability of a tree given an f-structure, supporting a simple and efficient implementation that scales to wide-coverage treebank-based resources. Sentences of length <=20 achieve coverage of 95.26%, BLEU score of 0.7227 and string accuracy of 0.7476 against the raw Section 23 text. Sentences of all lengths achieve coverage of 89.49%, BLEU score of 0.6979 and string accuracy of 0.7012. Our method is robust and can cope with noise in the f-structure input to generation and will attempt to produce partial output rather than fail.

pubs @ csli.stanford.edu 
CSLI Publications
Stanford University
Cordura Hall
210 Panama Street
Stanford, CA 94305-4101
(650) 723-1839