Automatic discovery of translation collocations from bilingual corpora

Sergio Barrachina, Juan Miguel Vilar

We describe a method to automatically discover translation collocations from a bilingual corpus and how these improve a machine translation system. The process of inference of collocations is iterative: an alignment is used to derive an initial set of collocations, these are used in turn to improve the alignment and this new alignment is used to generate new collocations. This process is repeated until no more collocations are found. The final alignment and the set of collocations are used to train a translation model. We use a model that is based on finite state transducers and word clusters and has been modified to work with collocations in addition to single words. We present experiments in which we show that automatic collocations improve translation quality without prior linguistic information.

Keywords: collocations, bilingual corpus, machine translation

Citation: Sergio Barrachina, Juan Miguel Vilar: Automatic discovery of translation collocations from bilingual corpora. In R.López de Mántaras and L.Saitta (eds.): ECAI2004, Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2004, pp.571-575.

