A Spanish-Catalan Translator Using Statistical Methods

Jesús Tomás, Francisco Casacuberta

The development of a Spanish-Catalan statistical machine translation system has been described. This approach tries to solve the problem using a pure inductive method, without using linguistic knowledge. To obtain the translator we follow the next steeps: First, we obtain a bilingual corpus from Internet. Second, we fragment the corpus into units (sentences and tokens). Third, we align the sentences from the two different languages. Then, we use the aligned corpus to train statistical models. Finally, we use these models to translate. That is, given a source sentence, we search the most probable target sentence. We have compared our translator with the most used Spanish-Catalan translators and we have obtained similar translation results than the other commercial system. It is accessible at

Keywords: machine translation, statistical pattern recognition, human language technology

Citation: Jesús Tomás, Francisco Casacuberta: A Spanish-Catalan Translator Using Statistical Methods. In R.López de Mántaras and L.Saitta (eds.): ECAI2004, Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2004, pp.1099-1100.

