Amponsah-Kaakyire, Kwabena; Pylypenko, Daria; España-Bonet, Cristina; van Genabith, Josef

Do not Rely on Relay Translations: Multilingual Parallel Direct Europarl

Proceedings of the Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21), International Committee on Computational Linguistics, pp. 1-7, Iceland (Online), 2021.

Translationese data is a scarce and valuable resource. Traditionally, the proceedings of the European Parliament have been used for studying translationese phenomena since their metadata allows to distinguish between original and translated texts. However, translations are not always direct and we hypothesise that a pivot (also called ”relay”) language might alter the conclusions on translationese effects. In this work, we (i) isolate translations that have been done without an intermediate language in the Europarl proceedings from those that might have used a pivot language, and (ii) build comparable and parallel corpora with data aligned across multiple languages that therefore can be used for both machine translation and translation studies.