Publications

Kunilovskaya, Maria; Przybyl, Heike; Teich, Elke; Lapshinova-Koltunski, Ekaterina

Simultaneous Interpreting as a Noisy Channel: How Much Information Gets Through Inproceedings

Proceedings of Recent Advances in Natural Language Processing, pp. 604–614, 2023.

We explore the relationship between information density/surprisal of source and target texts in translation and interpreting in the language pair English-German, looking at the specific properties of translation (“translationese”). Our data comes from two bidirectional English-German subcorpora representing written and spoken mediation modes collected from European Parliament proceedings. Within each language, we (a) compare original speeches to their translated or interpreted counterparts, and (b) explore the association between segment-aligned sources and targets in each translation direction. As additional variables, we consider source delivery mode (read-out, impromptu) and source speech rate in interpreting. We use language modelling to measure the information rendered by words in a segment and to characterise the cross-lingual transfer of information under various conditions. Our approach is based on statistical analyses of surprisal values, extracted from ngram models of our dataset. The analysis reveals that while there is a considerable positive correlation between the average surprisal of source and target segments in both modes, information output in interpreting is lower than in translation, given the same amount of input. Significantly lower information density in spoken mediated production compared to nonmediated speech in the same language can indicate a possible simplification effect in interpreting.

@inproceedings{kunilovskaya-etal-2023,
title = {Simultaneous Interpreting as a Noisy Channel: How Much Information Gets Through},
author = {Maria Kunilovskaya and Heike Przybyl and Elke Teich and Ekaterina Lapshinova-Koltunski},
url = {https://acl-bg.org/proceedings/2023/RANLP%202023/RANLP2023-draft-proceedings.pdf},
doi = {https://doi.org/10.26615/978-954-452-092-2_066},
year = {2023},
date = {2023},
booktitle = {Proceedings of Recent Advances in Natural Language Processing},
pages = {604–614},
abstract = {We explore the relationship between information density/surprisal of source and target texts in translation and interpreting in the language pair English-German, looking at the specific properties of translation (“translationese”). Our data comes from two bidirectional English-German subcorpora representing written and spoken mediation modes collected from European Parliament proceedings. Within each language, we (a) compare original speeches to their translated or interpreted counterparts, and (b) explore the association between segment-aligned sources and targets in each translation direction. As additional variables, we consider source delivery mode (read-out, impromptu) and source speech rate in interpreting. We use language modelling to measure the information rendered by words in a segment and to characterise the cross-lingual transfer of information under various conditions. Our approach is based on statistical analyses of surprisal values, extracted from ngram models of our dataset. The analysis reveals that while there is a considerable positive correlation between the average surprisal of source and target segments in both modes, information output in interpreting is lower than in translation, given the same amount of input. Significantly lower information density in spoken mediated production compared to nonmediated speech in the same language can indicate a possible simplification effect in interpreting.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Yung, Frances Pik Yu; Scholman, Merel; Lapshinova-Koltunski, Ekaterina; Pollkläsener, Christina; Demberg, Vera

Investigating Explicitation of Discourse Connectives in Translation Using Automatic Annotations Inproceedings

Stoyanchev, Svetlana; Joty, Shafiq; Schlangen, David; Dusek, Ondrej; Kennington, Casey; Alikhani, Malihe (Ed.): Proceedings of the 24th Meeting of Special Interest Group on Discourse and Dialogue (SIGDAIL), Association for Computational Linguistics, pp. 21-30, Prague, Czechia, 2023.

Discourse relations have different patterns of marking across different languages. As a result, discourse connectives are often added, omitted, or rephrased in translation. Prior work has shown a tendency for explicitation of discourse connectives, but such work was conducted using restricted sample sizes due to difficulty of connective identification and alignment. The current study exploits automatic methods to facilitate a large-scale study of connectives in English and German parallel texts. Our results based on over 300 types and 18000 instances of aligned connectives and an empirical approach to compare the cross-lingual specificity gap provide strong evidence of the Explicitation Hypothesis. We conclude that discourse relations are indeed more explicit in translation than texts written originally in the same language. Automatic annotations allow us to carry out translation studies of discourse relations on a large scale. Our methodology using relative entropy to study the specificity of connectives also provides more fine-grained insights into translation patterns.

@inproceedings{yung-etal-2023-investigating,
title = {Investigating Explicitation of Discourse Connectives in Translation Using Automatic Annotations},
author = {Frances Pik Yu Yung and Merel Scholman and Ekaterina Lapshinova-Koltunski and Christina Pollkl{\"a}sener and Vera Demberg},
editor = {Svetlana Stoyanchev and Shafiq Joty and David Schlangen and Ondrej Dusek and Casey Kennington and Malihe Alikhani},
url = {https://aclanthology.org/2023.sigdial-1.2},
doi = {https://doi.org/10.18653/v1/2023.sigdial-1.2},
year = {2023},
date = {2023},
booktitle = {Proceedings of the 24th Meeting of Special Interest Group on Discourse and Dialogue (SIGDAIL)},
pages = {21-30},
publisher = {Association for Computational Linguistics},
address = {Prague, Czechia},
abstract = {Discourse relations have different patterns of marking across different languages. As a result, discourse connectives are often added, omitted, or rephrased in translation. Prior work has shown a tendency for explicitation of discourse connectives, but such work was conducted using restricted sample sizes due to difficulty of connective identification and alignment. The current study exploits automatic methods to facilitate a large-scale study of connectives in English and German parallel texts. Our results based on over 300 types and 18000 instances of aligned connectives and an empirical approach to compare the cross-lingual specificity gap provide strong evidence of the Explicitation Hypothesis. We conclude that discourse relations are indeed more explicit in translation than texts written originally in the same language. Automatic annotations allow us to carry out translation studies of discourse relations on a large scale. Our methodology using relative entropy to study the specificity of connectives also provides more fine-grained insights into translation patterns.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B2 B7

Przybyl, Heike; Karakanta, Alina; Menzel, Katrin; Teich, Elke

Exploring linguistic variation in mediated discourse: translation vs. interpreting Book Chapter

Kajzer-Wietrzny, Marta; Bernardini, Silvia; Ferraresi, Adriano; Ivaska, Ilmari;  (Ed.): Mediated discourse at the European Parliament: Empirical investigations, Language Science Press, pp. 191–218, Berlin, 2022.

This paper focuses on the distinctive features of translated and interpreted texts in specific language combinations as forms of mediated discourse at the European Parliament. We aim to contribute to the long line of research on the specific properties of translation/interpreting. Specifically, we are interested in mediation effects (translation vs. interpreting) vs. effects of discourse mode (written vs. spoken). We propose a data-driven, exploratory approach to detecting and evaluating linguistic features as typical of translation/interpreting. Our approach utilizes simple wordbased 𝑛-gram language models combined with the information-theoretic measure of relative entropy, a standard measure of similarity/difference between probability distributions, applied here as a method of corpus comparison. Comparing translation
and interpreting (including the relation to their originals), we confirm the previously observed overall trend of written vs. spoken mode being strongly reflected in the translation and interpreting output. In addition, we detect some new features, such as a tendency towards more general lexemes in the verbal domain in interpreting or features of nominal style in translation.

@inbook{Przybyl2021exploring,
title = {Exploring linguistic variation in mediated discourse: translation vs. interpreting},
author = {Heike Przybyl and Alina Karakanta and Katrin Menzel and Elke Teich},
editor = {Marta Kajzer-Wietrzny and Silvia Bernardini and Adriano Ferraresi and Ilmari Ivaska},
url = {https://langsci-press.org/catalog/book/343},
doi = {https://doi.org/10.5281/zenodo.6977050},
year = {2022},
date = {2022},
booktitle = {Mediated discourse at the European Parliament: Empirical investigations},
pages = {191–218},
publisher = {Language Science Press},
address = {Berlin},
abstract = {This paper focuses on the distinctive features of translated and interpreted texts in specific language combinations as forms of mediated discourse at the European Parliament. We aim to contribute to the long line of research on the specific properties of translation/interpreting. Specifically, we are interested in mediation effects (translation vs. interpreting) vs. effects of discourse mode (written vs. spoken). We propose a data-driven, exploratory approach to detecting and evaluating linguistic features as typical of translation/interpreting. Our approach utilizes simple wordbased 𝑛-gram language models combined with the information-theoretic measure of relative entropy, a standard measure of similarity/difference between probability distributions, applied here as a method of corpus comparison. Comparing translation and interpreting (including the relation to their originals), we confirm the previously observed overall trend of written vs. spoken mode being strongly reflected in the translation and interpreting output. In addition, we detect some new features, such as a tendency towards more general lexemes in the verbal domain in interpreting or features of nominal style in translation.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B7

Lapshinova-Koltunski, Ekaterina; Pollkläsener, Christina; Przybyl, Heike

Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora Journal Article

The Prague Bulletin of Mathematical Linguistics, 119, pp. 5-22, 2022, ISSN 0032-6585.

We present a study of discourse connectives in English-German and German-English translation and interpreting where we focus on the phenomena of explicitation and implicitation.
Apart from distributional analysis of translation patterns in parallel data, we also look into surprisal, i.e. an information-theoretic measure of cognitive effort, which helps us to interpret the observed tendencies.

@article{lapshinova-koltunski-pollklaesener-przybyl:2022,
title = {Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora},
author = {Ekaterina Lapshinova-Koltunski and Christina Pollkl{\"a}sener and Heike Przybyl},
url = {https://ufal.mff.cuni.cz/pbml/119/art-lapshinova-koltunski-pollklaesener-przybyl.pdf},
doi = {https://doi.org/10.14712/00326585.020},
year = {2022},
date = {2022},
journal = {The Prague Bulletin of Mathematical Linguistics},
pages = {5-22},
volume = {119},
abstract = {We present a study of discourse connectives in English-German and German-English translation and interpreting where we focus on the phenomena of explicitation and implicitation. Apart from distributional analysis of translation patterns in parallel data, we also look into surprisal, i.e. an information-theoretic measure of cognitive effort, which helps us to interpret the observed tendencies.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B7

Przybyl, Heike; Lapshinova-Koltunski, Ekaterina; Menzel, Katrin; Fischer, Stefan; Teich, Elke

EPIC UdS - Creation and applications of a simultaneous interpreting corpus Inproceedings

Proceedings of the  13th Conference on Language Resources and Evaluation (LREC 2022), pp. 1193–1200, Marseille, France, 20-25 June 2022, 2022.

In this paper, we describe the creation and annotation of EPIC UdS, a multilingual corpus of simultaneous interpreting for English, German and Spanish. We give an overview of the comparable and parallel, aligned corpus variants and explore various applications of the corpus. What makes EPIC UdS relevant is that it is one of the rare interpreting corpora that includes transcripts suitable for research on more than one language pair and on interpreting with regard to German. It not only contains transcribed speeches, but also rich metadata and fine-grained linguistic annotations tailored for diverse applications across a broad range of linguistic subfields.

@inproceedings{Przybyl_interpreting_2022,
title = {EPIC UdS - Creation and applications of a simultaneous interpreting corpus},
author = {Heike Przybyl and Ekaterina Lapshinova-Koltunski and Katrin Menzel and Stefan Fischer and Elke Teich},
url = {https://aclanthology.org/2022.lrec-1.127/},
year = {2022},
date = {2022},
booktitle = {Proceedings of the  13th Conference on Language Resources and Evaluation (LREC 2022)},
pages = {1193–1200},
address = {Marseille, France, 20-25 June 2022},
abstract = {In this paper, we describe the creation and annotation of EPIC UdS, a multilingual corpus of simultaneous interpreting for English, German and Spanish. We give an overview of the comparable and parallel, aligned corpus variants and explore various applications of the corpus. What makes EPIC UdS relevant is that it is one of the rare interpreting corpora that includes transcripts suitable for research on more than one language pair and on interpreting with regard to German. It not only contains transcribed speeches, but also rich metadata and fine-grained linguistic annotations tailored for diverse applications across a broad range of linguistic subfields.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age Proceeding

Bizzoni, Yuri; Teich, Elke; España-Bonet, Cristina; van Genabith, Josef;  (Ed.): Association for Computational Linguistics, online, 2021.

@proceeding{motra-2021-modelling,
title = {Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age},
author = {},
editor = {Yuri Bizzoni and Elke Teich and Cristina Espa{\~n}a-Bonet and Josef van Genabith},
url = {https://aclanthology.org/2021.motra-1.0/},
year = {2021},
date = {2021},
publisher = {Association for Computational Linguistics},
address = {online},
pubstate = {published},
type = {proceeding}
}

Copy BibTeX to Clipboard

Project:   B7

Karakanta, Alina; Przybyl, Heike; Teich, Elke

Exploring variation in translation with probabilistic language models Incollection

Lavid-López, Julia; Maíz-Arévalo, Carmen; Zamorano-Mansilla, Juan Rafael;  (Ed.): Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations, 158, Benjamins, pp. 308-323, Amsterdam, 2021.

While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.

@incollection{KarakantaEtAl2021,
title = {Exploring variation in translation with probabilistic language models},
author = {Alina Karakanta and Heike Przybyl and Elke Teich},
editor = {Julia Lavid-López and Carmen Ma{\'i}z-Ar{\'e}valo and Juan Rafael Zamorano-Mansilla},
url = {https://doi.org/10.1075/btl.158.12kar},
doi = {https://doi.org/10.1075/btl.158.12kar},
year = {2021},
date = {2021},
booktitle = {Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations},
pages = {308-323},
publisher = {Benjamins},
address = {Amsterdam},
abstract = {While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.},
pubstate = {published},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   B7

Lapshinova-Koltunski, Ekaterina; Przybyl, Heike; Bizzoni, Yuri

Tracing variation in discourse connectives in translation and interpreting through neural semantic spaces Inproceedings

Proceedings of the 2nd Workshop on Computational Approaches to Discourse CODI, pp. 134-142, Punta Cana, Dominican Republic and Online, 2021.

In the present paper, we explore lexical contexts of discourse markers in translation and interpreting on the basis of word embeddings. Our special interest is on contextual variation of the same discourse markers in (written) translation vs. (simultaneous) interpreting. To explore this variation at the lexical level, we use a data-driven approach: we compare bilingual neural word embeddings trained on source-to- translation and source-tointerpreting aligned corpora. Our results show more variation of semantically related items in translation spaces vs. interpreting ones and a more consistent use of fewer connectives in interpreting. We also observe different trends with regard to the discourse relation types.

@inproceedings{LapshinovaEtAl2021codi,
title = {Tracing variation in discourse connectives in translation and interpreting through neural semantic spaces},
author = {Ekaterina Lapshinova-Koltunski and Heike Przybyl and Yuri Bizzoni},
url = {https://aclanthology.org/2021.codi-main.13/},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 2nd Workshop on Computational Approaches to Discourse CODI},
pages = {134-142},
address = {Punta Cana, Dominican Republic and Online},
abstract = {In the present paper, we explore lexical contexts of discourse markers in translation and interpreting on the basis of word embeddings. Our special interest is on contextual variation of the same discourse markers in (written) translation vs. (simultaneous) interpreting. To explore this variation at the lexical level, we use a data-driven approach: we compare bilingual neural word embeddings trained on source-to- translation and source-tointerpreting aligned corpora. Our results show more variation of semantically related items in translation spaces vs. interpreting ones and a more consistent use of fewer connectives in interpreting. We also observe different trends with regard to the discourse relation types.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Bizzoni, Yuri; Lapshinova-Koltunski, Ekaterina

Measuring Translationese across Levels of Expertise: Are Professionals more Surprising than Students? Inproceedings

Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), Linköping University Electronic Press, Sweden, pp. 53-63, 2021.

The present paper deals with a computational analysis of translationese in professional and student English-to-German translations belonging to different registers. Building upon an information-theoretical approach, we test translation conformity to source and target language in terms of a neural language model’s perplexity over Part of Speech (PoS) sequences. Our primary focus is on register diversification vs. convergence, reflected in the use of constructions eliciting a higher vs. lower perplexity score. Our results show that, against our expectations, professional translations elicit higher perplexity scores from a target language model than students’ translations. An analysis of the distribution of PoS patterns across registers shows that this apparent paradox is the effect of higher stylistic diversification and register sensitivity in professional translations. Our results contribute to the understanding of human translationese and shed light on the variation in texts generated by different translators, which is valuable for translation studies, multilingual language processing, and machine translation.

@inproceedings{Bizzoni2021,
title = {Measuring Translationese across Levels of Expertise: Are Professionals more Surprising than Students?},
author = {Yuri Bizzoni and Ekaterina Lapshinova-Koltunski},
url = {https://aclanthology.org/2021.nodalida-main.6},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
pages = {53-63},
publisher = {Link{\"o}ping University Electronic Press, Sweden},
abstract = {The present paper deals with a computational analysis of translationese in professional and student English-to-German translations belonging to different registers. Building upon an information-theoretical approach, we test translation conformity to source and target language in terms of a neural language model’s perplexity over Part of Speech (PoS) sequences. Our primary focus is on register diversification vs. convergence, reflected in the use of constructions eliciting a higher vs. lower perplexity score. Our results show that, against our expectations, professional translations elicit higher perplexity scores from a target language model than students’ translations. An analysis of the distribution of PoS patterns across registers shows that this apparent paradox is the effect of higher stylistic diversification and register sensitivity in professional translations. Our results contribute to the understanding of human translationese and shed light on the variation in texts generated by different translators, which is valuable for translation studies, multilingual language processing, and machine translation.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Menzel, Katrin; Przybyl, Heike; Lapshinova-Koltunski, Ekaterina

EPIC-UdS - ein mehrsprachiges Korpus als Grundlage für die korpusbasierte Dolmetsch- und Übersetzungswissenschaft Miscellaneous

TRANSLATA IV - 4. Internationale Konferenz zur Translationswissenschaft, Innsbruck, 2021.

@miscellaneous{Menzel2021epic,
title = {EPIC-UdS - ein mehrsprachiges Korpus als Grundlage f{\"u}r die korpusbasierte Dolmetsch- und {\"U}bersetzungswissenschaft},
author = {Katrin Menzel and Heike Przybyl and Ekaterina Lapshinova-Koltunski},
year = {2021},
date = {2021},
booktitle = {TRANSLATA IV - 4. Internationale Konferenz zur Translationswissenschaft},
address = {Innsbruck},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B7

Lapshinova-Koltunski, Ekaterina; Bizzoni, Yuri; Przybyl, Heike; Teich, Elke

Found in translation/interpreting: combining data-driven and supervised methods to analyse cross-linguistically mediated communication Inproceedings

Proceedings of the Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21), Association for Computational Linguistics, pp. 82-90, online, 2021.

We report on a study of the specific linguistic properties of cross-linguistically mediated communication, comparing written and spoken translation (simultaneous interpreting) in the domain of European Parliament discourse. Specifically, we compare translations and interpreting with target language original texts/speeches in terms of (a) predefined features commonly used for translationese detection, and (b) features derived in a data-driven fashion from translation and interpreting corpora. For the latter, we use n-gram language models combined with relative entropy (Kullback-Leibler Divergence). We set up a number of classification tasks comparing translations with comparable texts originally written in the target language and interpreted speeches with target language comparable speeches to assess the contributions of predefined and data-driven features to the distinction between translation, interpreting and originals. Our analysis reveals that interpreting is more distinct from comparable originals than translation and that its most distinctive features signal an overemphasis of oral, online production more than showing traces of cross-linguistically mediated communication.

@inproceedings{LapshinovaEtAl2021interp,
title = {Found in translation/interpreting: combining data-driven and supervised methods to analyse cross-linguistically mediated communication},
author = {Ekaterina Lapshinova-Koltunski and Yuri Bizzoni and Heike Przybyl and Elke Teich},
url = {https://aclanthology.org/2021.motra-1.9/},
year = {2021},
date = {2021-05-31},
booktitle = {Proceedings of the Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21)},
pages = {82-90},
publisher = {Association for Computational Linguistics},
address = {online},
abstract = {We report on a study of the specific linguistic properties of cross-linguistically mediated communication, comparing written and spoken translation (simultaneous interpreting) in the domain of European Parliament discourse. Specifically, we compare translations and interpreting with target language original texts/speeches in terms of (a) predefined features commonly used for translationese detection, and (b) features derived in a data-driven fashion from translation and interpreting corpora. For the latter, we use n-gram language models combined with relative entropy (Kullback-Leibler Divergence). We set up a number of classification tasks comparing translations with comparable texts originally written in the target language and interpreted speeches with target language comparable speeches to assess the contributions of predefined and data-driven features to the distinction between translation, interpreting and originals. Our analysis reveals that interpreting is more distinct from comparable originals than translation and that its most distinctive features signal an overemphasis of oral, online production more than showing traces of cross-linguistically mediated communication.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Lapshinova-Koltunski, Ekaterina

Analysing the Dimension of Mode in Translation Book Chapter

Bisiada, Mario;  (Ed.): Empirical Studies in Translation and Discourse. Translation and Multilingual Natural Language Processing, Language Science Press, pp. 223-243, Berlin, 2021, ISBN 978-3-96110-300-3, ISSN 2364-8899.

The present chapter applies text classification to test how well we can distinguish between texts along two dimensions: a text-production dimension that distinguishes between translations and non-translations (where translations also include interpreted texts); and a mode dimension that distinguishes between and spoken and written texts. The chapter also aims to investigate the relationship between these two dimensions. Moreover, it investigates whether the same linguistic features that are derived from variational linguistics contribute to the prediction of mode in both translations and non-translations. The distributional information about these features was used to statistically model variation along the two dimensions. The results show that the same feature set can be used to automatically differentiate translations from non-translations, as well as spoken texts from the written texts. However, language variation along the dimension of mode is stronger
than that along the dimension of text production, as classification into spoken and written texts delivers better results. Besides, linguistic features that contribute to the distinction between spoken and written mode are similar in both translated and non-translated language.

@inbook{Lapshinova2021dimension,
title = {Analysing the Dimension of Mode in Translation},
author = {Ekaterina Lapshinova-Koltunski},
editor = {Mario Bisiada},
url = {https://doi.org/10.5281/zenodo.4450014},
doi = {https://doi.org/10.5281/zenodo.4450014},
year = {2021},
date = {2021},
booktitle = {Empirical Studies in Translation and Discourse. Translation and Multilingual Natural Language Processing},
isbn = {978-3-96110-300-3},
issn = {2364-8899},
pages = {223-243},
publisher = {Language Science Press},
address = {Berlin},
abstract = {The present chapter applies text classification to test how well we can distinguish between texts along two dimensions: a text-production dimension that distinguishes between translations and non-translations (where translations also include interpreted texts); and a mode dimension that distinguishes between and spoken and written texts. The chapter also aims to investigate the relationship between these two dimensions. Moreover, it investigates whether the same linguistic features that are derived from variational linguistics contribute to the prediction of mode in both translations and non-translations. The distributional information about these features was used to statistically model variation along the two dimensions. The results show that the same feature set can be used to automatically differentiate translations from non-translations, as well as spoken texts from the written texts. However, language variation along the dimension of mode is stronger than that along the dimension of text production, as classification into spoken and written texts delivers better results. Besides, linguistic features that contribute to the distinction between spoken and written mode are similar in both translated and non-translated language.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B7

Teich, Elke; Martínez Martínez, José; Karakanta, Alina

Translation, information theory and cognition Book Chapter

Alves, Fabio; Lykke Jakobsen, Arnt (Ed.): The Routledge Handbook of Translation and Cognition, Routledge, pp. 360-375, London, UK, 2020, ISBN 9781138037007.

The chapter sketches a formal basis for the probabilistic modelling of human translation on the basis of information theory. We provide a definition of Shannon information applied to linguistic communication and discuss its relevance for modelling translation. We further explain the concept of the noisy channel and provide the link to modelling human translational choice. We suggest that a number of translation-relevant variables, notably (dis)similarity between languages, level of expertise and translation mode (i.e., interpreting vs. translation), may be appropriately indexed by entropy, which in turn has been shown to indicate production effort.

@inbook{Teich-etal2020-handbook,
title = {Translation, information theory and cognition},
author = {Elke Teich and Jos{\'e} Mart{\'i}nez Mart{\'i}nez and Alina Karakanta},
editor = {Fabio Alves and Arnt Lykke Jakobsen},
url = {https://www.taylorfrancis.com/chapters/edit/10.4324/9781315178127-24/translation-information-theory-cognition-elke-teich-josé-martínez-martínez-alina-karakanta},
year = {2020},
date = {2020},
booktitle = {The Routledge Handbook of Translation and Cognition},
isbn = {9781138037007},
pages = {360-375},
publisher = {Routledge},
address = {London, UK},
abstract = {

The chapter sketches a formal basis for the probabilistic modelling of human translation on the basis of information theory. We provide a definition of Shannon information applied to linguistic communication and discuss its relevance for modelling translation. We further explain the concept of the noisy channel and provide the link to modelling human translational choice. We suggest that a number of translation-relevant variables, notably (dis)similarity between languages, level of expertise and translation mode (i.e., interpreting vs. translation), may be appropriately indexed by entropy, which in turn has been shown to indicate production effort.
},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B7

Bizzoni, Yuri; Juzek, Tom; España-Bonet, Cristina; Dutta Chowdhury, Koel; van Genabith, Josef; Teich, Elke

How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech Inproceedings

The 17th International Workshop on Spoken Language Translation, Seattle, WA, United States, 2020.

Translationese is a phenomenon present in human translations, simultaneous interpreting, and even machine translations. Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear. This study analyzes translationese patterns in translation, interpreting, and machine translation outputs in order to explore possible reasons. In our analysis we (i) detail two non-invasive ways of detecting translationese and (ii) compare translationese across human and machine translations from text and speech. We find that machine translation shows traces of translationese, but does not reproduce the patterns found in human translation, offering support to the hypothesis that such patterns are due to the model (human vs. machine) rather than to the data (written vs. spoken).

@inproceedings{Bizzoni2020,
title = {How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech},
author = {Yuri Bizzoni and Tom Juzek and Cristina Espa{\~n}a-Bonet and Koel Dutta Chowdhury and Josef van Genabith and Elke Teich},
url = {https://aclanthology.org/2020.iwslt-1.34/},
doi = {https://doi.org/10.18653/v1/2020.iwslt-1.34},
year = {2020},
date = {2020},
booktitle = {The 17th International Workshop on Spoken Language Translation},
address = {Seattle, WA, United States},
abstract = {Translationese is a phenomenon present in human translations, simultaneous interpreting, and even machine translations. Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear. This study analyzes translationese patterns in translation, interpreting, and machine translation outputs in order to explore possible reasons. In our analysis we (i) detail two non-invasive ways of detecting translationese and (ii) compare translationese across human and machine translations from text and speech. We find that machine translation shows traces of translationese, but does not reproduce the patterns found in human translation, offering support to the hypothesis that such patterns are due to the model (human vs. machine) rather than to the data (written vs. spoken).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B6 B7

Bizzoni, Yuri; Teich, Elke

Analyzing variation in translation through neural semantic spaces Inproceedings

Special topic: Neural Networks for Building and Using Comparable Corpora, Recent Advances in Natural Language Processing (RANLP), Varna, BulgariaSpecial topic: Neural Networks for Building and Using Comparable Corpora, Recent Advances in Natural Language Processing (RANLP), Varna, Bulgaria, 2019.

We present an approach for exploring the lexical choice patterns in translation on the basis of word embeddings. Specifically, we are interested in variation in translation according to translation mode, i.e. (written) translation vs. (simultaneous) interpreting. While it might seem obvious that the outputs of the two translation modes differ, there are hardly any accounts of the summative linguistic effects of one vs. the other. To explore such effects at the lexical level, we propose a data-driven approach: using neural word embeddings (Word2Vec), we compare the bilingual semantic spaces emanating from source-totranslation and source-to-interpreting.

@inproceedings{Bizzoni2019,
title = {Analyzing variation in translation through neural semantic spaces},
author = {Yuri Bizzoni and Elke Teich},
url = {https://comparable.limsi.fr/bucc2019/Bizzoni_BUCC2019_paper1.pdf},
year = {2019},
date = {2019-08-30},
booktitle = {Special topic: Neural Networks for Building and Using Comparable Corpora, Recent Advances in Natural Language Processing (RANLP), Varna, Bulgaria},
address = {Varna, Bulgaria},
abstract = {We present an approach for exploring the lexical choice patterns in translation on the basis of word embeddings. Specifically, we are interested in variation in translation according to translation mode, i.e. (written) translation vs. (simultaneous) interpreting. While it might seem obvious that the outputs of the two translation modes differ, there are hardly any accounts of the summative linguistic effects of one vs. the other. To explore such effects at the lexical level, we propose a data-driven approach: using neural word embeddings (Word2Vec), we compare the bilingual semantic spaces emanating from source-totranslation and source-to-interpreting.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Karakanta, Alina; Menzel, Katrin; Przybyl, Heike; Teich, Elke

Detecting linguistic variation in translated vs. interpreted texts using relative entropy Inproceedings

Empirical Investigations in the Forms of Mediated Discourse at the European Parliament, Thematic Session at the 49th Poznan Linguistic Meeting (PLM2019), Poznan, 2019.

Our aim is to identify the features distinguishing simultaneously interpreted texts from translations (apart from being more oral) and the characteristics they have in common which set them apart from originals (translationese features). Empirical research on the features of interpreted language and cross-modal analyses in contrast to research on translated language alone has attracted wider interest only recently. Previous interpreting studies are typically based on relatively small datasets of naturally occurring or experimental data (e.g. Shlesinger/Ordan, 2012, Chmiel et al. forthcoming, Dragsted/Hansen 2009) for specific language pairs. We propose a corpus-based, exploratory approach to detect typical linguistic features of interpreting vs. translation based on a well-structured multilingual European Parliament translation and interpreting corpus. We use the Europarl-UdS corpus (Karakanta et al. 2018)1 containing originals and translations for English, German and Spanish, and selected material from existing interpreting/combined interpreting-translation corpora (EPIC: Sandrelli/Bendazzoli 2005; TIC: Kajzer-Wietrzny 2012; EPICG: Defrancq 2015), complemented with additional interpreting data (German). The data were transcribed or revised according to our transcription guidelines ensuring comparability across different datasets. All data were enriched with relevant metadata. We aim to contribute to a more nuanced understanding of the characteristics of translated and interpreted texts and a more adequate empirical theory of mediated discourse.

@inproceedings{Karakanta2019,
title = {Detecting linguistic variation in translated vs. interpreted texts using relative entropy},
author = {Alina Karakanta and Katrin Menzel and Heike Przybyl and Elke Teich},
url = {https://www.researchgate.net/publication/336990114_Detecting_linguistic_variation_in_translated_vs_interpreted_texts_using_relative_entropy},
year = {2019},
date = {2019},
booktitle = {Empirical Investigations in the Forms of Mediated Discourse at the European Parliament, Thematic Session at the 49th Poznan Linguistic Meeting (PLM2019), Poznan},
abstract = {Our aim is to identify the features distinguishing simultaneously interpreted texts from translations (apart from being more oral) and the characteristics they have in common which set them apart from originals (translationese features). Empirical research on the features of interpreted language and cross-modal analyses in contrast to research on translated language alone has attracted wider interest only recently. Previous interpreting studies are typically based on relatively small datasets of naturally occurring or experimental data (e.g. Shlesinger/Ordan, 2012, Chmiel et al. forthcoming, Dragsted/Hansen 2009) for specific language pairs. We propose a corpus-based, exploratory approach to detect typical linguistic features of interpreting vs. translation based on a well-structured multilingual European Parliament translation and interpreting corpus. We use the Europarl-UdS corpus (Karakanta et al. 2018)1 containing originals and translations for English, German and Spanish, and selected material from existing interpreting/combined interpreting-translation corpora (EPIC: Sandrelli/Bendazzoli 2005; TIC: Kajzer-Wietrzny 2012; EPICG: Defrancq 2015), complemented with additional interpreting data (German). The data were transcribed or revised according to our transcription guidelines ensuring comparability across different datasets. All data were enriched with relevant metadata. We aim to contribute to a more nuanced understanding of the characteristics of translated and interpreted texts and a more adequate empirical theory of mediated discourse.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Karakanta, Alina; Przybyl, Heike; Teich, Elke

Exploring Variation in Translation with Relative Entropy Inproceedings

Lavid-López, Carmen Maíz-Arévalo and Juan Rafael Zamorano-Mansilla, Julia (Ed.): Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations, John Benjamins Publishing Company, pp. 307–323, 2018.

While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.

@inproceedings{Karakanta2018b,
title = {Exploring Variation in Translation with Relative Entropy},
author = {Alina Karakanta and Heike Przybyl and Elke Teich},
editor = {Julia Lavid-López Carmen Ma{\'i}z-Ar{\'e}valo and Juan Rafael Zamorano-Mansilla},
url = {https://benjamins.com/catalog/btl.158.12kar},
doi = {https://doi.org/10.1075/btl.158.12kar},
year = {2018},
date = {2018},
booktitle = {Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations},
pages = {307–323},
publisher = {John Benjamins Publishing Company},
abstract = {

While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Karakanta, Alina; Vela, Mihaela; Teich, Elke

EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates Inproceedings

ParlaCLARIN workshop, 11th Language Resources and Evaluation Conference (LREC2018), Miyazaki, Japan, 2018.

Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, extra-textual information about speakers is often absent, and as a result, these resources cannot be fully used in translation studies.

In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish, where original language and native speaker information is available as metadata. The paperdocumentsallnecessary(pre-andpost-)processingstepsforcreatingsuchavaluableresource. Inadditiontotheparallelcorpora, we collect monolingual comparable corpora for English, German and Spanish using the same method.

@inproceedings{Karakanta2018b,
title = {EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates},
author = {Alina Karakanta and Mihaela Vela and Elke Teich},
url = {http://lrec-conf.org/workshops/lrec2018/W2/pdf/10_W2.pdf},
year = {2018},
date = {2018},
booktitle = {ParlaCLARIN workshop, 11th Language Resources and Evaluation Conference (LREC2018)},
address = {Miyazaki, Japan},
abstract = {Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, extra-textual information about speakers is often absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish, where original language and native speaker information is available as metadata. The paperdocumentsallnecessary(pre-andpost-)processingstepsforcreatingsuchavaluableresource. Inadditiontotheparallelcorpora, we collect monolingual comparable corpora for English, German and Spanish using the same method.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Collard, Camille; Przybyl, Heike; Defrancq, Bart

Interpreting into an SOV Language: Memory and the Position of the Verb. A Corpus-Based Comparative Study of Interpreted and Non-mediated Speech Journal Article

Küblera, Nathalie; Loock, Rudy; Pecman, Mojca (Ed.): Meta, 63, Les Presses de l’Université de Montréal, pp. 695-716, 2018.

In Dutch and German subordinate clauses, the verb is generally placed after the clausal constituents (Subject-Object-Verb structure) thereby creating a middle field (or verbal brace). This makes interpreting from SOV into SVO languages particularly challenging as it requires further processing and feats of memory. It often requires interpreters to use specific strategies (for example, anticipation) (Lederer 1981; Liontou 2011). However, few studies have tackled this issue from the point of view of interpreting into SOV languages. Producing SOV structures requires some specific cognitive effort as, for instance, subject properties need to be kept in mind in order to ensure the correct subject-verb agreement across a span of 10 or 20 words. Speakers therefore often opt for a strategy called extraposition, placing specific elements after the verb in order to shorten the brace (Hawkins 1994; Bevilacqua 2009). Dutch speakers use this strategy more often than German speakers (Haeseryn 1990). Given the additional cognitive load generated by the interpreting process (Gile 1999), it may be assumed that interpreters will shorten the verbal brace to a larger extent than native speakers.

The present study is based on a corpus of interpreted and non-mediated speeches at the European Parliament and compares middle field lengths as well as extraposition in Dutch and German subordinate clauses. Results from 3460 subordinate clauses confirm that interpreters of both languages shorten the middle field more than native speakers. The study also shows that German interpreters use extraposition more often than native speakers, but this is not the case for Dutch interpreters. Dutch and German interpreters appear to use extraposition partly because they imitate the clause word order of the source speech, showing that, in this case, extraposition can be considered an effort-saving tool.

@article{Collard2018,
title = {Interpreting into an SOV Language: Memory and the Position of the Verb. A Corpus-Based Comparative Study of Interpreted and Non-mediated Speech},
author = {Camille Collard and Heike Przybyl and Bart Defrancq},
editor = {Nathalie K{\"u}blera and Rudy Loock and Mojca Pecman},
url = {https://id.erudit.org/iderudit/1060169ar},
doi = {https://doi.org/10.7202/1060169ar},
year = {2018},
date = {2018},
journal = {Meta},
pages = {695-716},
publisher = {Les Presses de l’Universit{\'e} de Montr{\'e}al},
volume = {63},
number = {3},
abstract = {In Dutch and German subordinate clauses, the verb is generally placed after the clausal constituents (Subject-Object-Verb structure) thereby creating a middle field (or verbal brace). This makes interpreting from SOV into SVO languages particularly challenging as it requires further processing and feats of memory. It often requires interpreters to use specific strategies (for example, anticipation) (Lederer 1981; Liontou 2011). However, few studies have tackled this issue from the point of view of interpreting into SOV languages. Producing SOV structures requires some specific cognitive effort as, for instance, subject properties need to be kept in mind in order to ensure the correct subject-verb agreement across a span of 10 or 20 words. Speakers therefore often opt for a strategy called extraposition, placing specific elements after the verb in order to shorten the brace (Hawkins 1994; Bevilacqua 2009). Dutch speakers use this strategy more often than German speakers (Haeseryn 1990). Given the additional cognitive load generated by the interpreting process (Gile 1999), it may be assumed that interpreters will shorten the verbal brace to a larger extent than native speakers. The present study is based on a corpus of interpreted and non-mediated speeches at the European Parliament and compares middle field lengths as well as extraposition in Dutch and German subordinate clauses. Results from 3460 subordinate clauses confirm that interpreters of both languages shorten the middle field more than native speakers. The study also shows that German interpreters use extraposition more often than native speakers, but this is not the case for Dutch interpreters. Dutch and German interpreters appear to use extraposition partly because they imitate the clause word order of the source speech, showing that, in this case, extraposition can be considered an effort-saving tool.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B7

Successfully