Publications

Karakanta, Alina; Menzel, Katrin; Przybyl, Heike; Teich, Elke

Detecting linguistic variation in translated vs. interpreted texts using relative entropy Inproceedings

Empirical Investigations in the Forms of Mediated Discourse at the European Parliament, Thematic Session at the 49th Poznan Linguistic Meeting (PLM2019), Poznan, 2019.

Our aim is to identify the features distinguishing simultaneously interpreted texts from translations (apart from being more oral) and the characteristics they have in common which set them apart from originals (translationese features). Empirical research on the features of interpreted language and cross-modal analyses in contrast to research on translated language alone has attracted wider interest only recently. Previous interpreting studies are typically based on relatively small datasets of naturally occurring or experimental data (e.g. Shlesinger/Ordan, 2012, Chmiel et al. forthcoming, Dragsted/Hansen 2009) for specific language pairs. We propose a corpus-based, exploratory approach to detect typical linguistic features of interpreting vs. translation based on a well-structured multilingual European Parliament translation and interpreting corpus. We use the Europarl-UdS corpus (Karakanta et al. 2018)1 containing originals and translations for English, German and Spanish, and selected material from existing interpreting/combined interpreting-translation corpora (EPIC: Sandrelli/Bendazzoli 2005; TIC: Kajzer-Wietrzny 2012; EPICG: Defrancq 2015), complemented with additional interpreting data (German). The data were transcribed or revised according to our transcription guidelines ensuring comparability across different datasets. All data were enriched with relevant metadata. We aim to contribute to a more nuanced understanding of the characteristics of translated and interpreted texts and a more adequate empirical theory of mediated discourse.

@inproceedings{Karakanta2019,
title = {Detecting linguistic variation in translated vs. interpreted texts using relative entropy},
author = {Alina Karakanta and Katrin Menzel and Heike Przybyl and Elke Teich},
url = {https://www.researchgate.net/publication/336990114_Detecting_linguistic_variation_in_translated_vs_interpreted_texts_using_relative_entropy},
year = {2019},
date = {2019},
booktitle = {Empirical Investigations in the Forms of Mediated Discourse at the European Parliament, Thematic Session at the 49th Poznan Linguistic Meeting (PLM2019), Poznan},
abstract = {Our aim is to identify the features distinguishing simultaneously interpreted texts from translations (apart from being more oral) and the characteristics they have in common which set them apart from originals (translationese features). Empirical research on the features of interpreted language and cross-modal analyses in contrast to research on translated language alone has attracted wider interest only recently. Previous interpreting studies are typically based on relatively small datasets of naturally occurring or experimental data (e.g. Shlesinger/Ordan, 2012, Chmiel et al. forthcoming, Dragsted/Hansen 2009) for specific language pairs. We propose a corpus-based, exploratory approach to detect typical linguistic features of interpreting vs. translation based on a well-structured multilingual European Parliament translation and interpreting corpus. We use the Europarl-UdS corpus (Karakanta et al. 2018)1 containing originals and translations for English, German and Spanish, and selected material from existing interpreting/combined interpreting-translation corpora (EPIC: Sandrelli/Bendazzoli 2005; TIC: Kajzer-Wietrzny 2012; EPICG: Defrancq 2015), complemented with additional interpreting data (German). The data were transcribed or revised according to our transcription guidelines ensuring comparability across different datasets. All data were enriched with relevant metadata. We aim to contribute to a more nuanced understanding of the characteristics of translated and interpreted texts and a more adequate empirical theory of mediated discourse.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Karakanta, Alina; Przybyl, Heike; Teich, Elke

Exploring Variation in Translation with Relative Entropy Inproceedings

Lavid-López, Carmen Maíz-Arévalo and Juan Rafael Zamorano-Mansilla, Julia (Ed.): Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations, John Benjamins Publishing Company, pp. 307–323, 2018.

While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.

@inproceedings{Karakanta2018b,
title = {Exploring Variation in Translation with Relative Entropy},
author = {Alina Karakanta and Heike Przybyl and Elke Teich},
editor = {Julia Lavid-López Carmen Ma{\'i}z-Ar{\'e}valo and Juan Rafael Zamorano-Mansilla},
url = {https://benjamins.com/catalog/btl.158.12kar},
doi = {https://doi.org/10.1075/btl.158.12kar},
year = {2018},
date = {2018},
booktitle = {Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations},
pages = {307–323},
publisher = {John Benjamins Publishing Company},
abstract = {

While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Karakanta, Alina; Vela, Mihaela; Teich, Elke

EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates Inproceedings

ParlaCLARIN workshop, 11th Language Resources and Evaluation Conference (LREC2018), Miyazaki, Japan, 2018.

Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, extra-textual information about speakers is often absent, and as a result, these resources cannot be fully used in translation studies.

In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish, where original language and native speaker information is available as metadata. The paperdocumentsallnecessary(pre-andpost-)processingstepsforcreatingsuchavaluableresource. Inadditiontotheparallelcorpora, we collect monolingual comparable corpora for English, German and Spanish using the same method.

@inproceedings{Karakanta2018b,
title = {EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates},
author = {Alina Karakanta and Mihaela Vela and Elke Teich},
url = {http://lrec-conf.org/workshops/lrec2018/W2/pdf/10_W2.pdf},
year = {2018},
date = {2018},
booktitle = {ParlaCLARIN workshop, 11th Language Resources and Evaluation Conference (LREC2018)},
address = {Miyazaki, Japan},
abstract = {Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, extra-textual information about speakers is often absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish, where original language and native speaker information is available as metadata. The paperdocumentsallnecessary(pre-andpost-)processingstepsforcreatingsuchavaluableresource. Inadditiontotheparallelcorpora, we collect monolingual comparable corpora for English, German and Spanish using the same method.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Collard, Camille; Przybyl, Heike; Defrancq, Bart

Interpreting into an SOV Language: Memory and the Position of the Verb. A Corpus-Based Comparative Study of Interpreted and Non-mediated Speech Journal Article

Küblera, Nathalie; Loock, Rudy; Pecman, Mojca (Ed.): Meta, 63, Les Presses de l’Université de Montréal, pp. 695-716, 2018.

In Dutch and German subordinate clauses, the verb is generally placed after the clausal constituents (Subject-Object-Verb structure) thereby creating a middle field (or verbal brace). This makes interpreting from SOV into SVO languages particularly challenging as it requires further processing and feats of memory. It often requires interpreters to use specific strategies (for example, anticipation) (Lederer 1981; Liontou 2011). However, few studies have tackled this issue from the point of view of interpreting into SOV languages. Producing SOV structures requires some specific cognitive effort as, for instance, subject properties need to be kept in mind in order to ensure the correct subject-verb agreement across a span of 10 or 20 words. Speakers therefore often opt for a strategy called extraposition, placing specific elements after the verb in order to shorten the brace (Hawkins 1994; Bevilacqua 2009). Dutch speakers use this strategy more often than German speakers (Haeseryn 1990). Given the additional cognitive load generated by the interpreting process (Gile 1999), it may be assumed that interpreters will shorten the verbal brace to a larger extent than native speakers.

The present study is based on a corpus of interpreted and non-mediated speeches at the European Parliament and compares middle field lengths as well as extraposition in Dutch and German subordinate clauses. Results from 3460 subordinate clauses confirm that interpreters of both languages shorten the middle field more than native speakers. The study also shows that German interpreters use extraposition more often than native speakers, but this is not the case for Dutch interpreters. Dutch and German interpreters appear to use extraposition partly because they imitate the clause word order of the source speech, showing that, in this case, extraposition can be considered an effort-saving tool.

@article{Collard2018,
title = {Interpreting into an SOV Language: Memory and the Position of the Verb. A Corpus-Based Comparative Study of Interpreted and Non-mediated Speech},
author = {Camille Collard and Heike Przybyl and Bart Defrancq},
editor = {Nathalie K{\"u}blera and Rudy Loock and Mojca Pecman},
url = {https://id.erudit.org/iderudit/1060169ar},
doi = {https://doi.org/10.7202/1060169ar},
year = {2018},
date = {2018},
journal = {Meta},
pages = {695-716},
publisher = {Les Presses de l’Universit{\'e} de Montr{\'e}al},
volume = {63},
number = {3},
abstract = {In Dutch and German subordinate clauses, the verb is generally placed after the clausal constituents (Subject-Object-Verb structure) thereby creating a middle field (or verbal brace). This makes interpreting from SOV into SVO languages particularly challenging as it requires further processing and feats of memory. It often requires interpreters to use specific strategies (for example, anticipation) (Lederer 1981; Liontou 2011). However, few studies have tackled this issue from the point of view of interpreting into SOV languages. Producing SOV structures requires some specific cognitive effort as, for instance, subject properties need to be kept in mind in order to ensure the correct subject-verb agreement across a span of 10 or 20 words. Speakers therefore often opt for a strategy called extraposition, placing specific elements after the verb in order to shorten the brace (Hawkins 1994; Bevilacqua 2009). Dutch speakers use this strategy more often than German speakers (Haeseryn 1990). Given the additional cognitive load generated by the interpreting process (Gile 1999), it may be assumed that interpreters will shorten the verbal brace to a larger extent than native speakers. The present study is based on a corpus of interpreted and non-mediated speeches at the European Parliament and compares middle field lengths as well as extraposition in Dutch and German subordinate clauses. Results from 3460 subordinate clauses confirm that interpreters of both languages shorten the middle field more than native speakers. The study also shows that German interpreters use extraposition more often than native speakers, but this is not the case for Dutch interpreters. Dutch and German interpreters appear to use extraposition partly because they imitate the clause word order of the source speech, showing that, in this case, extraposition can be considered an effort-saving tool.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B7

Successfully