Publications

Lemke, Tyll Robin; Reich, Ingo; Schäfer, Lisa; Drenhaus, Heiner

Predictable words are more likely to be omitted in fragments – Evidence from production data Journal Article

Frontiers in Psychology, 12, pp. 662125, 2021.

Instead of a full sentence like Bring me to the university (uttered by the passenger to a taxi driver) speakers often use fragments like To the university to get their message across. So far there is no comprehensive and empirically supported account of why and under which circumstances speakers sometimes prefer a fragment over the corresponding full sentence. We propose an information-theoretic account to model this choice: A speaker chooses the encoding that distributes information most uniformly across the utterance in order to make the most efficient use of the hearer’s processing resources (Uniform Information Density, Levy and Jaeger, 2007). Since processing effort is related to the predictability of words (Hale, 2001) our account predicts two effects of word probability on omissions: First, omitting predictable words (which are more easily processed), avoids underutilizing processing resources. Second, inserting words before very unpredictable words distributes otherwise excessively high processing effort more uniformly. We test these predictions with a production study that supports both of these predictions. Our study makes two main contributions: First we develop an empirically motivated and supported account of fragment usage. Second, we extend previous evidence for information-theoretic processing constraints on language in two ways: We find predictability effects on omissions driven by extralinguistic context, whereas previous research mostly focused on effects of local linguistic context. Furthermore, we show that omissions of content words are also subject to information-theoretic well-formedness considerations. Previously, this has been shown mostly for the omission of function words.

@article{lemke.etal2021.frontiers,
title = {Predictable words are more likely to be omitted in fragments – Evidence from production data},
author = {Tyll Robin Lemke and Ingo Reich and Lisa Sch{\"a}fer and Heiner Drenhaus},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2021.662125/full},
doi = {https://doi.org/10.3389/fpsyg.2021.662125},
year = {2021},
date = {2021-07-22},
journal = {Frontiers in Psychology},
pages = {662125},
volume = {12},
abstract = {Instead of a full sentence like Bring me to the university (uttered by the passenger to a taxi driver) speakers often use fragments like To the university to get their message across. So far there is no comprehensive and empirically supported account of why and under which circumstances speakers sometimes prefer a fragment over the corresponding full sentence. We propose an information-theoretic account to model this choice: A speaker chooses the encoding that distributes information most uniformly across the utterance in order to make the most efficient use of the hearer's processing resources (Uniform Information Density, Levy and Jaeger, 2007). Since processing effort is related to the predictability of words (Hale, 2001) our account predicts two effects of word probability on omissions: First, omitting predictable words (which are more easily processed), avoids underutilizing processing resources. Second, inserting words before very unpredictable words distributes otherwise excessively high processing effort more uniformly. We test these predictions with a production study that supports both of these predictions. Our study makes two main contributions: First we develop an empirically motivated and supported account of fragment usage. Second, we extend previous evidence for information-theoretic processing constraints on language in two ways: We find predictability effects on omissions driven by extralinguistic context, whereas previous research mostly focused on effects of local linguistic context. Furthermore, we show that omissions of content words are also subject to information-theoretic well-formedness considerations. Previously, this has been shown mostly for the omission of function words.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Abdullah, Badr M.; Mosbach, Marius; Zaitova, Iuliia; Möbius, Bernd; Klakow, Dietrich

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study Inproceedings

Proceedings of Interspeech 2020, 2021.

Several variants of deep neural networks have been successfully employed for building parametric models that project variable-duration spoken word segments onto fixed-size vector representations, or acoustic word embeddings (AWEs). However, it remains unclear to what degree we can rely on the distance in the emerging AWE space as an estimate of word-form similarity. In this paper, we ask: does the distance in the acoustic embedding space correlate with phonological dissimilarity? To answer this question, we empirically investigate the performance of supervised approaches for AWEs with different neural architectures and learning objectives. We train AWE models in controlled settings for two languages (German and Czech) and evaluate the embeddings on two tasks: word discrimination and phonological similarity. Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the word discrimination task does not necessarily yield models that better reflect word phonological similarity. Our findings highlight the necessity to rethink the current intrinsic evaluations for AWEs.

@inproceedings{Abdullah2021DoAW,
title = {Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study},
author = {Badr M. Abdullah and Marius Mosbach and Iuliia Zaitova and Bernd M{\"o}bius and Dietrich Klakow},
url = {https://arxiv.org/abs/2106.08686},
year = {2021},
date = {2021},
booktitle = {Proceedings of Interspeech 2020},
abstract = {Several variants of deep neural networks have been successfully employed for building parametric models that project variable-duration spoken word segments onto fixed-size vector representations, or acoustic word embeddings (AWEs). However, it remains unclear to what degree we can rely on the distance in the emerging AWE space as an estimate of word-form similarity. In this paper, we ask: does the distance in the acoustic embedding space correlate with phonological dissimilarity? To answer this question, we empirically investigate the performance of supervised approaches for AWEs with different neural architectures and learning objectives. We train AWE models in controlled settings for two languages (German and Czech) and evaluate the embeddings on two tasks: word discrimination and phonological similarity. Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the word discrimination task does not necessarily yield models that better reflect word phonological similarity. Our findings highlight the necessity to rethink the current intrinsic evaluations for AWEs.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C4 B4

Mayn, Alexandra; Abdullah, Badr M.; Klakow, Dietrich

Familiar words but strange voices: Modelling the influence of speech variability on word recognition Inproceedings

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, Association for Computational Linguistics, pp. 96-102, Online, 2021.

We present a deep neural model of spoken word recognition which is trained to retrieve the meaning of a word (in the form of a word embedding) given its spoken form, a task which resembles that faced by a human listener. Furthermore, we investigate the influence of variability in speech signals on the model’s performance. To this end, we conduct of set of controlled experiments using word-aligned read speech data in German. Our experiments show that (1) the model is more sensitive to dialectical variation than gender variation, and (2) recognition performance of word cognates from related languages reflect the degree of relatedness between languages in our study. Our work highlights the feasibility of modeling human speech perception using deep neural networks.

@inproceedings{mayn-etal-2021-familiar,
title = {Familiar words but strange voices: Modelling the influence of speech variability on word recognition},
author = {Alexandra Mayn and Badr M. Abdullah and Dietrich Klakow},
url = {https://aclanthology.org/2021.eacl-srw.14},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop},
pages = {96-102},
publisher = {Association for Computational Linguistics},
address = {Online},
abstract = {We present a deep neural model of spoken word recognition which is trained to retrieve the meaning of a word (in the form of a word embedding) given its spoken form, a task which resembles that faced by a human listener. Furthermore, we investigate the influence of variability in speech signals on the model’s performance. To this end, we conduct of set of controlled experiments using word-aligned read speech data in German. Our experiments show that (1) the model is more sensitive to dialectical variation than gender variation, and (2) recognition performance of word cognates from related languages reflect the degree of relatedness between languages in our study. Our work highlights the feasibility of modeling human speech perception using deep neural networks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Macher, Nicole; Abdullah, Badr M.; Brouwer, Harm; Klakow, Dietrich

Do we read what we hear? Modeling orthographic influences on spoken word recognition Inproceedings

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, Association for Computational Linguistics, pp. 16-22, Online, 2021.

Theories and models of spoken word recognition aim to explain the process of accessing lexical knowledge given an acoustic realization of a word form. There is consensus that phonological and semantic information is crucial for this process. However, there is accumulating evidence that orthographic information could also have an impact on auditory word recognition. This paper presents two models of spoken word recognition that instantiate different hypotheses regarding the influence of orthography on this process. We show that these models reproduce human-like behavior in different ways and provide testable hypotheses for future research on the source of orthographic effects in spoken word recognition.

@inproceedings{macher-etal-2021-read,
title = {Do we read what we hear? Modeling orthographic influences on spoken word recognition},
author = {Nicole Macher and Badr M. Abdullah and Harm Brouwer and Dietrich Klakow},
url = {https://aclanthology.org/2021.eacl-srw.3},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop},
pages = {16-22},
publisher = {Association for Computational Linguistics},
address = {Online},
abstract = {Theories and models of spoken word recognition aim to explain the process of accessing lexical knowledge given an acoustic realization of a word form. There is consensus that phonological and semantic information is crucial for this process. However, there is accumulating evidence that orthographic information could also have an impact on auditory word recognition. This paper presents two models of spoken word recognition that instantiate different hypotheses regarding the influence of orthography on this process. We show that these models reproduce human-like behavior in different ways and provide testable hypotheses for future research on the source of orthographic effects in spoken word recognition.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A1 C4

Chingacham, Anupama; Demberg, Vera; Klakow, Dietrich

Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors Inproceedings

Proceedings of Interspeech 2021, pp. 1713–1717, 2021.

Listening in noisy environments can be difficult even for individuals with a normal hearing thresholds. The speech signal can be masked by noise, which may lead to word misperceptions on the side of the listener, and overall difficulty to understand the message. To mitigate hearing difficulties on listeners, a co-operative speaker utilizes voice modulation strategies like Lombard speech to generate noise-robust utterances, and similar solutions have been developed for speech synthesis systems. In this work, we propose an alternate solution of choosing noise-robust lexical paraphrases to represent an intended meaning. Our results show that lexical paraphrases differ in their intelligibility in noise. We evaluate the intelligibility of synonyms in context and find that choosing a lexical unit that is less risky to be misheard than its synonym introduced an average gain in comprehension of 37% at SNR -5 dB and 21% at SNR 0 dB for babble noise.

@inproceedings{Chingacham2021,
title = {Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors},
author = {Anupama Chingacham and Vera Demberg and Dietrich Klakow},
url = {https://arxiv.org/abs/2107.08337},
year = {2021},
date = {2021},
booktitle = {Proceedings of Interspeech 2021},
pages = {1713–1717},
abstract = {Listening in noisy environments can be difficult even for individuals with a normal hearing thresholds. The speech signal can be masked by noise, which may lead to word misperceptions on the side of the listener, and overall difficulty to understand the message. To mitigate hearing difficulties on listeners, a co-operative speaker utilizes voice modulation strategies like Lombard speech to generate noise-robust utterances, and similar solutions have been developed for speech synthesis systems. In this work, we propose an alternate solution of choosing noise-robust lexical paraphrases to represent an intended meaning. Our results show that lexical paraphrases differ in their intelligibility in noise. We evaluate the intelligibility of synonyms in context and find that choosing a lexical unit that is less risky to be misheard than its synonym introduced an average gain in comprehension of 37% at SNR -5 dB and 21% at SNR 0 dB for babble noise.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Voigtmann, Sophia; Speyer, Augustin

Information density as a factor for syntactic variation in Early New High German Inproceedings

Proceedings of Linguistic Evidence 2020, Tübingen, Germany, 2021.

In contrast to other languages like English, German has certain liberties in its word order. Different word orders do not influence the proposition of a sentence. The frame of the German clause are the sentence brackets (the left (LSB) and the right (RSB) sentence brackets) over which the parts of the predicate are distributed in the main clause, whereas in subordinate clauses, the left one can host subordinate conjunctions. But apart from the sentence brackets, the order of constituents is fairly variable, though a default word order (subject, indirect object, direct object for nouns; subject, direct object, indirect object for pronouns) exists. A deviation of this order can be caused by factors like focus, given-/newness, topicality, definiteness and animacy (Zubin & Köpcke, 1985; Reis, 1987; Müller, 1999; Lenerz, 2001 among others).

@inproceedings{voigtmannspeyerinprint,
title = {Information density as a factor for syntactic variation in Early New High German},
author = {Sophia Voigtmann and Augustin Speyer},
url = {https://ub01.uni-tuebingen.de/xmlui/handle/10900/134561},
year = {2021},
date = {2021},
booktitle = {Proceedings of Linguistic Evidence 2020},
address = {T{\"u}bingen, Germany},
abstract = {In contrast to other languages like English, German has certain liberties in its word order. Different word orders do not influence the proposition of a sentence. The frame of the German clause are the sentence brackets (the left (LSB) and the right (RSB) sentence brackets) over which the parts of the predicate are distributed in the main clause, whereas in subordinate clauses, the left one can host subordinate conjunctions. But apart from the sentence brackets, the order of constituents is fairly variable, though a default word order (subject, indirect object, direct object for nouns; subject, direct object, indirect object for pronouns) exists. A deviation of this order can be caused by factors like focus, given-/newness, topicality, definiteness and animacy (Zubin & K{\"o}pcke, 1985; Reis, 1987; M{\"u}ller, 1999; Lenerz, 2001 among others).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C6

Speyer, Augustin; Voigtmann, Sophia

Informationelle Bedingungen für die Selbständigkeit kausaler Satzaussagen. Eine diachrone Sichtweise Book Chapter

Külpmann, Robert; Finkbeiner, Rita (Ed.): Neues zur Selbstständigkeit von Sätzen, Linguistische Berichte, Sonderheft , Buske, pp. 177-206, Hamburg, 2021.

Das Deutsche bietet mehrere Möglichkeiten, eine Satzaussage, die in einer bestimmten logischen Beziehung zu einer anderen steht, zu kodieren. Relevant für das Thema des Workshops ist die Variation zwischen selbständigen und unselbständigen Versionen, wie es
am Beispiel einer kausalen Beziehung in (1) demonstriert ist.
(1) a. Uller kam früher nach Hause, weil Gwendolyn etwas mit ihm bereden wollte.
b. Uller kam früher nach Hause. (Denn) Gwendolyn wollte etwas mit ihm bereden.
Gerade zur Variation bei kausalen Verhältnissen ist in der Vergangenheit viel gearbeitet worden.

@inbook{speyervoigtmann_Bedingungen,
title = {Informationelle Bedingungen f{\"u}r die Selbst{\"a}ndigkeit kausaler Satzaussagen. Eine diachrone Sichtweise},
author = {Augustin Speyer and Sophia Voigtmann},
editor = {Robert K{\"u}lpmann and Rita Finkbeiner},
url = {https://buske.de/neues-zur-selbststandigkeit-von-satzen-16620.html},
doi = {https://doi.org/10.46771/978-3-96769-170-2},
year = {2021},
date = {2021},
booktitle = {Neues zur Selbstst{\"a}ndigkeit von S{\"a}tzen},
pages = {177-206},
publisher = {Buske},
address = {Hamburg},
abstract = {Das Deutsche bietet mehrere M{\"o}glichkeiten, eine Satzaussage, die in einer bestimmten logischen Beziehung zu einer anderen steht, zu kodieren. Relevant f{\"u}r das Thema des Workshops ist die Variation zwischen selbst{\"a}ndigen und unselbst{\"a}ndigen Versionen, wie es am Beispiel einer kausalen Beziehung in (1) demonstriert ist. (1) a. Uller kam fr{\"u}her nach Hause, weil Gwendolyn etwas mit ihm bereden wollte. b. Uller kam fr{\"u}her nach Hause. (Denn) Gwendolyn wollte etwas mit ihm bereden. Gerade zur Variation bei kausalen Verh{\"a}ltnissen ist in der Vergangenheit viel gearbeitet worden.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   C6

Speyer, Augustin; Voigtmann, Sophia

Factors for the integration of causal clauses in the history of German Book Chapter

Jedrzejowski, Lukasz; Fleczoreck, Constanze (Ed.): Micro- and Macro-variation of Causal Clauses: Synchronic and Diachronic Insights, John Benjamins Publishing Company, pp. 311–345, 2021.

The variation between integrated (verb-final) and independent (verb-second) causal clauses in German could depend on the amount of information conveyed in that clause. A lower amount might lead to integration, a higher amount to independence, as processing constraints might forbid integration of highly informative clauses. We use two ways to measure information amount: 1. the average ratio of given referents within the clause, 2. the cumulative surprisal of all words in the clause. Focusing on historical stages of German, a significant correlation between amount of information and integration was visible, regardless which method was used.

@inbook{speyervoigtmanninprinta,
title = {Factors for the integration of causal clauses in the history of German},
author = {Augustin Speyer and Sophia Voigtmann},
editor = {Lukasz Jedrzejowski and Constanze Fleczoreck},
url = {https://benjamins.com/catalog/slcs.231.11spe},
doi = {https://doi.org/10.1075/slcs.231.11spe},
year = {2021},
date = {2021},
booktitle = {Micro- and Macro-variation of Causal Clauses: Synchronic and Diachronic Insights},
pages = {311–345},
publisher = {John Benjamins Publishing Company},
abstract = {

The variation between integrated (verb-final) and independent (verb-second) causal clauses in German could depend on the amount of information conveyed in that clause. A lower amount might lead to integration, a higher amount to independence, as processing constraints might forbid integration of highly informative clauses. We use two ways to measure information amount: 1. the average ratio of given referents within the clause, 2. the cumulative surprisal of all words in the clause. Focusing on historical stages of German, a significant correlation between amount of information and integration was visible, regardless which method was used.

},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   C6

Sikos, Les; Venhuizen, Noortje; Drenhaus, Heiner; Crocker, Matthew W.

Speak before you listen: Pragmatic reasoning in multi-trial language games Inproceedings

Proceedings of the Annual Meeting of the Cognitive Science Society, 43, 2021.

Rational Speech Act theory (Frank & Goodman, 2012) has been successfully applied in numerous communicative settings, including studies using one-shot web-based language games. Several follow-up studies of the latter, however, suggest that listeners may not behave as pragmatically as originally suggested in those tasks. We investigate whether, in such reference games, listeners’ pragmatic reasoning about an informative speaker is improved by greater exposure to the task, and/or prior experience with being a speaker in this task. While we find limited evidence that increased exposure results in more pragmatic responses, listeners do show increased pragmatic reasoning after playing the role of the speaker. Moreover, we find that only in the Speaker-first condition, participant’s tendency to be an informative speaker predicts their degree of pragmatic behavior as a listener. These findings demonstrate that, in these settings, experience as a speaker enhances the ability of listeners to reason pragmatically, as modeled by RSA.

 

@inproceedings{sikos2021speak,
title = {Speak before you listen: Pragmatic reasoning in multi-trial language games},
author = {Les Sikos and Noortje Venhuizen and Heiner Drenhaus and Matthew W. Crocker},
url = {https://escholarship.org/uc/item/0xc7f7wc},
year = {2021},
date = {2021},
booktitle = {Proceedings of the Annual Meeting of the Cognitive Science Society},
abstract = {Rational Speech Act theory (Frank & Goodman, 2012) has been successfully applied in numerous communicative settings, including studies using one-shot web-based language games. Several follow-up studies of the latter, however, suggest that listeners may not behave as pragmatically as originally suggested in those tasks. We investigate whether, in such reference games, listeners’ pragmatic reasoning about an informative speaker is improved by greater exposure to the task, and/or prior experience with being a speaker in this task. While we find limited evidence that increased exposure results in more pragmatic responses, listeners do show increased pragmatic reasoning after playing the role of the speaker. Moreover, we find that only in the Speaker-first condition, participant’s tendency to be an informative speaker predicts their degree of pragmatic behavior as a listener. These findings demonstrate that, in these settings, experience as a speaker enhances the ability of listeners to reason pragmatically, as modeled by RSA.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Venhuizen, Noortje; Hendriks, Petra; Crocker, Matthew W.; Brouwer, Harm

Distributional formal semantics Journal Article

Information and Computation, pp. 104763, 2021, ISSN 0890-5401.

Natural language semantics has recently sought to combine the complementary strengths of formal and distributional approaches to meaning. However, given the fundamentally different ‘representational currency’ underlying these approaches—models of the world versus linguistic co-occurrence—their unification has proven extremely difficult. Here, we define Distributional Formal Semantics, which integrates distributionality into a formal semantic system on the level of formal models. This approach offers probabilistic, distributed meaning representations that are inherently compositional, and that naturally capture fundamental semantic notions such as quantification and entailment. Furthermore, we show how the probabilistic nature of these representations allows for probabilistic inference, and how the information-theoretic notion of “information” (measured in Entropy and Surprisal) naturally follows from it. Finally, we illustrate how meaning representations can be derived incrementally from linguistic input using a recurrent neural network model, and how the resultant incremental semantic construction procedure intuitively captures key semantic phenomena, including negation, presupposition, and anaphoricity.

@article{venhuizen2021distributional,
title = {Distributional formal semantics},
author = {Noortje Venhuizen and Petra Hendriks and Matthew W. Crocker and Harm Brouwer},
url = {https://www.sciencedirect.com/science/article/pii/S089054012100078X},
doi = {https://doi.org/10.1016/j.ic.2021.104763},
year = {2021},
date = {2021},
journal = {Information and Computation},
pages = {104763},
abstract = {Natural language semantics has recently sought to combine the complementary strengths of formal and distributional approaches to meaning. However, given the fundamentally different ‘representational currency’ underlying these approaches—models of the world versus linguistic co-occurrence—their unification has proven extremely difficult. Here, we define Distributional Formal Semantics, which integrates distributionality into a formal semantic system on the level of formal models. This approach offers probabilistic, distributed meaning representations that are inherently compositional, and that naturally capture fundamental semantic notions such as quantification and entailment. Furthermore, we show how the probabilistic nature of these representations allows for probabilistic inference, and how the information-theoretic notion of “information” (measured in Entropy and Surprisal) naturally follows from it. Finally, we illustrate how meaning representations can be derived incrementally from linguistic input using a recurrent neural network model, and how the resultant incremental semantic construction procedure intuitively captures key semantic phenomena, including negation, presupposition, and anaphoricity.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   A1 C3

Ortmann, Katrin

Chunking Historical German Inproceedings

Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), Linköping University Electronic Press, Sweden, pp. 190-199, Reykjavik, Iceland (Online), 2021.

Quantitative studies of historical syntax require large amounts of syntactically annotated data, which are rarely available. The application of NLP methods could reduce manual annotation effort, provided that they achieve sufficient levels of accuracy. The present study investigates the automatic identification of chunks in historical German texts. Because no training data exists for this task, chunks are extracted from modern and historical constituency treebanks and used to train a CRF-based neural sequence labeling tool. The evaluation shows that the neural chunker outperforms an unlexicalized baseline and achieves overall F-scores between 90% and 94% for different historical data sets when POS tags are used as feature. The conducted experiments demonstrate the usefulness of including historical training data while also highlighting the importance of reducing boundary errors to improve annotation precision.

@inproceedings{Ortmann2021,
title = {Chunking Historical German},
author = {Katrin Ortmann},
url = {https://aclanthology.org/2021.nodalida-main.19},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
pages = {190-199},
publisher = {Link{\"o}ping University Electronic Press, Sweden},
address = {Reykjavik, Iceland (Online)},
abstract = {Quantitative studies of historical syntax require large amounts of syntactically annotated data, which are rarely available. The application of NLP methods could reduce manual annotation effort, provided that they achieve sufficient levels of accuracy. The present study investigates the automatic identification of chunks in historical German texts. Because no training data exists for this task, chunks are extracted from modern and historical constituency treebanks and used to train a CRF-based neural sequence labeling tool. The evaluation shows that the neural chunker outperforms an unlexicalized baseline and achieves overall F-scores between 90% and 94% for different historical data sets when POS tags are used as feature. The conducted experiments demonstrate the usefulness of including historical training data while also highlighting the importance of reducing boundary errors to improve annotation precision.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C6

Muhlack, Beeke; Elmers, Mikey; Drenhaus, Heiner; van Os, Marjolein; Werner, Raphael; Ryzhova, Margarita; Möbius, Bernd

Revisiting recall effects of filler particles in German and English Inproceedings

Proceedings of Interspeech 2021, Interspeech, pp. 3979-3983, Brno, Czechia, 2021.

This paper reports on two experiments that partially replicate an experiment by Fraundorf and Watson (2011, J Mem. Lang.) on the recall effect of filler particles. Their subjects listened to three passages of a story, either with or without filler particles, which they had to retell afterwards. They analysed the subject‘ retelling in terms of whether important plot points were remem- bered or not. For their English data, they found that filler parti- cles facilitate the recall of the plot points significantly compared to stories that did not include filler particles. As this seems to be a convincing experimental design, we aimed at evaluating this method as a web-based experiment which may, if found to be suitable, easily be applied to other languages. Furthermore, we investigated whether their results are found in German as well (Experiment 1), and evaluated whether filler duration has an ef- fect on recall performance (Experiment 2). Our results could not replicate the findings of the original study: in fact, the op- posite effect was found for German. In Experiment 1, participants performed better on recall in the fluent condition, while no significant results were found for English in Experiment 2.

@inproceedings{Muhlack2021,
title = {Revisiting recall effects of filler particles in German and English},
author = {Beeke Muhlack and Mikey Elmers and Heiner Drenhaus and Marjolein van Os and Raphael Werner and Margarita Ryzhova and Bernd M{\"o}bius},
url = {https://www.isca-speech.org/archive/interspeech_2021/muhlack21_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2021-1056},
year = {2021},
date = {2021},
booktitle = {Proceedings of Interspeech 2021},
pages = {3979-3983},
publisher = {Interspeech},
address = {Brno, Czechia},
abstract = {This paper reports on two experiments that partially replicate an experiment by Fraundorf and Watson (2011, J Mem. Lang.) on the recall effect of filler particles. Their subjects listened to three passages of a story, either with or without filler particles, which they had to retell afterwards. They analysed the subject' retelling in terms of whether important plot points were remem- bered or not. For their English data, they found that filler parti- cles facilitate the recall of the plot points significantly compared to stories that did not include filler particles. As this seems to be a convincing experimental design, we aimed at evaluating this method as a web-based experiment which may, if found to be suitable, easily be applied to other languages. Furthermore, we investigated whether their results are found in German as well (Experiment 1), and evaluated whether filler duration has an ef- fect on recall performance (Experiment 2). Our results could not replicate the findings of the original study: in fact, the op- posite effect was found for German. In Experiment 1, participants performed better on recall in the fluent condition, while no significant results were found for English in Experiment 2.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Höller, Daniel; Behnke, Gregor

Loop Detection in the PANDA Planning System Inproceedings

Proceedings of the 31st International Conference on Automated Planning and Scheduling (ICAPS), 31, AAAI Press, pp. 168-173, 2021.

The International Planning Competition (IPC) in 2020 wasthe first one for a long time to host tracks on Hierarchical Task Network (HTN) planning. HYPERTENSION, the winner of the tack on totally-ordered problems, comes with an interesting technique: it stores parts of the decomposition path in the state to mark expanded tasks and forces its depth first search to leave recursive structures in the hierarchy. This can be seen as a form of loop detection (LD) – a technique that is not very common in HTN planning. This might be due to the spirit of encoding enough advice in the model to find plans (so that loop detection is simply not necessary), or because it becomes a computationally hard task in the general (i.e. partially-ordered) setting. We integrated several approximate and exact techniques for LD into the progression search of the HTN planner PANDA. We test our techniques on the benchmark set of the IPC 2020. Both in the partial ordered and total ordered track, PANDA with LD performs better than the respective winner of the competition.

@inproceedings{hoeller-behnke-21-LD,
title = {Loop Detection in the PANDA Planning System},
author = {Daniel H{\"o}ller and Gregor Behnke},
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/15959},
year = {2021},
date = {2021-07-21},
booktitle = {Proceedings of the 31st International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {168-173},
publisher = {AAAI Press},
abstract = {The International Planning Competition (IPC) in 2020 wasthe first one for a long time to host tracks on Hierarchical Task Network (HTN) planning. HYPERTENSION, the winner of the tack on totally-ordered problems, comes with an interesting technique: it stores parts of the decomposition path in the state to mark expanded tasks and forces its depth first search to leave recursive structures in the hierarchy. This can be seen as a form of loop detection (LD) – a technique that is not very common in HTN planning. This might be due to the spirit of encoding enough advice in the model to find plans (so that loop detection is simply not necessary), or because it becomes a computationally hard task in the general (i.e. partially-ordered) setting. We integrated several approximate and exact techniques for LD into the progression search of the HTN planner PANDA. We test our techniques on the benchmark set of the IPC 2020. Both in the partial ordered and total ordered track, PANDA with LD performs better than the respective winner of the competition.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Höller, Daniel

Translating Totally Ordered HTN Planning Problems to Classical Planning Problems Using Regular Approximation of Context-Free Languages Inproceedings

Proceedings of the 31st International Conference on Automated Planning and Scheduling (ICAPS), 31, AAAI Press, pp. 159-167, 2021.

There have been several approaches to use techniques from classical planning in HTN planning. While a direct translation is in general not possible due to the different expressiveness, there have been translations of bounded HTN problems and approaches to use classical heuristics in HTN search procedures. In this paper, we introduce a different approach. We exploit methods from the field of Computational Linguistics introduced to approximate Context-Free Languages by Finite Automata. We use them to approximate the decomposition structure of Totally Ordered (TO) HTN planning problems by classical problems. The resulting problem can then be solved using standard classical planning systems. A subset of TOHTN problems can be translated exactly, i.e., without changing the set of solutions. For problems where an approximation is necessary, we use an overapproximation, i.e., the set of solutions to the classical problem is a superset of that of the HTN problem. We then use plan verification to check whether a solution is valid and thus obtain a sound and complete overall approach. The resulting system outperforms the state of the art on the IPC 2020 benchmark set in terms of coverage.

@inproceedings{hoeller-21-toad,
title = {Translating Totally Ordered HTN Planning Problems to Classical Planning Problems Using Regular Approximation of Context-Free Languages},
author = {Daniel H{\"o}ller},
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/15958},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 31st International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {159-167},
publisher = {AAAI Press},
abstract = {There have been several approaches to use techniques from classical planning in HTN planning. While a direct translation is in general not possible due to the different expressiveness, there have been translations of bounded HTN problems and approaches to use classical heuristics in HTN search procedures. In this paper, we introduce a different approach. We exploit methods from the field of Computational Linguistics introduced to approximate Context-Free Languages by Finite Automata. We use them to approximate the decomposition structure of Totally Ordered (TO) HTN planning problems by classical problems. The resulting problem can then be solved using standard classical planning systems. A subset of TOHTN problems can be translated exactly, i.e., without changing the set of solutions. For problems where an approximation is necessary, we use an overapproximation, i.e., the set of solutions to the classical problem is a superset of that of the HTN problem. We then use plan verification to check whether a solution is valid and thus obtain a sound and complete overall approach. The resulting system outperforms the state of the art on the IPC 2020 benchmark set in terms of coverage.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Lauer, Pascal; Torralba, Álvaro; Fišer, Daniel; Höller, Daniel; Wichlacz, Julia; Hoffmann, Jörg

Polynomial-Time in PDDL Input Size: Making the Delete Relaxation Feasible for Lifted Planning Inproceedings

Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), IJCAI organization, pp. 4119-4126, 2021.

Abstract: Polynomial-time heuristic functions for planning are commonplace since 20 years. But polynomial-time in which input? Almost all existing approaches are based on a grounded task representation, not on the actual PDDL input which is exponentially smaller. This limits practical applicability to cases where the grounded representation is „small enough“. Previous attempts to tackle this problem for the delete relaxation leveraged symmetries to reduce the blow-up. Here we take a more radical approach, applying an additional relaxation to obtain a heuristic function that runs in time polynomial in the size of the PDDL input. Our relaxation splits the predicates into smaller predicates of fixed arity K. We show that computing a relaxed plan is still NP-hard (in PDDL input size) for K>=2, but is polynomial-time for K=1. We implement a heuristic function for K=1 and show that it can improve the state of the art on benchmarks whose grounded representation is large.

@inproceedings{lauer-21,
title = {Polynomial-Time in PDDL Input Size: Making the Delete Relaxation Feasible for Lifted Planning},
author = {Pascal Lauer and {\'A}lvaro Torralba and Daniel Fišer and Daniel H{\"o}ller and Julia Wichlacz and J{\"o}rg Hoffmann},
url = {https://www.ijcai.org/proceedings/2021/567},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {4119-4126},
publisher = {IJCAI organization},
abstract = {Abstract: Polynomial-time heuristic functions for planning are commonplace since 20 years. But polynomial-time in which input? Almost all existing approaches are based on a grounded task representation, not on the actual PDDL input which is exponentially smaller. This limits practical applicability to cases where the grounded representation is "small enough". Previous attempts to tackle this problem for the delete relaxation leveraged symmetries to reduce the blow-up. Here we take a more radical approach, applying an additional relaxation to obtain a heuristic function that runs in time polynomial in the size of the PDDL input. Our relaxation splits the predicates into smaller predicates of fixed arity K. We show that computing a relaxed plan is still NP-hard (in PDDL input size) for K>=2, but is polynomial-time for K=1. We implement a heuristic function for K=1 and show that it can improve the state of the art on benchmarks whose grounded representation is large.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Wichlacz, Julia; Höller, Daniel; Hoffmann, Jörg

Landmark Heuristics for Lifted Planning – Extended Abstract Inproceedings

Proceedings of the 13th International Symposium on Combinatorial Search (SoCS), AAAI Press, pp. 242-244, 2021.

Planning problems are usually modeled using lifted representations, they specify predicates and action schemas using variables over a finite universe of objects. However, current planning systems like Fast Downward need a grounded (propositional) input model. The process of grounding might result in an exponential blowup of the model size. This limits the application of grounded planning systems in practical applications. Recent work introduced an efficient planning system for lifted heuristic search, but the work on lifted heuristics is still limited. In this extended abstract, we introduce a novel lifted heuristic based on landmarks, which we extract from the lifted problem representation. Preliminary results on a benchmark set specialized to lifted planning show that there are domains where our approach finds enough landmarks to guide the search more effective than the heuristics available.

@inproceedings{wichlacz-21,
title = {Landmark Heuristics for Lifted Planning – Extended Abstract},
author = {Julia Wichlacz and Daniel H{\"o}ller and J{\"o}rg Hoffmann},
url = {https://ojs.aaai.org/index.php/SOCS/article/view/18597},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 13th International Symposium on Combinatorial Search (SoCS)},
pages = {242-244},
publisher = {AAAI Press},
abstract = {Planning problems are usually modeled using lifted representations, they specify predicates and action schemas using variables over a finite universe of objects. However, current planning systems like Fast Downward need a grounded (propositional) input model. The process of grounding might result in an exponential blowup of the model size. This limits the application of grounded planning systems in practical applications. Recent work introduced an efficient planning system for lifted heuristic search, but the work on lifted heuristics is still limited. In this extended abstract, we introduce a novel lifted heuristic based on landmarks, which we extract from the lifted problem representation. Preliminary results on a benchmark set specialized to lifted planning show that there are domains where our approach finds enough landmarks to guide the search more effective than the heuristics available.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Tröger, Johannes; Lindsay, Hali; Mina, Mario; Linz, Nicklas; Klöppel, Stefan; Kray, Jutta; Peter, Jessica

Patients with amnestic MCI Fail to Adapt Executive Control When Repeatedly Tested with Semantic Verbal Fluency Tasks Journal Article

Journal of the International Neuropsychological Society, Cambridge University Press, pp. 1-8, 2021.

Semantic verbal fluency (SVF) tasks require individuals to name items from a specified category within a fixed time. An impaired SVF performance is well documented in patients with amnestic Mild Cognitive Impairment (aMCI). The two leading theoretical views suggest either loss of semantic knowledge or impaired executive control to be responsible. We assessed SVF 3 times on 2 consecutive days in 29 healthy controls (HC) and 29 patients with aMCI with the aim to answer the question which of the two views holds true. When doing the task for the first time, patients with aMCI produced fewer and more common words with a shorter mean response latency. When tested repeatedly, only healthy volunteers increased performance. Likewise, only the performance of HC indicated two distinct retrieval processes: a prompt retrieval of readily available items at the beginning of the task and an active search through semantic space towards the end. With repeated assessment, the pool of readily available items became larger in HC, but not patients with aMCI. The production of fewer and more common words in aMCI points to a smaller search set and supports the loss of semantic knowledge view. The failure to improve performance as well as the lack of distinct retrieval processes point to an additional impairment in executive control. Our data did not clearly favour one theoretical view over the other, but rather indicates that the impairment of patients with aMCI in SVF is due to a combination of both.

@article{troger2021patients,
title = {Patients with amnestic MCI Fail to Adapt Executive Control When Repeatedly Tested with Semantic Verbal Fluency Tasks},
author = {Johannes Tr{\"o}ger and Hali Lindsay and Mario Mina and Nicklas Linz and Stefan Kl{\"o}ppel and Jutta Kray and Jessica Peter},
url = {https://www.cambridge.org/core/journals/journal-of-the-international-neuropsychological-society/article/abs/patients-with-amnestic-mci-fail-to-adapt-executive-control-when-repeatedly-tested-with-semantic-verbal-fluency-tasks/E09D9B7801DA02360B056E34E0BD96F7},
year = {2021},
date = {2021-06-30},
journal = {Journal of the International Neuropsychological Society},
pages = {1-8},
publisher = {Cambridge University Press},
abstract = {

Semantic verbal fluency (SVF) tasks require individuals to name items from a specified category within a fixed time. An impaired SVF performance is well documented in patients with amnestic Mild Cognitive Impairment (aMCI). The two leading theoretical views suggest either loss of semantic knowledge or impaired executive control to be responsible. We assessed SVF 3 times on 2 consecutive days in 29 healthy controls (HC) and 29 patients with aMCI with the aim to answer the question which of the two views holds true. When doing the task for the first time, patients with aMCI produced fewer and more common words with a shorter mean response latency. When tested repeatedly, only healthy volunteers increased performance. Likewise, only the performance of HC indicated two distinct retrieval processes: a prompt retrieval of readily available items at the beginning of the task and an active search through semantic space towards the end. With repeated assessment, the pool of readily available items became larger in HC, but not patients with aMCI. The production of fewer and more common words in aMCI points to a smaller search set and supports the loss of semantic knowledge view. The failure to improve performance as well as the lack of distinct retrieval processes point to an additional impairment in executive control. Our data did not clearly favour one theoretical view over the other, but rather indicates that the impairment of patients with aMCI in SVF is due to a combination of both.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A5

Brandt, Erika; Möbius, Bernd; Andreeva, Bistra

Dynamic Formant Trajectories in German Read Speech: Impact of Predictability and Prominence Journal Article

Frontiers in Communication, section Language Sciences, 6, pp. 1-15, 2021.

Phonetic structures expand temporally and spectrally when they are difficult to predict from their context. To some extent, effects of predictability are modulated by prosodic structure. So far, studies on the impact of contextual predictability and prosody on phonetic structures have neglected the dynamic nature of the speech signal. This study investigates the impact of predictability and prominence on the dynamic structure of the first and second formants of German vowels. We expect to find differences in the formant movements between vowels standing in different predictability contexts and a modulation of this effect by prominence. First and second formant values are extracted from a large German corpus. Formant trajectories of peripheral vowels are modeled using generalized additive mixed models, which estimate nonlinear regressions between a dependent variable and predictors. Contextual predictability is measured as biphone and triphone surprisal based on a statistical German language model. We test for the effects of the information-theoretic measures surprisal and word frequency, as well as prominence, on formant movement, while controlling for vowel phonemes and duration. Primary lexical stress and vowel phonemes are significant predictors of first and second formant trajectory shape. We replicate previous findings that vowels are more dispersed in stressed syllables than in unstressed syllables. The interaction of stress and surprisal explains formant movement: unstressed vowels show more variability in their formant trajectory shape at different surprisal levels than stressed vowels. This work shows that effects of contextual predictability on fine phonetic detail can be observed not only in pointwise measures but also in dynamic features of phonetic segments.

@article{Brandt/etal:2021,
title = {Dynamic Formant Trajectories in German Read Speech: Impact of Predictability and Prominence},
author = {Erika Brandt and Bernd M{\"o}bius and Bistra Andreeva},
url = {https://www.frontiersin.org/articles/10.3389/fcomm.2021.643528/full},
doi = {https://doi.org/10.3389/fcomm.2021.643528},
year = {2021},
date = {2021-06-21},
journal = {Frontiers in Communication, section Language Sciences},
pages = {1-15},
volume = {6},
number = {643528},
abstract = {Phonetic structures expand temporally and spectrally when they are difficult to predict from their context. To some extent, effects of predictability are modulated by prosodic structure. So far, studies on the impact of contextual predictability and prosody on phonetic structures have neglected the dynamic nature of the speech signal. This study investigates the impact of predictability and prominence on the dynamic structure of the first and second formants of German vowels. We expect to find differences in the formant movements between vowels standing in different predictability contexts and a modulation of this effect by prominence. First and second formant values are extracted from a large German corpus. Formant trajectories of peripheral vowels are modeled using generalized additive mixed models, which estimate nonlinear regressions between a dependent variable and predictors. Contextual predictability is measured as biphone and triphone surprisal based on a statistical German language model. We test for the effects of the information-theoretic measures surprisal and word frequency, as well as prominence, on formant movement, while controlling for vowel phonemes and duration. Primary lexical stress and vowel phonemes are significant predictors of first and second formant trajectory shape. We replicate previous findings that vowels are more dispersed in stressed syllables than in unstressed syllables. The interaction of stress and surprisal explains formant movement: unstressed vowels show more variability in their formant trajectory shape at different surprisal levels than stressed vowels. This work shows that effects of contextual predictability on fine phonetic detail can be observed not only in pointwise measures but also in dynamic features of phonetic segments.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Jágrová, Klára; Hedderich, Michael; Mosbach, Marius; Avgustinova, Tania; Klakow, Dietrich

On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers Journal Article

Frontiers in Psychology, 12, pp. 2296, 2021, ISSN 1664-1078.

This contribution seeks to provide a rational probabilistic explanation for the intelligibility of words in a genetically related language that is unknown to the reader, a phenomenon referred to as intercomprehension. In this research domain, linguistic distance, among other factors, was proved to correlate well with the mutual intelligibility of individual words. However, the role of context for the intelligibility of target words in sentences was subject to very few studies. To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences. We compare correlations of target word intelligibility with data from 3-g language models (LMs) to their correlations with data obtained from context-aware LMs. More specifically, we evaluate two context-aware LM architectures: Long Short-Term Memory (LSTMs) that can, theoretically, take infinitely long-distance dependencies into account and Transformer-based LMs which can access the whole input sequence at the same time. We investigate how their use of context affects surprisal and its correlation with intelligibility.

@article{10.3389/fpsyg.2021.662277,
title = {On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers},
author = {Kl{\'a}ra J{\'a}grov{\'a} and Michael Hedderich and Marius Mosbach and Tania Avgustinova and Dietrich Klakow},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2021.662277/full},
doi = {https://doi.org/10.3389/fpsyg.2021.662277},
year = {2021},
date = {2021},
journal = {Frontiers in Psychology},
pages = {2296},
volume = {12},
abstract = {This contribution seeks to provide a rational probabilistic explanation for the intelligibility of words in a genetically related language that is unknown to the reader, a phenomenon referred to as intercomprehension. In this research domain, linguistic distance, among other factors, was proved to correlate well with the mutual intelligibility of individual words. However, the role of context for the intelligibility of target words in sentences was subject to very few studies. To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences. We compare correlations of target word intelligibility with data from 3-g language models (LMs) to their correlations with data obtained from context-aware LMs. More specifically, we evaluate two context-aware LM architectures: Long Short-Term Memory (LSTMs) that can, theoretically, take infinitely long-distance dependencies into account and Transformer-based LMs which can access the whole input sequence at the same time. We investigate how their use of context affects surprisal and its correlation with intelligibility.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   B4 C4

Lapshinova-Koltunski, Ekaterina; Bizzoni, Yuri; Przybyl, Heike; Teich, Elke

Found in translation/interpreting: combining data-driven and supervised methods to analyse cross-linguistically mediated communication Inproceedings

Proceedings of the Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21), Association for Computational Linguistics, pp. 82-90, online, 2021.

We report on a study of the specific linguistic properties of cross-linguistically mediated communication, comparing written and spoken translation (simultaneous interpreting) in the domain of European Parliament discourse. Specifically, we compare translations and interpreting with target language original texts/speeches in terms of (a) predefined features commonly used for translationese detection, and (b) features derived in a data-driven fashion from translation and interpreting corpora. For the latter, we use n-gram language models combined with relative entropy (Kullback-Leibler Divergence). We set up a number of classification tasks comparing translations with comparable texts originally written in the target language and interpreted speeches with target language comparable speeches to assess the contributions of predefined and data-driven features to the distinction between translation, interpreting and originals. Our analysis reveals that interpreting is more distinct from comparable originals than translation and that its most distinctive features signal an overemphasis of oral, online production more than showing traces of cross-linguistically mediated communication.

@inproceedings{LapshinovaEtAl2021interp,
title = {Found in translation/interpreting: combining data-driven and supervised methods to analyse cross-linguistically mediated communication},
author = {Ekaterina Lapshinova-Koltunski and Yuri Bizzoni and Heike Przybyl and Elke Teich},
url = {https://aclanthology.org/2021.motra-1.9/},
year = {2021},
date = {2021-05-31},
booktitle = {Proceedings of the Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21)},
pages = {82-90},
publisher = {Association for Computational Linguistics},
address = {online},
abstract = {We report on a study of the specific linguistic properties of cross-linguistically mediated communication, comparing written and spoken translation (simultaneous interpreting) in the domain of European Parliament discourse. Specifically, we compare translations and interpreting with target language original texts/speeches in terms of (a) predefined features commonly used for translationese detection, and (b) features derived in a data-driven fashion from translation and interpreting corpora. For the latter, we use n-gram language models combined with relative entropy (Kullback-Leibler Divergence). We set up a number of classification tasks comparing translations with comparable texts originally written in the target language and interpreted speeches with target language comparable speeches to assess the contributions of predefined and data-driven features to the distinction between translation, interpreting and originals. Our analysis reveals that interpreting is more distinct from comparable originals than translation and that its most distinctive features signal an overemphasis of oral, online production more than showing traces of cross-linguistically mediated communication.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Successfully