Publications

Aurnhammer, Christoph; Delogu, Francesca; Brouwer, Harm; Crocker, Matthew W.

The P600 as a Continuous Index of Integration Effort Journal Article

Psychophysiology, 2023, ISSN 1469-8986.

The integration of word meaning into an unfolding utterance representation is a core operation of incremental language comprehension. There is considerable debate, however, as to which component of the ERP signal—the N400 or the P600—directly reflects integrative processes, with far reaching consequences for the temporal organization and architecture of the comprehension system. Multi-stream models maintaining the N400 as integration crucially rely on the presence of a semantically attractive plausible alternative interpretation to account for the absence of an N400 effect in response to certain semantic anomalies, as reported in previous studies. The single-stream Retrieval–Integration account posits the P600 as an index of integration, further predicting that its amplitude varies continuously with integrative effort. Here, we directly test these competing hypotheses using a context manipulation design in which a semantically attractive alternative is either available or not, and target word plausibility is varied across three levels. An initial self-paced reading study revealed graded reading times for plausibility, suggesting differential integration effort. A subsequent ERP study showed no N400 differences across conditions, and that P600 amplitude is graded for plausibility. These findings are inconsistent with the interpretation of the N400 as an index of integration, as no N400 effect emerged even in the absence of a semantically attractive alternative. By contrast, the link between plausibility, reading times, and P600 amplitude supports the view that the P600 is a continuous index of integration effort. More generally, our results support a single-stream architecture and eschew the need for multi-stream accounts.

@article{aurnhammer2023continuous,
title = {The P600 as a Continuous Index of Integration Effort},
author = {Christoph Aurnhammer and Francesca Delogu and Harm Brouwer and Matthew W. Crocker},
url = {https://onlinelibrary.wiley.com/doi/10.1111/psyp.14302},
doi = {https://doi.org/10.1111/psyp.14302},
year = {2023},
date = {2023},
journal = {Psychophysiology},
abstract = {The integration of word meaning into an unfolding utterance representation is a core operation of incremental language comprehension. There is considerable debate, however, as to which component of the ERP signal—the N400 or the P600—directly reflects integrative processes, with far reaching consequences for the temporal organization and architecture of the comprehension system. Multi-stream models maintaining the N400 as integration crucially rely on the presence of a semantically attractive plausible alternative interpretation to account for the absence of an N400 effect in response to certain semantic anomalies, as reported in previous studies. The single-stream Retrieval–Integration account posits the P600 as an index of integration, further predicting that its amplitude varies continuously with integrative effort. Here, we directly test these competing hypotheses using a context manipulation design in which a semantically attractive alternative is either available or not, and target word plausibility is varied across three levels. An initial self-paced reading study revealed graded reading times for plausibility, suggesting differential integration effort. A subsequent ERP study showed no N400 differences across conditions, and that P600 amplitude is graded for plausibility. These findings are inconsistent with the interpretation of the N400 as an index of integration, as no N400 effect emerged even in the absence of a semantically attractive alternative. By contrast, the link between plausibility, reading times, and P600 amplitude supports the view that the P600 is a continuous index of integration effort. More generally, our results support a single-stream architecture and eschew the need for multi-stream accounts.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Demberg, Vera; Kravtchenko, Ekaterina; Loy, Jia

A systematic evaluation of factors affecting referring expression choice in passage completion tasks Journal Article

Journal of Memory and Language, 130, 104413, 2023.

There is a long-standing controversy around the question of whether referent predictability affects pronominalization: while there are good theoretical reasons for this prediction (e.g., Arnold, 2008), the experimental evidence has been rather mixed. We here report on three highly powered studies that manipulate a range of factors that have differed between previous studies, in order to determine more exactly under which conditions a predictability effect on pronominalization can be found. We use a constrained as well as a free reference task, and manipulate verb type, antecedent ambiguity, length of NP and whether the stimuli are presented within a story context or not. Our results find the story context to be the single important factor that allows to elicit an effect of predictability on pronoun choice, in line with (Rosa and Arnold, 2017; Weatherford and Arnold, 2021). We also propose a parametrization for a rational speech act model, that reconciles the findings between many of the experiments in the literature.

@article{Demberg.etal23,
title = {A systematic evaluation of factors affecting referring expression choice in passage completion tasks},
author = {Vera Demberg and Ekaterina Kravtchenko and Jia Loy},
url = {https://europepmc.org/article/MED/37265576},
year = {2023},
date = {2023},
journal = {Journal of Memory and Language, 130, 104413},
abstract = {There is a long-standing controversy around the question of whether referent predictability affects pronominalization: while there are good theoretical reasons for this prediction (e.g., Arnold, 2008), the experimental evidence has been rather mixed. We here report on three highly powered studies that manipulate a range of factors that have differed between previous studies, in order to determine more exactly under which conditions a predictability effect on pronominalization can be found. We use a constrained as well as a free reference task, and manipulate verb type, antecedent ambiguity, length of NP and whether the stimuli are presented within a story context or not. Our results find the story context to be the single important factor that allows to elicit an effect of predictability on pronoun choice, in line with (Rosa and Arnold, 2017; Weatherford and Arnold, 2021). We also propose a parametrization for a rational speech act model, that reconciles the findings between many of the experiments in the literature.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A3

Ortmann, Katrin

Computational Methods for Investigating Syntactic Change: Automatic Identification of Extraposition in Modern and Historical German PhD Thesis

Bochumer Linguistische Arbeitsberichte (BLA) 25, 2023.

The linguistic analysis of historical German and diachronic syntactic change is traditionally based on small, manually annotated data sets. As a consequence, such studies lack the generalizability and statistical significance that quantitative approaches can offer. In this thesis, computational methods for the automatic syntactic analysis of modern and historical German are developed, which help to overcome the natural limits of manual annotation and enable the creation of large annotated data sets. The main goal of the thesis is to identify extraposition in modern and historical German, with extraposition being defined as the movement of constituents from their base position to the post-field of the sentence (Höhle 2019; Wöllstein 2018). For the automatic recognition of extraposition, two annotation steps are combined: (i) a topological field analysis for the identification of post-fields and (ii) a constituency analysis to recognize candidates for extraposition. The thesis describes experiments on topological field parsing (Ortmann 2020), chunking (Ortmann 2021a), and constituency parsing (Ortmann 2021b). The best results are achieved with statistical models trained on Part-of-Speech tags as input. Contrary to previous studies, all annotation steps are thoroughly evaluated with the newly developed FairEval method for the fine-grained error analysis and fair evaluation of labeled spans (Ortmann 2022). In an example analysis, the created methods are applied to large collections of modern and historical text to explore different factors for the extraposition of relative clauses, demonstrating the practical value of computational approaches for linguistic studies. The developed methods are released as the CLASSIG pipeline (Computational Linguistic Analysis of Syntactic Structures In German) at https://github.com/rubcompling/classig- pipeline. Data sets, models, and evaluation results are provided for download at https://github.com/rubcompling/classig-data and https://doi.org/10.5281/zenodo.7180973.

@phdthesis{ortmann23,
title = {Computational Methods for Investigating Syntactic Change: Automatic Identification of Extraposition in Modern and Historical German},
author = {Katrin Ortmann},
url = {https://www.linguistics.rub.de/forschung/arbeitsberichte/25.pdf},
year = {2023},
date = {2023},
publisher = {Bochumer Linguistische Arbeitsberichte (BLA) 25},
abstract = {The linguistic analysis of historical German and diachronic syntactic change is traditionally based on small, manually annotated data sets. As a consequence, such studies lack the generalizability and statistical significance that quantitative approaches can offer. In this thesis, computational methods for the automatic syntactic analysis of modern and historical German are developed, which help to overcome the natural limits of manual annotation and enable the creation of large annotated data sets. The main goal of the thesis is to identify extraposition in modern and historical German, with extraposition being defined as the movement of constituents from their base position to the post-field of the sentence (H{\"o}hle 2019; W{\"o}llstein 2018). For the automatic recognition of extraposition, two annotation steps are combined: (i) a topological field analysis for the identification of post-fields and (ii) a constituency analysis to recognize candidates for extraposition. The thesis describes experiments on topological field parsing (Ortmann 2020), chunking (Ortmann 2021a), and constituency parsing (Ortmann 2021b). The best results are achieved with statistical models trained on Part-of-Speech tags as input. Contrary to previous studies, all annotation steps are thoroughly evaluated with the newly developed FairEval method for the fine-grained error analysis and fair evaluation of labeled spans (Ortmann 2022). In an example analysis, the created methods are applied to large collections of modern and historical text to explore different factors for the extraposition of relative clauses, demonstrating the practical value of computational approaches for linguistic studies. The developed methods are released as the CLASSIG pipeline (Computational Linguistic Analysis of Syntactic Structures In German) at https://github.com/rubcompling/classig- pipeline. Data sets, models, and evaluation results are provided for download at https://github.com/rubcompling/classig-data and https://doi.org/10.5281/zenodo.7180973.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C6

Hug, Marius; Rau, Felix; Debbeler, Anke; Saleh, Sara; Mollenhauer, Elisabeth; Leinen, Peter; Genêt, Philippe; Trippel, Thorsten; Zinn, Claus; Dogaru, George; Witt, Andreas; Werthmann, Antonina; Draxler, Christoph; Schiel, Florian; Knappen, Jörg; Fischer, Stefan; Krielke, Marie-Pauline; Teich, Elke; Barth, Florian; Calvo Tello, José; Funk, Stefan E.; Göbel, Mathias; Kurzawe, Daniel; Veentjer, Ubbo; Weimer, Lukas; Blätte, Andreas; Lehmberg, Timm

Wohin damit? Storing and reusing my language data: Minute Madness der Datenzentren Miscellaneous

Text+, Zenodo, pp. 1-12, Potsdam, 2023.

Präsentiert beim Workshop „Wohin damit? Storing and reusing my language data“ am 22. Juni 2023 in Mannheim. Die Präsentation wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. gehalten.

@miscellaneous{HugRauDebbeleretal.2023,
title = {Wohin damit? Storing and reusing my language data: Minute Madness der Datenzentren},
author = {Marius Hug and Felix Rau and Anke Debbeler and Sara Saleh and Elisabeth Mollenhauer and Peter Leinen and Philippe Genêt and Thorsten Trippel and Claus Zinn and George Dogaru and Andreas Witt and Antonina Werthmann and Christoph Draxler and Florian Schiel and J{\"o}rg Knappen and Stefan Fischer and Marie-Pauline Krielke and Elke Teich and Florian Barth and Jos{\'e} Calvo Tello and Stefan E. Funk and Mathias G{\"o}bel and Daniel Kurzawe and Ubbo Veentjer and Lukas Weimer and Andreas Bl{\"a}tte and Timm Lehmberg},
url = {https://nbn-resolving.org/urn:nbn:de:bsz:mh39-121108},
doi = {https://doi.org/10.5281/zenodo.8123896},
year = {2023},
date = {2023},
booktitle = {Text+},
pages = {1-12},
publisher = {Zenodo},
address = {Potsdam},
abstract = {Pr{\"a}sentiert beim Workshop "Wohin damit? Storing and reusing my language data" am 22. Juni 2023 in Mannheim. Die Pr{\"a}sentation wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. gehalten.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B1

Fischer, Stefan; Fankhauser, Peter; Teich, Elke

Multi-word expressions and language efficiency: an information-theoretic account Miscellaneous

DGfS Computerlinguistik Postersession, Köln, 2023.

Multi-word expressions (MWEs) are a cornerstone in conventionalized language use and vital for the perceived fluency of a message (Fillmore 1979). From a processing perspective, MWEs seem to have an advantage over arbitrary word sequences due to highly predictable transitions from one word to the next, or they may be perceived as wholes (see e.g. Siyanova-Chanturia et al. 2017). The emergence and use of specific MWEs is typically context-dependent and register-specific. In our work, we investigate MWEs in the scientific domain from a diachronic perspective, asking what is the contribution of MWEs in the development of “scientific language” (here: English)? We assume that over time scientific English develops an optimal code for scientific expert communication characterized by high information density (Halliday 2004; Teich et al. 2021). Using a large diachronic corpus of English scientific texts (Fischer et al. 2020), we work in a data-driven fashion using various established word association measures (e.g. log-likelihood, PMI) to identify and classify MWEs by time periods (e.g. 50-year periods). In a complementary step, we account for the environments of words using selected computational language models (statistical models, embeddings; cf. Fankhauser & Kupietz 2022). On this basis, we then analyse the informational characteristics of MWEs diachronically: The more conventionalized an MWE becomes, the lower its surprisal (higher predictability of the MWE) and the lower the uncertainty about an upcoming word within the MWE (entropy). We expect to see that while specific MWEs come and go over time, during their life cycles they will exhibit surprisal/entropy reduction, thus contributing to language efficiency.

@miscellaneous{Fischer_etal_2024,
title = {Multi-word expressions and language efficiency: an information-theoretic account},
author = {Stefan Fischer and Peter Fankhauser and Elke Teich},
url = {https://dgfs2023.uni-koeln.de/sites/dgfs2023/Booklet/AG_Beschreibungen-und-Abstracts/Description-Abstracts-CL.pdf},
year = {2023},
date = {2023},
booktitle = {DGfS Computerlinguistik Postersession},
address = {K{\"o}ln},
abstract = {Multi-word expressions (MWEs) are a cornerstone in conventionalized language use and vital for the perceived fluency of a message (Fillmore 1979). From a processing perspective, MWEs seem to have an advantage over arbitrary word sequences due to highly predictable transitions from one word to the next, or they may be perceived as wholes (see e.g. Siyanova-Chanturia et al. 2017). The emergence and use of specific MWEs is typically context-dependent and register-specific. In our work, we investigate MWEs in the scientific domain from a diachronic perspective, asking what is the contribution of MWEs in the development of “scientific language” (here: English)? We assume that over time scientific English develops an optimal code for scientific expert communication characterized by high information density (Halliday 2004; Teich et al. 2021). Using a large diachronic corpus of English scientific texts (Fischer et al. 2020), we work in a data-driven fashion using various established word association measures (e.g. log-likelihood, PMI) to identify and classify MWEs by time periods (e.g. 50-year periods). In a complementary step, we account for the environments of words using selected computational language models (statistical models, embeddings; cf. Fankhauser & Kupietz 2022). On this basis, we then analyse the informational characteristics of MWEs diachronically: The more conventionalized an MWE becomes, the lower its surprisal (higher predictability of the MWE) and the lower the uncertainty about an upcoming word within the MWE (entropy). We expect to see that while specific MWEs come and go over time, during their life cycles they will exhibit surprisal/entropy reduction, thus contributing to language efficiency.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B1

Chingacham, Anupama; Demberg, Vera; Klakow, Dietrich

A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification Inproceedings

2022 IEEE Spoken Language Technology Workshop (SLT 2022, 9th - 12th January 2023, Doha, Qatar), 2023.

In noisy environments, speech can be hard to understand for humans. Spoken dialog systems can help to enhance the intelligibility of their output, either by modifying the speech synthesis (e.g., imitate Lombard speech) or by optimizing the language generation. We here focus on the second type of approach, by which an intended message is realized with words that are more intelligible in a specific noisy environment. By conducting a speech perception experiment, we created a dataset of 900 paraphrases in babble noise, perceived by native English speakers with normal hearing. We find that careful selection of paraphrases can improve intelligibility by 33% at SNR -5 dB. Our analysis of the data shows that the intelligibility differences between paraphrases are mainly driven by noise-robust acoustic cues. Furthermore, we propose an intelligibility-aware paraphrase ranking model, which outperforms baseline models with a relative improvement of 31.37% at SNR -5 dB.

@inproceedings{Chingachametal23,
title = {A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification},
author = {Anupama Chingacham and Vera Demberg and Dietrich Klakow},
url = {https://arxiv.org/abs/2210.10252},
doi = {https://doi.org/10.48550/arXiv.2210.10252},
year = {2023},
date = {2023},
booktitle = {2022 IEEE Spoken Language Technology Workshop (SLT 2022, 9th - 12th January 2023, Doha, Qatar)},
abstract = {In noisy environments, speech can be hard to understand for humans. Spoken dialog systems can help to enhance the intelligibility of their output, either by modifying the speech synthesis (e.g., imitate Lombard speech) or by optimizing the language generation. We here focus on the second type of approach, by which an intended message is realized with words that are more intelligible in a specific noisy environment. By conducting a speech perception experiment, we created a dataset of 900 paraphrases in babble noise, perceived by native English speakers with normal hearing. We find that careful selection of paraphrases can improve intelligibility by 33% at SNR -5 dB. Our analysis of the data shows that the intelligibility differences between paraphrases are mainly driven by noise-robust acoustic cues. Furthermore, we propose an intelligibility-aware paraphrase ranking model, which outperforms baseline models with a relative improvement of 31.37% at SNR -5 dB.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Przybyl, Heike; Karakanta, Alina; Menzel, Katrin; Teich, Elke

Exploring linguistic variation in mediated discourse: translation vs. interpreting Book Chapter

Kajzer-Wietrzny, Marta; Bernardini, Silvia; Ferraresi, Adriano; Ivaska, Ilmari;  (Ed.): Mediated discourse at the European Parliament: Empirical investigations, Language Science Press, pp. 191–218, Berlin, 2022.

This paper focuses on the distinctive features of translated and interpreted texts in specific language combinations as forms of mediated discourse at the European Parliament. We aim to contribute to the long line of research on the specific properties of translation/interpreting. Specifically, we are interested in mediation effects (translation vs. interpreting) vs. effects of discourse mode (written vs. spoken). We propose a data-driven, exploratory approach to detecting and evaluating linguistic features as typical of translation/interpreting. Our approach utilizes simple wordbased 𝑛-gram language models combined with the information-theoretic measure of relative entropy, a standard measure of similarity/difference between probability distributions, applied here as a method of corpus comparison. Comparing translation
and interpreting (including the relation to their originals), we confirm the previously observed overall trend of written vs. spoken mode being strongly reflected in the translation and interpreting output. In addition, we detect some new features, such as a tendency towards more general lexemes in the verbal domain in interpreting or features of nominal style in translation.

@inbook{Przybyl2021exploring,
title = {Exploring linguistic variation in mediated discourse: translation vs. interpreting},
author = {Heike Przybyl and Alina Karakanta and Katrin Menzel and Elke Teich},
editor = {Marta Kajzer-Wietrzny and Silvia Bernardini and Adriano Ferraresi and Ilmari Ivaska},
url = {https://langsci-press.org/catalog/book/343},
doi = {https://doi.org/10.5281/zenodo.6977050},
year = {2022},
date = {2022},
booktitle = {Mediated discourse at the European Parliament: Empirical investigations},
pages = {191–218},
publisher = {Language Science Press},
address = {Berlin},
abstract = {This paper focuses on the distinctive features of translated and interpreted texts in specific language combinations as forms of mediated discourse at the European Parliament. We aim to contribute to the long line of research on the specific properties of translation/interpreting. Specifically, we are interested in mediation effects (translation vs. interpreting) vs. effects of discourse mode (written vs. spoken). We propose a data-driven, exploratory approach to detecting and evaluating linguistic features as typical of translation/interpreting. Our approach utilizes simple wordbased 𝑛-gram language models combined with the information-theoretic measure of relative entropy, a standard measure of similarity/difference between probability distributions, applied here as a method of corpus comparison. Comparing translation and interpreting (including the relation to their originals), we confirm the previously observed overall trend of written vs. spoken mode being strongly reflected in the translation and interpreting output. In addition, we detect some new features, such as a tendency towards more general lexemes in the verbal domain in interpreting or features of nominal style in translation.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B7

Bhandari, Pratik

Interaction of top-down and bottom-up processes in spoken language comprehension PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

It seems pretty easy to listen to and understand someone speaking. However, our day-to-day conversations occur under adverse listening conditions. For example, background noise comes from different sound sources, multiple people talk simul- taneously (e.g., in a café), a poor signal connection distorts the voice of a person talking on the other end of a telephone call, and the list goes on. Despite these adversities, most of the time, we communicate successfully. One of the significant contributors to our ability to understand language in adverse listening conditions is predictive language processing. Humans are not passive consumers of language: we use the information available to us from a context and predict the not-yet-encountered, upcoming linguistic events. We do not wait for a speech signal to unfold completely to decode its meaning. This feature of human language processing is critical in understanding speech in adverse listening conditions. The studies in this thesis are timely in the field when the discussion about the role of prediction in language processing is vibrant and to some extent—heated. Some argue that prediction is a universal phenomenon, not only of language, but of human cognition, in general. The present thesis examined the boundary conditions of predictive language processing. We investigated if linguistic predictions are automatic, or if they are constrained by other factors like top-down attention regulation and bottom-up processing of different speech rates in degraded speech comprehension. In this thesis, we examined how listeners can use context information and form predictions while listening to speech at different levels of degradation. The central theme of the thesis is the investigation of the interactions between top- down semantic predictions and bottom-up auditory processing in adverse listening conditions under the theoretical framework of predictive processing and the noisy channel model of communication. We first introduce these concepts of top-down– bottom-up interactions in adverse listening conditions, then report the experiments that empirically investigated different aspects of degraded speech comprehension and the top-down – bottom-up interactions. Our findings showed that to understand a speaker’s utterance in a noisy channel (e.g., due to the degradation of speech signal), a listener takes into account the noise in the signal as well as the context information to form lexical-semantic predictions. Studies have shown that lexical-semantic predictions facilitate language com- prehension. We investigated if such a facilitatory effect of linguistic predictions is observed at all levels of speech degradation. We also addressed the debate on the nature of predictability effect (graded vs all-or-nothing). The studies in this thesis concluded that comprehension of degraded speech is predictive in nature: language processing in a noisy channel is probabilistic and rational. Listeners weigh top-down predictive (lexical-semantic cues) and bottom- up auditory (acoustic-phonetic cues) processes. When the speech degradation is not severe, they can rely on the bottom-up input of an upcoming word (i.e., what they actually heard), regardless of the context information available to them. When the speech is moderately degraded but intelligible enough, they generate predictions about the upcoming word from the context information. In addition, the weighing of lexical-semantic and acoustic-phonetic cues is also modulated by attention regulation and speech rate. Taken together, this thesis contributes to a better understanding of the dynamic interaction between top-down and bottom-up processes in speech comprehension.


Es scheint ziemlich einfach zu sein, jemandem beim Sprechen zuzuhören und ihn zu verstehen. Unsere täglichen Gespräche finden jedoch unter ungünstigen Bedingungen statt. Zum Beispiel kommen Hintergrundgeräusche von verschiedenen Schallquellen, mehrere Personen sprechen gleichzeitig (z. B. in einem Café), eine schlechte Signalverbindung verzerrt die Stimme des Gesprächspartners am anderen Ende des Telefons, und die Liste geht weiter. Trotz dieser Widrigkeiten kommunizieren wir in den meisten Fällen erfolgreich. Einer der wichtigsten Faktoren, der dazu beiträgt, dass wir Sprache auch unter ungünstigen Bedingungen verstehen können, ist die predictive language processing. In dieser Arbeit haben wir untersucht, wie Hörer Kontextinformationen nutzen und Vorhersagen treffen können, während sie Sprache mit unterschiedliche starken Signalstörungen hören. Das zentrale Thema der Arbeit ist die Untersuchung der Wechselwirkung zwischen semantischen Vorhersagen basierend auf dem vorigen Kontext und auditiver Verarbeitung des Sprachsignals unter ungünstigen Hörbedingungen im theoretischen Rahmen der “predictive processing” und des “noisy channel model of communication”. Es gibt zahlreiche Methoden, mit denen Kontextinformationen und Sprachverschlechterung (ungünstige Hörbedingungen) in einem Versuchsaufbau erzeugt und manipuliert werden können. Wir haben die Kontextinformationen manipuliert, indem wir kurze Subjekt-Verb-Objekt-Sätze auf Deutsch erstellt haben, in denen das Verb eines Satzes das Substantiv vorhersagt. Zusätzlich zur Kontextinformation untersuchten wir den Effekt der strategischen Aufmerksamkeitszuweisung als Top-down-Prozess. Die Sprache wurde durch “noisevocoding” der reinen Sprache degradiert. Zusätzlich zur noise-vocoding untersuchten wir die Wirkung von Änderungen der Sprechgeschwindigkeit als weiteren Faktor, der die Bottom-up-Prozesse beeinflusst. In Kapitel 5 untersuchten wir zunächst die Rolle der Top-down- Aufmerksamkeitsregulation für die Fähigkeit der Hörer, die Kontextinformationen zu nutzen. Unsere Forschungsfrage lautete, ob die Aufmerksamkeit auf den Kontext unabhängig von den Hörer, unbedingt erforderlich ist, um Vorhersagen über ein kommendes Wort in einem Satz auf verschiedenen Degradationsstufen zu treffen. Wir konnten zeigen, dass die semantische Vorhersagbarkeit eines Satzes nur dann zu einem besseren Sprachverständnis beiträgt, wenn die Hörer auf die Kontextinformationen achten. Darüber hinaus war eine solche Erleichterung bei schweren Degradationsstufen nicht vorhanden. Wir haben diese Ergebnisse in Kapitel 6 weiter untersucht und festgestellt, dass der erleichternde Effekt der Vorhersagbarkeit nur bei einem moderaten Grad der Sprachverschlechterung zu beobachten ist. Wir untersuchten die Art des Vorhersageeffekts und fanden heraus, dass er abgestuft ist und nicht alles oder nichts beinhaltet. Mit anderen Worten, wir fanden heraus, dass die Vorhersage der Hörer über ein kommendes Wort nicht nur auf einen stark einschränkenden Satzkontext beschränkt ist; stattdessen sagen die Hörer das kommende Wort in Abhängigkeit von der Wahrscheinlichkeit seines Auftretens in einem bestimmten Kontext voraus (z. B. “cloze probability”). Schließlich untersuchten wir in Kapitel 7, ob eine Änderung der Sprechgeschwindigkeit – die die Verarbeitungszeit verändert – die in Kapitel 6 beobachtete kontextuelle Erleichterung verstärkt oder verringert. Die Ergebnisse zeigten, dass das Hörverstehen der mäßig verschlechterten Sprache bei normaler Sprechgeschwindigkeit am besten ist: Eine Verlangsamung verstärkte die kontextuelle Erleichterung nicht. Bei Erhöhung der Sprechgeschwindigkeit wurde jedoch die Verarbeitung von Sätzen mit geringer, aber nicht mit hoher Vorhersagbarkeit beeinträchtigt. In der begrenzten Verarbeitungszeit war die Aktivierung von Zielwörtern in einem weniger einschränkenden Satzkontext schwieriger als in einem stark einschränkenden Satzkontext. All diese Experimente, die mit deutschen Stimuli an jungen Erwachsenen mit deutscher Muttersprache durchgeführt wurden, haben gezeigt, dass das Verstehen verschlechterter Sprache prädiktiver Natur ist: Die Sprachverarbeitung in einem verrauschten Kanal ist probabilistisch und rational. Die Hörer wägen Top-Down- Prozesse (lexikalisch-semantische Hinweise) und Bottom-Up-Hörprozesse (akustischphonetische Hinweise) ab. Wenn die Sprachverschlechterung nicht schwerwiegend ist, können sie sich auf den Bottom-up-Input eines kommenden Wortes verlassen (d. h. auf das, was sie tatsächlich gehört haben), unabhängig von den ihnen zur Verfügung stehenden Kontextinformationen. Wenn die Sprache mäßig verschlechtert, aber verständlich genug ist, erstellen sie aus den Kontextinformationen Vorhersagen über das kommende Wort. Darüber hinaus wird die Gewichtung von lexikalisch-semantischen und akustisch-phonetischen Hinweisen auch durch die Aufmerksamkeitssteuerung und die Sprechgeschwindigkeit moduliert. Insgesamt trägt diese Arbeit zu einem differenzierten Verständnis der dynamischen Interaktion zwischen Top-down- und Bottom-up-Prozessen beim Sprachverstehen bei.

@phdthesis{Bhandari_Diss_2022,
title = {Interaction of top-down and bottom-up processes in spoken language comprehension},
author = {Pratik Bhandari},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/34800},
doi = {https://doi.org/10.22028/D291-38594},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {It seems pretty easy to listen to and understand someone speaking. However, our day-to-day conversations occur under adverse listening conditions. For example, background noise comes from different sound sources, multiple people talk simul- taneously (e.g., in a caf{\'e}), a poor signal connection distorts the voice of a person talking on the other end of a telephone call, and the list goes on. Despite these adversities, most of the time, we communicate successfully. One of the significant contributors to our ability to understand language in adverse listening conditions is predictive language processing. Humans are not passive consumers of language: we use the information available to us from a context and predict the not-yet-encountered, upcoming linguistic events. We do not wait for a speech signal to unfold completely to decode its meaning. This feature of human language processing is critical in understanding speech in adverse listening conditions. The studies in this thesis are timely in the field when the discussion about the role of prediction in language processing is vibrant and to some extent—heated. Some argue that prediction is a universal phenomenon, not only of language, but of human cognition, in general. The present thesis examined the boundary conditions of predictive language processing. We investigated if linguistic predictions are automatic, or if they are constrained by other factors like top-down attention regulation and bottom-up processing of different speech rates in degraded speech comprehension. In this thesis, we examined how listeners can use context information and form predictions while listening to speech at different levels of degradation. The central theme of the thesis is the investigation of the interactions between top- down semantic predictions and bottom-up auditory processing in adverse listening conditions under the theoretical framework of predictive processing and the noisy channel model of communication. We first introduce these concepts of top-down– bottom-up interactions in adverse listening conditions, then report the experiments that empirically investigated different aspects of degraded speech comprehension and the top-down – bottom-up interactions. Our findings showed that to understand a speaker’s utterance in a noisy channel (e.g., due to the degradation of speech signal), a listener takes into account the noise in the signal as well as the context information to form lexical-semantic predictions. Studies have shown that lexical-semantic predictions facilitate language com- prehension. We investigated if such a facilitatory effect of linguistic predictions is observed at all levels of speech degradation. We also addressed the debate on the nature of predictability effect (graded vs all-or-nothing). The studies in this thesis concluded that comprehension of degraded speech is predictive in nature: language processing in a noisy channel is probabilistic and rational. Listeners weigh top-down predictive (lexical-semantic cues) and bottom- up auditory (acoustic-phonetic cues) processes. When the speech degradation is not severe, they can rely on the bottom-up input of an upcoming word (i.e., what they actually heard), regardless of the context information available to them. When the speech is moderately degraded but intelligible enough, they generate predictions about the upcoming word from the context information. In addition, the weighing of lexical-semantic and acoustic-phonetic cues is also modulated by attention regulation and speech rate. Taken together, this thesis contributes to a better understanding of the dynamic interaction between top-down and bottom-up processes in speech comprehension.


Es scheint ziemlich einfach zu sein, jemandem beim Sprechen zuzuh{\"o}ren und ihn zu verstehen. Unsere t{\"a}glichen Gespr{\"a}che finden jedoch unter ung{\"u}nstigen Bedingungen statt. Zum Beispiel kommen Hintergrundger{\"a}usche von verschiedenen Schallquellen, mehrere Personen sprechen gleichzeitig (z. B. in einem Caf{\'e}), eine schlechte Signalverbindung verzerrt die Stimme des Gespr{\"a}chspartners am anderen Ende des Telefons, und die Liste geht weiter. Trotz dieser Widrigkeiten kommunizieren wir in den meisten F{\"a}llen erfolgreich. Einer der wichtigsten Faktoren, der dazu beitr{\"a}gt, dass wir Sprache auch unter ung{\"u}nstigen Bedingungen verstehen k{\"o}nnen, ist die predictive language processing. In dieser Arbeit haben wir untersucht, wie H{\"o}rer Kontextinformationen nutzen und Vorhersagen treffen k{\"o}nnen, w{\"a}hrend sie Sprache mit unterschiedliche starken Signalst{\"o}rungen h{\"o}ren. Das zentrale Thema der Arbeit ist die Untersuchung der Wechselwirkung zwischen semantischen Vorhersagen basierend auf dem vorigen Kontext und auditiver Verarbeitung des Sprachsignals unter ung{\"u}nstigen H{\"o}rbedingungen im theoretischen Rahmen der “predictive processing” und des “noisy channel model of communication”. Es gibt zahlreiche Methoden, mit denen Kontextinformationen und Sprachverschlechterung (ung{\"u}nstige H{\"o}rbedingungen) in einem Versuchsaufbau erzeugt und manipuliert werden k{\"o}nnen. Wir haben die Kontextinformationen manipuliert, indem wir kurze Subjekt-Verb-Objekt-S{\"a}tze auf Deutsch erstellt haben, in denen das Verb eines Satzes das Substantiv vorhersagt. Zus{\"a}tzlich zur Kontextinformation untersuchten wir den Effekt der strategischen Aufmerksamkeitszuweisung als Top-down-Prozess. Die Sprache wurde durch “noisevocoding” der reinen Sprache degradiert. Zus{\"a}tzlich zur noise-vocoding untersuchten wir die Wirkung von {\"A}nderungen der Sprechgeschwindigkeit als weiteren Faktor, der die Bottom-up-Prozesse beeinflusst. In Kapitel 5 untersuchten wir zun{\"a}chst die Rolle der Top-down- Aufmerksamkeitsregulation f{\"u}r die F{\"a}higkeit der H{\"o}rer, die Kontextinformationen zu nutzen. Unsere Forschungsfrage lautete, ob die Aufmerksamkeit auf den Kontext unabh{\"a}ngig von den H{\"o}rer, unbedingt erforderlich ist, um Vorhersagen {\"u}ber ein kommendes Wort in einem Satz auf verschiedenen Degradationsstufen zu treffen. Wir konnten zeigen, dass die semantische Vorhersagbarkeit eines Satzes nur dann zu einem besseren Sprachverst{\"a}ndnis beitr{\"a}gt, wenn die H{\"o}rer auf die Kontextinformationen achten. Dar{\"u}ber hinaus war eine solche Erleichterung bei schweren Degradationsstufen nicht vorhanden. Wir haben diese Ergebnisse in Kapitel 6 weiter untersucht und festgestellt, dass der erleichternde Effekt der Vorhersagbarkeit nur bei einem moderaten Grad der Sprachverschlechterung zu beobachten ist. Wir untersuchten die Art des Vorhersageeffekts und fanden heraus, dass er abgestuft ist und nicht alles oder nichts beinhaltet. Mit anderen Worten, wir fanden heraus, dass die Vorhersage der H{\"o}rer {\"u}ber ein kommendes Wort nicht nur auf einen stark einschr{\"a}nkenden Satzkontext beschr{\"a}nkt ist; stattdessen sagen die H{\"o}rer das kommende Wort in Abh{\"a}ngigkeit von der Wahrscheinlichkeit seines Auftretens in einem bestimmten Kontext voraus (z. B. “cloze probability”). Schlie{\ss}lich untersuchten wir in Kapitel 7, ob eine {\"A}nderung der Sprechgeschwindigkeit - die die Verarbeitungszeit ver{\"a}ndert - die in Kapitel 6 beobachtete kontextuelle Erleichterung verst{\"a}rkt oder verringert. Die Ergebnisse zeigten, dass das H{\"o}rverstehen der m{\"a}{\ss}ig verschlechterten Sprache bei normaler Sprechgeschwindigkeit am besten ist: Eine Verlangsamung verst{\"a}rkte die kontextuelle Erleichterung nicht. Bei Erh{\"o}hung der Sprechgeschwindigkeit wurde jedoch die Verarbeitung von S{\"a}tzen mit geringer, aber nicht mit hoher Vorhersagbarkeit beeintr{\"a}chtigt. In der begrenzten Verarbeitungszeit war die Aktivierung von Zielw{\"o}rtern in einem weniger einschr{\"a}nkenden Satzkontext schwieriger als in einem stark einschr{\"a}nkenden Satzkontext. All diese Experimente, die mit deutschen Stimuli an jungen Erwachsenen mit deutscher Muttersprache durchgef{\"u}hrt wurden, haben gezeigt, dass das Verstehen verschlechterter Sprache pr{\"a}diktiver Natur ist: Die Sprachverarbeitung in einem verrauschten Kanal ist probabilistisch und rational. Die H{\"o}rer w{\"a}gen Top-Down- Prozesse (lexikalisch-semantische Hinweise) und Bottom-Up-H{\"o}rprozesse (akustischphonetische Hinweise) ab. Wenn die Sprachverschlechterung nicht schwerwiegend ist, k{\"o}nnen sie sich auf den Bottom-up-Input eines kommenden Wortes verlassen (d. h. auf das, was sie tats{\"a}chlich geh{\"o}rt haben), unabh{\"a}ngig von den ihnen zur Verf{\"u}gung stehenden Kontextinformationen. Wenn die Sprache m{\"a}{\ss}ig verschlechtert, aber verst{\"a}ndlich genug ist, erstellen sie aus den Kontextinformationen Vorhersagen {\"u}ber das kommende Wort. Dar{\"u}ber hinaus wird die Gewichtung von lexikalisch-semantischen und akustisch-phonetischen Hinweisen auch durch die Aufmerksamkeitssteuerung und die Sprechgeschwindigkeit moduliert. Insgesamt tr{\"a}gt diese Arbeit zu einem differenzierten Verst{\"a}ndnis der dynamischen Interaktion zwischen Top-down- und Bottom-up-Prozessen beim Sprachverstehen bei.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   A4

Hedderich, Michael

Weak supervision and label noise handling for Natural language processing in low-resource scenarios PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

The lack of large amounts of labeled data is a significant factor blocking many low-resource languages and domains from catching up with recent advancements in natural language processing. To reduce this dependency on labeled instances, weak supervision (semi-)automatically annotates unlabeled data. These labels can be obtained more quickly and cheaply than manual, gold-standard annotations. They also, however, contain more errors. Handling these noisy labels is often required to leverage the weakly supervised data successfully. In this dissertation, we study the whole weak supervision pipeline with a focus on the task of named entity recognition. We develop a tool for automatic annotation, and we propose an approach to model label noise when a small amount of clean data is available. We study the factors that influence the noise model’s quality from a theoretic perspective, and we validate this approach empirically on several different tasks and languages. An important aspect is the aim for a realistic evaluation. We perform our analysis, among others, on several African low-resource languages. We show the performance benefits that can be achieved using weak supervision and label noise modeling. But we also highlight open issues that the field still has to overcome. For the low-resource settings, we expand the analysis to few-shot learning. For classification errors, we present a novel approach to obtain interpretable insights of where classifiers fail.


Der Mangel an annotierten Daten ist ein wesentlicher Faktor, der viele Sprachen und Domänen mit geringen Ressourcen daran hindert, mit den jüngsten Fortschritten in der digitalen Textverarbeitung Schritt zu halten. Um diese Abhängigkeit von gelabelten Trainingsdaten zu verringern, werden bei Weak Supervision nicht gelabelte Daten (halb-)automatisch annotiert. Diese Annotationen sind schneller und günstiger zu erhalten. Sie enthalten jedoch auch mehr Fehler. Oft ist eine besondere Behandlung dieser Noisy Labels notwendig, um die Daten erfolgreich nutzen zu können. In dieser Dissertation untersuchen wir die gesamte Weak Supervision Pipeline mit einem Schwerpunkt auf den Einsatz für die Erkennung von Entitäten. Wir entwickeln ein Tool zur automatischen Annotation und präsentieren einen neuen Ansatz zur Modellierung von Noisy Labels. Wir untersuchen die Faktoren, die die Qualität dieses Modells aus theoretischer Sicht beeinflussen, und wir validieren den Ansatz empirisch für verschiedene Aufgaben und Sprachen. Ein wichtiger Aspekt dieser Arbeit ist das Ziel einer realistischen Analyse. Die Untersuchung führen wir unter anderem an mehreren afrikanischen Sprachen durch und zeigen die Leistungsvorteile, die durch Weak Supervision und die Modellierung von Label Noise erreicht werden können. Auch erweitern wir die Analyse auf das Lernen mit wenigen Beispielen. In Bezug auf Klassifizierungsfehler, stellen wir zudem einen neuen Ansatz vor, um interpretierbare Erkenntnisse zu gewinnen.

@phdthesis{Hedderich_Diss_2022,
title = {Weak supervision and label noise handling for Natural language processing in low-resource scenarios},
author = {Michael Hedderich},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/35026},
doi = {https://doi.org/10.22028/D291-38691},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {The lack of large amounts of labeled data is a significant factor blocking many low-resource languages and domains from catching up with recent advancements in natural language processing. To reduce this dependency on labeled instances, weak supervision (semi-)automatically annotates unlabeled data. These labels can be obtained more quickly and cheaply than manual, gold-standard annotations. They also, however, contain more errors. Handling these noisy labels is often required to leverage the weakly supervised data successfully. In this dissertation, we study the whole weak supervision pipeline with a focus on the task of named entity recognition. We develop a tool for automatic annotation, and we propose an approach to model label noise when a small amount of clean data is available. We study the factors that influence the noise model's quality from a theoretic perspective, and we validate this approach empirically on several different tasks and languages. An important aspect is the aim for a realistic evaluation. We perform our analysis, among others, on several African low-resource languages. We show the performance benefits that can be achieved using weak supervision and label noise modeling. But we also highlight open issues that the field still has to overcome. For the low-resource settings, we expand the analysis to few-shot learning. For classification errors, we present a novel approach to obtain interpretable insights of where classifiers fail.


Der Mangel an annotierten Daten ist ein wesentlicher Faktor, der viele Sprachen und Dom{\"a}nen mit geringen Ressourcen daran hindert, mit den j{\"u}ngsten Fortschritten in der digitalen Textverarbeitung Schritt zu halten. Um diese Abh{\"a}ngigkeit von gelabelten Trainingsdaten zu verringern, werden bei Weak Supervision nicht gelabelte Daten (halb-)automatisch annotiert. Diese Annotationen sind schneller und g{\"u}nstiger zu erhalten. Sie enthalten jedoch auch mehr Fehler. Oft ist eine besondere Behandlung dieser Noisy Labels notwendig, um die Daten erfolgreich nutzen zu k{\"o}nnen. In dieser Dissertation untersuchen wir die gesamte Weak Supervision Pipeline mit einem Schwerpunkt auf den Einsatz f{\"u}r die Erkennung von Entit{\"a}ten. Wir entwickeln ein Tool zur automatischen Annotation und pr{\"a}sentieren einen neuen Ansatz zur Modellierung von Noisy Labels. Wir untersuchen die Faktoren, die die Qualit{\"a}t dieses Modells aus theoretischer Sicht beeinflussen, und wir validieren den Ansatz empirisch f{\"u}r verschiedene Aufgaben und Sprachen. Ein wichtiger Aspekt dieser Arbeit ist das Ziel einer realistischen Analyse. Die Untersuchung f{\"u}hren wir unter anderem an mehreren afrikanischen Sprachen durch und zeigen die Leistungsvorteile, die durch Weak Supervision und die Modellierung von Label Noise erreicht werden k{\"o}nnen. Auch erweitern wir die Analyse auf das Lernen mit wenigen Beispielen. In Bezug auf Klassifizierungsfehler, stellen wir zudem einen neuen Ansatz vor, um interpretierbare Erkenntnisse zu gewinnen.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   B4

Ibrahim, Omnia

Speaker Adaptations as a Function of Message, Channel and Listener Variability PhD Thesis

University of Zürich, Zürich, Switzerland, 2022.

Speech is a highly dynamic process. Some variability is inherited directly from the language itself, while other variability stems from adapting to the surrounding environment or interlocutor. This Ph.D. thesis consists of seven studies investigating speech adaptation concerning the message, channel, and listener variability. It starts with investigating speakers’ adaptation to the linguistic message. Previous work has shown that duration is shortened in more predictable contexts, and conversely lengthened in less predictable contexts. This pervasive predictability effect is well studied in multiple languages and linguistic levels. However, syllable level predictability has been generally overlooked so far. This thesis aims to őll that gap. It focuses on the effect of information-theoretic factors at both the syllable and segmental levels. Furthermore, it found that the predictability effect is not uniform across all durational cues but is somewhat sensitive to the phonological relevance of a language-specific phonetic cue.
Speakers adapt not only to their message but also to the channel of transfer. For example, it is known that speakers modulate the characteristics of their speech and produce clear speech in response to background noise – syllables in noise have a longer duration, with higher average intensity, larger intensity range, and higher F0. Hence, speakers choose redundant multi-dimensional acoustic modifications to make their voices more salient and detectable in a noisy environment. This Ph.D. thesis provides new insights into speakers’ adaptation to noise and predictability on the acoustic realizations of syllables in German; showing that the speakers’ response to background noise is independent of syllable predictability.
Regarding speaker-to-listener adaptations, this thesis finds that speech variability is not necessarily a function of the interaction’s duration. Instead, speakers constantly position themselves concerning the ongoing social interaction. Indeed, speakers’ cooperation during the discussion would lead to a higher convergence behavior. Moreover, interpersonal power dynamics between interlocutors were found to serve as a predictor for accommodation behavior. This adaptation holds for both human-human interaction and human-robot interaction. In an ecological validity study, speakers changed their voice depending on whether they were addressing a human or a robot. Those findings align with previous studies on robot-directed speech and confirm that this difference also holds when the conversations are more natural and spontaneous.
The results of this thesis provide compelling evidence that speech adaptation is socially motivated and, to some extent, consciously controlled by the speaker. These findings have implications for including environment-based and listener-based formulations in speech production models along with message-based formulations. Furthermore, this thesis aims to advance our understanding of verbal and non-verbal behavior mechanisms for social communication. Finally, it contributes to the broader literature on information-theoretical factors and accommodation effects on speakers’ acoustic realization.

@phdthesis{Ibrahim_Diss_2022,
title = {Speaker Adaptations as a Function of Message, Channel and Listener Variability},
author = {Omnia Ibrahim},
url = {https://www.zora.uzh.ch/id/eprint/233694/},
doi = {https://doi.org/10.5167/uzh-233694},
year = {2022},
date = {2022},
school = {University of Z{\"u}rich},
address = {Z{\"u}rich, Switzerland},
abstract = {Speech is a highly dynamic process. Some variability is inherited directly from the language itself, while other variability stems from adapting to the surrounding environment or interlocutor. This Ph.D. thesis consists of seven studies investigating speech adaptation concerning the message, channel, and listener variability. It starts with investigating speakers’ adaptation to the linguistic message. Previous work has shown that duration is shortened in more predictable contexts, and conversely lengthened in less predictable contexts. This pervasive predictability effect is well studied in multiple languages and linguistic levels. However, syllable level predictability has been generally overlooked so far. This thesis aims to őll that gap. It focuses on the effect of information-theoretic factors at both the syllable and segmental levels. Furthermore, it found that the predictability effect is not uniform across all durational cues but is somewhat sensitive to the phonological relevance of a language-specific phonetic cue. Speakers adapt not only to their message but also to the channel of transfer. For example, it is known that speakers modulate the characteristics of their speech and produce clear speech in response to background noise – syllables in noise have a longer duration, with higher average intensity, larger intensity range, and higher F0. Hence, speakers choose redundant multi-dimensional acoustic modifications to make their voices more salient and detectable in a noisy environment. This Ph.D. thesis provides new insights into speakers’ adaptation to noise and predictability on the acoustic realizations of syllables in German; showing that the speakers’ response to background noise is independent of syllable predictability. Regarding speaker-to-listener adaptations, this thesis finds that speech variability is not necessarily a function of the interaction’s duration. Instead, speakers constantly position themselves concerning the ongoing social interaction. Indeed, speakers’ cooperation during the discussion would lead to a higher convergence behavior. Moreover, interpersonal power dynamics between interlocutors were found to serve as a predictor for accommodation behavior. This adaptation holds for both human-human interaction and human-robot interaction. In an ecological validity study, speakers changed their voice depending on whether they were addressing a human or a robot. Those findings align with previous studies on robot-directed speech and confirm that this difference also holds when the conversations are more natural and spontaneous. The results of this thesis provide compelling evidence that speech adaptation is socially motivated and, to some extent, consciously controlled by the speaker. These findings have implications for including environment-based and listener-based formulations in speech production models along with message-based formulations. Furthermore, this thesis aims to advance our understanding of verbal and non-verbal behavior mechanisms for social communication. Finally, it contributes to the broader literature on information-theoretical factors and accommodation effects on speakers’ acoustic realization.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C1

Kudera, Jacek

Slavic receptive multilingualism: intercomprehension of speech PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

Intercomprehension refers to a communication practice in which speakers use closely related languages. We know that the degree of mutual intelligibility differs according to the stimulus modality. This work aims to define the linguistic features which contribute to and impede cross-lingual understanding of speech via production and perception studies involving speakers of four Slavic languages. The current study combines the methodological apparatus from acoustic phonetics and information theory to provide evidence for mutual intelligibility on various levels of language processing. It concludes that the degree of mutual understanding does not always correspond to typological divisions of tested languages. The results presented here suggest that intercomprehension is often driven by unit (un)expectedness rather than the phonetic resemblance of a perceived stimulus and its equivalence in the native lexicon of speakers.

@phdthesis{Kudera_Diss_2022,
title = {Slavic receptive multilingualism: intercomprehension of speech},
author = {Jacek Kudera},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/33236},
doi = {https://doi.org/10.22028/D291-36578},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {Intercomprehension refers to a communication practice in which speakers use closely related languages. We know that the degree of mutual intelligibility differs according to the stimulus modality. This work aims to define the linguistic features which contribute to and impede cross-lingual understanding of speech via production and perception studies involving speakers of four Slavic languages. The current study combines the methodological apparatus from acoustic phonetics and information theory to provide evidence for mutual intelligibility on various levels of language processing. It concludes that the degree of mutual understanding does not always correspond to typological divisions of tested languages. The results presented here suggest that intercomprehension is often driven by unit (un)expectedness rather than the phonetic resemblance of a perceived stimulus and its equivalence in the native lexicon of speakers.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C4

Talamo, Luigi

Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase Inproceedings

Vylomova, Ekaterina; Ponti, Edoardo; Cotterell, Ryan (Ed.): Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Association for Computational Linguistics, pp. 36-41, Seattle, Washington, 2022.

We describe a methodology to extract with finer accuracy word order patterns from texts automatically annotated with Universal Dependency (UD) trained parsers. We use the methodology to quantify the word order entropy of determiners, quantifiers and numerals in ten Indo-European languages, using UD-parsed texts from a parallel corpus of prosaic texts. Our results suggest that the combinations of different UD annotation layers, such as UD Relations, Universal Parts of Speech and lemma, and the introduction of language-specific lists of closed-category lemmata has the two-fold effect of improving the quality of analysis and unveiling hidden areas of variability in word order patterns.

@inproceedings{Talamo_2022,
title = {Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase},
author = {Luigi Talamo},
editor = {Ekaterina Vylomova and Edoardo Ponti and Ryan Cotterell},
url = {https://aclanthology.org/2022.sigtyp-1.5/},
doi = {https://doi.org/10.18653/v1/2022.sigtyp-1.5},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
pages = {36-41},
publisher = {Association for Computational Linguistics},
address = {Seattle, Washington},
abstract = {We describe a methodology to extract with finer accuracy word order patterns from texts automatically annotated with Universal Dependency (UD) trained parsers. We use the methodology to quantify the word order entropy of determiners, quantifiers and numerals in ten Indo-European languages, using UD-parsed texts from a parallel corpus of prosaic texts. Our results suggest that the combinations of different UD annotation layers, such as UD Relations, Universal Parts of Speech and lemma, and the introduction of language-specific lists of closed-category lemmata has the two-fold effect of improving the quality of analysis and unveiling hidden areas of variability in word order patterns.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C7

Jesujoba , Alabi; Adelani, David; Mosbach, Marius; Klakow, Dietrich

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning Inproceedings

Proceedings of the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics, pp. 4336-4349, Gyeongju, Republic of Korea, 2022.

Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) {—} fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to target language individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50{\%}. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.

@inproceedings{alabi-etal-2022-adapting,
title = {Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning},
author = {Alabi Jesujoba and David Adelani and Marius Mosbach and Dietrich Klakow},
url = {https://aclanthology.org/2022.coling-1.382},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 29th International Conference on Computational Linguistics},
pages = {4336-4349},
publisher = {International Committee on Computational Linguistics},
address = {Gyeongju, Republic of Korea},
abstract = {Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) {---} fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to target language individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50{\%}. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Lapshinova-Koltunski, Ekaterina; Pollkläsener, Christina; Przybyl, Heike

Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora Journal Article

The Prague Bulletin of Mathematical Linguistics, 119, pp. 5-22, 2022, ISSN 0032-6585.

We present a study of discourse connectives in English-German and German-English translation and interpreting where we focus on the phenomena of explicitation and implicitation.
Apart from distributional analysis of translation patterns in parallel data, we also look into surprisal, i.e. an information-theoretic measure of cognitive effort, which helps us to interpret the observed tendencies.

@article{lapshinova-koltunski-pollklaesener-przybyl:2022,
title = {Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora},
author = {Ekaterina Lapshinova-Koltunski and Christina Pollkl{\"a}sener and Heike Przybyl},
url = {https://ufal.mff.cuni.cz/pbml/119/art-lapshinova-koltunski-pollklaesener-przybyl.pdf},
doi = {https://doi.org/10.14712/00326585.020},
year = {2022},
date = {2022},
journal = {The Prague Bulletin of Mathematical Linguistics},
pages = {5-22},
volume = {119},
abstract = {We present a study of discourse connectives in English-German and German-English translation and interpreting where we focus on the phenomena of explicitation and implicitation. Apart from distributional analysis of translation patterns in parallel data, we also look into surprisal, i.e. an information-theoretic measure of cognitive effort, which helps us to interpret the observed tendencies.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B7

Kudera, Jacek; Stenger, Irina; Möbius, Bernd; Avgustinova, Tania; Klakow, Dietrich

Phonetic cues in auditory identification of Bulgarian, Czech, Polish, and Russian language of origin Journal Article

Language and Speech, 2022.

This work presents the results of an auditory language of origin identification experiment. Disyllabic and trisyllabic logatomes were recorded by speakers of Bulgarian, Czech, Polish, and Russian, and presented to L1 speakers of the abovementioned Slavic languages. The goals of the test were to verify the ability of lay listeners to recognize the linguistic origin of speakers, based on spoken samples with limited segmental and suprasegmental information, and to correlate the signal features with the subjects’ performance. It was found that position of word stress is not an important predictor in language recognition. However, inherent vowel characteristics such as duration and vowel space computed by the means of Pillai scores correlate with subjects’ performance. Both the linguistic profile and the familiarity with closely related languages also appear to be relevant predictors of listeners’ performance. Finally, the information-theoretic notion of surprisal applied on regular cross-linguistic sound correspondences was correlated with recognition scores; though, the correlations did not reach the threshold of statistical significance. We conclude that auditory identification of linguistic origin by lay persons, native speakers of closely related languages, is possible even when exposed to limited segmental information, which can serve as a cue in the identification of linguistic origin.

@article{kudera_etal2022_cues,
title = {Phonetic cues in auditory identification of Bulgarian, Czech, Polish, and Russian language of origin},
author = {Jacek Kudera and Irina Stenger and Bernd M{\"o}bius and Tania Avgustinova and Dietrich Klakow},
url = {https://journals.sagepub.com/eprint/JJIKHP9RPEYZM2EQKFWZ/full},
doi = {https://doi.org/10.1177/00238309221119098},
year = {2022},
date = {2022-09-01},
journal = {Language and Speech},
abstract = {This work presents the results of an auditory language of origin identification experiment. Disyllabic and trisyllabic logatomes were recorded by speakers of Bulgarian, Czech, Polish, and Russian, and presented to L1 speakers of the abovementioned Slavic languages. The goals of the test were to verify the ability of lay listeners to recognize the linguistic origin of speakers, based on spoken samples with limited segmental and suprasegmental information, and to correlate the signal features with the subjects’ performance. It was found that position of word stress is not an important predictor in language recognition. However, inherent vowel characteristics such as duration and vowel space computed by the means of Pillai scores correlate with subjects’ performance. Both the linguistic profile and the familiarity with closely related languages also appear to be relevant predictors of listeners’ performance. Finally, the information-theoretic notion of surprisal applied on regular cross-linguistic sound correspondences was correlated with recognition scores; though, the correlations did not reach the threshold of statistical significance. We conclude that auditory identification of linguistic origin by lay persons, native speakers of closely related languages, is possible even when exposed to limited segmental information, which can serve as a cue in the identification of linguistic origin.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Zhang, Miaoran; Mosbach, Marius; Adelani, David; Hedderich, Michael; Klakow, Dietrich

MCSE: Multimodal Contrastive Learning of Sentence Embeddings Inproceedings

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 5959-5969, Seattle, United States, 2022.

Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance across various datasets and pre-trained encoders. In particular, combining a small amount of multimodal data with a large text-only corpus, we improve the state-of-the-art average Spearman{‚}s correlation by 1.7{\%}. By analyzing the properties of the textual embedding space, we show that our model excels in aligning semantically similar sentences, providing an explanation for its improved performance.

@inproceedings{zhang-etal-2022-mcse,
title = {MCSE: Multimodal Contrastive Learning of Sentence Embeddings},
author = {Miaoran Zhang and Marius Mosbach and David Adelani and Michael Hedderich and Dietrich Klakow},
url = {https://aclanthology.org/2022.naacl-main.436},
doi = {https://doi.org/10.18653/v1/2022.naacl-main.436},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages = {5959-5969},
publisher = {Association for Computational Linguistics},
address = {Seattle, United States},
abstract = {Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance across various datasets and pre-trained encoders. In particular, combining a small amount of multimodal data with a large text-only corpus, we improve the state-of-the-art average Spearman{'}s correlation by 1.7{\%}. By analyzing the properties of the textual embedding space, we show that our model excels in aligning semantically similar sentences, providing an explanation for its improved performance.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Abdullah, Badr M.; Klakow, Dietrich

Analyzing the Representational Geometry of Acoustic Word Embeddings Inproceedings

Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics, pp. 178-191, Abu Dhabi, United Arab Emirates (Hybrid), 2022.

Acoustic word embeddings (AWEs) are fixed-dimensionality vector representations in a vector space such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their use in speech technology applications such as spoken term discovery and keyword spotting, AWE models have been adopted as models of spoken-word processing in several cognitively motivated studies and they have shown to exhibit a human-like performance in some auditory processing tasks. Nevertheless, the representation geometry of AWEs remains an under-explored topic that has not been studied in the literature. In this paper, we take a closer analytical look at AWEs and study how the choice of the learning objective and the architecture shapes their representational profile. Our main findings highlight the prominent role of the learning objective on the representational geometry over the architecture.

@inproceedings{abdullah-klakow-2022-analyzing,
title = {Analyzing the Representational Geometry of Acoustic Word Embeddings},
author = {Badr M. Abdullah and Dietrich Klakow},
url = {https://aclanthology.org/2022.blackboxnlp-1.15},
doi = {https://doi.org/10.18653/v1/2022.blackboxnlp-1.15},
year = {2022},
date = {2022},
booktitle = {Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP},
pages = {178-191},
publisher = {Association for Computational Linguistics},
address = {Abu Dhabi, United Arab Emirates (Hybrid)},
abstract = {Acoustic word embeddings (AWEs) are fixed-dimensionality vector representations in a vector space such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their use in speech technology applications such as spoken term discovery and keyword spotting, AWE models have been adopted as models of spoken-word processing in several cognitively motivated studies and they have shown to exhibit a human-like performance in some auditory processing tasks. Nevertheless, the representation geometry of AWEs remains an under-explored topic that has not been studied in the literature. In this paper, we take a closer analytical look at AWEs and study how the choice of the learning objective and the architecture shapes their representational profile. Our main findings highlight the prominent role of the learning objective on the representational geometry over the architecture.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Ibrahim, Omnia; Yuen, Ivan; van Os, Marjolein; Andreeva, Bistra; Möbius, Bernd

The combined effects of contextual predictability and noise on the acoustic realisation of German syllables Journal Article

The Journal of the Acoustical Society of America, 152, 2022.

Speakers tend to speak clearly in noisy environments, while they tend to reserve effort by shortening word duration in predictable contexts. It is unclear how these two communicative demands are met. The current study investigates the acoustic realizations of syllables in predictable vs unpredictable contexts across different background noise levels. Thirty-eight German native speakers produced 60 CV syllables in two predictability contexts in three noise conditions (reference = quiet, 0 dB and −10 dB signal-to-noise ratio). Duration, intensity (average and range), F0 (median), and vowel formants of the target syllables were analysed. The presence of noise yielded significantly longer duration, higher average intensity, larger intensity range, and higher F0. Noise levels affected intensity (average and range) and F0. Low predictability syllables exhibited longer duration and larger intensity range. However, no interaction was found between noise and predictability. This suggests that noise-related modifications might be independent of predictability-related changes, with implications for including channel-based and message-based formulations in speech production.

@article{ibrahim_etal_jasa2022,
title = {The combined effects of contextual predictability and noise on the acoustic realisation of German syllables},
author = {Omnia Ibrahim and Ivan Yuen and Marjolein van Os and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://asa.scitation.org/doi/10.1121/10.0013413},
doi = {https://doi.org/10.1121/10.0013413},
year = {2022},
date = {2022-08-10},
journal = {The Journal of the Acoustical Society of America},
volume = {152},
number = {2},
abstract = {Speakers tend to speak clearly in noisy environments, while they tend to reserve effort by shortening word duration in predictable contexts. It is unclear how these two communicative demands are met. The current study investigates the acoustic realizations of syllables in predictable vs unpredictable contexts across different background noise levels. Thirty-eight German native speakers produced 60 CV syllables in two predictability contexts in three noise conditions (reference = quiet, 0 dB and −10 dB signal-to-noise ratio). Duration, intensity (average and range), F0 (median), and vowel formants of the target syllables were analysed. The presence of noise yielded significantly longer duration, higher average intensity, larger intensity range, and higher F0. Noise levels affected intensity (average and range) and F0. Low predictability syllables exhibited longer duration and larger intensity range. However, no interaction was found between noise and predictability. This suggests that noise-related modifications might be independent of predictability-related changes, with implications for including channel-based and message-based formulations in speech production.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   C1 A4

Bhandari, Pratik; Demberg, Vera; Kray, Jutta

Predictability effects in degraded speech comprehension are reduced as a function of attention Journal Article

Language and Cognition, Cambridge University Press, pp. 1-18, 2022.

The aim of this study was to examine the role of attention in understanding linguistic information even in a noisy environment. To assess the role of attention, we varied task instructions in two experiments in which participants were instructed to listen to short sentences and thereafter to type in the last word they heard or to type in the whole sentence. We were interested in how these task instructions influence the interplay between top-down prediction and bottom-up perceptual processes during language comprehension. Therefore, we created sentences that varied in the degree of predictability (low, medium, and high) as well as in the degree of speech degradation (four, six, and eight noise-vocoding channels). Results indicated better word recognition for highly predictable sentences for moderate, though not for high, levels of speech degradation, but only when attention was directed to the whole sentence. This underlines the important role of attention in language comprehension.

@article{bhandari_demberg_kray_2022,
title = {Predictability effects in degraded speech comprehension are reduced as a function of attention},
author = {Pratik Bhandari and Vera Demberg and Jutta Kray},
url = {https://www.cambridge.org/core/journals/language-and-cognition/article/abs/predictability-effects-in-degraded-speech-comprehension-are-reduced-as-a-function-of-attention/98F4E3A4A3FC0B7E00C8E1536D986853},
doi = {https://doi.org/10.1017/langcog.2022.16},
year = {2022},
date = {2022-07-22},
journal = {Language and Cognition},
pages = {1-18},
publisher = {Cambridge University Press},
abstract = {The aim of this study was to examine the role of attention in understanding linguistic information even in a noisy environment. To assess the role of attention, we varied task instructions in two experiments in which participants were instructed to listen to short sentences and thereafter to type in the last word they heard or to type in the whole sentence. We were interested in how these task instructions influence the interplay between top-down prediction and bottom-up perceptual processes during language comprehension. Therefore, we created sentences that varied in the degree of predictability (low, medium, and high) as well as in the degree of speech degradation (four, six, and eight noise-vocoding channels). Results indicated better word recognition for highly predictable sentences for moderate, though not for high, levels of speech degradation, but only when attention was directed to the whole sentence. This underlines the important role of attention in language comprehension.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A4

Przybyl, Heike; Lapshinova-Koltunski, Ekaterina; Menzel, Katrin; Fischer, Stefan; Teich, Elke

EPIC UdS - Creation and applications of a simultaneous interpreting corpus Inproceedings

Proceedings of the  13th Conference on Language Resources and Evaluation (LREC 2022), pp. 1193–1200, Marseille, France, 20-25 June 2022, 2022.

In this paper, we describe the creation and annotation of EPIC UdS, a multilingual corpus of simultaneous interpreting for English, German and Spanish. We give an overview of the comparable and parallel, aligned corpus variants and explore various applications of the corpus. What makes EPIC UdS relevant is that it is one of the rare interpreting corpora that includes transcripts suitable for research on more than one language pair and on interpreting with regard to German. It not only contains transcribed speeches, but also rich metadata and fine-grained linguistic annotations tailored for diverse applications across a broad range of linguistic subfields.

@inproceedings{Przybyl_interpreting_2022,
title = {EPIC UdS - Creation and applications of a simultaneous interpreting corpus},
author = {Heike Przybyl and Ekaterina Lapshinova-Koltunski and Katrin Menzel and Stefan Fischer and Elke Teich},
url = {https://aclanthology.org/2022.lrec-1.127/},
year = {2022},
date = {2022},
booktitle = {Proceedings of the  13th Conference on Language Resources and Evaluation (LREC 2022)},
pages = {1193–1200},
address = {Marseille, France, 20-25 June 2022},
abstract = {In this paper, we describe the creation and annotation of EPIC UdS, a multilingual corpus of simultaneous interpreting for English, German and Spanish. We give an overview of the comparable and parallel, aligned corpus variants and explore various applications of the corpus. What makes EPIC UdS relevant is that it is one of the rare interpreting corpora that includes transcripts suitable for research on more than one language pair and on interpreting with regard to German. It not only contains transcribed speeches, but also rich metadata and fine-grained linguistic annotations tailored for diverse applications across a broad range of linguistic subfields.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Successfully