Publications

Bhandari, Pratik

Interaction of top-down and bottom-up processes in spoken language comprehension PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

It seems pretty easy to listen to and understand someone speaking. However, our day-to-day conversations occur under adverse listening conditions. For example, background noise comes from different sound sources, multiple people talk simul- taneously (e.g., in a café), a poor signal connection distorts the voice of a person talking on the other end of a telephone call, and the list goes on. Despite these adversities, most of the time, we communicate successfully. One of the significant contributors to our ability to understand language in adverse listening conditions is predictive language processing. Humans are not passive consumers of language: we use the information available to us from a context and predict the not-yet-encountered, upcoming linguistic events. We do not wait for a speech signal to unfold completely to decode its meaning. This feature of human language processing is critical in understanding speech in adverse listening conditions. The studies in this thesis are timely in the field when the discussion about the role of prediction in language processing is vibrant and to some extent—heated. Some argue that prediction is a universal phenomenon, not only of language, but of human cognition, in general. The present thesis examined the boundary conditions of predictive language processing. We investigated if linguistic predictions are automatic, or if they are constrained by other factors like top-down attention regulation and bottom-up processing of different speech rates in degraded speech comprehension. In this thesis, we examined how listeners can use context information and form predictions while listening to speech at different levels of degradation. The central theme of the thesis is the investigation of the interactions between top- down semantic predictions and bottom-up auditory processing in adverse listening conditions under the theoretical framework of predictive processing and the noisy channel model of communication. We first introduce these concepts of top-down– bottom-up interactions in adverse listening conditions, then report the experiments that empirically investigated different aspects of degraded speech comprehension and the top-down – bottom-up interactions. Our findings showed that to understand a speaker’s utterance in a noisy channel (e.g., due to the degradation of speech signal), a listener takes into account the noise in the signal as well as the context information to form lexical-semantic predictions. Studies have shown that lexical-semantic predictions facilitate language com- prehension. We investigated if such a facilitatory effect of linguistic predictions is observed at all levels of speech degradation. We also addressed the debate on the nature of predictability effect (graded vs all-or-nothing). The studies in this thesis concluded that comprehension of degraded speech is predictive in nature: language processing in a noisy channel is probabilistic and rational. Listeners weigh top-down predictive (lexical-semantic cues) and bottom- up auditory (acoustic-phonetic cues) processes. When the speech degradation is not severe, they can rely on the bottom-up input of an upcoming word (i.e., what they actually heard), regardless of the context information available to them. When the speech is moderately degraded but intelligible enough, they generate predictions about the upcoming word from the context information. In addition, the weighing of lexical-semantic and acoustic-phonetic cues is also modulated by attention regulation and speech rate. Taken together, this thesis contributes to a better understanding of the dynamic interaction between top-down and bottom-up processes in speech comprehension.


Es scheint ziemlich einfach zu sein, jemandem beim Sprechen zuzuhören und ihn zu verstehen. Unsere täglichen Gespräche finden jedoch unter ungünstigen Bedingungen statt. Zum Beispiel kommen Hintergrundgeräusche von verschiedenen Schallquellen, mehrere Personen sprechen gleichzeitig (z. B. in einem Café), eine schlechte Signalverbindung verzerrt die Stimme des Gesprächspartners am anderen Ende des Telefons, und die Liste geht weiter. Trotz dieser Widrigkeiten kommunizieren wir in den meisten Fällen erfolgreich. Einer der wichtigsten Faktoren, der dazu beiträgt, dass wir Sprache auch unter ungünstigen Bedingungen verstehen können, ist die predictive language processing. In dieser Arbeit haben wir untersucht, wie Hörer Kontextinformationen nutzen und Vorhersagen treffen können, während sie Sprache mit unterschiedliche starken Signalstörungen hören. Das zentrale Thema der Arbeit ist die Untersuchung der Wechselwirkung zwischen semantischen Vorhersagen basierend auf dem vorigen Kontext und auditiver Verarbeitung des Sprachsignals unter ungünstigen Hörbedingungen im theoretischen Rahmen der “predictive processing” und des “noisy channel model of communication”. Es gibt zahlreiche Methoden, mit denen Kontextinformationen und Sprachverschlechterung (ungünstige Hörbedingungen) in einem Versuchsaufbau erzeugt und manipuliert werden können. Wir haben die Kontextinformationen manipuliert, indem wir kurze Subjekt-Verb-Objekt-Sätze auf Deutsch erstellt haben, in denen das Verb eines Satzes das Substantiv vorhersagt. Zusätzlich zur Kontextinformation untersuchten wir den Effekt der strategischen Aufmerksamkeitszuweisung als Top-down-Prozess. Die Sprache wurde durch “noisevocoding” der reinen Sprache degradiert. Zusätzlich zur noise-vocoding untersuchten wir die Wirkung von Änderungen der Sprechgeschwindigkeit als weiteren Faktor, der die Bottom-up-Prozesse beeinflusst. In Kapitel 5 untersuchten wir zunächst die Rolle der Top-down- Aufmerksamkeitsregulation für die Fähigkeit der Hörer, die Kontextinformationen zu nutzen. Unsere Forschungsfrage lautete, ob die Aufmerksamkeit auf den Kontext unabhängig von den Hörer, unbedingt erforderlich ist, um Vorhersagen über ein kommendes Wort in einem Satz auf verschiedenen Degradationsstufen zu treffen. Wir konnten zeigen, dass die semantische Vorhersagbarkeit eines Satzes nur dann zu einem besseren Sprachverständnis beiträgt, wenn die Hörer auf die Kontextinformationen achten. Darüber hinaus war eine solche Erleichterung bei schweren Degradationsstufen nicht vorhanden. Wir haben diese Ergebnisse in Kapitel 6 weiter untersucht und festgestellt, dass der erleichternde Effekt der Vorhersagbarkeit nur bei einem moderaten Grad der Sprachverschlechterung zu beobachten ist. Wir untersuchten die Art des Vorhersageeffekts und fanden heraus, dass er abgestuft ist und nicht alles oder nichts beinhaltet. Mit anderen Worten, wir fanden heraus, dass die Vorhersage der Hörer über ein kommendes Wort nicht nur auf einen stark einschränkenden Satzkontext beschränkt ist; stattdessen sagen die Hörer das kommende Wort in Abhängigkeit von der Wahrscheinlichkeit seines Auftretens in einem bestimmten Kontext voraus (z. B. “cloze probability”). Schließlich untersuchten wir in Kapitel 7, ob eine Änderung der Sprechgeschwindigkeit – die die Verarbeitungszeit verändert – die in Kapitel 6 beobachtete kontextuelle Erleichterung verstärkt oder verringert. Die Ergebnisse zeigten, dass das Hörverstehen der mäßig verschlechterten Sprache bei normaler Sprechgeschwindigkeit am besten ist: Eine Verlangsamung verstärkte die kontextuelle Erleichterung nicht. Bei Erhöhung der Sprechgeschwindigkeit wurde jedoch die Verarbeitung von Sätzen mit geringer, aber nicht mit hoher Vorhersagbarkeit beeinträchtigt. In der begrenzten Verarbeitungszeit war die Aktivierung von Zielwörtern in einem weniger einschränkenden Satzkontext schwieriger als in einem stark einschränkenden Satzkontext. All diese Experimente, die mit deutschen Stimuli an jungen Erwachsenen mit deutscher Muttersprache durchgeführt wurden, haben gezeigt, dass das Verstehen verschlechterter Sprache prädiktiver Natur ist: Die Sprachverarbeitung in einem verrauschten Kanal ist probabilistisch und rational. Die Hörer wägen Top-Down- Prozesse (lexikalisch-semantische Hinweise) und Bottom-Up-Hörprozesse (akustischphonetische Hinweise) ab. Wenn die Sprachverschlechterung nicht schwerwiegend ist, können sie sich auf den Bottom-up-Input eines kommenden Wortes verlassen (d. h. auf das, was sie tatsächlich gehört haben), unabhängig von den ihnen zur Verfügung stehenden Kontextinformationen. Wenn die Sprache mäßig verschlechtert, aber verständlich genug ist, erstellen sie aus den Kontextinformationen Vorhersagen über das kommende Wort. Darüber hinaus wird die Gewichtung von lexikalisch-semantischen und akustisch-phonetischen Hinweisen auch durch die Aufmerksamkeitssteuerung und die Sprechgeschwindigkeit moduliert. Insgesamt trägt diese Arbeit zu einem differenzierten Verständnis der dynamischen Interaktion zwischen Top-down- und Bottom-up-Prozessen beim Sprachverstehen bei.

@phdthesis{Bhandari_Diss_2022,
title = {Interaction of top-down and bottom-up processes in spoken language comprehension},
author = {Pratik Bhandari},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/34800},
doi = {https://doi.org/10.22028/D291-38594},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {It seems pretty easy to listen to and understand someone speaking. However, our day-to-day conversations occur under adverse listening conditions. For example, background noise comes from different sound sources, multiple people talk simul- taneously (e.g., in a caf{\'e}), a poor signal connection distorts the voice of a person talking on the other end of a telephone call, and the list goes on. Despite these adversities, most of the time, we communicate successfully. One of the significant contributors to our ability to understand language in adverse listening conditions is predictive language processing. Humans are not passive consumers of language: we use the information available to us from a context and predict the not-yet-encountered, upcoming linguistic events. We do not wait for a speech signal to unfold completely to decode its meaning. This feature of human language processing is critical in understanding speech in adverse listening conditions. The studies in this thesis are timely in the field when the discussion about the role of prediction in language processing is vibrant and to some extent—heated. Some argue that prediction is a universal phenomenon, not only of language, but of human cognition, in general. The present thesis examined the boundary conditions of predictive language processing. We investigated if linguistic predictions are automatic, or if they are constrained by other factors like top-down attention regulation and bottom-up processing of different speech rates in degraded speech comprehension. In this thesis, we examined how listeners can use context information and form predictions while listening to speech at different levels of degradation. The central theme of the thesis is the investigation of the interactions between top- down semantic predictions and bottom-up auditory processing in adverse listening conditions under the theoretical framework of predictive processing and the noisy channel model of communication. We first introduce these concepts of top-down– bottom-up interactions in adverse listening conditions, then report the experiments that empirically investigated different aspects of degraded speech comprehension and the top-down – bottom-up interactions. Our findings showed that to understand a speaker’s utterance in a noisy channel (e.g., due to the degradation of speech signal), a listener takes into account the noise in the signal as well as the context information to form lexical-semantic predictions. Studies have shown that lexical-semantic predictions facilitate language com- prehension. We investigated if such a facilitatory effect of linguistic predictions is observed at all levels of speech degradation. We also addressed the debate on the nature of predictability effect (graded vs all-or-nothing). The studies in this thesis concluded that comprehension of degraded speech is predictive in nature: language processing in a noisy channel is probabilistic and rational. Listeners weigh top-down predictive (lexical-semantic cues) and bottom- up auditory (acoustic-phonetic cues) processes. When the speech degradation is not severe, they can rely on the bottom-up input of an upcoming word (i.e., what they actually heard), regardless of the context information available to them. When the speech is moderately degraded but intelligible enough, they generate predictions about the upcoming word from the context information. In addition, the weighing of lexical-semantic and acoustic-phonetic cues is also modulated by attention regulation and speech rate. Taken together, this thesis contributes to a better understanding of the dynamic interaction between top-down and bottom-up processes in speech comprehension.


Es scheint ziemlich einfach zu sein, jemandem beim Sprechen zuzuh{\"o}ren und ihn zu verstehen. Unsere t{\"a}glichen Gespr{\"a}che finden jedoch unter ung{\"u}nstigen Bedingungen statt. Zum Beispiel kommen Hintergrundger{\"a}usche von verschiedenen Schallquellen, mehrere Personen sprechen gleichzeitig (z. B. in einem Caf{\'e}), eine schlechte Signalverbindung verzerrt die Stimme des Gespr{\"a}chspartners am anderen Ende des Telefons, und die Liste geht weiter. Trotz dieser Widrigkeiten kommunizieren wir in den meisten F{\"a}llen erfolgreich. Einer der wichtigsten Faktoren, der dazu beitr{\"a}gt, dass wir Sprache auch unter ung{\"u}nstigen Bedingungen verstehen k{\"o}nnen, ist die predictive language processing. In dieser Arbeit haben wir untersucht, wie H{\"o}rer Kontextinformationen nutzen und Vorhersagen treffen k{\"o}nnen, w{\"a}hrend sie Sprache mit unterschiedliche starken Signalst{\"o}rungen h{\"o}ren. Das zentrale Thema der Arbeit ist die Untersuchung der Wechselwirkung zwischen semantischen Vorhersagen basierend auf dem vorigen Kontext und auditiver Verarbeitung des Sprachsignals unter ung{\"u}nstigen H{\"o}rbedingungen im theoretischen Rahmen der “predictive processing” und des “noisy channel model of communication”. Es gibt zahlreiche Methoden, mit denen Kontextinformationen und Sprachverschlechterung (ung{\"u}nstige H{\"o}rbedingungen) in einem Versuchsaufbau erzeugt und manipuliert werden k{\"o}nnen. Wir haben die Kontextinformationen manipuliert, indem wir kurze Subjekt-Verb-Objekt-S{\"a}tze auf Deutsch erstellt haben, in denen das Verb eines Satzes das Substantiv vorhersagt. Zus{\"a}tzlich zur Kontextinformation untersuchten wir den Effekt der strategischen Aufmerksamkeitszuweisung als Top-down-Prozess. Die Sprache wurde durch “noisevocoding” der reinen Sprache degradiert. Zus{\"a}tzlich zur noise-vocoding untersuchten wir die Wirkung von {\"A}nderungen der Sprechgeschwindigkeit als weiteren Faktor, der die Bottom-up-Prozesse beeinflusst. In Kapitel 5 untersuchten wir zun{\"a}chst die Rolle der Top-down- Aufmerksamkeitsregulation f{\"u}r die F{\"a}higkeit der H{\"o}rer, die Kontextinformationen zu nutzen. Unsere Forschungsfrage lautete, ob die Aufmerksamkeit auf den Kontext unabh{\"a}ngig von den H{\"o}rer, unbedingt erforderlich ist, um Vorhersagen {\"u}ber ein kommendes Wort in einem Satz auf verschiedenen Degradationsstufen zu treffen. Wir konnten zeigen, dass die semantische Vorhersagbarkeit eines Satzes nur dann zu einem besseren Sprachverst{\"a}ndnis beitr{\"a}gt, wenn die H{\"o}rer auf die Kontextinformationen achten. Dar{\"u}ber hinaus war eine solche Erleichterung bei schweren Degradationsstufen nicht vorhanden. Wir haben diese Ergebnisse in Kapitel 6 weiter untersucht und festgestellt, dass der erleichternde Effekt der Vorhersagbarkeit nur bei einem moderaten Grad der Sprachverschlechterung zu beobachten ist. Wir untersuchten die Art des Vorhersageeffekts und fanden heraus, dass er abgestuft ist und nicht alles oder nichts beinhaltet. Mit anderen Worten, wir fanden heraus, dass die Vorhersage der H{\"o}rer {\"u}ber ein kommendes Wort nicht nur auf einen stark einschr{\"a}nkenden Satzkontext beschr{\"a}nkt ist; stattdessen sagen die H{\"o}rer das kommende Wort in Abh{\"a}ngigkeit von der Wahrscheinlichkeit seines Auftretens in einem bestimmten Kontext voraus (z. B. “cloze probability”). Schlie{\ss}lich untersuchten wir in Kapitel 7, ob eine {\"A}nderung der Sprechgeschwindigkeit - die die Verarbeitungszeit ver{\"a}ndert - die in Kapitel 6 beobachtete kontextuelle Erleichterung verst{\"a}rkt oder verringert. Die Ergebnisse zeigten, dass das H{\"o}rverstehen der m{\"a}{\ss}ig verschlechterten Sprache bei normaler Sprechgeschwindigkeit am besten ist: Eine Verlangsamung verst{\"a}rkte die kontextuelle Erleichterung nicht. Bei Erh{\"o}hung der Sprechgeschwindigkeit wurde jedoch die Verarbeitung von S{\"a}tzen mit geringer, aber nicht mit hoher Vorhersagbarkeit beeintr{\"a}chtigt. In der begrenzten Verarbeitungszeit war die Aktivierung von Zielw{\"o}rtern in einem weniger einschr{\"a}nkenden Satzkontext schwieriger als in einem stark einschr{\"a}nkenden Satzkontext. All diese Experimente, die mit deutschen Stimuli an jungen Erwachsenen mit deutscher Muttersprache durchgef{\"u}hrt wurden, haben gezeigt, dass das Verstehen verschlechterter Sprache pr{\"a}diktiver Natur ist: Die Sprachverarbeitung in einem verrauschten Kanal ist probabilistisch und rational. Die H{\"o}rer w{\"a}gen Top-Down- Prozesse (lexikalisch-semantische Hinweise) und Bottom-Up-H{\"o}rprozesse (akustischphonetische Hinweise) ab. Wenn die Sprachverschlechterung nicht schwerwiegend ist, k{\"o}nnen sie sich auf den Bottom-up-Input eines kommenden Wortes verlassen (d. h. auf das, was sie tats{\"a}chlich geh{\"o}rt haben), unabh{\"a}ngig von den ihnen zur Verf{\"u}gung stehenden Kontextinformationen. Wenn die Sprache m{\"a}{\ss}ig verschlechtert, aber verst{\"a}ndlich genug ist, erstellen sie aus den Kontextinformationen Vorhersagen {\"u}ber das kommende Wort. Dar{\"u}ber hinaus wird die Gewichtung von lexikalisch-semantischen und akustisch-phonetischen Hinweisen auch durch die Aufmerksamkeitssteuerung und die Sprechgeschwindigkeit moduliert. Insgesamt tr{\"a}gt diese Arbeit zu einem differenzierten Verst{\"a}ndnis der dynamischen Interaktion zwischen Top-down- und Bottom-up-Prozessen beim Sprachverstehen bei.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   A4

Hedderich, Michael

Weak supervision and label noise handling for Natural language processing in low-resource scenarios PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

The lack of large amounts of labeled data is a significant factor blocking many low-resource languages and domains from catching up with recent advancements in natural language processing. To reduce this dependency on labeled instances, weak supervision (semi-)automatically annotates unlabeled data. These labels can be obtained more quickly and cheaply than manual, gold-standard annotations. They also, however, contain more errors. Handling these noisy labels is often required to leverage the weakly supervised data successfully. In this dissertation, we study the whole weak supervision pipeline with a focus on the task of named entity recognition. We develop a tool for automatic annotation, and we propose an approach to model label noise when a small amount of clean data is available. We study the factors that influence the noise model’s quality from a theoretic perspective, and we validate this approach empirically on several different tasks and languages. An important aspect is the aim for a realistic evaluation. We perform our analysis, among others, on several African low-resource languages. We show the performance benefits that can be achieved using weak supervision and label noise modeling. But we also highlight open issues that the field still has to overcome. For the low-resource settings, we expand the analysis to few-shot learning. For classification errors, we present a novel approach to obtain interpretable insights of where classifiers fail.


Der Mangel an annotierten Daten ist ein wesentlicher Faktor, der viele Sprachen und Domänen mit geringen Ressourcen daran hindert, mit den jüngsten Fortschritten in der digitalen Textverarbeitung Schritt zu halten. Um diese Abhängigkeit von gelabelten Trainingsdaten zu verringern, werden bei Weak Supervision nicht gelabelte Daten (halb-)automatisch annotiert. Diese Annotationen sind schneller und günstiger zu erhalten. Sie enthalten jedoch auch mehr Fehler. Oft ist eine besondere Behandlung dieser Noisy Labels notwendig, um die Daten erfolgreich nutzen zu können. In dieser Dissertation untersuchen wir die gesamte Weak Supervision Pipeline mit einem Schwerpunkt auf den Einsatz für die Erkennung von Entitäten. Wir entwickeln ein Tool zur automatischen Annotation und präsentieren einen neuen Ansatz zur Modellierung von Noisy Labels. Wir untersuchen die Faktoren, die die Qualität dieses Modells aus theoretischer Sicht beeinflussen, und wir validieren den Ansatz empirisch für verschiedene Aufgaben und Sprachen. Ein wichtiger Aspekt dieser Arbeit ist das Ziel einer realistischen Analyse. Die Untersuchung führen wir unter anderem an mehreren afrikanischen Sprachen durch und zeigen die Leistungsvorteile, die durch Weak Supervision und die Modellierung von Label Noise erreicht werden können. Auch erweitern wir die Analyse auf das Lernen mit wenigen Beispielen. In Bezug auf Klassifizierungsfehler, stellen wir zudem einen neuen Ansatz vor, um interpretierbare Erkenntnisse zu gewinnen.

@phdthesis{Hedderich_Diss_2022,
title = {Weak supervision and label noise handling for Natural language processing in low-resource scenarios},
author = {Michael Hedderich},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/35026},
doi = {https://doi.org/10.22028/D291-38691},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {The lack of large amounts of labeled data is a significant factor blocking many low-resource languages and domains from catching up with recent advancements in natural language processing. To reduce this dependency on labeled instances, weak supervision (semi-)automatically annotates unlabeled data. These labels can be obtained more quickly and cheaply than manual, gold-standard annotations. They also, however, contain more errors. Handling these noisy labels is often required to leverage the weakly supervised data successfully. In this dissertation, we study the whole weak supervision pipeline with a focus on the task of named entity recognition. We develop a tool for automatic annotation, and we propose an approach to model label noise when a small amount of clean data is available. We study the factors that influence the noise model's quality from a theoretic perspective, and we validate this approach empirically on several different tasks and languages. An important aspect is the aim for a realistic evaluation. We perform our analysis, among others, on several African low-resource languages. We show the performance benefits that can be achieved using weak supervision and label noise modeling. But we also highlight open issues that the field still has to overcome. For the low-resource settings, we expand the analysis to few-shot learning. For classification errors, we present a novel approach to obtain interpretable insights of where classifiers fail.


Der Mangel an annotierten Daten ist ein wesentlicher Faktor, der viele Sprachen und Dom{\"a}nen mit geringen Ressourcen daran hindert, mit den j{\"u}ngsten Fortschritten in der digitalen Textverarbeitung Schritt zu halten. Um diese Abh{\"a}ngigkeit von gelabelten Trainingsdaten zu verringern, werden bei Weak Supervision nicht gelabelte Daten (halb-)automatisch annotiert. Diese Annotationen sind schneller und g{\"u}nstiger zu erhalten. Sie enthalten jedoch auch mehr Fehler. Oft ist eine besondere Behandlung dieser Noisy Labels notwendig, um die Daten erfolgreich nutzen zu k{\"o}nnen. In dieser Dissertation untersuchen wir die gesamte Weak Supervision Pipeline mit einem Schwerpunkt auf den Einsatz f{\"u}r die Erkennung von Entit{\"a}ten. Wir entwickeln ein Tool zur automatischen Annotation und pr{\"a}sentieren einen neuen Ansatz zur Modellierung von Noisy Labels. Wir untersuchen die Faktoren, die die Qualit{\"a}t dieses Modells aus theoretischer Sicht beeinflussen, und wir validieren den Ansatz empirisch f{\"u}r verschiedene Aufgaben und Sprachen. Ein wichtiger Aspekt dieser Arbeit ist das Ziel einer realistischen Analyse. Die Untersuchung f{\"u}hren wir unter anderem an mehreren afrikanischen Sprachen durch und zeigen die Leistungsvorteile, die durch Weak Supervision und die Modellierung von Label Noise erreicht werden k{\"o}nnen. Auch erweitern wir die Analyse auf das Lernen mit wenigen Beispielen. In Bezug auf Klassifizierungsfehler, stellen wir zudem einen neuen Ansatz vor, um interpretierbare Erkenntnisse zu gewinnen.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   B4

Ibrahim, Omnia

Speaker Adaptations as a Function of Message, Channel and Listener Variability PhD Thesis

University of Zürich, Zürich, Switzerland, 2022.

Speech is a highly dynamic process. Some variability is inherited directly from the language itself, while other variability stems from adapting to the surrounding environment or interlocutor. This Ph.D. thesis consists of seven studies investigating speech adaptation concerning the message, channel, and listener variability. It starts with investigating speakers’ adaptation to the linguistic message. Previous work has shown that duration is shortened in more predictable contexts, and conversely lengthened in less predictable contexts. This pervasive predictability effect is well studied in multiple languages and linguistic levels. However, syllable level predictability has been generally overlooked so far. This thesis aims to őll that gap. It focuses on the effect of information-theoretic factors at both the syllable and segmental levels. Furthermore, it found that the predictability effect is not uniform across all durational cues but is somewhat sensitive to the phonological relevance of a language-specific phonetic cue.
Speakers adapt not only to their message but also to the channel of transfer. For example, it is known that speakers modulate the characteristics of their speech and produce clear speech in response to background noise – syllables in noise have a longer duration, with higher average intensity, larger intensity range, and higher F0. Hence, speakers choose redundant multi-dimensional acoustic modifications to make their voices more salient and detectable in a noisy environment. This Ph.D. thesis provides new insights into speakers’ adaptation to noise and predictability on the acoustic realizations of syllables in German; showing that the speakers’ response to background noise is independent of syllable predictability.
Regarding speaker-to-listener adaptations, this thesis finds that speech variability is not necessarily a function of the interaction’s duration. Instead, speakers constantly position themselves concerning the ongoing social interaction. Indeed, speakers’ cooperation during the discussion would lead to a higher convergence behavior. Moreover, interpersonal power dynamics between interlocutors were found to serve as a predictor for accommodation behavior. This adaptation holds for both human-human interaction and human-robot interaction. In an ecological validity study, speakers changed their voice depending on whether they were addressing a human or a robot. Those findings align with previous studies on robot-directed speech and confirm that this difference also holds when the conversations are more natural and spontaneous.
The results of this thesis provide compelling evidence that speech adaptation is socially motivated and, to some extent, consciously controlled by the speaker. These findings have implications for including environment-based and listener-based formulations in speech production models along with message-based formulations. Furthermore, this thesis aims to advance our understanding of verbal and non-verbal behavior mechanisms for social communication. Finally, it contributes to the broader literature on information-theoretical factors and accommodation effects on speakers’ acoustic realization.

@phdthesis{Ibrahim_Diss_2022,
title = {Speaker Adaptations as a Function of Message, Channel and Listener Variability},
author = {Omnia Ibrahim},
url = {https://www.zora.uzh.ch/id/eprint/233694/},
doi = {https://doi.org/10.5167/uzh-233694},
year = {2022},
date = {2022},
school = {University of Z{\"u}rich},
address = {Z{\"u}rich, Switzerland},
abstract = {Speech is a highly dynamic process. Some variability is inherited directly from the language itself, while other variability stems from adapting to the surrounding environment or interlocutor. This Ph.D. thesis consists of seven studies investigating speech adaptation concerning the message, channel, and listener variability. It starts with investigating speakers’ adaptation to the linguistic message. Previous work has shown that duration is shortened in more predictable contexts, and conversely lengthened in less predictable contexts. This pervasive predictability effect is well studied in multiple languages and linguistic levels. However, syllable level predictability has been generally overlooked so far. This thesis aims to őll that gap. It focuses on the effect of information-theoretic factors at both the syllable and segmental levels. Furthermore, it found that the predictability effect is not uniform across all durational cues but is somewhat sensitive to the phonological relevance of a language-specific phonetic cue. Speakers adapt not only to their message but also to the channel of transfer. For example, it is known that speakers modulate the characteristics of their speech and produce clear speech in response to background noise – syllables in noise have a longer duration, with higher average intensity, larger intensity range, and higher F0. Hence, speakers choose redundant multi-dimensional acoustic modifications to make their voices more salient and detectable in a noisy environment. This Ph.D. thesis provides new insights into speakers’ adaptation to noise and predictability on the acoustic realizations of syllables in German; showing that the speakers’ response to background noise is independent of syllable predictability. Regarding speaker-to-listener adaptations, this thesis finds that speech variability is not necessarily a function of the interaction’s duration. Instead, speakers constantly position themselves concerning the ongoing social interaction. Indeed, speakers’ cooperation during the discussion would lead to a higher convergence behavior. Moreover, interpersonal power dynamics between interlocutors were found to serve as a predictor for accommodation behavior. This adaptation holds for both human-human interaction and human-robot interaction. In an ecological validity study, speakers changed their voice depending on whether they were addressing a human or a robot. Those findings align with previous studies on robot-directed speech and confirm that this difference also holds when the conversations are more natural and spontaneous. The results of this thesis provide compelling evidence that speech adaptation is socially motivated and, to some extent, consciously controlled by the speaker. These findings have implications for including environment-based and listener-based formulations in speech production models along with message-based formulations. Furthermore, this thesis aims to advance our understanding of verbal and non-verbal behavior mechanisms for social communication. Finally, it contributes to the broader literature on information-theoretical factors and accommodation effects on speakers’ acoustic realization.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C1

Kudera, Jacek

Slavic receptive multilingualism: intercomprehension of speech PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

Intercomprehension refers to a communication practice in which speakers use closely related languages. We know that the degree of mutual intelligibility differs according to the stimulus modality. This work aims to define the linguistic features which contribute to and impede cross-lingual understanding of speech via production and perception studies involving speakers of four Slavic languages. The current study combines the methodological apparatus from acoustic phonetics and information theory to provide evidence for mutual intelligibility on various levels of language processing. It concludes that the degree of mutual understanding does not always correspond to typological divisions of tested languages. The results presented here suggest that intercomprehension is often driven by unit (un)expectedness rather than the phonetic resemblance of a perceived stimulus and its equivalence in the native lexicon of speakers.

@phdthesis{Kudera_Diss_2022,
title = {Slavic receptive multilingualism: intercomprehension of speech},
author = {Jacek Kudera},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/33236},
doi = {https://doi.org/10.22028/D291-36578},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {Intercomprehension refers to a communication practice in which speakers use closely related languages. We know that the degree of mutual intelligibility differs according to the stimulus modality. This work aims to define the linguistic features which contribute to and impede cross-lingual understanding of speech via production and perception studies involving speakers of four Slavic languages. The current study combines the methodological apparatus from acoustic phonetics and information theory to provide evidence for mutual intelligibility on various levels of language processing. It concludes that the degree of mutual understanding does not always correspond to typological divisions of tested languages. The results presented here suggest that intercomprehension is often driven by unit (un)expectedness rather than the phonetic resemblance of a perceived stimulus and its equivalence in the native lexicon of speakers.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C4

Talamo, Luigi

Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase Inproceedings

Vylomova, Ekaterina; Ponti, Edoardo; Cotterell, Ryan (Ed.): Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Association for Computational Linguistics, pp. 36-41, Seattle, Washington, 2022.

We describe a methodology to extract with finer accuracy word order patterns from texts automatically annotated with Universal Dependency (UD) trained parsers. We use the methodology to quantify the word order entropy of determiners, quantifiers and numerals in ten Indo-European languages, using UD-parsed texts from a parallel corpus of prosaic texts. Our results suggest that the combinations of different UD annotation layers, such as UD Relations, Universal Parts of Speech and lemma, and the introduction of language-specific lists of closed-category lemmata has the two-fold effect of improving the quality of analysis and unveiling hidden areas of variability in word order patterns.

@inproceedings{Talamo_2022,
title = {Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase},
author = {Luigi Talamo},
editor = {Ekaterina Vylomova and Edoardo Ponti and Ryan Cotterell},
url = {https://aclanthology.org/2022.sigtyp-1.5/},
doi = {https://doi.org/10.18653/v1/2022.sigtyp-1.5},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
pages = {36-41},
publisher = {Association for Computational Linguistics},
address = {Seattle, Washington},
abstract = {We describe a methodology to extract with finer accuracy word order patterns from texts automatically annotated with Universal Dependency (UD) trained parsers. We use the methodology to quantify the word order entropy of determiners, quantifiers and numerals in ten Indo-European languages, using UD-parsed texts from a parallel corpus of prosaic texts. Our results suggest that the combinations of different UD annotation layers, such as UD Relations, Universal Parts of Speech and lemma, and the introduction of language-specific lists of closed-category lemmata has the two-fold effect of improving the quality of analysis and unveiling hidden areas of variability in word order patterns.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C7

Jesujoba , Alabi; Adelani, David; Mosbach, Marius; Klakow, Dietrich

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning Inproceedings

Proceedings of the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics, pp. 4336-4349, Gyeongju, Republic of Korea, 2022.

Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) {—} fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to target language individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50{\%}. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.

@inproceedings{alabi-etal-2022-adapting,
title = {Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning},
author = {Alabi Jesujoba and David Adelani and Marius Mosbach and Dietrich Klakow},
url = {https://aclanthology.org/2022.coling-1.382},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 29th International Conference on Computational Linguistics},
pages = {4336-4349},
publisher = {International Committee on Computational Linguistics},
address = {Gyeongju, Republic of Korea},
abstract = {Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) {---} fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to target language individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50{\%}. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Lapshinova-Koltunski, Ekaterina; Pollkläsener, Christina; Przybyl, Heike

Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora Journal Article

The Prague Bulletin of Mathematical Linguistics, 119, pp. 5-22, 2022, ISSN 0032-6585.

We present a study of discourse connectives in English-German and German-English translation and interpreting where we focus on the phenomena of explicitation and implicitation.
Apart from distributional analysis of translation patterns in parallel data, we also look into surprisal, i.e. an information-theoretic measure of cognitive effort, which helps us to interpret the observed tendencies.

@article{lapshinova-koltunski-pollklaesener-przybyl:2022,
title = {Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora},
author = {Ekaterina Lapshinova-Koltunski and Christina Pollkl{\"a}sener and Heike Przybyl},
url = {https://ufal.mff.cuni.cz/pbml/119/art-lapshinova-koltunski-pollklaesener-przybyl.pdf},
doi = {https://doi.org/10.14712/00326585.020},
year = {2022},
date = {2022},
journal = {The Prague Bulletin of Mathematical Linguistics},
pages = {5-22},
volume = {119},
abstract = {We present a study of discourse connectives in English-German and German-English translation and interpreting where we focus on the phenomena of explicitation and implicitation. Apart from distributional analysis of translation patterns in parallel data, we also look into surprisal, i.e. an information-theoretic measure of cognitive effort, which helps us to interpret the observed tendencies.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B7

Kudera, Jacek; Stenger, Irina; Möbius, Bernd; Avgustinova, Tania; Klakow, Dietrich

Phonetic cues in auditory identification of Bulgarian, Czech, Polish, and Russian language of origin Journal Article

Language and Speech, 2022.

This work presents the results of an auditory language of origin identification experiment. Disyllabic and trisyllabic logatomes were recorded by speakers of Bulgarian, Czech, Polish, and Russian, and presented to L1 speakers of the abovementioned Slavic languages. The goals of the test were to verify the ability of lay listeners to recognize the linguistic origin of speakers, based on spoken samples with limited segmental and suprasegmental information, and to correlate the signal features with the subjects’ performance. It was found that position of word stress is not an important predictor in language recognition. However, inherent vowel characteristics such as duration and vowel space computed by the means of Pillai scores correlate with subjects’ performance. Both the linguistic profile and the familiarity with closely related languages also appear to be relevant predictors of listeners’ performance. Finally, the information-theoretic notion of surprisal applied on regular cross-linguistic sound correspondences was correlated with recognition scores; though, the correlations did not reach the threshold of statistical significance. We conclude that auditory identification of linguistic origin by lay persons, native speakers of closely related languages, is possible even when exposed to limited segmental information, which can serve as a cue in the identification of linguistic origin.

@article{kudera_etal2022_cues,
title = {Phonetic cues in auditory identification of Bulgarian, Czech, Polish, and Russian language of origin},
author = {Jacek Kudera and Irina Stenger and Bernd M{\"o}bius and Tania Avgustinova and Dietrich Klakow},
url = {https://journals.sagepub.com/eprint/JJIKHP9RPEYZM2EQKFWZ/full},
doi = {https://doi.org/10.1177/00238309221119098},
year = {2022},
date = {2022-09-01},
journal = {Language and Speech},
abstract = {This work presents the results of an auditory language of origin identification experiment. Disyllabic and trisyllabic logatomes were recorded by speakers of Bulgarian, Czech, Polish, and Russian, and presented to L1 speakers of the abovementioned Slavic languages. The goals of the test were to verify the ability of lay listeners to recognize the linguistic origin of speakers, based on spoken samples with limited segmental and suprasegmental information, and to correlate the signal features with the subjects’ performance. It was found that position of word stress is not an important predictor in language recognition. However, inherent vowel characteristics such as duration and vowel space computed by the means of Pillai scores correlate with subjects’ performance. Both the linguistic profile and the familiarity with closely related languages also appear to be relevant predictors of listeners’ performance. Finally, the information-theoretic notion of surprisal applied on regular cross-linguistic sound correspondences was correlated with recognition scores; though, the correlations did not reach the threshold of statistical significance. We conclude that auditory identification of linguistic origin by lay persons, native speakers of closely related languages, is possible even when exposed to limited segmental information, which can serve as a cue in the identification of linguistic origin.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Zhang, Miaoran; Mosbach, Marius; Adelani, David; Hedderich, Michael; Klakow, Dietrich

MCSE: Multimodal Contrastive Learning of Sentence Embeddings Inproceedings

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 5959-5969, Seattle, United States, 2022.

Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance across various datasets and pre-trained encoders. In particular, combining a small amount of multimodal data with a large text-only corpus, we improve the state-of-the-art average Spearman{‚}s correlation by 1.7{\%}. By analyzing the properties of the textual embedding space, we show that our model excels in aligning semantically similar sentences, providing an explanation for its improved performance.

@inproceedings{zhang-etal-2022-mcse,
title = {MCSE: Multimodal Contrastive Learning of Sentence Embeddings},
author = {Miaoran Zhang and Marius Mosbach and David Adelani and Michael Hedderich and Dietrich Klakow},
url = {https://aclanthology.org/2022.naacl-main.436},
doi = {https://doi.org/10.18653/v1/2022.naacl-main.436},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages = {5959-5969},
publisher = {Association for Computational Linguistics},
address = {Seattle, United States},
abstract = {Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance across various datasets and pre-trained encoders. In particular, combining a small amount of multimodal data with a large text-only corpus, we improve the state-of-the-art average Spearman{'}s correlation by 1.7{\%}. By analyzing the properties of the textual embedding space, we show that our model excels in aligning semantically similar sentences, providing an explanation for its improved performance.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Abdullah, Badr M.; Klakow, Dietrich

Analyzing the Representational Geometry of Acoustic Word Embeddings Inproceedings

Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics, pp. 178-191, Abu Dhabi, United Arab Emirates (Hybrid), 2022.

Acoustic word embeddings (AWEs) are fixed-dimensionality vector representations in a vector space such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their use in speech technology applications such as spoken term discovery and keyword spotting, AWE models have been adopted as models of spoken-word processing in several cognitively motivated studies and they have shown to exhibit a human-like performance in some auditory processing tasks. Nevertheless, the representation geometry of AWEs remains an under-explored topic that has not been studied in the literature. In this paper, we take a closer analytical look at AWEs and study how the choice of the learning objective and the architecture shapes their representational profile. Our main findings highlight the prominent role of the learning objective on the representational geometry over the architecture.

@inproceedings{abdullah-klakow-2022-analyzing,
title = {Analyzing the Representational Geometry of Acoustic Word Embeddings},
author = {Badr M. Abdullah and Dietrich Klakow},
url = {https://aclanthology.org/2022.blackboxnlp-1.15},
doi = {https://doi.org/10.18653/v1/2022.blackboxnlp-1.15},
year = {2022},
date = {2022},
booktitle = {Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP},
pages = {178-191},
publisher = {Association for Computational Linguistics},
address = {Abu Dhabi, United Arab Emirates (Hybrid)},
abstract = {Acoustic word embeddings (AWEs) are fixed-dimensionality vector representations in a vector space such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their use in speech technology applications such as spoken term discovery and keyword spotting, AWE models have been adopted as models of spoken-word processing in several cognitively motivated studies and they have shown to exhibit a human-like performance in some auditory processing tasks. Nevertheless, the representation geometry of AWEs remains an under-explored topic that has not been studied in the literature. In this paper, we take a closer analytical look at AWEs and study how the choice of the learning objective and the architecture shapes their representational profile. Our main findings highlight the prominent role of the learning objective on the representational geometry over the architecture.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Ibrahim, Omnia; Yuen, Ivan; van Os, Marjolein; Andreeva, Bistra; Möbius, Bernd

The combined effects of contextual predictability and noise on the acoustic realisation of German syllables Journal Article

The Journal of the Acoustical Society of America, 152, 2022.

Speakers tend to speak clearly in noisy environments, while they tend to reserve effort by shortening word duration in predictable contexts. It is unclear how these two communicative demands are met. The current study investigates the acoustic realizations of syllables in predictable vs unpredictable contexts across different background noise levels. Thirty-eight German native speakers produced 60 CV syllables in two predictability contexts in three noise conditions (reference = quiet, 0 dB and −10 dB signal-to-noise ratio). Duration, intensity (average and range), F0 (median), and vowel formants of the target syllables were analysed. The presence of noise yielded significantly longer duration, higher average intensity, larger intensity range, and higher F0. Noise levels affected intensity (average and range) and F0. Low predictability syllables exhibited longer duration and larger intensity range. However, no interaction was found between noise and predictability. This suggests that noise-related modifications might be independent of predictability-related changes, with implications for including channel-based and message-based formulations in speech production.

@article{ibrahim_etal_jasa2022,
title = {The combined effects of contextual predictability and noise on the acoustic realisation of German syllables},
author = {Omnia Ibrahim and Ivan Yuen and Marjolein van Os and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://asa.scitation.org/doi/10.1121/10.0013413},
doi = {https://doi.org/10.1121/10.0013413},
year = {2022},
date = {2022-08-10},
journal = {The Journal of the Acoustical Society of America},
volume = {152},
number = {2},
abstract = {Speakers tend to speak clearly in noisy environments, while they tend to reserve effort by shortening word duration in predictable contexts. It is unclear how these two communicative demands are met. The current study investigates the acoustic realizations of syllables in predictable vs unpredictable contexts across different background noise levels. Thirty-eight German native speakers produced 60 CV syllables in two predictability contexts in three noise conditions (reference = quiet, 0 dB and −10 dB signal-to-noise ratio). Duration, intensity (average and range), F0 (median), and vowel formants of the target syllables were analysed. The presence of noise yielded significantly longer duration, higher average intensity, larger intensity range, and higher F0. Noise levels affected intensity (average and range) and F0. Low predictability syllables exhibited longer duration and larger intensity range. However, no interaction was found between noise and predictability. This suggests that noise-related modifications might be independent of predictability-related changes, with implications for including channel-based and message-based formulations in speech production.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   C1 A4

Bhandari, Pratik; Demberg, Vera; Kray, Jutta

Predictability effects in degraded speech comprehension are reduced as a function of attention Journal Article

Language and Cognition, Cambridge University Press, pp. 1-18, 2022.

The aim of this study was to examine the role of attention in understanding linguistic information even in a noisy environment. To assess the role of attention, we varied task instructions in two experiments in which participants were instructed to listen to short sentences and thereafter to type in the last word they heard or to type in the whole sentence. We were interested in how these task instructions influence the interplay between top-down prediction and bottom-up perceptual processes during language comprehension. Therefore, we created sentences that varied in the degree of predictability (low, medium, and high) as well as in the degree of speech degradation (four, six, and eight noise-vocoding channels). Results indicated better word recognition for highly predictable sentences for moderate, though not for high, levels of speech degradation, but only when attention was directed to the whole sentence. This underlines the important role of attention in language comprehension.

@article{bhandari_demberg_kray_2022,
title = {Predictability effects in degraded speech comprehension are reduced as a function of attention},
author = {Pratik Bhandari and Vera Demberg and Jutta Kray},
url = {https://www.cambridge.org/core/journals/language-and-cognition/article/abs/predictability-effects-in-degraded-speech-comprehension-are-reduced-as-a-function-of-attention/98F4E3A4A3FC0B7E00C8E1536D986853},
doi = {https://doi.org/10.1017/langcog.2022.16},
year = {2022},
date = {2022-07-22},
journal = {Language and Cognition},
pages = {1-18},
publisher = {Cambridge University Press},
abstract = {The aim of this study was to examine the role of attention in understanding linguistic information even in a noisy environment. To assess the role of attention, we varied task instructions in two experiments in which participants were instructed to listen to short sentences and thereafter to type in the last word they heard or to type in the whole sentence. We were interested in how these task instructions influence the interplay between top-down prediction and bottom-up perceptual processes during language comprehension. Therefore, we created sentences that varied in the degree of predictability (low, medium, and high) as well as in the degree of speech degradation (four, six, and eight noise-vocoding channels). Results indicated better word recognition for highly predictable sentences for moderate, though not for high, levels of speech degradation, but only when attention was directed to the whole sentence. This underlines the important role of attention in language comprehension.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A4

Przybyl, Heike; Lapshinova-Koltunski, Ekaterina; Menzel, Katrin; Fischer, Stefan; Teich, Elke

EPIC UdS - Creation and applications of a simultaneous interpreting corpus Inproceedings

Proceedings of the  13th Conference on Language Resources and Evaluation (LREC 2022), pp. 1193–1200, Marseille, France, 20-25 June 2022, 2022.

In this paper, we describe the creation and annotation of EPIC UdS, a multilingual corpus of simultaneous interpreting for English, German and Spanish. We give an overview of the comparable and parallel, aligned corpus variants and explore various applications of the corpus. What makes EPIC UdS relevant is that it is one of the rare interpreting corpora that includes transcripts suitable for research on more than one language pair and on interpreting with regard to German. It not only contains transcribed speeches, but also rich metadata and fine-grained linguistic annotations tailored for diverse applications across a broad range of linguistic subfields.

@inproceedings{Przybyl_interpreting_2022,
title = {EPIC UdS - Creation and applications of a simultaneous interpreting corpus},
author = {Heike Przybyl and Ekaterina Lapshinova-Koltunski and Katrin Menzel and Stefan Fischer and Elke Teich},
url = {https://aclanthology.org/2022.lrec-1.127/},
year = {2022},
date = {2022},
booktitle = {Proceedings of the  13th Conference on Language Resources and Evaluation (LREC 2022)},
pages = {1193–1200},
address = {Marseille, France, 20-25 June 2022},
abstract = {In this paper, we describe the creation and annotation of EPIC UdS, a multilingual corpus of simultaneous interpreting for English, German and Spanish. We give an overview of the comparable and parallel, aligned corpus variants and explore various applications of the corpus. What makes EPIC UdS relevant is that it is one of the rare interpreting corpora that includes transcripts suitable for research on more than one language pair and on interpreting with regard to German. It not only contains transcribed speeches, but also rich metadata and fine-grained linguistic annotations tailored for diverse applications across a broad range of linguistic subfields.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Stenger, Irina; Georgis, Philip; Avgustinova, Tania; Möbius, Bernd; Klakow, Dietrich

Modeling the Impact of Syntactic Distance and Surprisal on Cross-Slavic Text Comprehension Inproceedings

Proceedings of the Language Resources and Evaluation Conference, European Language Resources Association, pp. 7368-7376, Marseille, France, 2022.

We focus on the syntactic variation and measure syntactic distances between nine Slavic languages (Belarusian, Bulgarian, Croatian, Czech, Polish, Slovak, Slovene, Russian, and Ukrainian) using symmetric measures of insertion, deletion and movement of syntactic units in the parallel sentences of the fable „The North Wind and the Sun“. Additionally, we investigate phonetic and orthographic asymmetries between selected languages by means of the information theoretical notion of surprisal. Syntactic distance and surprisal are, thus, considered as potential predictors of mutual intelligibility between related languages. In spoken and written cloze test experiments for Slavic native speakers, the presented predictors will be validated as to whether variations in syntax lead to a slower or impeded intercomprehension of Slavic texts.

@inproceedings{stenger-EtAl:2022:LREC,
title = {Modeling the Impact of Syntactic Distance and Surprisal on Cross-Slavic Text Comprehension},
author = {Irina Stenger and Philip Georgis and Tania Avgustinova and Bernd M{\"o}bius and Dietrich Klakow},
url = {https://aclanthology.org/2022.lrec-1.802},
year = {2022},
date = {2022-06-21},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
pages = {7368-7376},
publisher = {European Language Resources Association},
address = {Marseille, France},
abstract = {We focus on the syntactic variation and measure syntactic distances between nine Slavic languages (Belarusian, Bulgarian, Croatian, Czech, Polish, Slovak, Slovene, Russian, and Ukrainian) using symmetric measures of insertion, deletion and movement of syntactic units in the parallel sentences of the fable "The North Wind and the Sun". Additionally, we investigate phonetic and orthographic asymmetries between selected languages by means of the information theoretical notion of surprisal. Syntactic distance and surprisal are, thus, considered as potential predictors of mutual intelligibility between related languages. In spoken and written cloze test experiments for Slavic native speakers, the presented predictors will be validated as to whether variations in syntax lead to a slower or impeded intercomprehension of Slavic texts.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Ortmann, Katrin

Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans Inproceedings

Proceedings of the Language Resources and Evaluation Conference (LREC), European Language Resources Association, pp. 1400-1407, Marseille, France, 2022.

The traditional evaluation of labeled spans with precision, recall, and F1-score has undesirable effects due to double penalties. Annotations with incorrect label or boundaries count as two errors instead of one, despite being closer to the target annotation than false positives or false negatives. In this paper, new error types are introduced, which more accurately reflect true annotation quality and ensure that every annotation counts only once. An algorithm for error identification in flat and multi-level annotations is presented and complemented with a proposal on how to calculate meaningful precision, recall, and F1-scores based on the more fine-grained error types. The exemplary application to three different annotation tasks (NER, chunking, parsing) shows that the suggested procedure not only prevents double penalties but also allows for a more detailed error analysis, thereby providing more insight into the actual weaknesses of a system.

@inproceedings{ortmann2022,
title = {Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans},
author = {Katrin Ortmann},
url = {https://aclanthology.org/2022.lrec-1.150},
year = {2022},
date = {2022-06-21},
booktitle = {Proceedings of the Language Resources and Evaluation Conference (LREC)},
pages = {1400-1407},
publisher = {European Language Resources Association},
address = {Marseille, France},
abstract = {The traditional evaluation of labeled spans with precision, recall, and F1-score has undesirable effects due to double penalties. Annotations with incorrect label or boundaries count as two errors instead of one, despite being closer to the target annotation than false positives or false negatives. In this paper, new error types are introduced, which more accurately reflect true annotation quality and ensure that every annotation counts only once. An algorithm for error identification in flat and multi-level annotations is presented and complemented with a proposal on how to calculate meaningful precision, recall, and F1-scores based on the more fine-grained error types. The exemplary application to three different annotation tasks (NER, chunking, parsing) shows that the suggested procedure not only prevents double penalties but also allows for a more detailed error analysis, thereby providing more insight into the actual weaknesses of a system.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C6

Menzel, Katrin; Krielke, Marie-Pauline; Degaetano-Ortlieb, Stefania

Synthetic and analytic adjective negation in English scientific journal articles: A diachronic perspective Journal Article

LEGE ARTIS: Language yesterday, today, tomorrow, VII, Trnava: University of SS Cyril and Methodius in Trnava, pp. 157-213, 2022, ISSN 2453-8035 .

This paper addresses the development of synthetic and analytic adjective negation in a corpus of English scientific articles from the mid-17th century towards the end of the 20th century. Analytic patterns of adjective negation are found to become less frequent in the language of scientific articles, but more conventionalised in their textual contexts. Conversely, prefixed negated adjectives are identified as more frequent and more diverse with regard to their contexts.

@article{menzel_2022_diachronicperspective,
title = {Synthetic and analytic adjective negation in English scientific journal articles: A diachronic perspective},
author = {Katrin Menzel and Marie-Pauline Krielke and Stefania Degaetano-Ortlieb},
url = {https://www.researchgate.net/publication/361099180_Synthetic_and_analytic_adjective_negation_in_English_scientific_journal_articles_A_diachronic_perspective},
year = {2022},
date = {2022},
journal = {LEGE ARTIS: Language yesterday, today, tomorrow},
pages = {157-213},
publisher = {Trnava: University of SS Cyril and Methodius in Trnava},
volume = {VII},
number = {1},
abstract = {This paper addresses the development of synthetic and analytic adjective negation in a corpus of English scientific articles from the mid-17th century towards the end of the 20th century. Analytic patterns of adjective negation are found to become less frequent in the language of scientific articles, but more conventionalised in their textual contexts. Conversely, prefixed negated adjectives are identified as more frequent and more diverse with regard to their contexts.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Scholman, Merel; Blything, Liam; Cain, Kate; Evers-Vermeul, Jacqueline

Discourse Rules:The Effects of Clause Order Principles on the Reading Process Journal Article

Language, Cognition and Neuroscience, 37(10), pp. 1277-1291, 2022, ISSN 2327-3798 .

In an eye-tracking-while-reading study, we investigated adult monolinguals’ (N=80) processing of two-clause sentences embedded in short narratives. Three principles theorized to guide comprehension of complex sentences were contrasted: one operating at the clause level, namely clause structure (main clause – subordinate clause or vice versa), and two operating at the discourse-level, namely givenness (given-new vs. new-given) and event order (chronological vs. reverse order). The results indicate that clause structure mainly affects early stages of processing, whereas the two principles operating at the discourse level are more important during later stages and for reading times of the entire sentence. Event order was found to operate relatively independently of the other principles. Givenness was found to overrule clause structure, a phenomenon that can be related to the grounding function of preposed subordinate clauses. We propose a new principle to reflect this interaction effect: the grounding principle.

@article{Merel_Rules_2022,
title = {Discourse Rules:The Effects of Clause Order Principles on the Reading Process},
author = {Merel Scholman and Liam Blything and Kate Cain and Jacqueline Evers-Vermeul},
url = {https://www.tandfonline.com/doi/full/10.1080/23273798.2022.2077971},
doi = {https://doi.org/10.1080/23273798.2022.2077971},
year = {2022},
date = {2022},
journal = {Language, Cognition and Neuroscience},
pages = {1277-1291},
volume = {37(10)},
abstract = {In an eye-tracking-while-reading study, we investigated adult monolinguals’ (N=80) processing of two-clause sentences embedded in short narratives. Three principles theorized to guide comprehension of complex sentences were contrasted: one operating at the clause level, namely clause structure (main clause - subordinate clause or vice versa), and two operating at the discourse-level, namely givenness (given-new vs. new-given) and event order (chronological vs. reverse order). The results indicate that clause structure mainly affects early stages of processing, whereas the two principles operating at the discourse level are more important during later stages and for reading times of the entire sentence. Event order was found to operate relatively independently of the other principles. Givenness was found to overrule clause structure, a phenomenon that can be related to the grounding function of preposed subordinate clauses. We propose a new principle to reflect this interaction effect: the grounding principle.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Kravtchenko, Ekaterina; Demberg, Vera

Informationally redundant utterances elicit pragmatic inferences Journal Article

Cognition, 225, pp. 105159, 2022, ISSN 0010-0277.

Most theories of pragmatics and language processing predict that speakers avoid excessive informational redundancy. Informationally redundant utterances are, however, quite common in natural dialogue. From a comprehension standpoint, it remains unclear how comprehenders interpret these utterances, and whether they make attempts to reconcile the ‚dips‘ in informational utility with expectations of ‚appropriate‘ or ‚rational‘ speaker informativity. We show that informationally redundant (overinformative) utterances can trigger pragmatic inferences that increase utterance utility in line with comprehender expectations. In a series of three studies, we look at utterances which refer to stereotyped event sequences describing common activities (scripts). When comprehenders encounter utterances describing events that can be easily inferred from prior context, they interpret them as signifying that the event conveys new, unstated information (i.e. an event otherwise assumed to be habitual, such as paying the cashier when shopping, is reinterpreted as non-habitual). We call these inferences atypicality inferences. Further, we show that the degree to which these atypicality inferences are triggered depends on the framing of the utterance. In the absence of an exclamation mark or a discourse marker indicating the speaker’s specific intent to communicate the given information, such inferences are far less likely to arise. Overall, the results demonstrate that excessive conceptual redundancy leads to comprehenders revising the conversational common ground, in an effort to accommodate unexpected dips in informational utility.

@article{Kravtchenko_redundant_2022,
title = {Informationally redundant utterances elicit pragmatic inferences},
author = {Ekaterina Kravtchenko and Vera Demberg},
url = {https://www.sciencedirect.com/science/article/pii/S0010027722001470},
doi = {https://doi.org/ 10.1016/j.cognition.2022.105159},
year = {2022},
date = {2022},
journal = {Cognition},
pages = {105159},
volume = {225},
abstract = {Most theories of pragmatics and language processing predict that speakers avoid excessive informational redundancy. Informationally redundant utterances are, however, quite common in natural dialogue. From a comprehension standpoint, it remains unclear how comprehenders interpret these utterances, and whether they make attempts to reconcile the 'dips' in informational utility with expectations of 'appropriate' or 'rational' speaker informativity. We show that informationally redundant (overinformative) utterances can trigger pragmatic inferences that increase utterance utility in line with comprehender expectations. In a series of three studies, we look at utterances which refer to stereotyped event sequences describing common activities (scripts). When comprehenders encounter utterances describing events that can be easily inferred from prior context, they interpret them as signifying that the event conveys new, unstated information (i.e. an event otherwise assumed to be habitual, such as paying the cashier when shopping, is reinterpreted as non-habitual). We call these inferences atypicality inferences. Further, we show that the degree to which these atypicality inferences are triggered depends on the framing of the utterance. In the absence of an exclamation mark or a discourse marker indicating the speaker's specific intent to communicate the given information, such inferences are far less likely to arise. Overall, the results demonstrate that excessive conceptual redundancy leads to comprehenders revising the conversational common ground, in an effort to accommodate unexpected dips in informational utility.},
keywords = {Accommodation; Context-dependent implicatures; Experimental pragmatics; Psycholinguistics; Redundancy},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A3

Sommerfeld, Linda; Staudte, Maria; Kray, Jutta

Ratings of name agreement and semantic categorization of 247 colored clipart pictures by young German children Journal Article

Acta Psychologica, 226, pp. 103558, 2022, ISSN 0001-6918.

Developmental and longitudinal studies with children increasingly use pictorial stimuli in cognitive, psychologic, and psycholinguistic research. To enhance validity and comparability within and across those studies, the use of normed pictures is recommended. Besides, creating picture sets and evaluating them in rating studies is very time consuming, in particular regarding samples of young children in which testing time is rather limited. As there is an increasing number of studies that investigate young German children’s semantic language processing with colored clipart stimuli, this work provides a first set of 247 colored cliparts with ratings of German native speaking children aged 4 to 6 years. We assessed two central rating aspects of pictures: Name agreement (Do pictures elicit the intended name of an object?) and semantic categorization (Are objects classified as members of the intended semantic category?). Our ratings indicate that children are proficient in naming and even better in semantic categorization of objects, whereas both seems to improve with increasing age of young childhood. Finally, this paper discusses some features of pictorial objects that might be important for children’s name agreement and semantic categorization and could be considered in future picture rating studies.

 

@article{Sommerfeld_of_2022,
title = {Ratings of name agreement and semantic categorization of 247 colored clipart pictures by young German children},
author = {Linda Sommerfeld and Maria Staudte and Jutta Kray},
url = {https://www.sciencedirect.com/science/article/pii/S0001691822000737},
doi = {https://doi.org/https://doi.org/10.1016/j.actpsy.2022.103558},
year = {2022},
date = {2022},
journal = {Acta Psychologica},
pages = {103558},
volume = {226},
abstract = {Developmental and longitudinal studies with children increasingly use pictorial stimuli in cognitive, psychologic, and psycholinguistic research. To enhance validity and comparability within and across those studies, the use of normed pictures is recommended. Besides, creating picture sets and evaluating them in rating studies is very time consuming, in particular regarding samples of young children in which testing time is rather limited. As there is an increasing number of studies that investigate young German children's semantic language processing with colored clipart stimuli, this work provides a first set of 247 colored cliparts with ratings of German native speaking children aged 4 to 6 years. We assessed two central rating aspects of pictures: Name agreement (Do pictures elicit the intended name of an object?) and semantic categorization (Are objects classified as members of the intended semantic category?). Our ratings indicate that children are proficient in naming and even better in semantic categorization of objects, whereas both seems to improve with increasing age of young childhood. Finally, this paper discusses some features of pictorial objects that might be important for children's name agreement and semantic categorization and could be considered in future picture rating studies.},
keywords = {Name agreement, Semantic categorization, Picture naming, Picture ratings, Children, Age differences},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A5

Höltje, Gerrit; Mecklinger, Axel

Benefits and costs of predictive processing: How sentential constraint and word expectedness affect memory formation Journal Article

Brain Research, pp. 147942, 2022, ISSN 0006-8993.

This study investigated how the strength of schema support provided by strongly (SC) and weakly constraining (WC) sentences affects the encoding of expected and unexpected words, and how this is reflected in event-related potentials (ERPs). In a surprise recognition memory test, words studied on the previous day were presented together with new words and lures that were expected but not presented in the study phase. ERPs recorded in the study phase were compared for subsequently remembered and forgotten words. Better memory performance for expected over unexpected words was electrophysiologically supported by a parietal subsequent memory effect (SME) reflecting enhanced item-specific encoding of contextually expected words. SC sentences not only facilitated the semantic integration of sentence-ending words, as reflected in reduced N400 amplitudes, but also enabled the rapid successful encoding of these words into memory, which is evidenced by an SC > WC pattern in memory performance and correlations between pre- and post-stimulus SMEs for SC sentences. In contrast, words processed in WC sentence contexts necessitated sustained elaborative encoding processes as reflected in a late frontal slow wave SME. Expected but not presented words were associated with high rates of false positive memory decisions, indicating that these words remained in a state of high accessibility in memory even one day after the study phase. These mnemonic costs of predictive processing were more pronounced for expected words from SC sentences than from WC sentences and could reflect the lingering of strong semantic predictions which were associated with the pre-updating of sentence representations.

@article{Höltje_and_2022,
title = {Benefits and costs of predictive processing: How sentential constraint and word expectedness affect memory formation},
author = {Gerrit H{\"o}ltje and Axel Mecklinger},
url = {https://www.sciencedirect.com/science/article/abs/pii/S0006899322001664},
doi = {https://doi.org/10.1016/j.brainres.2022.147942},
year = {2022},
date = {2022},
journal = {Brain Research},
pages = {147942},
number = {1788},
abstract = {This study investigated how the strength of schema support provided by strongly (SC) and weakly constraining (WC) sentences affects the encoding of expected and unexpected words, and how this is reflected in event-related potentials (ERPs). In a surprise recognition memory test, words studied on the previous day were presented together with new words and lures that were expected but not presented in the study phase. ERPs recorded in the study phase were compared for subsequently remembered and forgotten words. Better memory performance for expected over unexpected words was electrophysiologically supported by a parietal subsequent memory effect (SME) reflecting enhanced item-specific encoding of contextually expected words. SC sentences not only facilitated the semantic integration of sentence-ending words, as reflected in reduced N400 amplitudes, but also enabled the rapid successful encoding of these words into memory, which is evidenced by an SC > WC pattern in memory performance and correlations between pre- and post-stimulus SMEs for SC sentences. In contrast, words processed in WC sentence contexts necessitated sustained elaborative encoding processes as reflected in a late frontal slow wave SME. Expected but not presented words were associated with high rates of false positive memory decisions, indicating that these words remained in a state of high accessibility in memory even one day after the study phase. These mnemonic costs of predictive processing were more pronounced for expected words from SC sentences than from WC sentences and could reflect the lingering of strong semantic predictions which were associated with the pre-updating of sentence representations.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A6

Successfully