Publications

Yuen, Ivan; Andreeva, Bistra; Ibrahim, Omnia; Möbius, Bernd

Prosodic factors do not always suppress discourse or surprisal factors on word-final syllable duration in German polysyllabic words Incollection

Lemke, Robin; Schäfer, Lisa; Reich, Ingo (Ed.): Information Structure and Information Theory, Language Science Press, pp. 215-234, Berlin, 2024.

Predictability is known to influence acoustic duration (e.g., Ibrahim et al. 2022) and prosodic factors such as accenting and boundary-related lengthening have been postulated to account for this effect (e.g., Aylett & Turk 2004). However, it has also been shown that other factors such as information status or speech styles could contribute to acoustic duration (e.g. Baker & Bradlow 2009). This raises the question as to whether acoustic duration is primarily subject to the influence of prosody that reflects linguistic structure including predictability. The current study addressed this question by examining the acoustic duration of word-final syllables in polysyllabic words in DIRNDL, a German radio broadcast corpus (e.g. Eckart et al. 2012). We analysed polysyllabic words followed by an intermediate phrase or an intonational phrase boundary, with or without accenting, and with given or new information status. Our results indicate that the acoustic duration of the word-final syllable was subject to the effect of prosodic boundary for long host words, in line with Aylett & Turk (2004); however, we also observed additional effects of information status, log surprisal and accenting for short host words, in line with Baker & Bradlow (2009). These results suggest that acoustic duration is subject to the influence of prosodic (e.g., boundary and accenting) and linguistic factors (e.g., information status and surprisal), and that the primacy of prosodic factors impacting on acoustic duration is further constrained by some intrinsic durational constraints, for example word length.

@incollection{Yuen/etal:2024b,
title = {Prosodic factors do not always suppress discourse or surprisal factors on word-final syllable duration in German polysyllabic words},
author = {Ivan Yuen and Bistra Andreeva and Omnia Ibrahim and Bernd M{\"o}bius},
editor = {Robin Lemke and Lisa Sch{\"a}fer and Ingo Reich},
url = {https://zenodo.org/records/13383799},
doi = {https://doi.org/10.5281/zenodo.13383799},
year = {2024},
date = {2024},
booktitle = {Information Structure and Information Theory},
pages = {215-234},
publisher = {Language Science Press},
address = {Berlin},
abstract = {Predictability is known to influence acoustic duration (e.g., Ibrahim et al. 2022) and prosodic factors such as accenting and boundary-related lengthening have been postulated to account for this effect (e.g., Aylett & Turk 2004). However, it has also been shown that other factors such as information status or speech styles could contribute to acoustic duration (e.g. Baker & Bradlow 2009). This raises the question as to whether acoustic duration is primarily subject to the influence of prosody that reflects linguistic structure including predictability. The current study addressed this question by examining the acoustic duration of word-final syllables in polysyllabic words in DIRNDL, a German radio broadcast corpus (e.g. Eckart et al. 2012). We analysed polysyllabic words followed by an intermediate phrase or an intonational phrase boundary, with or without accenting, and with given or new information status. Our results indicate that the acoustic duration of the word-final syllable was subject to the effect of prosodic boundary for long host words, in line with Aylett & Turk (2004); however, we also observed additional effects of information status, log surprisal and accenting for short host words, in line with Baker & Bradlow (2009). These results suggest that acoustic duration is subject to the influence of prosodic (e.g., boundary and accenting) and linguistic factors (e.g., information status and surprisal), and that the primacy of prosodic factors impacting on acoustic duration is further constrained by some intrinsic durational constraints, for example word length.},
pubstate = {published},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   C1

Pellegrino, Elisa; Dellwo, Volker; Pardo, Jennifer; Möbius, Bernd

Forms, factors and functions of phonetic convergence: Editorial Journal Article

Speech Communication, 165, 2024.
This introductory article for the Special Issue on Forms, Factors and Functions of Phonetic Convergence offers a comprehensive overview of the dominant theoretical paradigms, elicitation methods, and computational approaches pertaining to phonetic convergence, and discusses the role of established factors shaping interspeakers’ acoustic adjustments. The nine papers in this collection offer new insights into the fundamental mechanisms, factors and functions behind accommodation in production and perception, and in the perception of accommodation. By integrating acoustic, articulatory and perceptual evaluations of convergence, and combining traditional experimental phonetic analysis with computational modeling, the nine papers (1) emphasize the roles of cognitive adaptability and phonetic variability as triggers for convergence, (2) reveal fundamental similarities between the mechanisms of convergence perception and speaker identification, and (3) shed light on the evolutionary link between adaptation in human and animal vocalizations.

@article{Pellegrino/etal:2024,
title = {Forms, factors and functions of phonetic convergence: Editorial},
author = {Elisa Pellegrino and Volker Dellwo and Jennifer Pardo and Bernd M{\"o}bius},
url = {https://www.sciencedirect.com/science/article/pii/S0167639324001134},
doi = {https://doi.org/10.1016/j.specom.2024.103142},
year = {2024},
date = {2024},
journal = {Speech Communication},
volume = {165},
abstract = {

This introductory article for the Special Issue on Forms, Factors and Functions of Phonetic Convergence offers a comprehensive overview of the dominant theoretical paradigms, elicitation methods, and computational approaches pertaining to phonetic convergence, and discusses the role of established factors shaping interspeakers’ acoustic adjustments. The nine papers in this collection offer new insights into the fundamental mechanisms, factors and functions behind accommodation in production and perception, and in the perception of accommodation. By integrating acoustic, articulatory and perceptual evaluations of convergence, and combining traditional experimental phonetic analysis with computational modeling, the nine papers (1) emphasize the roles of cognitive adaptability and phonetic variability as triggers for convergence, (2) reveal fundamental similarities between the mechanisms of convergence perception and speaker identification, and (3) shed light on the evolutionary link between adaptation in human and animal vocalizations.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Abdullah, Badr M.

The representation of speech variability and variation in deep neural networks PhD Thesis

Saarländische Universitäts- und Landesbibliothek, Saarland University, Saarbruecken, Germany, 2024.

The central aim of this thesis is to bridge between the study of human speech variability and representation learning, focusing on how modern deep neural networks (DNNs) process and encode speech variability and variation in their latent representations. Diverging from prior machine learning research which has primarily focused on improving model performance in the face of variability, this thesis seeks to provide better insights into how different dimensions of speech variability shape neural network representations. The first part of this thesis, concerned with neural models of spoken language identification, introduces two studies investigating the model’s adaptability to domain variability and the extent to which the model representations capture cross-linguistic variation. The second part of this thesis focuses on neural models of spoken-word representations, presenting three studies that explore various dimensions of variability including: the encoding of word-form variability in the model representational geometry, the variability of linguistic experience and its role in shaping non-native spoken-word representations, and the integration of high-level lexical knowledge into the model to abstract from variability in word acoustic realization. The third and final part of this thesis analyzes the latent discrete representations in transformer-based speech models trained with self-supervision and codebook learning, and demonstrates that information-theoretic metrics reflect acoustic-phonetic variability in segment realization. In summary, this thesis makes tangible contributions by uncovering how neural models encode domain, acoustic-phonetic, and cross-linguistic variation, exploring the role of L1/L2 similarity on non-native spoken-word processing, and characterizing the relationship between discrete speech representations and abstract phonetic categories such as phonemes. Throughout six diverse studies, this thesis takes an interdisciplinary perspective and demonstrates the utility of machine learning models as a potent scientific tool to answer novel and linguistically-informed research questions that are grounded in the fields of sociolinguistics, speech perception, and cognitive modeling research.


Das zentrale Ziel dieser Dissertation ist es, die Forschungslücke zwischen der Untersuchung von Variabilität und Variation in der menschlichen Sprache und der maschinellen Verarbeitung von Sprache auf der Grundlage von Repräsentationslernen zu schließen, um neue Erkenntnisse darüber zu gewinnen, wie moderne tiefe neuronale Netze (DNNs) verschiedene Dimensionen der Sprachvariabilität in ihren Repräsentationen verarbeiten und kodieren. Obwohl einige Aspekte der Variabilität in früheren Forschungsarbeiten zur computergestützten Sprachverarbeitung behandelt wurden, lag der Hauptschwerpunkt bei vorherigen Ansätzen des maschinellen Lernens stets auf der Entwicklung von Modellen, die robust gegenüber Variationen in den Aufnahme- und Akustikbedingungen sind, sowie auf der Generalisierungsfähigkeit gegenüber Unstimmigkeiten zwischen Trainingsund Testdaten aufgrund von Domänen-, Sprecher- und linguistischen Variationen. Daher konzentrierten sich die Forschungsbemühungen in der bisherigen Sprachrepr äsentationsforschung in erster Linie auf die Verbesserung der Leistungsmetriken für eine bestimmte Aufgabe bei Vorhandensein einer Variabilitätsquelle. Anstelle dieses leistungsorientierten Ansatzes nimmt diese Dissertation eine andere Perspektive ein und zielt darauf ab, zu analysieren und zu verstehen, wie das Repräsentationsprofil von neuronalen Sprachnetzwerken durch verschiedene Dimensionen der Sprachvariabilität geformt wird, wie z.B. Domänenvariabilität, sprachübergreifende Variation, Variabilität innerhalb der Kategorie, Variabilität in der sprachlichen Erfahrung und akustische Variabilität abstrakter phonetischer Kategorien In dieser Dissertation werden sechs Studien vorgestellt, die in drei verschiedene Teile gegliedert sind, wobei jeder Teil einer Sprachverarbeitungsaufgabe gewidmet ist. Im ersten Teil der Dissertation stelle ich zwei Studien vor, die sich mit neuronalen Modellen zur Identifikation gesprochener Sprache (SLID) befassen, um ihre Anpassungsfähigkeit an Domänenvariabilität zu untersuchen (Studie I) und zu analysieren, inwieweit sie sprachübergreifende Variationen darstellen (Studie II). In Studie I zeige ich, dass DNNs – wie erwartet – nicht robust gegen Domänenvariabilität sind, jedoch können bestimmte Trainingsstrategien (z.B adversarial learning) effektiv sein, um zu verhindern, dass das Modell Abkürzungen in den Daten lernt, um seine domänenübergreifende Generalisierung zu verbessern. In Studie II zeige ich, dass die Repräsentationen neuronaler Netze sprachübergreifende Ähnlichkeit erfassen und in einer Weise geclustert sind, die Sprachverwandtschaft widerspiegelt. Im zweiten Teil der Dissertation stelle ich drei Studien vor, die sich mit neuronalen Modellen des Keyword-Spotting und der akustischen Worteinbettung befassen, um die Variabilität von gesprochenen Wortrealisierungen zu untersuchen. Zunächst gehe ich näher auf die Geometrie des Repräsentationsraums für gesprochene Wörter ein, um zu untersuchen, wie er die Variabilität von Beispielen innerhalb einer Kategorie kodiert und wie sich die Variabilität in den Anfangsbedingungen des Modells auf die Repräsentationen auswirkt, sobald sie konvergiert sind (Studie IV). Anschließend wird eine Studie vorgestellt, die darauf abzielt, die Variabilität der sprachlichen Erfahrung und ihre Rolle bei der Verarbeitung nicht-muttersprachlicher Sprache zu modellieren (Studie V). Konkret wird in dieser Studie die sprachliche Erfahrung als die Muttersprache (L1) des Modells während des Trainings charakterisiert und die Verarbeitung nichtmuttersprachlicher gesprochener Wörter simuliert, indem das Ausmaß gemessen wird, in dem nicht-muttersprachliche Modelle muttersprachliche Repräsentationen von gesprochenen Wörtern erzeugen. Schließlich stelle ich ein Berechnungsmodell für die Repräsentation gesprochener Wörter vor, das von der menschlichen Sprachverarbeitung inspiriert ist und eine Zuordnung zwischen der akustischen Form und einer semantischen Repräsentation auf abstrakter Ebene erlernt, die lexikalisches Wissen kodiert (Studie V). Ich zeige, dass die Integration von lexikalischem Wissen in das Training gesprochener Wortrepräsentationen die Fähigkeit des Modells verbessert, zwischen lexikalischen Kategorien zu unterscheiden, und das Modell ermutigt, von der Variabilität des Sprechers und des lexikalischen Kontexts zu abstrahieren. Im dritten Teil konzentriere ich mich auf die diskreten Repräsentationen von Sprache, die sich beim Training von Transformer-Modellen durch Selbstüberwachtesund Codebuchlernen entstehen. In diesem Teil wird ein Ansatz zur Charakterisierung der Beziehung zwischen diskreten Sprachrepräsentationen und abstrakten phonetischen Kategorien wie Phonemen vorgestellt. Konkret schlägt das Kapitel zunächst einen informationstheoretischen Rahmen vor, in dem jede phonetische Kategorie als eine Verteilung über diskrete Einheiten dargestellt wird. Die Studie zeigt, dass die Entropie phonetischer Verteilungen die akustisch-phonetische Variabilität der zugrunde liegenden Sprachlaute widerspiegelt, wobei Sonoranten im Durchschnitt entropischer sind als Obstruenten. Darüber hinaus zeigt sich, dass phonetisch ähnliche Laute auf niedriger Ebene ähnliche Verteilungen aufweisen, während eine Clusteranalyse zeigt, dass die höchste Ebene der Aufteilung Obstruenten und Sonoranten trennt. Insgesamt bietet diese Dissertation wertvolle Einblicke in die Art und Weise, wie DNNs Sprachvariabilität über mehrere Dimensionen hinweg verarbeiten und kodieren. Dies verbessert unser Verständnis von Sprachverarbeitung und trägt zur Entwicklung robusterer und linguistisch informierter Sprachtechnologieanwendungen bei.

@phdthesis{Abdullah_Diss,
title = {The representation of speech variability and variation in deep neural networks},
author = {Badr M. Abdullah},
url = {https://jahrbib.sulb.uni-saarland.de/handle/20.500.11880/38479},
doi = {https://doi.org/10.22028/D291-42719},
year = {2024},
date = {2024},
school = {Saarland University},
publisher = {Saarl{\"a}ndische Universit{\"a}ts- und Landesbibliothek},
address = {Saarbruecken, Germany},
abstract = {The central aim of this thesis is to bridge between the study of human speech variability and representation learning, focusing on how modern deep neural networks (DNNs) process and encode speech variability and variation in their latent representations. Diverging from prior machine learning research which has primarily focused on improving model performance in the face of variability, this thesis seeks to provide better insights into how different dimensions of speech variability shape neural network representations. The first part of this thesis, concerned with neural models of spoken language identification, introduces two studies investigating the model’s adaptability to domain variability and the extent to which the model representations capture cross-linguistic variation. The second part of this thesis focuses on neural models of spoken-word representations, presenting three studies that explore various dimensions of variability including: the encoding of word-form variability in the model representational geometry, the variability of linguistic experience and its role in shaping non-native spoken-word representations, and the integration of high-level lexical knowledge into the model to abstract from variability in word acoustic realization. The third and final part of this thesis analyzes the latent discrete representations in transformer-based speech models trained with self-supervision and codebook learning, and demonstrates that information-theoretic metrics reflect acoustic-phonetic variability in segment realization. In summary, this thesis makes tangible contributions by uncovering how neural models encode domain, acoustic-phonetic, and cross-linguistic variation, exploring the role of L1/L2 similarity on non-native spoken-word processing, and characterizing the relationship between discrete speech representations and abstract phonetic categories such as phonemes. Throughout six diverse studies, this thesis takes an interdisciplinary perspective and demonstrates the utility of machine learning models as a potent scientific tool to answer novel and linguistically-informed research questions that are grounded in the fields of sociolinguistics, speech perception, and cognitive modeling research.


Das zentrale Ziel dieser Dissertation ist es, die Forschungsl{\"u}cke zwischen der Untersuchung von Variabilit{\"a}t und Variation in der menschlichen Sprache und der maschinellen Verarbeitung von Sprache auf der Grundlage von Repr{\"a}sentationslernen zu schlie{\ss}en, um neue Erkenntnisse dar{\"u}ber zu gewinnen, wie moderne tiefe neuronale Netze (DNNs) verschiedene Dimensionen der Sprachvariabilit{\"a}t in ihren Repr{\"a}sentationen verarbeiten und kodieren. Obwohl einige Aspekte der Variabilit{\"a}t in fr{\"u}heren Forschungsarbeiten zur computergest{\"u}tzten Sprachverarbeitung behandelt wurden, lag der Hauptschwerpunkt bei vorherigen Ans{\"a}tzen des maschinellen Lernens stets auf der Entwicklung von Modellen, die robust gegen{\"u}ber Variationen in den Aufnahme- und Akustikbedingungen sind, sowie auf der Generalisierungsf{\"a}higkeit gegen{\"u}ber Unstimmigkeiten zwischen Trainingsund Testdaten aufgrund von Dom{\"a}nen-, Sprecher- und linguistischen Variationen. Daher konzentrierten sich die Forschungsbem{\"u}hungen in der bisherigen Sprachrepr {\"a}sentationsforschung in erster Linie auf die Verbesserung der Leistungsmetriken f{\"u}r eine bestimmte Aufgabe bei Vorhandensein einer Variabilit{\"a}tsquelle. Anstelle dieses leistungsorientierten Ansatzes nimmt diese Dissertation eine andere Perspektive ein und zielt darauf ab, zu analysieren und zu verstehen, wie das Repr{\"a}sentationsprofil von neuronalen Sprachnetzwerken durch verschiedene Dimensionen der Sprachvariabilit{\"a}t geformt wird, wie z.B. Dom{\"a}nenvariabilit{\"a}t, sprach{\"u}bergreifende Variation, Variabilit{\"a}t innerhalb der Kategorie, Variabilit{\"a}t in der sprachlichen Erfahrung und akustische Variabilit{\"a}t abstrakter phonetischer Kategorien In dieser Dissertation werden sechs Studien vorgestellt, die in drei verschiedene Teile gegliedert sind, wobei jeder Teil einer Sprachverarbeitungsaufgabe gewidmet ist. Im ersten Teil der Dissertation stelle ich zwei Studien vor, die sich mit neuronalen Modellen zur Identifikation gesprochener Sprache (SLID) befassen, um ihre Anpassungsf{\"a}higkeit an Dom{\"a}nenvariabilit{\"a}t zu untersuchen (Studie I) und zu analysieren, inwieweit sie sprach{\"u}bergreifende Variationen darstellen (Studie II). In Studie I zeige ich, dass DNNs - wie erwartet - nicht robust gegen Dom{\"a}nenvariabilit{\"a}t sind, jedoch k{\"o}nnen bestimmte Trainingsstrategien (z.B adversarial learning) effektiv sein, um zu verhindern, dass das Modell Abk{\"u}rzungen in den Daten lernt, um seine dom{\"a}nen{\"u}bergreifende Generalisierung zu verbessern. In Studie II zeige ich, dass die Repr{\"a}sentationen neuronaler Netze sprach{\"u}bergreifende {\"A}hnlichkeit erfassen und in einer Weise geclustert sind, die Sprachverwandtschaft widerspiegelt. Im zweiten Teil der Dissertation stelle ich drei Studien vor, die sich mit neuronalen Modellen des Keyword-Spotting und der akustischen Worteinbettung befassen, um die Variabilit{\"a}t von gesprochenen Wortrealisierungen zu untersuchen. Zun{\"a}chst gehe ich n{\"a}her auf die Geometrie des Repr{\"a}sentationsraums f{\"u}r gesprochene W{\"o}rter ein, um zu untersuchen, wie er die Variabilit{\"a}t von Beispielen innerhalb einer Kategorie kodiert und wie sich die Variabilit{\"a}t in den Anfangsbedingungen des Modells auf die Repr{\"a}sentationen auswirkt, sobald sie konvergiert sind (Studie IV). Anschlie{\ss}end wird eine Studie vorgestellt, die darauf abzielt, die Variabilit{\"a}t der sprachlichen Erfahrung und ihre Rolle bei der Verarbeitung nicht-muttersprachlicher Sprache zu modellieren (Studie V). Konkret wird in dieser Studie die sprachliche Erfahrung als die Muttersprache (L1) des Modells w{\"a}hrend des Trainings charakterisiert und die Verarbeitung nichtmuttersprachlicher gesprochener W{\"o}rter simuliert, indem das Ausma{\ss} gemessen wird, in dem nicht-muttersprachliche Modelle muttersprachliche Repr{\"a}sentationen von gesprochenen W{\"o}rtern erzeugen. Schlie{\ss}lich stelle ich ein Berechnungsmodell f{\"u}r die Repr{\"a}sentation gesprochener W{\"o}rter vor, das von der menschlichen Sprachverarbeitung inspiriert ist und eine Zuordnung zwischen der akustischen Form und einer semantischen Repr{\"a}sentation auf abstrakter Ebene erlernt, die lexikalisches Wissen kodiert (Studie V). Ich zeige, dass die Integration von lexikalischem Wissen in das Training gesprochener Wortrepr{\"a}sentationen die F{\"a}higkeit des Modells verbessert, zwischen lexikalischen Kategorien zu unterscheiden, und das Modell ermutigt, von der Variabilit{\"a}t des Sprechers und des lexikalischen Kontexts zu abstrahieren. Im dritten Teil konzentriere ich mich auf die diskreten Repr{\"a}sentationen von Sprache, die sich beim Training von Transformer-Modellen durch Selbst{\"u}berwachtesund Codebuchlernen entstehen. In diesem Teil wird ein Ansatz zur Charakterisierung der Beziehung zwischen diskreten Sprachrepr{\"a}sentationen und abstrakten phonetischen Kategorien wie Phonemen vorgestellt. Konkret schl{\"a}gt das Kapitel zun{\"a}chst einen informationstheoretischen Rahmen vor, in dem jede phonetische Kategorie als eine Verteilung {\"u}ber diskrete Einheiten dargestellt wird. Die Studie zeigt, dass die Entropie phonetischer Verteilungen die akustisch-phonetische Variabilit{\"a}t der zugrunde liegenden Sprachlaute widerspiegelt, wobei Sonoranten im Durchschnitt entropischer sind als Obstruenten. Dar{\"u}ber hinaus zeigt sich, dass phonetisch {\"a}hnliche Laute auf niedriger Ebene {\"a}hnliche Verteilungen aufweisen, w{\"a}hrend eine Clusteranalyse zeigt, dass die h{\"o}chste Ebene der Aufteilung Obstruenten und Sonoranten trennt. Insgesamt bietet diese Dissertation wertvolle Einblicke in die Art und Weise, wie DNNs Sprachvariabilit{\"a}t {\"u}ber mehrere Dimensionen hinweg verarbeiten und kodieren. Dies verbessert unser Verst{\"a}ndnis von Sprachverarbeitung und tr{\"a}gt zur Entwicklung robusterer und linguistisch informierter Sprachtechnologieanwendungen bei.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C4

Greenberg, Clayton

Evaluating humanness in language models PhD Thesis

Saarländische Universitäts- und Landesbibliothek, Saarland University, Saarbruecken, Germany, 2024.

Advances with language models, systems that predict upcoming words in context, have enabled an era in which people sometimes cannot distinguish between human-written and artificially created text. Perplexity, the simplest and most popular way to evaluate the quality of a language model, rewards any pattern captured by the system as long as it robustly constrains the upcoming possibilities. By capturing patterns that humans do not use, optimizing a language model for minimal perplexity could trigger a divergence between the most probable text and the most human-like text. In this thesis, I argue that this divergence has happened for state-of-the-art language models. Part I characterizes the kinds of knowledge captured by language models. First, I present three novel language model architectures whose neural connections were inspired by human behavior. Then, I discuss novel morphology- and sentiment-based paradigms that capture human knowledge quantitatively. Part II establishes several methods for evaluating language models by comparison against human behavior measures. I consider the suitability and potential confounds for offline ratings and two paradigms of online reading times: eye-tracking and G-Maze. Then, I use a novel dataset of G-Maze response times to show computational and linguistic evidence of the divergence.


Fortschritte bei Sprachmodellen (LMs) – Systeme, die aus dem Kontext heraus nachfolgende Worte vorhersagen – haben dazu geführt, dass Menschen manchmal nicht mehr zwischen von Menschen geschriebenem und künstlich erzeugtem Text unterscheiden können. Perplexität (PPL), die einfachste und beliebteste Methode zur Bewertung der Qualität eines LM, belohnt jedes vom System erfasste Muster, solange es die kommenden Möglichkeiten stark einschränkt. Durch die Erfassung von Mustern, die Menschen nicht verwenden, könnte die Optimierung eines LM hinsichtlich minimaler PPL zu einer Divergenz zwischen dem wahrscheinlichsten Text und dem menschenähnlichsten Text führen. In dieser Arbeit wird argumentiert, dass diese Divergenz bei modernen LMs aufgetreten ist. Teil I charakterisiert die Arten von Wissen, die von LMs erfasst werden. Zuerst werden drei neue LM-Architekturen beschreiben, deren neuronale Verbindungen von menschlichem Verhalten inspiriert wurden. Danach werden neuartige morphologie- und sentiment-basierte Paradigmen diskutiert, die menschliches Verhalten quantitativ erfassen. In Teil II werden mehrere Methoden entwickelt, die LMs durch Vergleich mit menschlichen Verhaltensmaßen bewerten. Diskutiert werden die Eignung und mögliche Störfaktoren für Offline-Bewertungen und zwei Paradigmen von Online-Lesezeiten: Eye-Tracking und G-Maze. Ein neuartiger Datensatz der G-Maze-Antwortzeiten wird dazu verwendet, um rechnerische und sprachliche Beweise für die Divergenz zu liefern.

@phdthesis{Greenberg_Diss,
title = {Evaluating humanness in language models},
author = {Clayton Greenberg},
url = {https://jahrbib.sulb.uni-saarland.de/handle/20.500.11880/37534},
doi = {https://doi.org/10.22028/D291-41943},
year = {2024},
date = {2024},
school = {Saarland University},
publisher = {Saarl{\"a}ndische Universit{\"a}ts- und Landesbibliothek},
address = {Saarbruecken, Germany},
abstract = {Advances with language models, systems that predict upcoming words in context, have enabled an era in which people sometimes cannot distinguish between human-written and artificially created text. Perplexity, the simplest and most popular way to evaluate the quality of a language model, rewards any pattern captured by the system as long as it robustly constrains the upcoming possibilities. By capturing patterns that humans do not use, optimizing a language model for minimal perplexity could trigger a divergence between the most probable text and the most human-like text. In this thesis, I argue that this divergence has happened for state-of-the-art language models. Part I characterizes the kinds of knowledge captured by language models. First, I present three novel language model architectures whose neural connections were inspired by human behavior. Then, I discuss novel morphology- and sentiment-based paradigms that capture human knowledge quantitatively. Part II establishes several methods for evaluating language models by comparison against human behavior measures. I consider the suitability and potential confounds for offline ratings and two paradigms of online reading times: eye-tracking and G-Maze. Then, I use a novel dataset of G-Maze response times to show computational and linguistic evidence of the divergence.


Fortschritte bei Sprachmodellen (LMs) - Systeme, die aus dem Kontext heraus nachfolgende Worte vorhersagen - haben dazu gef{\"u}hrt, dass Menschen manchmal nicht mehr zwischen von Menschen geschriebenem und k{\"u}nstlich erzeugtem Text unterscheiden k{\"o}nnen. Perplexit{\"a}t (PPL), die einfachste und beliebteste Methode zur Bewertung der Qualit{\"a}t eines LM, belohnt jedes vom System erfasste Muster, solange es die kommenden M{\"o}glichkeiten stark einschr{\"a}nkt. Durch die Erfassung von Mustern, die Menschen nicht verwenden, k{\"o}nnte die Optimierung eines LM hinsichtlich minimaler PPL zu einer Divergenz zwischen dem wahrscheinlichsten Text und dem menschen{\"a}hnlichsten Text f{\"u}hren. In dieser Arbeit wird argumentiert, dass diese Divergenz bei modernen LMs aufgetreten ist. Teil I charakterisiert die Arten von Wissen, die von LMs erfasst werden. Zuerst werden drei neue LM-Architekturen beschreiben, deren neuronale Verbindungen von menschlichem Verhalten inspiriert wurden. Danach werden neuartige morphologie- und sentiment-basierte Paradigmen diskutiert, die menschliches Verhalten quantitativ erfassen. In Teil II werden mehrere Methoden entwickelt, die LMs durch Vergleich mit menschlichen Verhaltensma{\ss}en bewerten. Diskutiert werden die Eignung und m{\"o}gliche St{\"o}rfaktoren f{\"u}r Offline-Bewertungen und zwei Paradigmen von Online-Lesezeiten: Eye-Tracking und G-Maze. Ein neuartiger Datensatz der G-Maze-Antwortzeiten wird dazu verwendet, um rechnerische und sprachliche Beweise f{\"u}r die Divergenz zu liefern.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   B4

Gessinger, Iona; Andreeva, Bistra; Cowan, Benjamin R.

The Use of Modifiers and f0 in Remote Referential Communication with Human and Computer Partners Inproceedings

Proc. Interspeech 2024, pp. 1575-1579, 2024, ISSN 2958-1796.

The present study investigates referring expressions in a remote interaction context with a human or computer partner (both simulated). Across these conditions, we compare the effect of competitor information being available to both partners (common ground) or only the speaker (privileged ground) on target item descriptions. We analyse the number of adjectival modifiers uttered and show that participants responded to the manipulation of information status in both partner conditions. In addition, we examine whether the information status also affects the prosodic realisation of the descriptions. No sufficient evidence was found for this. As expected, adjectives showed a slightly higher peak f0 when a competitor was present in the common ground than when there was no competitor. However, when analysing the overall f0 contour, there was no systematic difference between conditions.

@inproceedings{gessinger24_interspeech,
title = {The Use of Modifiers and f0 in Remote Referential Communication with Human and Computer Partners},
author = {Iona Gessinger and Bistra Andreeva and Benjamin R. Cowan},
url = {https://www.isca-archive.org/interspeech_2024/gessinger24_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2024-1169},
year = {2024},
date = {2024},
booktitle = {Proc. Interspeech 2024},
issn = {2958-1796},
pages = {1575-1579},
abstract = {

The present study investigates referring expressions in a remote interaction context with a human or computer partner (both simulated). Across these conditions, we compare the effect of competitor information being available to both partners (common ground) or only the speaker (privileged ground) on target item descriptions. We analyse the number of adjectival modifiers uttered and show that participants responded to the manipulation of information status in both partner conditions. In addition, we examine whether the information status also affects the prosodic realisation of the descriptions. No sufficient evidence was found for this. As expected, adjectives showed a slightly higher peak f0 when a competitor was present in the common ground than when there was no competitor. However, when analysing the overall f0 contour, there was no systematic difference between conditions.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Zhang, Miaoran; Mingyang, Wang; Jesujoba , Alabi; Klakow, Dietrich

AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness Inproceedings

Kr. Ojha, Atul; Seza Doğruöz, A.; Tayyar Madabushi, Harish; Da San Martino, Giovanni; Rosenthal, Sara; Rosá, Aiala (Ed.): Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), Association for Computational Linguistics, pp. 800-810, Mexico City, Mexico, 2024.

This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages. The shared task aims at measuring the semantic textual relatedness between pairs of sentences, with a focus on a range of under-represented languages. In this work, we propose using machine translation for data augmentation to address the low-resource challenge of limited training data. Moreover, we apply task-adaptive pre-training on unlabeled task data to bridge the gap between pre-training and task adaptation. For model training, we investigate both full fine-tuning and adapter-based tuning, and adopt the adapter framework for effective zero-shot cross-lingual transfer. We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer).

@inproceedings{zhang2024aadamsemeval2024task1,
title = {AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness},
author = {Miaoran Zhang and Wang Mingyang and Alabi Jesujoba and Dietrich Klakow},
editor = {Atul Kr. Ojha and A. Seza Doğru{\"o}z and Harish Tayyar Madabushi and Giovanni Da San Martino and Sara Rosenthal and Aiala Ros{\'a}},
url = {https://aclanthology.org/2024.semeval-1.114},
doi = {https://doi.org/10.18653/v1/2024.semeval-1.114},
year = {2024},
date = {2024},
booktitle = {Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)},
pages = {800-810},
publisher = {Association for Computational Linguistics},
address = {Mexico City, Mexico},
abstract = {This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages. The shared task aims at measuring the semantic textual relatedness between pairs of sentences, with a focus on a range of under-represented languages. In this work, we propose using machine translation for data augmentation to address the low-resource challenge of limited training data. Moreover, we apply task-adaptive pre-training on unlabeled task data to bridge the gap between pre-training and task adaptation. For model training, we investigate both full fine-tuning and adapter-based tuning, and adopt the adapter framework for effective zero-shot cross-lingual transfer. We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Jablotschkin, Sarah; Zinsmeister, Heike

Wie ist der Anfang von Sätzen in Leichter Sprache? Ergebnisse der Studie LeiKo -- Ein Vergleichs-Korpus für Leichte Sprache und Einfache Sprache Miscellaneous

Schiffler, Inga; Baier, Elke; Kölln, Marco; Schneider, Nadine; Haarkamp, Angelika (Ed.): , 2024.
Dieser Text ist eine Zusammenfassung in Leichter Sprache

@miscellaneous{jablotschkin_wie_2024,
title = {Wie ist der Anfang von S{\"a}tzen in Leichter Sprache? Ergebnisse der Studie LeiKo -- Ein Vergleichs-Korpus f{\"u}r Leichte Sprache und Einfache Sprache},
author = {Sarah Jablotschkin and Heike Zinsmeister},
editor = {Inga Schiffler and Elke Baier and Marco K{\"o}lln and Nadine Schneider and Angelika Haarkamp},
url = {https://www.fdr.uni-hamburg.de/record/14827},
doi = {https://doi.org/10.25592/UHHFDM.14827},
year = {2024},
date = {2024},
abstract = {

Dieser Text ist eine Zusammenfassung in Leichter Sprache
},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   T1

Lemke, Tyll Robin

Acceptability, predictability and processing of antecedent-target mismatches under verb phrase ellipsis Journal Article

Glossa Psycholinguistics, 3 (1), 2024.

Deletion-based accounts of verb phrase ellipsis (VPE) predict that this construction requires a syntactically identical antecedent, but previous research shows that some antecedent-target mismatches are perceived as relatively acceptable in experiments (see e.g. Arregui et al., 2006; Miller & Hemforth, 2014). So far, the acceptability of these mismatches has been explained mostly by licensing conditions on VPE or by ellipsis-specific processing mechanisms. This article explores to what extent the acceptability of mismatches follows from the more general principles of an information-theoretic account of language use, which has been independently evidenced for other omission phenomena: To avoid under- or overutilizing the hearer’s processing resources, predictable VPs are more likely to be omitted, whereas unpredictable ones are more likely to be realized. This hypothesis is tested with three experiments that investigate a gradual acceptability cline between VPE mismatches which has been reported by Arregui et al. (2006). First, an acceptability rating study replicates the overall pattern found by Arregui et al. (2006) and confirms that the effect is specific to ellipsis. Second, a production task shows that the acceptability differences are indeed related to a gradual decrease in the predictability of the target VP, which is also reflected in the likelihood of participants producing VPE. Finally, a self-paced reading experiment shows that VPE is more acceptable when it is easier to process. Overall, the experimental results support the information-theoretic account and suggest that no specific syntactic constraints or reconstruction mechanisms might be required to account for the acceptability cline observed for the mismatches investigated.

@article{Lemke_2024,
title = {Acceptability, predictability and processing of antecedent-target mismatches under verb phrase ellipsis},
author = {Tyll Robin Lemke},
url = {https://escholarship.org/uc/item/36q4c5mf},
doi = {https://doi.org/10.5070/G6011237},
year = {2024},
date = {2024},
journal = {Glossa Psycholinguistics},
volume = {3 (1)},
number = {1},
abstract = {Deletion-based accounts of verb phrase ellipsis (VPE) predict that this construction requires a syntactically identical antecedent, but previous research shows that some antecedent-target mismatches are perceived as relatively acceptable in experiments (see e.g. Arregui et al., 2006; Miller & Hemforth, 2014). So far, the acceptability of these mismatches has been explained mostly by licensing conditions on VPE or by ellipsis-specific processing mechanisms. This article explores to what extent the acceptability of mismatches follows from the more general principles of an information-theoretic account of language use, which has been independently evidenced for other omission phenomena: To avoid under- or overutilizing the hearer’s processing resources, predictable VPs are more likely to be omitted, whereas unpredictable ones are more likely to be realized. This hypothesis is tested with three experiments that investigate a gradual acceptability cline between VPE mismatches which has been reported by Arregui et al. (2006). First, an acceptability rating study replicates the overall pattern found by Arregui et al. (2006) and confirms that the effect is specific to ellipsis. Second, a production task shows that the acceptability differences are indeed related to a gradual decrease in the predictability of the target VP, which is also reflected in the likelihood of participants producing VPE. Finally, a self-paced reading experiment shows that VPE is more acceptable when it is easier to process. Overall, the experimental results support the information-theoretic account and suggest that no specific syntactic constraints or reconstruction mechanisms might be required to account for the acceptability cline observed for the mismatches investigated.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Zaitova, Iuliia; Stenger, Irina; Xue, Wei; Avgustinova, Tania; Möbius, Bernd; Klakow, Dietrich

Cross-Linguistic Intelligibility of Non-Compositional Expressions in Spoken Context Inproceedings

Proceedings of Interspeech 2024, ISCA, pp. 4189-4193, Kos, Greece, 2024.

This study investigates intelligibility of non-compositional expressions in spoken context for five closely related Slavic languages (Belarusian, Bulgarian, Czech, Polish, and Ukrainian) by native Russian speakers. Our investigation employs a web-based experiment involving free-response and multiple-choice translation tasks. Drawing on prior research, two factors were examined: (1) linguistic similarities (orthographic and phonological distances), and (2) surprisal scores obtained from two multilingual speech representation (SR) models fine-tuned for Russian (Wav2Vec2-Large-Ru-Golos-With-LM and Whisper Medium Russian).
According to the results of Pearson correlation and regression analyses, phonological distance appears to be a better predictor of intelligibility scores than SR surprisal.

@inproceedings{Zaitova/etal:2024a,
title = {Cross-Linguistic Intelligibility of Non-Compositional Expressions in Spoken Context},
author = {Iuliia Zaitova and Irina Stenger and Wei Xue and Tania Avgustinova and Bernd M{\"o}bius and Dietrich Klakow},
url = {https://www.isca-archive.org/interspeech_2024/zaitova24_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2024-416},
year = {2024},
date = {2024},
booktitle = {Proceedings of Interspeech 2024},
pages = {4189-4193},
publisher = {ISCA},
address = {Kos, Greece},
abstract = {This study investigates intelligibility of non-compositional expressions in spoken context for five closely related Slavic languages (Belarusian, Bulgarian, Czech, Polish, and Ukrainian) by native Russian speakers. Our investigation employs a web-based experiment involving free-response and multiple-choice translation tasks. Drawing on prior research, two factors were examined: (1) linguistic similarities (orthographic and phonological distances), and (2) surprisal scores obtained from two multilingual speech representation (SR) models fine-tuned for Russian (Wav2Vec2-Large-Ru-Golos-With-LM and Whisper Medium Russian). According to the results of Pearson correlation and regression analyses, phonological distance appears to be a better predictor of intelligibility scores than SR surprisal.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Zaitova, Iuliia; Stenger, Irina; Avgustinova, Tania

Cross-Linguistic Processing of Non-Compositional Expressions in Slavic Languages Inproceedings

Zock, Michael; Chersoni, Emmanuele; Hsu, Yu-Yin; de Deyne, Simon (Ed.): Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024, ELRA and ICCL, pp. 86-97, Torino, Italia, 2024.

This study focuses on evaluating and predicting the intelligibility of non-compositional expressions within the context of five closely related Slavic languages: Belarusian, Bulgarian, Czech, Polish, and Ukrainian, as perceived by native speakers of Russian. Our investigation employs a web-based experiment where native Russian respondents take part in free-response and multiple-choice translation tasks. Based on the previous studies in mutual intelligibility and non-compositionality, we propose two predictive factors for reading comprehension of unknown but closely related languages: 1) linguistic distances, which include orthographic and phonological distances; 2) surprisal scores obtained from monolingual Language Models (LMs). Our primary objective is to explore the relationship of these two factors with the intelligibility scores and response times of our web-based experiment. Our findings reveal that, while intelligibility scores from the experimental tasks exhibit a stronger correlation with phonological distances, LM surprisal scores appear to be better predictors of the time participants invest in completing the translation tasks.

@inproceedings{zaitova-etal-2024-cross,
title = {Cross-Linguistic Processing of Non-Compositional Expressions in Slavic Languages},
author = {Iuliia Zaitova and Irina Stenger and Tania Avgustinova},
editor = {Michael Zock and Emmanuele Chersoni and Yu-Yin Hsu and Simon de Deyne},
url = {https://aclanthology.org/2024.cogalex-1.10/},
year = {2024},
date = {2024},
booktitle = {Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024},
pages = {86-97},
publisher = {ELRA and ICCL},
address = {Torino, Italia},
abstract = {This study focuses on evaluating and predicting the intelligibility of non-compositional expressions within the context of five closely related Slavic languages: Belarusian, Bulgarian, Czech, Polish, and Ukrainian, as perceived by native speakers of Russian. Our investigation employs a web-based experiment where native Russian respondents take part in free-response and multiple-choice translation tasks. Based on the previous studies in mutual intelligibility and non-compositionality, we propose two predictive factors for reading comprehension of unknown but closely related languages: 1) linguistic distances, which include orthographic and phonological distances; 2) surprisal scores obtained from monolingual Language Models (LMs). Our primary objective is to explore the relationship of these two factors with the intelligibility scores and response times of our web-based experiment. Our findings reveal that, while intelligibility scores from the experimental tasks exhibit a stronger correlation with phonological distances, LM surprisal scores appear to be better predictors of the time participants invest in completing the translation tasks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Xue, Wei; Yuen, Ivan; Möbius, Bernd

Towards a better understanding of receptive multilingualism: listening conditions and priming effects Inproceedings

Proceedings of Interspeech 2024, ISCA, pp. 12-16, Kos, Greece, 2024.

Receptive multilingualism is a form of communication where speakers can comprehend an utterance of a foreign language (Lx) using their native language (L1) when L1 and Lx share similarities in, e.g., vocabulary and pronunciation. The success of receptive multilingualism can be tested by examining accuracy and reaction time of auditory word recognition (AWR) of target words in lexical decision tasks. AWR in such tasks can be affected by adverse listening conditions due to environmental noises and by the presence of a preceding prime word. This study explores whether AWR of L1 in Lx-L1 pairs (Lx = Dutch; L1 = German or English) will be affected by different degrees of similarities in their phonology and semantics and whether such an influence will differ as a function of listening condition. We observed less accurate and slower responses without semantic similarity but a null effect on accuracy without phonological overlap. The interaction with listening conditions is language-dependent.

@inproceedings{Xue/etal:2024a,
title = {Towards a better understanding of receptive multilingualism: listening conditions and priming effects},
author = {Wei Xue and Ivan Yuen and Bernd M{\"o}bius},
url = {https://www.isca-archive.org/interspeech_2024/xue24_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2024-418},
year = {2024},
date = {2024},
booktitle = {Proceedings of Interspeech 2024},
pages = {12-16},
publisher = {ISCA},
address = {Kos, Greece},
abstract = {Receptive multilingualism is a form of communication where speakers can comprehend an utterance of a foreign language (Lx) using their native language (L1) when L1 and Lx share similarities in, e.g., vocabulary and pronunciation. The success of receptive multilingualism can be tested by examining accuracy and reaction time of auditory word recognition (AWR) of target words in lexical decision tasks. AWR in such tasks can be affected by adverse listening conditions due to environmental noises and by the presence of a preceding prime word. This study explores whether AWR of L1 in Lx-L1 pairs (Lx = Dutch; L1 = German or English) will be affected by different degrees of similarities in their phonology and semantics and whether such an influence will differ as a function of listening condition. We observed less accurate and slower responses without semantic similarity but a null effect on accuracy without phonological overlap. The interaction with listening conditions is language-dependent.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C1 C4

Bourgonje, Peter; Demberg, Vera

Generalizing across Languages and Domains for Discourse Relation Classification Inproceedings

Kawahara, Tatsuya; Demberg, Vera; Ultes, Stefan; Inoue, Koji; Mehri, Shikib; Howcroft, David; Komatani, Kazunori (Ed.): Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Association for Computational Linguistics, pp. 554-565, Kyoto, Japan, 2024.

The availability of corpora annotated for discourse relations is limited and discourse relation classification performance varies greatly depending on both language and domain. This is a problem for downstream applications that are intended for a language (i.e., not English) or a domain (i.e., not financial news) with comparatively low coverage for discourse annotations. In this paper, we experiment with a state-of-the-art model for discourse relation classification, originally developed for English, extend it to a multi-lingual setting (testing on Italian, Portuguese and Turkish), and employ a simple, yet effective method to mark out-of-domain training instances. By doing so, we aim to contribute to better generalization and more robust discourse relation classification performance across both language and domain.

@inproceedings{bourgonje-demberg-2024-generalizing,
title = {Generalizing across Languages and Domains for Discourse Relation Classification},
author = {Peter Bourgonje and Vera Demberg},
editor = {Tatsuya Kawahara and Vera Demberg and Stefan Ultes and Koji Inoue and Shikib Mehri and David Howcroft and Kazunori Komatani},
url = {https://aclanthology.org/2024.sigdial-1.47/},
doi = {https://doi.org/10.18653/v1/2024.sigdial-1.47},
year = {2024},
date = {2024},
booktitle = {Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue},
pages = {554-565},
publisher = {Association for Computational Linguistics},
address = {Kyoto, Japan},
abstract = {The availability of corpora annotated for discourse relations is limited and discourse relation classification performance varies greatly depending on both language and domain. This is a problem for downstream applications that are intended for a language (i.e., not English) or a domain (i.e., not financial news) with comparatively low coverage for discourse annotations. In this paper, we experiment with a state-of-the-art model for discourse relation classification, originally developed for English, extend it to a multi-lingual setting (testing on Italian, Portuguese and Turkish), and employ a simple, yet effective method to mark out-of-domain training instances. By doing so, we aim to contribute to better generalization and more robust discourse relation classification performance across both language and domain.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Sommerfeld, Linda

Predictive language processing in the complex visual world in children and adults PhD Thesis

Saarländische Universitäts- und Landesbibliothek, Saarland University, Saarbruecken, Germany, 2024.

Given the sentence “On sunny days, Rose rides to work with her …”, it is likely that you predict the word “bicycle” before reading it. Notably, not only adults, but even children from an early age predict language, which is seen as one reason of why language comprehension is remarkably fast and accurate. In everyday-life language is usually received in visual contexts which can influence what a comprehender predicts. Imagine processing the above sentence while looking at the picture of a bicycle. This could make you even more likely to predict the noun “bicycle”. Thus, prediction research often applies the Visual World Paradigm. Here, participants listen to predictable sentences like the above while looking at visual scenes that show one visual prediction option that is (e.g., bicycle) and one distractor object that is not (e.g., cake) consistent with the predictive sentence context. When participants show an increase in fixations to the visual prediction option after the predictive cue was played (e.g., “ride”), but prior to the target noun, this indexes prediction. Cognitive models argue that visually situated prediction involves two mechanisms. Predictive linguistic cues (e.g., the semantically constraining verb “ride”) cause the pre-activation of the mental representations of prediction options such as “bicycle” in long-term memory. If a visual context allows to commit to a prediction option, this option is pre-updated (i.e., pre-processed) in working memory. Given this, individual differences in verbal and cognitive abilities could influence visually situated prediction. That is, language experience could determine which long-term memory representations can be pre-activated, while working memory capacity could affect the ability to pre-update prediction options. Since children have smaller language experience and working memory capacity than adults, we used a developmental approach and compared children and adults in their prediction behavior in the visual world to test the above model assumptions. First, we compared children and adults in their ability to make multiple predictions in parallel. With the Visual World Paradigm, adults have already been shown to rely on visual contexts to make multiple predictions: When hearing the sentence (“Rose rides to work with her …”) while looking at multiple “ridable” objects, adults have been shown to predict up to four sentence continuations in parallel. We examined whether also children can follow a multiple predictions pattern, or whether their limited language experience and cognitive capacity prevent them from doing so. Besides, since working memory engages more mental resources when more stimuli are processed, we examined whether children and adults show an increase in cognitive load to pre-update multiple versus only single prediction options in working memory. We examined whether this effect is more prominent in children given their smaller cognitive capacity. We finally investigated whether processing load of a predictable target word (e.g., “bicycle”) is smaller when that word was pre-updated alone or among multiple competitors. In Chapter 1 we outline the theoretical background of this work. This is followed by an empirical section that addresses the above questions. We conducted two studies in which children and adults were presented with sentences with semantically constraining verbs and predictable target nouns (e.g., “The father eats the waffle”) in visual scenes of four object pictures each. Across four conditions, the scenes varied in predictability: Either 0, 1, 3, or 4 visual objects were consistent with the verb constraints and thus viewed as visual prediction options. Chapter 2 shows a pretest of the sentences and the scenes with young children (4–6 years). Experiment 1 was an eye-tracking study in which children (5–6) and adults listened to the sentences while looking at the visual scenes. In Chapter 3, we used their anticipatory object fixations as an index of prediction behavior. Chapter 4 presents data collected in the same study. Here, the Index of Cognitive Activity (ICA) and pupil sizes were used as a measure of cognitive load engaged in sentence processing in the different visual conditions. Chapter 5 presents Experiment 2, where literate children (8–12 years) and adults were presented with the same sentences and scenes in a self-paced reading task. They read the sentences word-by-word while inspecting the scenes. We relied on word processing times as an index of cognitive load. Their anticipatory object fixations (Experiment 1) showed that children and adults followed a multiple predictions pattern. For children, this ability was positively related to their language experience, supporting the view that prediction involves the pre-activation of mental representations in long-term memory. We found no consistent evidence of whether children and adults engaged higher cognitive load to make multiple predictions. Both age groups’ ICA and pupil size values did not (Experiment 1) but their word processing times did (Experiment 2) suggest additional processing costs for multiple predictions. The latter result is in line with the view that prediction involves the pre-updating of input in the cognitive system. Finally, both studies found children and adults to engage less processing load for target nouns that could be pre-updated alone versus among multiple competitors. In sum, we provide indication that visual contexts can influence the ease of (predictive) language processing, which is discussed beyond cognitive perspectives of prediction in Chapter 6. Here, we also consider which questions about predictive language processing still remain open, in particular for children.


Stellen Sie sich folgenden Satz vor: „Um an sonnigen Tagen zur Arbeit zu kommen, fährt Rosa mit ihrem …“. Vermutlich haben Sie das Wort „Fahrrad“ antizipiert, ohne es gelesen zu haben. Dies wird prädiktive Sprachverarbeitung genannt und als ein Grund für die enorme Genauigkeit und Geschwindigkeit des Sprachverständnisses gesehen. Bemerkenswerterweise weisen nicht nur Erwachsene, sondern auch Kinder, die Fähigkeit zur sprachlichen Vorhersage auf. Im Alltag wird Sprache oft in visuellen Kontexten rezipiert, welche die Vorhersage beeinflussen. Stellen Sie sich vor, Sie hören obigen Satz, während Sie das Bild eines Fahrrades betrachten. Dies könnte die Wahrscheinlichkeit erhöhen, dass Sie das Wort „Fahrrad“ vorhersagen. Empirische Studien zur sprachlichen Vorhersage nutzen daher häufig das Visual World Paradigma. Hier hören Versuchspersonen vorhersagbare Sätze, wie den obigen, während sie visuelle Szenen betrachten. Diese zeigen typischerweise eine visuelle Vorhersageoption (z.B. das Bild eines Fahrrades) und ein weiteres Objekt, das inkonsistent mit dem prädiktiven Satzkontext ist (z.B. das Bild eines Kuchens). Dieses Paradigma weist sprachliche Vorhersage nach, wenn Versuchspersonen bereits nach dem prädiktive Hinweisreiz (z.B. „fahren“) und vor dem Zielwort (z.B. „Fahrrad“) einen Anstieg an Fixationen der visuellen Vorhersageoption im Vergleich zum inkonsistenten Objekt zeigen. Kognitive Modelle postulieren, dass zwei Mechanismen an der Vorhersage im visuellen Kontext beteiligt sind. Prädiktive sprachliche Hinweisreize (z.B. das Verb „fahren“) erwirken die Voraktivierung von Vorhersageoptionen (z.B. Fortbewegungsmitteln) im Langzeitgedächtnis. Wenn zudem eine visuelle Vorhersageoption verfügbar ist (z.B. das Bild eines Fahrrades), wird diese Option im Arbeitsgedächtnis vorverarbeitet. Infolgedessen könnten verbale und kognitive Fähigkeiten die sprachliche Vorhersage im visuellen Kontext beeinflussen. So könnte die Spracherfahrung bestimmen, welche Informationen im Langzeitgedächtnis voraktiviert werden können. Die Arbeitsgedächtniskapazität hingegen könnte die Fähigkeit zur Vorverarbeitung von Vorhersageoptionen beeinflussen. Da Kinder im Vergleich zu Erwachsenen über eine geringere Spracherfahrung sowie Kapazität des Arbeitsgedächtnisses verfügen, nutzte diese Arbeit einen entwicklungspsychologischen Ansatz, um obige Annahmen zur sprachlichen Vorhersage zu prüfen. Zunächst wurden Kinder und Erwachsene in ihrer Fähigkeit verglichen, mehrere Vorhersagen gleichzeitig zu treffen. Mit dem Visual World Paradigma wurde bereits gezeigt, dass Erwachsene visuelle Kontexte nutzen, um mehrere Vorhersagen zu treffen: Erwachsene, die obigen Beispielsatz hören und gleichzeitig mehrere „fahrbare“ Objekte betrachten, konnten nachweislich bis zu vier potentielle Zielwörter gleichzeitig vorhersagen. Diese Arbeit untersucht, ob auch Kinder mehrere Vorhersagen gleichzeig treffen oder ob ihre geringe Spracherfahrung und kognitive Kapazität ein solches Muster der Vorhersage einschränken. Weiterhin wird geprüft, ob Kinder und Erwachsene eine höhere kognitive Belastung zeigen, wenn sie mehrere, statt nur einer Vorhersageoption, vorverarbeiten. Dies wäre plausibel, da das Arbeitsgedächtnis in der Regel mehr mentale Ressourcen beansprucht, wenn es mehr Informationen verarbeitet. Zudem wird untersucht, ob dieser Effekt bei Kindern aufgrund ihrer geringen kognitiven Kapazität stärker ausgeprägt ist als bei Erwachsenen. Zuletzt wird ermittelt, ob mehr mentale Ressourcen zur Verarbeitung eines Zielwortes benötigt werden, wenn dieses Wort mit weiteren Vorhersageoptionen (statt als einzige Option) vorverarbeitet wurde. Kapitel 1 präsentiert den theoretischen Hintergrund dieser Arbeit. Es folgt ein empirischer Teil, in dem obige Fragen adressiert werden. Dieser umfasst zwei Studien, in denen Kindern und Erwachsenen Sätze mit prädiktiven Verben und Zielwörtern gezeigt wurden (z.B. „Der Vater isst die Waffel“). Die Sätze wurden zusammen mit visuellen Szenen präsentiert, die jeweils vier Bilder von Objekten zeigten. Die Szenen variierten in ihrer Vorhersagbarkeit: Basierend auf dem prädiktiven Verb stellten 0, 1, 3 oder 4 der Objekte eine visuelle Vorhersageoption dar. Kapitel 2 zeigt eine Studie, in der die Sätze und Szenen mit Kindern (4–6 Jahre) normiert wurden. Experiment 1 war eine Eye-Tracking Studie, in der Kinder (5–6 Jahre) und Erwachsene die Szenen betrachteten, während ihnen die Sätze vorgespielt wurden. In Kapitel 3 wurden die Objektfixationen der Versuchspersonen als Index für das Vorhersageverhalten verwendet. Kapitel 4 präsentiert Daten, die in derselben Studie erhoben wurden. Hier wurde die Pupillengröße sowie der Index of Cognitive Activity (ICA) als Maß für die kognitive Belastung der Satzverarbeitung in den verschiedenen visuellen Konditionen verwendet. Kapitel 5 präsentiert Experiment 2. Hier wurden Kindern (8–12 Jahre) und Erwachsenen dieselben Sätze und Szenen präsentiert, jedoch wurden die Sätze auf dem Bildschirm innerhalb der Szenen gezeigt und Wort für Wort gelesen. Die Wortverarbeitungszeit wurde als Maß für die kognitive Belastung gewertet. Anhand der Objektfixationen zeigte Experiment 1, dass beide Altersgruppen mehrere Vorhersagen gleichzeitig trafen. Bei Kindern stand diese Fähigkeit in positiver Relation zu ihrer Spracherfahrung. Wir fanden keine konsistente Evidenz, dass Kinder und Erwachsene eine höhere kognitive Belastung zeigen, wenn sie mehrere Vorhersagen gleichzeitig treffen. Dieser Effekt wurde durch die Wortverarbeitungszeiten beider Altersgruppen nachgewiesen (Experiment 2), nicht jedoch durch ihre Pupillengrößen und ICA-Daten (Experiment 1). In beiden Studien zeigten Kinder und Erwachsene eine höhere kognitive Belastung bei der Verarbeitung von Zielwörtern, die mit mehreren Vorhersageoptionen (statt als einzige Option) antizipiert wurden. Insgesamt zeigen die Ergebnisse dieser Arbeit, dass visuelle Kontexte einen Einfluss auf die prädiktive Sprachverarbeitung und ihre Leichtigkeit haben können. Dies wird in Kapitel 6 vor dem Hintergrund kognitiver Modelle der Vorhersage diskutiert. Hier werden zudem offene Fragen zur sprachlichen Vorhersage, insbesondere bei Kindern, thematisiert.

@phdthesis{Sommerfeld_Diss,
title = {Predictive language processing in the complex visual world in children and adults},
author = {Linda Sommerfeld},
url = {https://jahrbib.sulb.uni-saarland.de/handle/20.500.11880/37808},
doi = {https://doi.org/10.22028/D291-42078},
year = {2024},
date = {2024},
school = {Saarland University},
publisher = {Saarl{\"a}ndische Universit{\"a}ts- und Landesbibliothek},
address = {Saarbruecken, Germany},
abstract = {Given the sentence “On sunny days, Rose rides to work with her …”, it is likely that you predict the word “bicycle” before reading it. Notably, not only adults, but even children from an early age predict language, which is seen as one reason of why language comprehension is remarkably fast and accurate. In everyday-life language is usually received in visual contexts which can influence what a comprehender predicts. Imagine processing the above sentence while looking at the picture of a bicycle. This could make you even more likely to predict the noun “bicycle”. Thus, prediction research often applies the Visual World Paradigm. Here, participants listen to predictable sentences like the above while looking at visual scenes that show one visual prediction option that is (e.g., bicycle) and one distractor object that is not (e.g., cake) consistent with the predictive sentence context. When participants show an increase in fixations to the visual prediction option after the predictive cue was played (e.g., “ride”), but prior to the target noun, this indexes prediction. Cognitive models argue that visually situated prediction involves two mechanisms. Predictive linguistic cues (e.g., the semantically constraining verb “ride”) cause the pre-activation of the mental representations of prediction options such as “bicycle” in long-term memory. If a visual context allows to commit to a prediction option, this option is pre-updated (i.e., pre-processed) in working memory. Given this, individual differences in verbal and cognitive abilities could influence visually situated prediction. That is, language experience could determine which long-term memory representations can be pre-activated, while working memory capacity could affect the ability to pre-update prediction options. Since children have smaller language experience and working memory capacity than adults, we used a developmental approach and compared children and adults in their prediction behavior in the visual world to test the above model assumptions. First, we compared children and adults in their ability to make multiple predictions in parallel. With the Visual World Paradigm, adults have already been shown to rely on visual contexts to make multiple predictions: When hearing the sentence (“Rose rides to work with her …”) while looking at multiple “ridable” objects, adults have been shown to predict up to four sentence continuations in parallel. We examined whether also children can follow a multiple predictions pattern, or whether their limited language experience and cognitive capacity prevent them from doing so. Besides, since working memory engages more mental resources when more stimuli are processed, we examined whether children and adults show an increase in cognitive load to pre-update multiple versus only single prediction options in working memory. We examined whether this effect is more prominent in children given their smaller cognitive capacity. We finally investigated whether processing load of a predictable target word (e.g., “bicycle”) is smaller when that word was pre-updated alone or among multiple competitors. In Chapter 1 we outline the theoretical background of this work. This is followed by an empirical section that addresses the above questions. We conducted two studies in which children and adults were presented with sentences with semantically constraining verbs and predictable target nouns (e.g., “The father eats the waffle”) in visual scenes of four object pictures each. Across four conditions, the scenes varied in predictability: Either 0, 1, 3, or 4 visual objects were consistent with the verb constraints and thus viewed as visual prediction options. Chapter 2 shows a pretest of the sentences and the scenes with young children (4–6 years). Experiment 1 was an eye-tracking study in which children (5–6) and adults listened to the sentences while looking at the visual scenes. In Chapter 3, we used their anticipatory object fixations as an index of prediction behavior. Chapter 4 presents data collected in the same study. Here, the Index of Cognitive Activity (ICA) and pupil sizes were used as a measure of cognitive load engaged in sentence processing in the different visual conditions. Chapter 5 presents Experiment 2, where literate children (8–12 years) and adults were presented with the same sentences and scenes in a self-paced reading task. They read the sentences word-by-word while inspecting the scenes. We relied on word processing times as an index of cognitive load. Their anticipatory object fixations (Experiment 1) showed that children and adults followed a multiple predictions pattern. For children, this ability was positively related to their language experience, supporting the view that prediction involves the pre-activation of mental representations in long-term memory. We found no consistent evidence of whether children and adults engaged higher cognitive load to make multiple predictions. Both age groups’ ICA and pupil size values did not (Experiment 1) but their word processing times did (Experiment 2) suggest additional processing costs for multiple predictions. The latter result is in line with the view that prediction involves the pre-updating of input in the cognitive system. Finally, both studies found children and adults to engage less processing load for target nouns that could be pre-updated alone versus among multiple competitors. In sum, we provide indication that visual contexts can influence the ease of (predictive) language processing, which is discussed beyond cognitive perspectives of prediction in Chapter 6. Here, we also consider which questions about predictive language processing still remain open, in particular for children.


Stellen Sie sich folgenden Satz vor: „Um an sonnigen Tagen zur Arbeit zu kommen, f{\"a}hrt Rosa mit ihrem ...“. Vermutlich haben Sie das Wort „Fahrrad“ antizipiert, ohne es gelesen zu haben. Dies wird pr{\"a}diktive Sprachverarbeitung genannt und als ein Grund f{\"u}r die enorme Genauigkeit und Geschwindigkeit des Sprachverst{\"a}ndnisses gesehen. Bemerkenswerterweise weisen nicht nur Erwachsene, sondern auch Kinder, die F{\"a}higkeit zur sprachlichen Vorhersage auf. Im Alltag wird Sprache oft in visuellen Kontexten rezipiert, welche die Vorhersage beeinflussen. Stellen Sie sich vor, Sie h{\"o}ren obigen Satz, w{\"a}hrend Sie das Bild eines Fahrrades betrachten. Dies k{\"o}nnte die Wahrscheinlichkeit erh{\"o}hen, dass Sie das Wort „Fahrrad“ vorhersagen. Empirische Studien zur sprachlichen Vorhersage nutzen daher h{\"a}ufig das Visual World Paradigma. Hier h{\"o}ren Versuchspersonen vorhersagbare S{\"a}tze, wie den obigen, w{\"a}hrend sie visuelle Szenen betrachten. Diese zeigen typischerweise eine visuelle Vorhersageoption (z.B. das Bild eines Fahrrades) und ein weiteres Objekt, das inkonsistent mit dem pr{\"a}diktiven Satzkontext ist (z.B. das Bild eines Kuchens). Dieses Paradigma weist sprachliche Vorhersage nach, wenn Versuchspersonen bereits nach dem pr{\"a}diktive Hinweisreiz (z.B. „fahren“) und vor dem Zielwort (z.B. „Fahrrad“) einen Anstieg an Fixationen der visuellen Vorhersageoption im Vergleich zum inkonsistenten Objekt zeigen. Kognitive Modelle postulieren, dass zwei Mechanismen an der Vorhersage im visuellen Kontext beteiligt sind. Pr{\"a}diktive sprachliche Hinweisreize (z.B. das Verb „fahren“) erwirken die Voraktivierung von Vorhersageoptionen (z.B. Fortbewegungsmitteln) im Langzeitged{\"a}chtnis. Wenn zudem eine visuelle Vorhersageoption verf{\"u}gbar ist (z.B. das Bild eines Fahrrades), wird diese Option im Arbeitsged{\"a}chtnis vorverarbeitet. Infolgedessen k{\"o}nnten verbale und kognitive F{\"a}higkeiten die sprachliche Vorhersage im visuellen Kontext beeinflussen. So k{\"o}nnte die Spracherfahrung bestimmen, welche Informationen im Langzeitged{\"a}chtnis voraktiviert werden k{\"o}nnen. Die Arbeitsged{\"a}chtniskapazit{\"a}t hingegen k{\"o}nnte die F{\"a}higkeit zur Vorverarbeitung von Vorhersageoptionen beeinflussen. Da Kinder im Vergleich zu Erwachsenen {\"u}ber eine geringere Spracherfahrung sowie Kapazit{\"a}t des Arbeitsged{\"a}chtnisses verf{\"u}gen, nutzte diese Arbeit einen entwicklungspsychologischen Ansatz, um obige Annahmen zur sprachlichen Vorhersage zu pr{\"u}fen. Zun{\"a}chst wurden Kinder und Erwachsene in ihrer F{\"a}higkeit verglichen, mehrere Vorhersagen gleichzeitig zu treffen. Mit dem Visual World Paradigma wurde bereits gezeigt, dass Erwachsene visuelle Kontexte nutzen, um mehrere Vorhersagen zu treffen: Erwachsene, die obigen Beispielsatz h{\"o}ren und gleichzeitig mehrere „fahrbare“ Objekte betrachten, konnten nachweislich bis zu vier potentielle Zielw{\"o}rter gleichzeitig vorhersagen. Diese Arbeit untersucht, ob auch Kinder mehrere Vorhersagen gleichzeig treffen oder ob ihre geringe Spracherfahrung und kognitive Kapazit{\"a}t ein solches Muster der Vorhersage einschr{\"a}nken. Weiterhin wird gepr{\"u}ft, ob Kinder und Erwachsene eine h{\"o}here kognitive Belastung zeigen, wenn sie mehrere, statt nur einer Vorhersageoption, vorverarbeiten. Dies w{\"a}re plausibel, da das Arbeitsged{\"a}chtnis in der Regel mehr mentale Ressourcen beansprucht, wenn es mehr Informationen verarbeitet. Zudem wird untersucht, ob dieser Effekt bei Kindern aufgrund ihrer geringen kognitiven Kapazit{\"a}t st{\"a}rker ausgepr{\"a}gt ist als bei Erwachsenen. Zuletzt wird ermittelt, ob mehr mentale Ressourcen zur Verarbeitung eines Zielwortes ben{\"o}tigt werden, wenn dieses Wort mit weiteren Vorhersageoptionen (statt als einzige Option) vorverarbeitet wurde. Kapitel 1 pr{\"a}sentiert den theoretischen Hintergrund dieser Arbeit. Es folgt ein empirischer Teil, in dem obige Fragen adressiert werden. Dieser umfasst zwei Studien, in denen Kindern und Erwachsenen S{\"a}tze mit pr{\"a}diktiven Verben und Zielw{\"o}rtern gezeigt wurden (z.B. „Der Vater isst die Waffel“). Die S{\"a}tze wurden zusammen mit visuellen Szenen pr{\"a}sentiert, die jeweils vier Bilder von Objekten zeigten. Die Szenen variierten in ihrer Vorhersagbarkeit: Basierend auf dem pr{\"a}diktiven Verb stellten 0, 1, 3 oder 4 der Objekte eine visuelle Vorhersageoption dar. Kapitel 2 zeigt eine Studie, in der die S{\"a}tze und Szenen mit Kindern (4–6 Jahre) normiert wurden. Experiment 1 war eine Eye-Tracking Studie, in der Kinder (5–6 Jahre) und Erwachsene die Szenen betrachteten, w{\"a}hrend ihnen die S{\"a}tze vorgespielt wurden. In Kapitel 3 wurden die Objektfixationen der Versuchspersonen als Index f{\"u}r das Vorhersageverhalten verwendet. Kapitel 4 pr{\"a}sentiert Daten, die in derselben Studie erhoben wurden. Hier wurde die Pupillengr{\"o}{\ss}e sowie der Index of Cognitive Activity (ICA) als Ma{\ss} f{\"u}r die kognitive Belastung der Satzverarbeitung in den verschiedenen visuellen Konditionen verwendet. Kapitel 5 pr{\"a}sentiert Experiment 2. Hier wurden Kindern (8–12 Jahre) und Erwachsenen dieselben S{\"a}tze und Szenen pr{\"a}sentiert, jedoch wurden die S{\"a}tze auf dem Bildschirm innerhalb der Szenen gezeigt und Wort f{\"u}r Wort gelesen. Die Wortverarbeitungszeit wurde als Ma{\ss} f{\"u}r die kognitive Belastung gewertet. Anhand der Objektfixationen zeigte Experiment 1, dass beide Altersgruppen mehrere Vorhersagen gleichzeitig trafen. Bei Kindern stand diese F{\"a}higkeit in positiver Relation zu ihrer Spracherfahrung. Wir fanden keine konsistente Evidenz, dass Kinder und Erwachsene eine h{\"o}here kognitive Belastung zeigen, wenn sie mehrere Vorhersagen gleichzeitig treffen. Dieser Effekt wurde durch die Wortverarbeitungszeiten beider Altersgruppen nachgewiesen (Experiment 2), nicht jedoch durch ihre Pupillengr{\"o}{\ss}en und ICA-Daten (Experiment 1). In beiden Studien zeigten Kinder und Erwachsene eine h{\"o}here kognitive Belastung bei der Verarbeitung von Zielw{\"o}rtern, die mit mehreren Vorhersageoptionen (statt als einzige Option) antizipiert wurden. Insgesamt zeigen die Ergebnisse dieser Arbeit, dass visuelle Kontexte einen Einfluss auf die pr{\"a}diktive Sprachverarbeitung und ihre Leichtigkeit haben k{\"o}nnen. Dies wird in Kapitel 6 vor dem Hintergrund kognitiver Modelle der Vorhersage diskutiert. Hier werden zudem offene Fragen zur sprachlichen Vorhersage, insbesondere bei Kindern, thematisiert.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   A5

Kunilovskaya, Maria; Dutta Chowdhury, Koel; Przybyl, Heike; España-Bonet, Cristina; van Genabith, Josef

Mitigating Translationese with GPT-4: Strategies and Performance Inproceedings

Proceedings of the 25th Annual Conference of the European Association for Machine Translation, 1, European Association for Machine Translation, pp. 411–430, 2024.

Translations differ in systematic ways from texts originally authored in the same language. These differences, collectively known as translationese, can pose challenges in cross-lingual natural language processing: models trained or tested on translated input might struggle when presented with non-translated language.Translationese mitigation can alleviate this problem. This study investigates the generative capacities of GPT-4 to reduce translationese in human-translated texts. The task is framed as a rewriting process aimed
at modified translations indistinguishable from the original text in the target language. Our focus is on prompt engineering that tests the utility of linguistic knowledge as part of the instruction for GPT-4. Through a series of prompt design experiments, we show that GPT4-generated revisions are more similar to originals in the target language when the prompts incorporate specific linguistic instructions instead of relying solely on the model’s internal knowledge. Furthermore, we release the segment-aligned bidirectional German–English data built from the Europarl corpus that underpins this study.

@inproceedings{kunilovskaya-etal-2024-mitigating,
title = {Mitigating Translationese with GPT-4: Strategies and Performance},
author = {Maria Kunilovskaya and Koel Dutta Chowdhury and Heike Przybyl and Cristina Espa{\~n}a-Bonet and Josef van Genabith},
url = {https://eamt2024.github.io/proceedings/vol1.pdf},
year = {2024},
date = {2024},
booktitle = {Proceedings of the 25th Annual Conference of the European Association for Machine Translation},
pages = {411–430},
publisher = {European Association for Machine Translation},
abstract = {Translations differ in systematic ways from texts originally authored in the same language. These differences, collectively known as translationese, can pose challenges in cross-lingual natural language processing: models trained or tested on translated input might struggle when presented with non-translated language.Translationese mitigation can alleviate this problem. This study investigates the generative capacities of GPT-4 to reduce translationese in human-translated texts. The task is framed as a rewriting process aimed at modified translations indistinguishable from the original text in the target language. Our focus is on prompt engineering that tests the utility of linguistic knowledge as part of the instruction for GPT-4. Through a series of prompt design experiments, we show that GPT4-generated revisions are more similar to originals in the target language when the prompts incorporate specific linguistic instructions instead of relying solely on the model’s internal knowledge. Furthermore, we release the segment-aligned bidirectional German–English data built from the Europarl corpus that underpins this study.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B6 B7

Bafna, Niyati; España-Bonet, Cristina; van Genabith, Josef; Sagot, Benoît; Bawden, Rachel

When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages Inproceedings

Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.): Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL, pp. 17544-17556, Torino, Italia, 2024.

Most existing approaches for unsupervised bilingual lexicon induction (BLI) depend on good quality static or contextual embeddings requiring large monolingual corpora for both languages. However, unsupervised BLI is most likely to be useful for low-resource languages (LRLs), where large datasets are not available. Often we are interested in building bilingual resources for LRLs against related high-resource languages (HRLs), resulting in severely imbalanced data settings for BLI. We first show that state-of-the-art BLI methods in the literature exhibit near-zero performance for severely data-imbalanced language pairs, indicating that these settings require more robust techniques. We then present a new method for unsupervised BLI between a related LRL and HRL that only requires inference on a masked language model of the HRL, and demonstrate its effectiveness on truly low-resource languages Bhojpuri and Magahi (with <5M monolingual tokens each), against Hindi. We further present experiments on (mid-resource) Marathi and Nepali to compare approach performances by resource range, and release our resulting lexicons for five low-resource Indic languages: Bhojpuri, Magahi, Awadhi, Braj, and Maithili, against Hindi.

@inproceedings{bafna-etal-2024-cousin-right,
title = {When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages},
author = {Niyati Bafna and Cristina Espa{\~n}a-Bonet and Josef van Genabith and Benoît Sagot and Rachel Bawden},
editor = {Nicoletta Calzolari and Min-Yen Kan and Veronique Hoste and Alessandro Lenci and Sakriani Sakti and Nianwen Xue},
url = {https://aclanthology.org/2024.lrec-main.1526},
year = {2024},
date = {2024},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
pages = {17544-17556},
publisher = {ELRA and ICCL},
address = {Torino, Italia},
abstract = {Most existing approaches for unsupervised bilingual lexicon induction (BLI) depend on good quality static or contextual embeddings requiring large monolingual corpora for both languages. However, unsupervised BLI is most likely to be useful for low-resource languages (LRLs), where large datasets are not available. Often we are interested in building bilingual resources for LRLs against related high-resource languages (HRLs), resulting in severely imbalanced data settings for BLI. We first show that state-of-the-art BLI methods in the literature exhibit near-zero performance for severely data-imbalanced language pairs, indicating that these settings require more robust techniques. We then present a new method for unsupervised BLI between a related LRL and HRL that only requires inference on a masked language model of the HRL, and demonstrate its effectiveness on truly low-resource languages Bhojpuri and Magahi (with <5M monolingual tokens each), against Hindi. We further present experiments on (mid-resource) Marathi and Nepali to compare approach performances by resource range, and release our resulting lexicons for five low-resource Indic languages: Bhojpuri, Magahi, Awadhi, Braj, and Maithili, against Hindi.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Manzoni-Luxenburger, Judith; Andreeva, Bistra; Zahner-Ritter, Katharina

Intonational Patterns under Time Pressure: Phonetic Strategies in Bulgarian Learners of German and English Inproceedings

Proc. Speech Prosody 2024, pp. 369-373, 2024.

Research on the second-language (L2) acquisition of intonation is a growing field but only few studies have (so far) focused on the fine phonetic detail of intonational patterns in the L2. The present study concentrates on the phonetic realization of nuclear intonation contours under time pressure, testing Bulgarian learners in their L2s German and English – two languages in which intonation contours are accommodated differently by native speakers (L1) when little sonorant material is available. In particular, nuclear falling contours (H* L-%) tend to be truncated in L1 German while they are compressed in L1 English. Here we recorded 14 Bulgarian learners in their L2s German and English (within subjects, language order counterbalanced) when producing utterances in a statement context. The target word, a surname placed at the end of the utterance, differed in the available sonorant material (disyllable vs. monosyllables with long and short vowels). Our findings showed that Bulgarian speakers primarily truncate nuclear falling movements ((L+)H* L-%) in both L2s, suggesting transfer irrespective of the target strategy. However, our data show substantial inter- and intra-individual variation which we will discuss, along with factors that might explain this variation.

@inproceedings{manzoniluxenburger24_speechprosody,
title = {Intonational Patterns under Time Pressure: Phonetic Strategies in Bulgarian Learners of German and English},
author = {Judith Manzoni-Luxenburger and Bistra Andreeva and Katharina Zahner-Ritter},
url = {https://www.isca-archive.org/speechprosody_2024/manzoniluxenburger24_speechprosody.html},
doi = {https://doi.org/10.21437/SpeechProsody.2024-75},
year = {2024},
date = {2024},
booktitle = {Proc. Speech Prosody 2024},
pages = {369-373},
abstract = {Research on the second-language (L2) acquisition of intonation is a growing field but only few studies have (so far) focused on the fine phonetic detail of intonational patterns in the L2. The present study concentrates on the phonetic realization of nuclear intonation contours under time pressure, testing Bulgarian learners in their L2s German and English – two languages in which intonation contours are accommodated differently by native speakers (L1) when little sonorant material is available. In particular, nuclear falling contours (H* L-%) tend to be truncated in L1 German while they are compressed in L1 English. Here we recorded 14 Bulgarian learners in their L2s German and English (within subjects, language order counterbalanced) when producing utterances in a statement context. The target word, a surname placed at the end of the utterance, differed in the available sonorant material (disyllable vs. monosyllables with long and short vowels). Our findings showed that Bulgarian speakers primarily truncate nuclear falling movements ((L+)H* L-%) in both L2s, suggesting transfer irrespective of the target strategy. However, our data show substantial inter- and intra-individual variation which we will discuss, along with factors that might explain this variation.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Yuen, Ivan; Andreeva, Bistra; Ibrahim, Omnia; Möbius, Bernd

Differential effects of word frequency and utterance position on the duration of tense and lax vowels in German Inproceedings

Proc. Speech Prosody 2024 (Leiden, The Netherlands), pp. 442-446, Leiden, The Netherlands, 2024.

Acoustic duration is subject to modification from multiple sources, for example, utterance position [13] and predictability such as occurrence frequency at word and syllable levels [e.g., 2, 3, 4]. A study of German radio corpus data showed that these two sources interact to modify syllable duration. On the one hand, the predictability effect can percolate downstream to the segmental level, and this downstream effect is sensitive to phonological contrasts [9]. On the other, [6] showed that utterance-final lengthening is uniformly applied to tense and lax vowels in German. This then raises some questions as to whether the effects of the two sources of durational variation are uniformly applied or sensitive to phonological contrasts. The current study focused on the duration of tense and lax vowels in the stressed syllable of monosyllabic and disyllabic words in utterance-medial and utterance-final positions. Twenty German speakers participated in a question-answer elicitation task. A preliminary analysis of seven speakers showed effects of utterance position and word frequency, as well as interactions with vowel type, suggesting a non-uniform application of durational adjustments contingent on phonological vowel length. Interestingly, the frequency effect affects the duration of lax vowels, but utterance position affects the duration of tense vowels.

@inproceedings{Yuen/etal:2024a,
title = {Differential effects of word frequency and utterance position on the duration of tense and lax vowels in German},
author = {Ivan Yuen and Bistra Andreeva and Omnia Ibrahim and Bernd M{\"o}bius},
url = {https://www.isca-archive.org/speechprosody_2024/yuen24_speechprosody.html},
doi = {https://doi.org/10.21437/SpeechProsody.2024-90},
year = {2024},
date = {2024},
booktitle = {Proc. Speech Prosody 2024 (Leiden, The Netherlands)},
pages = {442-446},
address = {Leiden, The Netherlands},
abstract = {Acoustic duration is subject to modification from multiple sources, for example, utterance position [13] and predictability such as occurrence frequency at word and syllable levels [e.g., 2, 3, 4]. A study of German radio corpus data showed that these two sources interact to modify syllable duration. On the one hand, the predictability effect can percolate downstream to the segmental level, and this downstream effect is sensitive to phonological contrasts [9]. On the other, [6] showed that utterance-final lengthening is uniformly applied to tense and lax vowels in German. This then raises some questions as to whether the effects of the two sources of durational variation are uniformly applied or sensitive to phonological contrasts. The current study focused on the duration of tense and lax vowels in the stressed syllable of monosyllabic and disyllabic words in utterance-medial and utterance-final positions. Twenty German speakers participated in a question-answer elicitation task. A preliminary analysis of seven speakers showed effects of utterance position and word frequency, as well as interactions with vowel type, suggesting a non-uniform application of durational adjustments contingent on phonological vowel length. Interestingly, the frequency effect affects the duration of lax vowels, but utterance position affects the duration of tense vowels.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Chingacham, Anupama; Zhang, Miaoran; Demberg, Vera; Klakow, Dietrich

Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It? Inproceedings

Soni, Nikita; Flek, Lucie; Sharma, Ashish; Yang, Diyi; Hooker, Sara; Andrew Schwartz, H. (Ed.): Proceedings of the 1st Human-Centered Large Language Modeling Workshop, ACL, pp. 1-15, TBD, 2024.

Large Language Models (LLMs) can generate text by transferring style attributes like formality resulting in formal or informal text. However, instructing LLMs to generate text that when spoken, is more intelligible in an acoustically difficult environment, is an under-explored topic. We conduct the first study to evaluate LLMs on a novel task of generating acoustically intelligible paraphrases for better human speech perception in noise. Our experiments in English demonstrated that with standard prompting, LLMs struggle to control the non-textual attribute, i.e., acoustic intelligibility, while efficiently capturing the desired textual attributes like semantic equivalence. To remedy this issue, we propose a simple prompting approach, prompt-and-select, which generates paraphrases by decoupling the desired textual and non-textual attributes in the text generation pipeline. Our approach resulted in a 40% relative improvement in human speech perception, by paraphrasing utterances that are highly distorted in a listening condition with babble noise at signal-to-noise ratio (SNR) -5 dB. This study reveals the limitation of LLMs in capturing non-textual attributes, and our proposed method showcases the potential of using LLMs for better human speech perception in noise.

@inproceedings{chingacham-etal-2024-human,
title = {Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It?},
author = {Anupama Chingacham and Miaoran Zhang and Vera Demberg and Dietrich Klakow},
editor = {Nikita Soni and Lucie Flek and Ashish Sharma and Diyi Yang and Sara Hooker and H. Andrew Schwartz},
url = {https://aclanthology.org/2024.hucllm-1.1},
year = {2024},
date = {2024},
booktitle = {Proceedings of the 1st Human-Centered Large Language Modeling Workshop},
pages = {1-15},
publisher = {ACL},
address = {TBD},
abstract = {Large Language Models (LLMs) can generate text by transferring style attributes like formality resulting in formal or informal text. However, instructing LLMs to generate text that when spoken, is more intelligible in an acoustically difficult environment, is an under-explored topic. We conduct the first study to evaluate LLMs on a novel task of generating acoustically intelligible paraphrases for better human speech perception in noise. Our experiments in English demonstrated that with standard prompting, LLMs struggle to control the non-textual attribute, i.e., acoustic intelligibility, while efficiently capturing the desired textual attributes like semantic equivalence. To remedy this issue, we propose a simple prompting approach, prompt-and-select, which generates paraphrases by decoupling the desired textual and non-textual attributes in the text generation pipeline. Our approach resulted in a 40% relative improvement in human speech perception, by paraphrasing utterances that are highly distorted in a listening condition with babble noise at signal-to-noise ratio (SNR) -5 dB. This study reveals the limitation of LLMs in capturing non-textual attributes, and our proposed method showcases the potential of using LLMs for better human speech perception in noise.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Verkerk, Annemarie; Talamo, Luigi

mini-CIEP+ : A Shareable Parallel Corpus of Prose Inproceedings

Zweigenbaum, Pierre; Rapp, Reinhard; Sharoff, Serge (Ed.): Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024, ELRA and ICCL, pp. 135-143, Torino, Italia, 2024.

In this paper we present mini-CIEP+, a sharable parallel corpus of prose. mini-CIEP+ consists of the first part of ten different works of prose across many different languages, allowing for the cross-linguistic investigation of larger discourse units. Subcorpora typically contain 5750 sentences and almost 125K tokens. Subcorpora have dependency grammar annotation based on the Universal Dependencies standard (de Marneffe et al., 2021). mini-CIEP+ version 1.0 is available in 35 languages, with the aim of increasing the sample to 50 languages. It is shareable due to recent developments in German law, which allow researchers to share up to 15% of copy-righted material with a select group of people for their own research. Hence, mini-CIEP+ is not publically available, but is rather shareable in a modular fashion with select researchers. We additionally describe future plans for further annotation of mini-CIEP+ as well as its limitations.

@inproceedings{verkerk-talamo-2024-mini,
title = {mini-CIEP+ : A Shareable Parallel Corpus of Prose},
author = {Annemarie Verkerk and Luigi Talamo},
editor = {Pierre Zweigenbaum and Reinhard Rapp and Serge Sharoff},
url = {https://aclanthology.org/2024.bucc-1.15},
year = {2024},
date = {2024},
booktitle = {Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024},
pages = {135-143},
publisher = {ELRA and ICCL},
address = {Torino, Italia},
abstract = {In this paper we present mini-CIEP+, a sharable parallel corpus of prose. mini-CIEP+ consists of the first part of ten different works of prose across many different languages, allowing for the cross-linguistic investigation of larger discourse units. Subcorpora typically contain 5750 sentences and almost 125K tokens. Subcorpora have dependency grammar annotation based on the Universal Dependencies standard (de Marneffe et al., 2021). mini-CIEP+ version 1.0 is available in 35 languages, with the aim of increasing the sample to 50 languages. It is shareable due to recent developments in German law, which allow researchers to share up to 15% of copy-righted material with a select group of people for their own research. Hence, mini-CIEP+ is not publically available, but is rather shareable in a modular fashion with select researchers. We additionally describe future plans for further annotation of mini-CIEP+ as well as its limitations.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C7

Jablotschkin, Sarah; Teich, Elke; Zinsmeister, Heike

DE-Lite - a New Corpus of Easy German: Compilation, Exploration, Analysis Inproceedings

Raya Chakravarthi, Bharathi; B, Bharathi; Buitelaar, Paul; Durairaj, Thenmozhi; Kovács, György; Ángel García Cumbreras, Miguel (Ed.): Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion, Association for Computational Linguistics, pp. 106-117, St. Julians, Malta, 2024.

In this paper, we report on a new corpus of simplified German. It is recently requested from public agencies in Germany to provide information in easy language on their outlets (e.g. websites) so as to facilitate participation in society for people with low-literacy levels related to learning difficulties or low language proficiency (e.g. L2 speakers). While various rule sets and guidelines for Easy German (a specific variant of simplified German) have emerged over time, it is unclear (a) to what extent authors and other content creators, including generative AI tools consistently apply them, and (b) how adequate texts in authentic Easy German really are for the intended audiences. As a first step in gaining insights into these issues and to further LT development for simplified German, we compiled DE-Lite, a corpus of easy-to-read texts including Easy German and comparable Standard German texts, by integrating existing collections and gathering new data from the web. We built n-gram models for an Easy German subcorpus of DE-Lite and comparable Standard German texts in order to identify typical features of Easy German. To this end, we use relative entropy (Kullback-Leibler Divergence), a standard technique for evaluating language models, which we apply here for corpus comparison. Our analysis reveals that some rules of Easy German are fairly dominant (e.g. punctuation) and that text genre has a strong effect on the distinctivity of the two language variants.

@inproceedings{jablotschkin-etal-2024-de,
title = {DE-Lite - a New Corpus of Easy German: Compilation, Exploration, Analysis},
author = {Sarah Jablotschkin and Elke Teich and Heike Zinsmeister},
editor = {Bharathi Raya Chakravarthi and Bharathi B and Paul Buitelaar and Thenmozhi Durairaj and Gy{\"o}rgy Kov{\'a}cs and Miguel {\'A}ngel Garc{\'i}a Cumbreras},
url = {https://aclanthology.org/2024.ltedi-1.9},
year = {2024},
date = {2024},
booktitle = {Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion},
pages = {106-117},
publisher = {Association for Computational Linguistics},
address = {St. Julians, Malta},
abstract = {In this paper, we report on a new corpus of simplified German. It is recently requested from public agencies in Germany to provide information in easy language on their outlets (e.g. websites) so as to facilitate participation in society for people with low-literacy levels related to learning difficulties or low language proficiency (e.g. L2 speakers). While various rule sets and guidelines for Easy German (a specific variant of simplified German) have emerged over time, it is unclear (a) to what extent authors and other content creators, including generative AI tools consistently apply them, and (b) how adequate texts in authentic Easy German really are for the intended audiences. As a first step in gaining insights into these issues and to further LT development for simplified German, we compiled DE-Lite, a corpus of easy-to-read texts including Easy German and comparable Standard German texts, by integrating existing collections and gathering new data from the web. We built n-gram models for an Easy German subcorpus of DE-Lite and comparable Standard German texts in order to identify typical features of Easy German. To this end, we use relative entropy (Kullback-Leibler Divergence), a standard technique for evaluating language models, which we apply here for corpus comparison. Our analysis reveals that some rules of Easy German are fairly dominant (e.g. punctuation) and that text genre has a strong effect on the distinctivity of the two language variants.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   T1

Successfully