Publications

Mosbach, Marius; Pimentel, Tiago; Ravfogel, Shauli; Klakow, Dietrich; Elazar, Yanai

Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation Inproceedings

Findings of the Association for Computational Linguistics: ACL 2023, Association for Computational Linguistics, pp. 12284-12314, Toronto, Canada, 2023.

Few-shot fine-tuning and in-context learning are two alternative strategies for task adaptation of pre-trained language models. Recently, in-context learning has gained popularity over fine-tuning due to its simplicity and improved out-of-domain generalization, and because extensive evidence shows that fine-tuned models pick up on spurious correlations.Unfortunately, previous comparisons of the two approaches were done using models of different sizes. This raises the question of whether the observed weaker out-of-domain generalization of fine-tuned models is an inherent property of fine-tuning or a limitation of the experimental setup. In this paper, we compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets, while controlling for the models used, the number of examples, and the number of parameters, ranging from 125M to 30B. Our results show that fine-tuned language models can in fact generalize well out-of-domain. We find that both approaches generalize similarly; they exhibit large variation and depend on properties such as model size and the number of examples, highlighting that robust task adaptation remains a challenge.

@inproceedings{mosbach-etal-2023-shot,
title = {Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation},
author = {Marius Mosbach and Tiago Pimentel and Shauli Ravfogel and Dietrich Klakow and Yanai Elazar},
url = {https://aclanthology.org/2023.findings-acl.779},
doi = {https://doi.org/10.18653/v1/2023.findings-acl.779},
year = {2023},
date = {2023},
booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
pages = {12284-12314},
publisher = {Association for Computational Linguistics},
address = {Toronto, Canada},
abstract = {Few-shot fine-tuning and in-context learning are two alternative strategies for task adaptation of pre-trained language models. Recently, in-context learning has gained popularity over fine-tuning due to its simplicity and improved out-of-domain generalization, and because extensive evidence shows that fine-tuned models pick up on spurious correlations.Unfortunately, previous comparisons of the two approaches were done using models of different sizes. This raises the question of whether the observed weaker out-of-domain generalization of fine-tuned models is an inherent property of fine-tuning or a limitation of the experimental setup. In this paper, we compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets, while controlling for the models used, the number of examples, and the number of parameters, ranging from 125M to 30B. Our results show that fine-tuned language models can in fact generalize well out-of-domain. We find that both approaches generalize similarly; they exhibit large variation and depend on properties such as model size and the number of examples, highlighting that robust task adaptation remains a challenge.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Ryzhova, Margarita; Skrjanec, Iza; Quach, Nina; Chase, Alice Virginia ; Ellsiepen, Emilia; Demberg, Vera

Word Familiarity Classification From a Single Trial Based on Eye-Movements. A Study in German and English Inproceedings

ETRA '23: Proceedings of the 2023 Symposium on Eye Tracking Research and Applications, 2023.

Identifying processing difficulty during reading due to unfamiliar words has promising applications in automatic text adaptation. We present a classification model that predicts whether a word is (un)known to the reader based on eye-movement measures. We examine German and English data and validate our model on unseen subjects and items achieving a high accuracy in both languages.

@inproceedings{ryzhova-etal-2023,
title = {Word Familiarity Classification From a Single Trial Based on Eye-Movements. A Study in German and English},
author = {Margarita Ryzhova and Iza Skrjanec and Nina Quach and Alice Virginia Chase and Emilia Ellsiepen and Vera Demberg},
url = {https://dl.acm.org/doi/abs/10.1145/3588015.3590118},
doi = {https://doi.org/10.1145/3588015.3590118},
year = {2023},
date = {2023},
booktitle = {ETRA '23: Proceedings of the 2023 Symposium on Eye Tracking Research and Applications},
abstract = {

Identifying processing difficulty during reading due to unfamiliar words has promising applications in automatic text adaptation. We present a classification model that predicts whether a word is (un)known to the reader based on eye-movement measures. We examine German and English data and validate our model on unseen subjects and items achieving a high accuracy in both languages.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A8

Skrjanec, Iza; Broy, Frederik Yannick; Demberg, Vera

Expert-adapted language models improve the fit to reading times Inproceedings

Procedia Computer Science, PsyArXiv, 2023.

The concept of surprisal refers to the predictability of a word based on its context. Surprisal is known to be predictive of human processing difficulty and is usually estimated by language models. However, because humans differ in their linguistic experience, they also differ in the actual processing difficulty they experience with a given word or sentence. We investigate whether models that are similar to the linguistic experience and background knowledge of a specific group of humans are better at predicting their reading times than a generic language model. We analyze reading times from the PoTeC corpus (Jäger et al. 2021) of eye movements from biology and physics experts reading biology and physics texts. We find experts read in-domain texts faster than novices, especially domain-specific terms. Next, we train language models adapted to the biology and physics domains and show that surprisal obtained from these specialized models improves the fit to expert reading times above and beyond a generic language model.

 

@inproceedings{skrjanec_broy_demberg_2023,
title = {Expert-adapted language models improve the fit to reading times},
author = {Iza Skrjanec and Frederik Yannick Broy and Vera Demberg},
url = {https://psyarxiv.com/dc8y6},
doi = {https://doi.org/10.31234/osf.io/dc8y6},
year = {2023},
date = {2023},
booktitle = {Procedia Computer Science},
publisher = {PsyArXiv},
abstract = {

The concept of surprisal refers to the predictability of a word based on its context. Surprisal is known to be predictive of human processing difficulty and is usually estimated by language models. However, because humans differ in their linguistic experience, they also differ in the actual processing difficulty they experience with a given word or sentence. We investigate whether models that are similar to the linguistic experience and background knowledge of a specific group of humans are better at predicting their reading times than a generic language model. We analyze reading times from the PoTeC corpus (J{\"a}ger et al. 2021) of eye movements from biology and physics experts reading biology and physics texts. We find experts read in-domain texts faster than novices, especially domain-specific terms. Next, we train language models adapted to the biology and physics domains and show that surprisal obtained from these specialized models improves the fit to expert reading times above and beyond a generic language model.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A8

Mecklinger, Axel; Kamp, Siri-Maria

Observing memory encoding while it unfolds: Functional interpretation and current debates regarding ERP subsequent memory effects Journal Article

Neuroscience & Biobehavioral Reviews, 153, 2023.

Our ability to remember the past depends on neural processes set in train in the moment an event is experienced. These processes can be studied by segregating brain activity according to whether an event is later remembered or forgotten. The present review integrates a large number of studies examining this differential brain activity, labeled subsequent memory effect (SME), with the ERP technique, into a functional organization and discusses routes for further research. Based on the reviewed literature, we suggest that memory encoding is implemented by multiple processes, typically reflected in three functionally different subcomponents of the ERP SME elicited by study stimuli, which presumably interact with preparatory SME activity preceding the to be encoded event. We argue that ERPs are a valuable method in the SME paradigm because they have a sufficiently high temporal resolution to disclose the subcomponents of encoding-related brain activity. Implications of the proposed functional organization for future studies using the SME procedure in basic and applied settings will be discussed.

@article{Mecklinger-etal-2023,
title = {Observing memory encoding while it unfolds: Functional interpretation and current debates regarding ERP subsequent memory effects},
author = {Axel Mecklinger and Siri-Maria Kamp},
url = {https://www.sciencedirect.com/science/article/abs/pii/S0149763423003160},
year = {2023},
date = {2023},
journal = {Neuroscience & Biobehavioral Reviews},
volume = {153},
abstract = {

Our ability to remember the past depends on neural processes set in train in the moment an event is experienced. These processes can be studied by segregating brain activity according to whether an event is later remembered or forgotten. The present review integrates a large number of studies examining this differential brain activity, labeled subsequent memory effect (SME), with the ERP technique, into a functional organization and discusses routes for further research. Based on the reviewed literature, we suggest that memory encoding is implemented by multiple processes, typically reflected in three functionally different subcomponents of the ERP SME elicited by study stimuli, which presumably interact with preparatory SME activity preceding the to be encoded event. We argue that ERPs are a valuable method in the SME paradigm because they have a sufficiently high temporal resolution to disclose the subcomponents of encoding-related brain activity. Implications of the proposed functional organization for future studies using the SME procedure in basic and applied settings will be discussed.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A6

Bader, Regine; Tarantini, Luca; Mecklinger, Axel

Task context dissociates the FN400 and the N400 Journal Article

Psychophysiology, 60, 2023.

In event-related potential studies, familiarity-based recognition has been associated with the FN400, that is, more positive-going waveforms for old items than new items 300–500 ms post-stimulus onset, maximal at frontal electrodes. We tested the proposition that the FN400 reflects the attribution of unexpected processing fluency to familiarity. This implies that the FN400 is greater when fluency is less expected, that is, for less familiar stimuli. Moreover, the FN400 should be modulated by the goal of remembering and only elicited when fluency is correctly attributed to the past, that is, by correct old responses in recognition memory tests. In the absence of a retrieval task, enhanced fluency for repeated items should be associated with an N400 attenuation as no episodic attribution takes place. In an incidental study-test design with words of low and high life-time familiarity, participants made pleasantness judgments for half of the studied words. The other half re-appeared in a recognition test. Only in the latter task, participants had the goal of remembering. As both tasks included also new words, we could compare old/new effects under conditions in which both effects are driven by increased fluency for repeated words. We did not find the expected differences in the FN400 for low vs. high life-time familiarity items. However, as expected, we found a frontally distributed FN400 in the recognition test whereas the old/new effect in the pleasantness task resembled an N400 effect. This supports the view that the FN400 occurs when fluency is attributed to familiarity during a recognition decision.

@article{Bader_etal_2023,
title = {Task context dissociates the FN400 and the N400},
author = {Regine Bader and Luca Tarantini and Axel Mecklinger},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/psyp.14258},
doi = {https://doi.org/10.1111/psyp.14258},
year = {2023},
date = {2023},
journal = {Psychophysiology},
volume = {60},
number = {7},
abstract = {

In event-related potential studies, familiarity-based recognition has been associated with the FN400, that is, more positive-going waveforms for old items than new items 300–500 ms post-stimulus onset, maximal at frontal electrodes. We tested the proposition that the FN400 reflects the attribution of unexpected processing fluency to familiarity. This implies that the FN400 is greater when fluency is less expected, that is, for less familiar stimuli. Moreover, the FN400 should be modulated by the goal of remembering and only elicited when fluency is correctly attributed to the past, that is, by correct old responses in recognition memory tests. In the absence of a retrieval task, enhanced fluency for repeated items should be associated with an N400 attenuation as no episodic attribution takes place. In an incidental study-test design with words of low and high life-time familiarity, participants made pleasantness judgments for half of the studied words. The other half re-appeared in a recognition test. Only in the latter task, participants had the goal of remembering. As both tasks included also new words, we could compare old/new effects under conditions in which both effects are driven by increased fluency for repeated words. We did not find the expected differences in the FN400 for low vs. high life-time familiarity items. However, as expected, we found a frontally distributed FN400 in the recognition test whereas the old/new effect in the pleasantness task resembled an N400 effect. This supports the view that the FN400 occurs when fluency is attributed to familiarity during a recognition decision.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A6

Li, Muqing; Venhuizen, Noortje; Jachmann, Torsten; Drenhaus, Heiner; Crocker, Matthew W.

Does informativity modulate linearization preferences in reference production?  Inproceedings

Proceedings of the Annual Meeting of the Cognitive Science Society, 45, pp. 3028-3054, 2023.

During referential communication, speaker choices regarding the syntactic encoding of their expressions can modulate the linear ordering of the properties necessary to identify the referent. We investigated whether such syntactic choices are influenced by the informativity of these properties in a given visual context, as quantified by Referential Entropy Reduction (RER). In two experiments, a maze-based sentence completion task was used to examine whether informativity of a particular property (animal or action) influenced the decision to produce pre- versus post-nominal modifications when describing animal-performing-action referents in a visual scene. While many participants used a fixed strategy, informativity did significantly influence linearization for the remaining participants, consistent with a maximal informativity strategy in which the high RER property is be encoded first. This suggests that speakers who vary their encodings are indeed sensitive to the informativity of properties in a visual scene, preferring syntactic linearization in which informative properties appear early.

@inproceedings{Muqing-etal-2023,
title = {Does informativity modulate linearization preferences in reference production? },
author = {Muqing Li and Noortje Venhuizen and Torsten Jachmann and Heiner Drenhaus and Matthew W. Crocker},
url = {https://escholarship.org/uc/item/95v6j0sx},
year = {2023},
date = {2023},
booktitle = {Proceedings of the Annual Meeting of the Cognitive Science Society},
pages = {3028-3054},
abstract = {During referential communication, speaker choices regarding the syntactic encoding of their expressions can modulate the linear ordering of the properties necessary to identify the referent. We investigated whether such syntactic choices are influenced by the informativity of these properties in a given visual context, as quantified by Referential Entropy Reduction (RER). In two experiments, a maze-based sentence completion task was used to examine whether informativity of a particular property (animal or action) influenced the decision to produce pre- versus post-nominal modifications when describing animal-performing-action referents in a visual scene. While many participants used a fixed strategy, informativity did significantly influence linearization for the remaining participants, consistent with a maximal informativity strategy in which the high RER property is be encoded first. This suggests that speakers who vary their encodings are indeed sensitive to the informativity of properties in a visual scene, preferring syntactic linearization in which informative properties appear early.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Zaitova, Iuliia; Stenger, Irina; Avgustinova, Tania

Microsyntactic Unit Detection Using Word Embedding Models: Experiments on Slavic Languages Inproceedings

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2023), pp. 1251-1259, 2023.

@inproceedings{Zaitova/etal:2023a,
title = {Microsyntactic Unit Detection Using Word Embedding Models: Experiments on Slavic Languages},
author = {Iuliia Zaitova and Irina Stenger and Tania Avgustinova},
year = {2023},
date = {2023},
booktitle = {Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2023)},
pages = {1251-1259},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Gessinger, Iona; Cohn, Michelle; Cowan, Benjamin R.; Zellou, Georgia; Möbius, Bernd

Cross-linguistic emotion perception in human and TTS voices Inproceedings

Proceedings of Interspeech 2023, pp. 5222-5226, Dublin, Ireland, 2023.

This study investigates how German listeners perceive changes in the emotional expression of German and American English human voices and Amazon Alexa text-to-speech (TTS) voices, respectively. Participants rated sentences containing emotionally neutral lexico-semantic information that were resynthesized to vary in prosodic emotional expressiveness. Starting from an emotionally neutral production, three levels of increasing ‚happiness‘ were created. Results show that ‚happiness‘ manipulations lead to higher ratings of emotional valence (i.e., more positive) and arousal (i.e., more excited) for German and English voices, with stronger effects for the German voices. In particular, changes in valence were perceived more prominently in German TTS compared to English TTS. Additionally, both TTS voices were rated lower than the respective human voices on scales that reflect anthropomorphism (e.g., human-likeness). We discuss these findings in the context of cross-linguistic emotion accounts.

@inproceedings{Gessinger/etal:2023,
title = {Cross-linguistic emotion perception in human and TTS voices},
author = {Iona Gessinger and Michelle Cohn and Benjamin R. Cowan and Georgia Zellou and Bernd M{\"o}bius},
url = {https://www.isca-speech.org/archive/interspeech_2023/gessinger23_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2023-711},
year = {2023},
date = {2023},
booktitle = {Proceedings of Interspeech 2023},
pages = {5222-5226},
address = {Dublin, Ireland},
abstract = {This study investigates how German listeners perceive changes in the emotional expression of German and American English human voices and Amazon Alexa text-to-speech (TTS) voices, respectively. Participants rated sentences containing emotionally neutral lexico-semantic information that were resynthesized to vary in prosodic emotional expressiveness. Starting from an emotionally neutral production, three levels of increasing 'happiness' were created. Results show that 'happiness' manipulations lead to higher ratings of emotional valence (i.e., more positive) and arousal (i.e., more excited) for German and English voices, with stronger effects for the German voices. In particular, changes in valence were perceived more prominently in German TTS compared to English TTS. Additionally, both TTS voices were rated lower than the respective human voices on scales that reflect anthropomorphism (e.g., human-likeness). We discuss these findings in the context of cross-linguistic emotion accounts.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Kudera, Jacek; Stenger, Irina; Georgis, Philip; Möbius, Bernd; Avgustinova, Tania; Klakow, Dietrich

Cross-linguistic intelligibility of idiomatic phrases in Polish-Russian translation tasks Incollection

Phraseology, constructions and translation: Corpus-based, computational and cultural aspects, Presses universitaires de Louvain, pp. 237-249, 2023.

This paper presents the results of a translation task involving idiomatic phrases in closely related languages. The goal is to test auditory comprehension of idioms. The experiment was conducted with native speakers of either Polish or Russian, who were not professional translators. The translation equivalents were categorized according to three conditions: (1) semantic equivalent, found in a phraseological dictionary; (2) lemma-based referent, sharing a cognate component; and (3) literal translation of the source phrase. It is hypothesized that information-theoretic measures of surprisal in combination with lexical and syntactic distances between idioms can predict lay translators’ preferences. The results suggest that the proposed measures are valid predictors for the type of translation native speakers will select. The outcomes reveal an asymmetry in preference for equivalent selection across the groups of lay translators.

@incollection{Kudera/etal:2023a,
title = {Cross-linguistic intelligibility of idiomatic phrases in Polish-Russian translation tasks},
author = {Jacek Kudera and Irina Stenger and Philip Georgis and Bernd M{\"o}bius and Tania Avgustinova and Dietrich Klakow},
url = {https://pul.uclouvain.be/book/?GCOI=29303100163350&utm_source=rss&utm_medium=rss&utm_campaign=newreleases#h2tabFormats},
year = {2023},
date = {2023},
booktitle = {Phraseology, constructions and translation: Corpus-based, computational and cultural aspects},
pages = {237-249},
publisher = {Presses universitaires de Louvain},
abstract = {This paper presents the results of a translation task involving idiomatic phrases in closely related languages. The goal is to test auditory comprehension of idioms. The experiment was conducted with native speakers of either Polish or Russian, who were not professional translators. The translation equivalents were categorized according to three conditions: (1) semantic equivalent, found in a phraseological dictionary; (2) lemma-based referent, sharing a cognate component; and (3) literal translation of the source phrase. It is hypothesized that information-theoretic measures of surprisal in combination with lexical and syntactic distances between idioms can predict lay translators’ preferences. The results suggest that the proposed measures are valid predictors for the type of translation native speakers will select. The outcomes reveal an asymmetry in preference for equivalent selection across the groups of lay translators.},
pubstate = {published},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   C4

Abdullah, Badr M.; Shaik, Mohammed Maqsood ; Möbius, Bernd; Klakow, Dietrich

An information-theoretic analysis of self-supervised discrete representations of speech Inproceedings

Proceedings of Interspeech 2023, pp. 2883-2887, Dublin, Ireland, 2023.

Self-supervised representation learning for speech often involves a quantization step that transforms the acoustic input into discrete units. However, it remains unclear how to characterize the relationship between these discrete units and abstract phonetic categories such as phonemes. In this paper, we develop an information-theoretic framework whereby we represent each phonetic category as a distribution over discrete units. We then apply our framework to two different self-supervised models (namely wav2vec 2.0 and XLSR) and use American English speech as a case study. Our study demonstrates that the entropy of phonetic distributions reflects the variability of the underlying speech sounds, with phonetically similar sounds exhibiting similar distributions. While our study confirms the lack of direct, one-to-one correspondence, we find an intriguing, indirect relationship between phonetic categories and discrete units.

@inproceedings{Abdullah/etal:2023a,
title = {An information-theoretic analysis of self-supervised discrete representations of speech},
author = {Badr M. Abdullah and Mohammed Maqsood Shaik and Bernd M{\"o}bius and Dietrich Klakow},
doi = {https://doi.org/10.21437/Interspeech.2023--2131},
year = {2023},
date = {2023},
booktitle = {Proceedings of Interspeech 2023},
pages = {2883-2887},
address = {Dublin, Ireland},
abstract = {Self-supervised representation learning for speech often involves a quantization step that transforms the acoustic input into discrete units. However, it remains unclear how to characterize the relationship between these discrete units and abstract phonetic categories such as phonemes. In this paper, we develop an information-theoretic framework whereby we represent each phonetic category as a distribution over discrete units. We then apply our framework to two different self-supervised models (namely wav2vec 2.0 and XLSR) and use American English speech as a case study. Our study demonstrates that the entropy of phonetic distributions reflects the variability of the underlying speech sounds, with phonetically similar sounds exhibiting similar distributions. While our study confirms the lack of direct, one-to-one correspondence, we find an intriguing, indirect relationship between phonetic categories and discrete units.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Yuen, Ivan; Ibrahim, Omnia; Andreeva, Bistra; Möbius, Bernd

Non-uniform cue-trading: differential effects of surprisal on pause usage and pause duration in German Inproceedings

Proceedings of the 20th International Congress of Phonetic Sciences, ICPhS 2023 (Prague, Czech Rep.), pp. 619-623, 2023.

Pause occurrence is conditional on contextual (un)predictability (in terms of surprisal) [10, 11], and so is the acoustic implementation of duration at multiple linguistic levels. Although these cues (i.e., pause usage/pause duration and syllable duration) are subject to the influence of the same factor, it is not clear how they are related to one another. A recent study in [1] using pause duration to define prosodic boundary strength reported a more pronounced surprisal effect on syllable duration, hinting at a trading relationship. The current study aimed to directly test for trading relationships among pause usage, pause duration and syllable duration in different surprisal contexts, analysing German radio news in the DIRNDL corpus. No trading relationship was observed between pause usage and surprisal, or between pause usage and syllable duration. However, a trading relationship was found between the durations of a pause and a syllable for accented items.

@inproceedings{Yuen/etal:2023a,
title = {Non-uniform cue-trading: differential effects of surprisal on pause usage and pause duration in German},
author = {Ivan Yuen and Omnia Ibrahim and Bistra Andreeva and Bernd M{\"o}bius},
year = {2023},
date = {2023},
booktitle = {Proceedings of the 20th International Congress of Phonetic Sciences, ICPhS 2023 (Prague, Czech Rep.)},
pages = {619-623},
abstract = {Pause occurrence is conditional on contextual (un)predictability (in terms of surprisal) [10, 11], and so is the acoustic implementation of duration at multiple linguistic levels. Although these cues (i.e., pause usage/pause duration and syllable duration) are subject to the influence of the same factor, it is not clear how they are related to one another. A recent study in [1] using pause duration to define prosodic boundary strength reported a more pronounced surprisal effect on syllable duration, hinting at a trading relationship. The current study aimed to directly test for trading relationships among pause usage, pause duration and syllable duration in different surprisal contexts, analysing German radio news in the DIRNDL corpus. No trading relationship was observed between pause usage and surprisal, or between pause usage and syllable duration. However, a trading relationship was found between the durations of a pause and a syllable for accented items.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Abdullah, Badr M.; Shaik, Mohammed Maqsood ; Klakow, Dietrich

On the Nature of Discrete Speech Representations in Multilingual Self-supervised Models Inproceedings

Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Association for Computational Linguistics, pp. 159-161, Dubrovnik, Croatia, 2023.

Self-supervision has emerged as an effective paradigm for learning representations of spoken language from raw audio without explicit labels or transcriptions. Self-supervised speech models, such as wav2vec 2.0 (Baevski et al., 2020) and HuBERT (Hsu et al., 2021), have shown significant promise in improving the performance across different speech processing tasks. One of the main advantages of self-supervised speech models is that they can be pre-trained on a large sample of languages (Conneau et al., 2020; Babu et al.,2022), which facilitates cross-lingual transfer for low-resource languages (San et al., 2021). State-of-the-art self-supervised speech models include a quantization module that transforms the continuous acoustic input into a sequence of discrete units. One of the key questions in this area is whether the discrete representations learned via self-supervision are language-specific or language-universal. In other words, we ask: do the discrete units learned by a multilingual speech model represent the same speech sounds across languages or do they differ based on the specific language being spoken? From the practical perspective, this question has important implications for the development of speech models that can generalize across languages, particularly for low-resource languages. Furthermore, examining the level of linguistic abstraction in speech models that lack symbolic supervision is also relevant to the field of human language acquisition (Dupoux, 2018).

@inproceedings{abdullah-etal-2023-nature,
title = {On the Nature of Discrete Speech Representations in Multilingual Self-supervised Models},
author = {Badr M. Abdullah and Mohammed Maqsood Shaik and Dietrich Klakow},
url = {https://aclanthology.org/2023.sigtyp-1.20},
year = {2023},
date = {2023},
booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
pages = {159-161},
publisher = {Association for Computational Linguistics},
address = {Dubrovnik, Croatia},
abstract = {Self-supervision has emerged as an effective paradigm for learning representations of spoken language from raw audio without explicit labels or transcriptions. Self-supervised speech models, such as wav2vec 2.0 (Baevski et al., 2020) and HuBERT (Hsu et al., 2021), have shown significant promise in improving the performance across different speech processing tasks. One of the main advantages of self-supervised speech models is that they can be pre-trained on a large sample of languages (Conneau et al., 2020; Babu et al.,2022), which facilitates cross-lingual transfer for low-resource languages (San et al., 2021). State-of-the-art self-supervised speech models include a quantization module that transforms the continuous acoustic input into a sequence of discrete units. One of the key questions in this area is whether the discrete representations learned via self-supervision are language-specific or language-universal. In other words, we ask: do the discrete units learned by a multilingual speech model represent the same speech sounds across languages or do they differ based on the specific language being spoken? From the practical perspective, this question has important implications for the development of speech models that can generalize across languages, particularly for low-resource languages. Furthermore, examining the level of linguistic abstraction in speech models that lack symbolic supervision is also relevant to the field of human language acquisition (Dupoux, 2018).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Steuer, Julius; Abdullah, Badr M.; List, Johann-Mattis; Klakow, Dietrich

Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists Inproceedings

Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Association for Computational Linguistics, pp. 96-109, Dubrovnik, Croatia, 2023.

We present a cross-linguistic study of vowel harmony that aims to quantifies this phenomenon using data-driven computational modeling. Concretely, we define an information-theoretic measure of harmonicity based on the predictability of vowels in a natural language lexicon, which we estimate using phoneme-level language models (PLMs). Prior quantitative studies have heavily relied on inflected word-forms in the analysis on vowel harmony. On the contrary, we train our models using cross-linguistically comparable lemma forms with little or no inflection, which enables us to cover more under-studied languages. Training data for our PLMs consists of word lists offering a maximum of 1000 entries per language. Despite the fact that the data we employ are substantially smaller than previously used corpora, our experiments demonstrate the neural PLMs capture vowel harmony patterns in a set of languages that exhibit this phenomenon. Our work also demonstrates that word lists are a valuable resource for typological research, and offers new possibilities for future studies on low-resource, under-studied languages.

@inproceedings{steuer-etal-2023-information,
title = {Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists},
author = {Julius Steuer and Badr M. Abdullah and Johann-Mattis List and Dietrich Klakow},
url = {https://aclanthology.org/2023.sigtyp-1.10},
year = {2023},
date = {2023},
booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
pages = {96-109},
publisher = {Association for Computational Linguistics},
address = {Dubrovnik, Croatia},
abstract = {We present a cross-linguistic study of vowel harmony that aims to quantifies this phenomenon using data-driven computational modeling. Concretely, we define an information-theoretic measure of harmonicity based on the predictability of vowels in a natural language lexicon, which we estimate using phoneme-level language models (PLMs). Prior quantitative studies have heavily relied on inflected word-forms in the analysis on vowel harmony. On the contrary, we train our models using cross-linguistically comparable lemma forms with little or no inflection, which enables us to cover more under-studied languages. Training data for our PLMs consists of word lists offering a maximum of 1000 entries per language. Despite the fact that the data we employ are substantially smaller than previously used corpora, our experiments demonstrate the neural PLMs capture vowel harmony patterns in a set of languages that exhibit this phenomenon. Our work also demonstrates that word lists are a valuable resource for typological research, and offers new possibilities for future studies on low-resource, under-studied languages.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B4 C4

Aurnhammer, Christoph; Crocker, Matthew W.; Brouwer, Harm

Single-trial neurodynamics reveal N400 and P600 coupling in language comprehension Journal Article

Cognitive Neurodynamics, 2023, ISSN 1871-4099.

Theories of the electrophysiology of language comprehension are mostly informed by event-related potential effects observed between condition averages. We here argue that a dissociation between competing effect-level explanations of event-related potentials can be achieved by turning to predictions and analyses at the single-trial level. Specifically, we examine the single-trial dynamics in event-related potential data that exhibited a biphasic N400–P600 effect pattern. A group of multi-stream models can explain biphasic effects by positing that each individual trial should induce either an N400 increase or a P600 increase, but not both. An alternative, single-stream account, Retrieval-Integration theory, explicitly predicts that N400 amplitude and P600 amplitude should be correlated at the single-trial level. In order to investigate the single-trial dynamics of the N400 and the P600, we apply a regression-based technique in which we quantify the extent to which N400 amplitudes are predictive of the electroencephalogram in the P600 time window. Our findings suggest that, indeed, N400 amplitudes and P600 amplitudes are inversely correlated within-trial and, hence, the N400 effect and the P600 effect in biphasic data are driven by the same trials. Critically, we demonstrate that this finding also extends to data which exhibited only monophasic effects between conditions. In sum, the observation that the N400 is inversely correlated with the P600 on a by-trial basis supports a single stream view, such as Retrieval-Integration theory, and is difficult to reconcile with the processing mechanisms proposed by multi-stream models.

@article{aurnhammer2023singletrial,
title = {Single-trial neurodynamics reveal N400 and P600 coupling in language comprehension},
author = {Christoph Aurnhammer and Matthew W. Crocker and Harm Brouwer},
url = {https://link.springer.com/article/10.1007/s11571-023-09983-7},
doi = {https://doi.org/10.1007/s11571-023-09983-7},
year = {2023},
date = {2023},
journal = {Cognitive Neurodynamics},
abstract = {Theories of the electrophysiology of language comprehension are mostly informed by event-related potential effects observed between condition averages. We here argue that a dissociation between competing effect-level explanations of event-related potentials can be achieved by turning to predictions and analyses at the single-trial level. Specifically, we examine the single-trial dynamics in event-related potential data that exhibited a biphasic N400–P600 effect pattern. A group of multi-stream models can explain biphasic effects by positing that each individual trial should induce either an N400 increase or a P600 increase, but not both. An alternative, single-stream account, Retrieval-Integration theory, explicitly predicts that N400 amplitude and P600 amplitude should be correlated at the single-trial level. In order to investigate the single-trial dynamics of the N400 and the P600, we apply a regression-based technique in which we quantify the extent to which N400 amplitudes are predictive of the electroencephalogram in the P600 time window. Our findings suggest that, indeed, N400 amplitudes and P600 amplitudes are inversely correlated within-trial and, hence, the N400 effect and the P600 effect in biphasic data are driven by the same trials. Critically, we demonstrate that this finding also extends to data which exhibited only monophasic effects between conditions. In sum, the observation that the N400 is inversely correlated with the P600 on a by-trial basis supports a single stream view, such as Retrieval-Integration theory, and is difficult to reconcile with the processing mechanisms proposed by multi-stream models.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Ibrahim, Omnia; Yuen, Ivan; Andreeva, Bistra; Möbius, Bernd

The interplay between syllable-based predictability and voicing during closure in intersonorant German stops Inproceedings

Conference: Phonetics and Phonology in Europe 2023 (PaPE 2023), Nijmegen, the Netherlands, 2023.
Contextual predictability has pervasive effects on the acoustic realization of speech. Generally, duration is shortened in more predictable contexts and conversely lengthened in less predictable contexts. There are several measures to quantify predictability in a message. One of them is surprisal, which is calculated as S(Uniti) = -log2 P (Uniti|Context). In a recent work, Ibrahim et al. have found that the effect of syllable-based surprisal on the temporal dimension(s) of a syllable selectively extends to the segmental level, for example, consonant voicing in German. Closure duration was uniformly longer for both voiceless and voiced consonants, but voice onset time was not. The voice onset time pattern might be related to German being typically considered an ‚aspirating‘ language, using [+spread glottis] for voiceless consonants and [-spread glottis] for their voiced counterparts. However, voicing has also been reported in an intervocalic context for both voiceless and voiced consonants to varying extents. To further test whether the previously reported surprisal-based effect on voice onset time is driven by the phonological feature [spread glottis], the current study re-examined the downstream effect of syllable-based predictability on segmental voicing in German stops by measuring the degree of residual (phonetic) voicing during stop closure in an inter-sonorant context. Method: Data were based on a subset of stimuli (speech produced in a quiet acoustic condition) from Ibrahim et al. 38 German speakers recorded 60 sentences. Each sentence contained a target stressed CV syllable in a polysyllabic word. Each target syllable began with one of the stops /p, k, b, d/, combined with one of the vowels /a:, e:, i:, o:, u:/. The analyzed data contained voiceless vs. voiced initial stops in a low or high surprisal syllable. Closure duration (CD) and voicing during closure (VDC) were extracted using in-house Python and Praat scripts. A ratio measure VDC/CD was used to factor out any potential covariation between VDC and CD. Linear mixed-effects modeling was used to evaluate the effect(s) of surprisal and target stop voicing status on VDC/CD ratio using the lmer package in R. The final model was: VDC/CD ratio ∼ Surprisal + Target stop voicing status + (1 | Speaker) + (1 | Syllable ) + (1 | PrevManner ) + (1 | Sentence). Results: In an inter-sonorant context, we found a smaller VDC/CD ratio in voiceless stops than in voiced ones (p=2.04e-08***). As expected, residual voicing is shorter during a voiceless closure than during a voiced closure. This is consistent with the idea of preserving a phonological voicing distinction, as well as the physiological constraint of sustaining voicing for a long period during the closure of a voiceless stop. Moreover, the results yielded a significant effect of surprisal on VDC/CD ratio (p=.017*), with no interaction between the two factors (voicing and surprisal). The VDC/CD ratio is larger in a low than in a high surprisal syllable, irrespective of the voicing status of the target stops. That is, the syllable-based surprisal effect percolated down to German voicing, and the effect is uniform for a voiceless and voiced stop, when residual voicing was measured. Such a uniform effect on residual voicing is consistent with the previous result on closure duration. These findings reveal that the syllable-based surprisal effect can spread downstream to the segmental level and the effect is uniform for acoustic cues that are not directly tied to a phonological feature in German voicing (i.e. [spread glottis]).

@inproceedings{inproceedings,
title = {The interplay between syllable-based predictability and voicing during closure in intersonorant German stops},
author = {Omnia Ibrahim and Ivan Yuen and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.researchgate.net/publication/371138687_The_interplay_between_syllable-based_predictability_and_voicing_during_closure_in_intersonorant_German_stops},
year = {2023},
date = {2023},
booktitle = {Conference: Phonetics and Phonology in Europe 2023 (PaPE 2023)},
address = {Nijmegen, the Netherlands},
abstract = {

Contextual predictability has pervasive effects on the acoustic realization of speech. Generally, duration is shortened in more predictable contexts and conversely lengthened in less predictable contexts. There are several measures to quantify predictability in a message. One of them is surprisal, which is calculated as S(Uniti) = -log2 P (Uniti|Context). In a recent work, Ibrahim et al. have found that the effect of syllable-based surprisal on the temporal dimension(s) of a syllable selectively extends to the segmental level, for example, consonant voicing in German. Closure duration was uniformly longer for both voiceless and voiced consonants, but voice onset time was not. The voice onset time pattern might be related to German being typically considered an 'aspirating' language, using [+spread glottis] for voiceless consonants and [-spread glottis] for their voiced counterparts. However, voicing has also been reported in an intervocalic context for both voiceless and voiced consonants to varying extents. To further test whether the previously reported surprisal-based effect on voice onset time is driven by the phonological feature [spread glottis], the current study re-examined the downstream effect of syllable-based predictability on segmental voicing in German stops by measuring the degree of residual (phonetic) voicing during stop closure in an inter-sonorant context. Method: Data were based on a subset of stimuli (speech produced in a quiet acoustic condition) from Ibrahim et al. 38 German speakers recorded 60 sentences. Each sentence contained a target stressed CV syllable in a polysyllabic word. Each target syllable began with one of the stops /p, k, b, d/, combined with one of the vowels /a:, e:, i:, o:, u:/. The analyzed data contained voiceless vs. voiced initial stops in a low or high surprisal syllable. Closure duration (CD) and voicing during closure (VDC) were extracted using in-house Python and Praat scripts. A ratio measure VDC/CD was used to factor out any potential covariation between VDC and CD. Linear mixed-effects modeling was used to evaluate the effect(s) of surprisal and target stop voicing status on VDC/CD ratio using the lmer package in R. The final model was: VDC/CD ratio ∼ Surprisal + Target stop voicing status + (1 | Speaker) + (1 | Syllable ) + (1 | PrevManner ) + (1 | Sentence). Results: In an inter-sonorant context, we found a smaller VDC/CD ratio in voiceless stops than in voiced ones (p=2.04e-08***). As expected, residual voicing is shorter during a voiceless closure than during a voiced closure. This is consistent with the idea of preserving a phonological voicing distinction, as well as the physiological constraint of sustaining voicing for a long period during the closure of a voiceless stop. Moreover, the results yielded a significant effect of surprisal on VDC/CD ratio (p=.017*), with no interaction between the two factors (voicing and surprisal). The VDC/CD ratio is larger in a low than in a high surprisal syllable, irrespective of the voicing status of the target stops. That is, the syllable-based surprisal effect percolated down to German voicing, and the effect is uniform for a voiceless and voiced stop, when residual voicing was measured. Such a uniform effect on residual voicing is consistent with the previous result on closure duration. These findings reveal that the syllable-based surprisal effect can spread downstream to the segmental level and the effect is uniform for acoustic cues that are not directly tied to a phonological feature in German voicing (i.e. [spread glottis]).
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Jablotschkin, Sarah; Zinsmeister, Heike

LeiKo. Ein Vergleichskorpus für Leichte Sprache und Einfache Sprache Incollection

Neue Entwicklungen in der Korpuslandschaft der Germanistik. Beiträge zur IDS-Methodenmesse 2022, Kupietz, Mark und Thomas Schmidt, Tübingen: Narr, 2023.

Mit dem Konzept “Easy-to-read” werden Teilsysteme natürlicher Sprachen bezeichnet, welche durch eine systematische Reduktion auf den Ebenen Lexik und Syntax entstehen und den Zugang zu geschriebenen Informationen für Erwachsene mit geringen Lesekompetenzen gewährleisten. Im Deutschen gibt es “Leichte Sprache”, welche sich nach spezifischen linguistischen und typografischen Regeln richtet, und die weniger restringierte “einfache Sprache”. Beide Varianten erhalten im akademischen sowie nicht-akademischen Diskurs vermehrt Aufmerksamkeit – nicht zuletzt dank der im Jahr 2009 in Deutschland ratifizierten UN-Behindertenrechtskonvention (UN-BRK).

@incollection{jablotschkin_zinsmeister_2023,
title = {LeiKo. Ein Vergleichskorpus f{\"u}r Leichte Sprache und Einfache Sprache},
author = {Sarah Jablotschkin and Heike Zinsmeister},
url = {https://www.ids-mannheim.de/fileadmin/aktuell/Jahrestagungen/2022/Methodenmesse/5_Jablotschkin_Zinsmeister_LeiKo.pdf},
year = {2023},
date = {2023},
booktitle = {Neue Entwicklungen in der Korpuslandschaft der Germanistik. Beitr{\"a}ge zur IDS-Methodenmesse 2022},
publisher = {Kupietz, Mark und Thomas Schmidt},
address = {T{\"u}bingen: Narr},
abstract = {Mit dem Konzept “Easy-to-read” werden Teilsysteme nat{\"u}rlicher Sprachen bezeichnet, welche durch eine systematische Reduktion auf den Ebenen Lexik und Syntax entstehen und den Zugang zu geschriebenen Informationen f{\"u}r Erwachsene mit geringen Lesekompetenzen gew{\"a}hrleisten. Im Deutschen gibt es “Leichte Sprache”, welche sich nach spezifischen linguistischen und typografischen Regeln richtet, und die weniger restringierte “einfache Sprache”. Beide Varianten erhalten im akademischen sowie nicht-akademischen Diskurs vermehrt Aufmerksamkeit – nicht zuletzt dank der im Jahr 2009 in Deutschland ratifizierten UN-Behindertenrechtskonvention (UN-BRK).},
pubstate = {published},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   T1

Jablotschkin, Sarah; Benz, Nele; Zinsmeister, Heike

Evaluation of neural coreference annotation of simplified German Conference

Posterpräsentation auf der Computational Linguistics Poster Session im Rahmen der 45. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft (DGfS) in Köln, 2023.

This poster presents our evaluation of a neural coreference resolver (Schröder et al. 2021) on simplified German texts as well as the results of an annotation study that we conducted in order to analyse error sources.

The underlying corpus can be found on Zenodo: https://doi.org/10.5281/zenodo.3626763

@conference{jablotschkin_sarah_2023_12252,
title = {Evaluation of neural coreference annotation of simplified German},
author = {Sarah Jablotschkin and Nele Benz and Heike Zinsmeister},
url = {https://doi.org/10.25592/uhhfdm.12252},
doi = {https://doi.org/10.25592/uhhfdm.12252},
year = {2023},
date = {2023},
booktitle = {Posterpr{\"a}sentation auf der Computational Linguistics Poster Session im Rahmen der 45. Jahrestagung der Deutschen Gesellschaft f{\"u}r Sprachwissenschaft (DGfS) in K{\"o}ln},
abstract = {This poster presents our evaluation of a neural coreference resolver (Schr{\"o}der et al. 2021) on simplified German texts as well as the results of an annotation study that we conducted in order to analyse error sources. The underlying corpus can be found on Zenodo: https://doi.org/10.5281/zenodo.3626763},
pubstate = {published},
type = {conference}
}

Copy BibTeX to Clipboard

Project:   T1

Dyer, Andrew

Revisiting dependency length and intervener complexity minimisation on a parallel corpus in 35 languages Inproceedings

Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Association for Computational Linguistics, pp. 110-119, Dubrovnik, Croatia, 2023.

In this replication study of previous research into dependency length minimisation (DLM), we pilot a new parallel multilingual parsed corpus to examine whether previous findings are upheld when controlling for variation in domain and sentence content between languages. We follow the approach of previous research in comparing the dependency lengths of observed sentences in a multilingual corpus to a variety of baselines: permutations of the sentences, either random or according to some fixed schema. We go on to compare DLM with intervener complexity measure (ICM), an alternative measure of syntactic complexity. Our findings uphold both dependency length and intervener complexity minimisation in all languages under investigation. We also find a markedly lesser extent of dependency length minimisation in verbfinal languages, and the same for intervener complexity measure. We conclude that dependency length and intervener complexity minimisation as universals are upheld when controlling for domain and content variation, but that further research is needed into the asymmetry between verb-final and other languages in this regard.

@inproceedings{dyer-2023-revisiting,
title = {Revisiting dependency length and intervener complexity minimisation on a parallel corpus in 35 languages},
author = {Andrew Dyer},
url = {https://aclanthology.org/2023.sigtyp-1.11/},
year = {2023},
date = {2023},
booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
pages = {110-119},
publisher = {Association for Computational Linguistics},
address = {Dubrovnik, Croatia},
abstract = {

In this replication study of previous research into dependency length minimisation (DLM), we pilot a new parallel multilingual parsed corpus to examine whether previous findings are upheld when controlling for variation in domain and sentence content between languages. We follow the approach of previous research in comparing the dependency lengths of observed sentences in a multilingual corpus to a variety of baselines: permutations of the sentences, either random or according to some fixed schema. We go on to compare DLM with intervener complexity measure (ICM), an alternative measure of syntactic complexity. Our findings uphold both dependency length and intervener complexity minimisation in all languages under investigation. We also find a markedly lesser extent of dependency length minimisation in verbfinal languages, and the same for intervener complexity measure. We conclude that dependency length and intervener complexity minimisation as universals are upheld when controlling for domain and content variation, but that further research is needed into the asymmetry between verb-final and other languages in this regard.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C7

Jachmann, Torsten; Drenhaus, Heiner; Staudte, Maria; Crocker, Matthew W.

When a look is enough: Neurophysiological correlates of referential speaker gaze in situated comprehension Journal Article

Cognition, 236, pp. 105449, 2023, ISSN 0010-0277.

Behavioral studies have shown that speaker gaze to objects in a co-present scene can influence listeners’ expectations about how the utterance will unfold. These findings have recently been supported by ERP studies that linked the underlying mechanisms of the integration of speaker gaze with an utterance meaning representation to multiple ERP components. This leads to the question, however, as to whether speaker gaze should be considered part of the communicative signal itself, such that the referential information conveyed by gaze can help listeners not only form expectations but also to confirm referential expectations induced by the prior linguistic context. In the current study, we investigated this question by conducting an ERP experiment (N=24, Age:[19,31]), in which referential expectations were established by linguistic context together with several depicted objects in the scene. Those expectations then could be confirmed by subsequent speaker gaze that preceded the referential expression. Participants were presented with a centrally positioned face performing gaze actions aligned to utterances comparing two out of three displayed objects, with the task to judge whether the sentence was true given the provided scene. We manipulated the gaze cue to be either Present (toward the subsequently named object) or Absent preceding contextually Expected or Unexpected referring nouns. The results provided strong evidence for gaze as being treated as an integral part of the communicative signal: While in the absence of gaze, effects of phonological verification (PMN), word meaning retrieval (N400) and sentence meaning integration/evaluation (P600) were found on the unexpected noun, in the presence of gaze effects of retrieval (N400) and integration/evaluation (P300) were solely found in response to the pre-referent gaze cue when it was directed toward the unexpected referent with attenuated effects on the following referring noun.

@article{Jachmannetal-23,
title = {When a look is enough: Neurophysiological correlates of referential speaker gaze in situated comprehension},
author = {Torsten Jachmann and Heiner Drenhaus and Maria Staudte and Matthew W. Crocker},
url = {https://www.sciencedirect.com/science/article/pii/S0010027723000835?via%3Dihub},
doi = {https://doi.org/10.1016/j.cognition.2023.105449},
year = {2023},
date = {2023},
journal = {Cognition},
pages = {105449},
volume = {236},
abstract = {Behavioral studies have shown that speaker gaze to objects in a co-present scene can influence listeners’ expectations about how the utterance will unfold. These findings have recently been supported by ERP studies that linked the underlying mechanisms of the integration of speaker gaze with an utterance meaning representation to multiple ERP components. This leads to the question, however, as to whether speaker gaze should be considered part of the communicative signal itself, such that the referential information conveyed by gaze can help listeners not only form expectations but also to confirm referential expectations induced by the prior linguistic context. In the current study, we investigated this question by conducting an ERP experiment (N=24, Age:[19,31]), in which referential expectations were established by linguistic context together with several depicted objects in the scene. Those expectations then could be confirmed by subsequent speaker gaze that preceded the referential expression. Participants were presented with a centrally positioned face performing gaze actions aligned to utterances comparing two out of three displayed objects, with the task to judge whether the sentence was true given the provided scene. We manipulated the gaze cue to be either Present (toward the subsequently named object) or Absent preceding contextually Expected or Unexpected referring nouns. The results provided strong evidence for gaze as being treated as an integral part of the communicative signal: While in the absence of gaze, effects of phonological verification (PMN), word meaning retrieval (N400) and sentence meaning integration/evaluation (P600) were found on the unexpected noun, in the presence of gaze effects of retrieval (N400) and integration/evaluation (P300) were solely found in response to the pre-referent gaze cue when it was directed toward the unexpected referent with attenuated effects on the following referring noun.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C3

Pyatkin, Valentina; Yung, Frances Pik Yu; Scholman, Merel; Tsarfaty, Reut; Dagan, Ido ; Demberg, Vera

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases introduced by Task Design Journal Article

Transactions of the Association for Computational Linguistics (TACL) , 2023.

Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations‘ ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.

@article{Pyatkinetal.,
title = {Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases introduced by Task Design},
author = {Valentina Pyatkin and Frances Pik Yu Yung and Merel Scholman and Reut Tsarfaty and Ido Dagan and Vera Demberg},
url = {https://arxiv.org/abs/2304.00815},
year = {2023},
date = {2023},
journal = {Transactions of the Association for Computational Linguistics (TACL)},
abstract = {Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Successfully