Publications

Ibrahim, Omnia; Yuen, Ivan; Andreeva, Bistra; Möbius, Bernd

The interplay between syllable-based predictability and voicing during closure in intersonorant German stops Inproceedings

Conference: Phonetics and Phonology in Europe 2023 (PaPE 2023), Nijmegen, the Netherlands, 2023.
Contextual predictability has pervasive effects on the acoustic realization of speech. Generally, duration is shortened in more predictable contexts and conversely lengthened in less predictable contexts. There are several measures to quantify predictability in a message. One of them is surprisal, which is calculated as S(Uniti) = -log2 P (Uniti|Context). In a recent work, Ibrahim et al. have found that the effect of syllable-based surprisal on the temporal dimension(s) of a syllable selectively extends to the segmental level, for example, consonant voicing in German. Closure duration was uniformly longer for both voiceless and voiced consonants, but voice onset time was not. The voice onset time pattern might be related to German being typically considered an ‚aspirating‘ language, using [+spread glottis] for voiceless consonants and [-spread glottis] for their voiced counterparts. However, voicing has also been reported in an intervocalic context for both voiceless and voiced consonants to varying extents. To further test whether the previously reported surprisal-based effect on voice onset time is driven by the phonological feature [spread glottis], the current study re-examined the downstream effect of syllable-based predictability on segmental voicing in German stops by measuring the degree of residual (phonetic) voicing during stop closure in an inter-sonorant context. Method: Data were based on a subset of stimuli (speech produced in a quiet acoustic condition) from Ibrahim et al. 38 German speakers recorded 60 sentences. Each sentence contained a target stressed CV syllable in a polysyllabic word. Each target syllable began with one of the stops /p, k, b, d/, combined with one of the vowels /a:, e:, i:, o:, u:/. The analyzed data contained voiceless vs. voiced initial stops in a low or high surprisal syllable. Closure duration (CD) and voicing during closure (VDC) were extracted using in-house Python and Praat scripts. A ratio measure VDC/CD was used to factor out any potential covariation between VDC and CD. Linear mixed-effects modeling was used to evaluate the effect(s) of surprisal and target stop voicing status on VDC/CD ratio using the lmer package in R. The final model was: VDC/CD ratio ∼ Surprisal + Target stop voicing status + (1 | Speaker) + (1 | Syllable ) + (1 | PrevManner ) + (1 | Sentence). Results: In an inter-sonorant context, we found a smaller VDC/CD ratio in voiceless stops than in voiced ones (p=2.04e-08***). As expected, residual voicing is shorter during a voiceless closure than during a voiced closure. This is consistent with the idea of preserving a phonological voicing distinction, as well as the physiological constraint of sustaining voicing for a long period during the closure of a voiceless stop. Moreover, the results yielded a significant effect of surprisal on VDC/CD ratio (p=.017*), with no interaction between the two factors (voicing and surprisal). The VDC/CD ratio is larger in a low than in a high surprisal syllable, irrespective of the voicing status of the target stops. That is, the syllable-based surprisal effect percolated down to German voicing, and the effect is uniform for a voiceless and voiced stop, when residual voicing was measured. Such a uniform effect on residual voicing is consistent with the previous result on closure duration. These findings reveal that the syllable-based surprisal effect can spread downstream to the segmental level and the effect is uniform for acoustic cues that are not directly tied to a phonological feature in German voicing (i.e. [spread glottis]).

@inproceedings{inproceedings,
title = {The interplay between syllable-based predictability and voicing during closure in intersonorant German stops},
author = {Omnia Ibrahim and Ivan Yuen and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.researchgate.net/publication/371138687_The_interplay_between_syllable-based_predictability_and_voicing_during_closure_in_intersonorant_German_stops},
year = {2023},
date = {2023},
booktitle = {Conference: Phonetics and Phonology in Europe 2023 (PaPE 2023)},
address = {Nijmegen, the Netherlands},
abstract = {

Contextual predictability has pervasive effects on the acoustic realization of speech. Generally, duration is shortened in more predictable contexts and conversely lengthened in less predictable contexts. There are several measures to quantify predictability in a message. One of them is surprisal, which is calculated as S(Uniti) = -log2 P (Uniti|Context). In a recent work, Ibrahim et al. have found that the effect of syllable-based surprisal on the temporal dimension(s) of a syllable selectively extends to the segmental level, for example, consonant voicing in German. Closure duration was uniformly longer for both voiceless and voiced consonants, but voice onset time was not. The voice onset time pattern might be related to German being typically considered an 'aspirating' language, using [+spread glottis] for voiceless consonants and [-spread glottis] for their voiced counterparts. However, voicing has also been reported in an intervocalic context for both voiceless and voiced consonants to varying extents. To further test whether the previously reported surprisal-based effect on voice onset time is driven by the phonological feature [spread glottis], the current study re-examined the downstream effect of syllable-based predictability on segmental voicing in German stops by measuring the degree of residual (phonetic) voicing during stop closure in an inter-sonorant context. Method: Data were based on a subset of stimuli (speech produced in a quiet acoustic condition) from Ibrahim et al. 38 German speakers recorded 60 sentences. Each sentence contained a target stressed CV syllable in a polysyllabic word. Each target syllable began with one of the stops /p, k, b, d/, combined with one of the vowels /a:, e:, i:, o:, u:/. The analyzed data contained voiceless vs. voiced initial stops in a low or high surprisal syllable. Closure duration (CD) and voicing during closure (VDC) were extracted using in-house Python and Praat scripts. A ratio measure VDC/CD was used to factor out any potential covariation between VDC and CD. Linear mixed-effects modeling was used to evaluate the effect(s) of surprisal and target stop voicing status on VDC/CD ratio using the lmer package in R. The final model was: VDC/CD ratio ∼ Surprisal + Target stop voicing status + (1 | Speaker) + (1 | Syllable ) + (1 | PrevManner ) + (1 | Sentence). Results: In an inter-sonorant context, we found a smaller VDC/CD ratio in voiceless stops than in voiced ones (p=2.04e-08***). As expected, residual voicing is shorter during a voiceless closure than during a voiced closure. This is consistent with the idea of preserving a phonological voicing distinction, as well as the physiological constraint of sustaining voicing for a long period during the closure of a voiceless stop. Moreover, the results yielded a significant effect of surprisal on VDC/CD ratio (p=.017*), with no interaction between the two factors (voicing and surprisal). The VDC/CD ratio is larger in a low than in a high surprisal syllable, irrespective of the voicing status of the target stops. That is, the syllable-based surprisal effect percolated down to German voicing, and the effect is uniform for a voiceless and voiced stop, when residual voicing was measured. Such a uniform effect on residual voicing is consistent with the previous result on closure duration. These findings reveal that the syllable-based surprisal effect can spread downstream to the segmental level and the effect is uniform for acoustic cues that are not directly tied to a phonological feature in German voicing (i.e. [spread glottis]).
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Jablotschkin, Sarah; Zinsmeister, Heike

LeiKo. Ein Vergleichskorpus für Leichte Sprache und Einfache Sprache Incollection

Neue Entwicklungen in der Korpuslandschaft der Germanistik. Beiträge zur IDS-Methodenmesse 2022, Kupietz, Mark und Thomas Schmidt, Tübingen: Narr, 2023.

Mit dem Konzept “Easy-to-read” werden Teilsysteme natürlicher Sprachen bezeichnet, welche durch eine systematische Reduktion auf den Ebenen Lexik und Syntax entstehen und den Zugang zu geschriebenen Informationen für Erwachsene mit geringen Lesekompetenzen gewährleisten. Im Deutschen gibt es “Leichte Sprache”, welche sich nach spezifischen linguistischen und typografischen Regeln richtet, und die weniger restringierte “einfache Sprache”. Beide Varianten erhalten im akademischen sowie nicht-akademischen Diskurs vermehrt Aufmerksamkeit – nicht zuletzt dank der im Jahr 2009 in Deutschland ratifizierten UN-Behindertenrechtskonvention (UN-BRK).

@incollection{jablotschkin_zinsmeister_2023,
title = {LeiKo. Ein Vergleichskorpus f{\"u}r Leichte Sprache und Einfache Sprache},
author = {Sarah Jablotschkin and Heike Zinsmeister},
url = {https://www.ids-mannheim.de/fileadmin/aktuell/Jahrestagungen/2022/Methodenmesse/5_Jablotschkin_Zinsmeister_LeiKo.pdf},
year = {2023},
date = {2023},
booktitle = {Neue Entwicklungen in der Korpuslandschaft der Germanistik. Beitr{\"a}ge zur IDS-Methodenmesse 2022},
publisher = {Kupietz, Mark und Thomas Schmidt},
address = {T{\"u}bingen: Narr},
abstract = {Mit dem Konzept “Easy-to-read” werden Teilsysteme nat{\"u}rlicher Sprachen bezeichnet, welche durch eine systematische Reduktion auf den Ebenen Lexik und Syntax entstehen und den Zugang zu geschriebenen Informationen f{\"u}r Erwachsene mit geringen Lesekompetenzen gew{\"a}hrleisten. Im Deutschen gibt es “Leichte Sprache”, welche sich nach spezifischen linguistischen und typografischen Regeln richtet, und die weniger restringierte “einfache Sprache”. Beide Varianten erhalten im akademischen sowie nicht-akademischen Diskurs vermehrt Aufmerksamkeit – nicht zuletzt dank der im Jahr 2009 in Deutschland ratifizierten UN-Behindertenrechtskonvention (UN-BRK).},
pubstate = {published},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   T1

Jablotschkin, Sarah; Benz, Nele; Zinsmeister, Heike

Evaluation of neural coreference annotation of simplified German Conference

Posterpräsentation auf der Computational Linguistics Poster Session im Rahmen der 45. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft (DGfS) in Köln, 2023.

This poster presents our evaluation of a neural coreference resolver (Schröder et al. 2021) on simplified German texts as well as the results of an annotation study that we conducted in order to analyse error sources.

The underlying corpus can be found on Zenodo: https://doi.org/10.5281/zenodo.3626763

@conference{jablotschkin_sarah_2023_12252,
title = {Evaluation of neural coreference annotation of simplified German},
author = {Sarah Jablotschkin and Nele Benz and Heike Zinsmeister},
url = {https://doi.org/10.25592/uhhfdm.12252},
doi = {https://doi.org/10.25592/uhhfdm.12252},
year = {2023},
date = {2023},
booktitle = {Posterpr{\"a}sentation auf der Computational Linguistics Poster Session im Rahmen der 45. Jahrestagung der Deutschen Gesellschaft f{\"u}r Sprachwissenschaft (DGfS) in K{\"o}ln},
abstract = {This poster presents our evaluation of a neural coreference resolver (Schr{\"o}der et al. 2021) on simplified German texts as well as the results of an annotation study that we conducted in order to analyse error sources. The underlying corpus can be found on Zenodo: https://doi.org/10.5281/zenodo.3626763},
pubstate = {published},
type = {conference}
}

Copy BibTeX to Clipboard

Project:   T1

Dyer, Andrew

Revisiting dependency length and intervener complexity minimisation on a parallel corpus in 35 languages Inproceedings

Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Association for Computational Linguistics, pp. 110-119, Dubrovnik, Croatia, 2023.

In this replication study of previous research into dependency length minimisation (DLM), we pilot a new parallel multilingual parsed corpus to examine whether previous findings are upheld when controlling for variation in domain and sentence content between languages. We follow the approach of previous research in comparing the dependency lengths of observed sentences in a multilingual corpus to a variety of baselines: permutations of the sentences, either random or according to some fixed schema. We go on to compare DLM with intervener complexity measure (ICM), an alternative measure of syntactic complexity. Our findings uphold both dependency length and intervener complexity minimisation in all languages under investigation. We also find a markedly lesser extent of dependency length minimisation in verbfinal languages, and the same for intervener complexity measure. We conclude that dependency length and intervener complexity minimisation as universals are upheld when controlling for domain and content variation, but that further research is needed into the asymmetry between verb-final and other languages in this regard.

@inproceedings{dyer-2023-revisiting,
title = {Revisiting dependency length and intervener complexity minimisation on a parallel corpus in 35 languages},
author = {Andrew Dyer},
url = {https://aclanthology.org/2023.sigtyp-1.11/},
year = {2023},
date = {2023},
booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
pages = {110-119},
publisher = {Association for Computational Linguistics},
address = {Dubrovnik, Croatia},
abstract = {

In this replication study of previous research into dependency length minimisation (DLM), we pilot a new parallel multilingual parsed corpus to examine whether previous findings are upheld when controlling for variation in domain and sentence content between languages. We follow the approach of previous research in comparing the dependency lengths of observed sentences in a multilingual corpus to a variety of baselines: permutations of the sentences, either random or according to some fixed schema. We go on to compare DLM with intervener complexity measure (ICM), an alternative measure of syntactic complexity. Our findings uphold both dependency length and intervener complexity minimisation in all languages under investigation. We also find a markedly lesser extent of dependency length minimisation in verbfinal languages, and the same for intervener complexity measure. We conclude that dependency length and intervener complexity minimisation as universals are upheld when controlling for domain and content variation, but that further research is needed into the asymmetry between verb-final and other languages in this regard.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C7

Jachmann, Torsten; Drenhaus, Heiner; Staudte, Maria; Crocker, Matthew W.

When a look is enough: Neurophysiological correlates of referential speaker gaze in situated comprehension Journal Article

Cognition, 236, pp. 105449, 2023, ISSN 0010-0277.

Behavioral studies have shown that speaker gaze to objects in a co-present scene can influence listeners’ expectations about how the utterance will unfold. These findings have recently been supported by ERP studies that linked the underlying mechanisms of the integration of speaker gaze with an utterance meaning representation to multiple ERP components. This leads to the question, however, as to whether speaker gaze should be considered part of the communicative signal itself, such that the referential information conveyed by gaze can help listeners not only form expectations but also to confirm referential expectations induced by the prior linguistic context. In the current study, we investigated this question by conducting an ERP experiment (N=24, Age:[19,31]), in which referential expectations were established by linguistic context together with several depicted objects in the scene. Those expectations then could be confirmed by subsequent speaker gaze that preceded the referential expression. Participants were presented with a centrally positioned face performing gaze actions aligned to utterances comparing two out of three displayed objects, with the task to judge whether the sentence was true given the provided scene. We manipulated the gaze cue to be either Present (toward the subsequently named object) or Absent preceding contextually Expected or Unexpected referring nouns. The results provided strong evidence for gaze as being treated as an integral part of the communicative signal: While in the absence of gaze, effects of phonological verification (PMN), word meaning retrieval (N400) and sentence meaning integration/evaluation (P600) were found on the unexpected noun, in the presence of gaze effects of retrieval (N400) and integration/evaluation (P300) were solely found in response to the pre-referent gaze cue when it was directed toward the unexpected referent with attenuated effects on the following referring noun.

@article{Jachmannetal-23,
title = {When a look is enough: Neurophysiological correlates of referential speaker gaze in situated comprehension},
author = {Torsten Jachmann and Heiner Drenhaus and Maria Staudte and Matthew W. Crocker},
url = {https://www.sciencedirect.com/science/article/pii/S0010027723000835?via%3Dihub},
doi = {https://doi.org/10.1016/j.cognition.2023.105449},
year = {2023},
date = {2023},
journal = {Cognition},
pages = {105449},
volume = {236},
abstract = {Behavioral studies have shown that speaker gaze to objects in a co-present scene can influence listeners’ expectations about how the utterance will unfold. These findings have recently been supported by ERP studies that linked the underlying mechanisms of the integration of speaker gaze with an utterance meaning representation to multiple ERP components. This leads to the question, however, as to whether speaker gaze should be considered part of the communicative signal itself, such that the referential information conveyed by gaze can help listeners not only form expectations but also to confirm referential expectations induced by the prior linguistic context. In the current study, we investigated this question by conducting an ERP experiment (N=24, Age:[19,31]), in which referential expectations were established by linguistic context together with several depicted objects in the scene. Those expectations then could be confirmed by subsequent speaker gaze that preceded the referential expression. Participants were presented with a centrally positioned face performing gaze actions aligned to utterances comparing two out of three displayed objects, with the task to judge whether the sentence was true given the provided scene. We manipulated the gaze cue to be either Present (toward the subsequently named object) or Absent preceding contextually Expected or Unexpected referring nouns. The results provided strong evidence for gaze as being treated as an integral part of the communicative signal: While in the absence of gaze, effects of phonological verification (PMN), word meaning retrieval (N400) and sentence meaning integration/evaluation (P600) were found on the unexpected noun, in the presence of gaze effects of retrieval (N400) and integration/evaluation (P300) were solely found in response to the pre-referent gaze cue when it was directed toward the unexpected referent with attenuated effects on the following referring noun.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C3

Pyatkin, Valentina; Yung, Frances Pik Yu; Scholman, Merel; Tsarfaty, Reut; Dagan, Ido ; Demberg, Vera

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases introduced by Task Design Journal Article

Transactions of the Association for Computational Linguistics (TACL) , 2023.

Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations‘ ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.

@article{Pyatkinetal.,
title = {Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases introduced by Task Design},
author = {Valentina Pyatkin and Frances Pik Yu Yung and Merel Scholman and Reut Tsarfaty and Ido Dagan and Vera Demberg},
url = {https://arxiv.org/abs/2304.00815},
year = {2023},
date = {2023},
journal = {Transactions of the Association for Computational Linguistics (TACL)},
abstract = {Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Aurnhammer, Christoph; Delogu, Francesca; Brouwer, Harm; Crocker, Matthew W.

The P600 as a Continuous Index of Integration Effort Journal Article

Psychophysiology, 2023, ISSN 1469-8986.

The integration of word meaning into an unfolding utterance representation is a core operation of incremental language comprehension. There is considerable debate, however, as to which component of the ERP signal—the N400 or the P600—directly reflects integrative processes, with far reaching consequences for the temporal organization and architecture of the comprehension system. Multi-stream models maintaining the N400 as integration crucially rely on the presence of a semantically attractive plausible alternative interpretation to account for the absence of an N400 effect in response to certain semantic anomalies, as reported in previous studies. The single-stream Retrieval–Integration account posits the P600 as an index of integration, further predicting that its amplitude varies continuously with integrative effort. Here, we directly test these competing hypotheses using a context manipulation design in which a semantically attractive alternative is either available or not, and target word plausibility is varied across three levels. An initial self-paced reading study revealed graded reading times for plausibility, suggesting differential integration effort. A subsequent ERP study showed no N400 differences across conditions, and that P600 amplitude is graded for plausibility. These findings are inconsistent with the interpretation of the N400 as an index of integration, as no N400 effect emerged even in the absence of a semantically attractive alternative. By contrast, the link between plausibility, reading times, and P600 amplitude supports the view that the P600 is a continuous index of integration effort. More generally, our results support a single-stream architecture and eschew the need for multi-stream accounts.

@article{aurnhammer2023continuous,
title = {The P600 as a Continuous Index of Integration Effort},
author = {Christoph Aurnhammer and Francesca Delogu and Harm Brouwer and Matthew W. Crocker},
url = {https://onlinelibrary.wiley.com/doi/10.1111/psyp.14302},
doi = {https://doi.org/10.1111/psyp.14302},
year = {2023},
date = {2023},
journal = {Psychophysiology},
abstract = {The integration of word meaning into an unfolding utterance representation is a core operation of incremental language comprehension. There is considerable debate, however, as to which component of the ERP signal—the N400 or the P600—directly reflects integrative processes, with far reaching consequences for the temporal organization and architecture of the comprehension system. Multi-stream models maintaining the N400 as integration crucially rely on the presence of a semantically attractive plausible alternative interpretation to account for the absence of an N400 effect in response to certain semantic anomalies, as reported in previous studies. The single-stream Retrieval–Integration account posits the P600 as an index of integration, further predicting that its amplitude varies continuously with integrative effort. Here, we directly test these competing hypotheses using a context manipulation design in which a semantically attractive alternative is either available or not, and target word plausibility is varied across three levels. An initial self-paced reading study revealed graded reading times for plausibility, suggesting differential integration effort. A subsequent ERP study showed no N400 differences across conditions, and that P600 amplitude is graded for plausibility. These findings are inconsistent with the interpretation of the N400 as an index of integration, as no N400 effect emerged even in the absence of a semantically attractive alternative. By contrast, the link between plausibility, reading times, and P600 amplitude supports the view that the P600 is a continuous index of integration effort. More generally, our results support a single-stream architecture and eschew the need for multi-stream accounts.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Demberg, Vera; Kravtchenko, Ekaterina; Loy, Jia

A systematic evaluation of factors affecting referring expression choice in passage completion tasks Journal Article

Journal of Memory and Language, 130, 104413, 2023.

There is a long-standing controversy around the question of whether referent predictability affects pronominalization: while there are good theoretical reasons for this prediction (e.g., Arnold, 2008), the experimental evidence has been rather mixed. We here report on three highly powered studies that manipulate a range of factors that have differed between previous studies, in order to determine more exactly under which conditions a predictability effect on pronominalization can be found. We use a constrained as well as a free reference task, and manipulate verb type, antecedent ambiguity, length of NP and whether the stimuli are presented within a story context or not. Our results find the story context to be the single important factor that allows to elicit an effect of predictability on pronoun choice, in line with (Rosa and Arnold, 2017; Weatherford and Arnold, 2021). We also propose a parametrization for a rational speech act model, that reconciles the findings between many of the experiments in the literature.

@article{Demberg.etal23,
title = {A systematic evaluation of factors affecting referring expression choice in passage completion tasks},
author = {Vera Demberg and Ekaterina Kravtchenko and Jia Loy},
url = {https://europepmc.org/article/MED/37265576},
year = {2023},
date = {2023},
journal = {Journal of Memory and Language, 130, 104413},
abstract = {There is a long-standing controversy around the question of whether referent predictability affects pronominalization: while there are good theoretical reasons for this prediction (e.g., Arnold, 2008), the experimental evidence has been rather mixed. We here report on three highly powered studies that manipulate a range of factors that have differed between previous studies, in order to determine more exactly under which conditions a predictability effect on pronominalization can be found. We use a constrained as well as a free reference task, and manipulate verb type, antecedent ambiguity, length of NP and whether the stimuli are presented within a story context or not. Our results find the story context to be the single important factor that allows to elicit an effect of predictability on pronoun choice, in line with (Rosa and Arnold, 2017; Weatherford and Arnold, 2021). We also propose a parametrization for a rational speech act model, that reconciles the findings between many of the experiments in the literature.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A3

Ortmann, Katrin

Computational Methods for Investigating Syntactic Change: Automatic Identification of Extraposition in Modern and Historical German PhD Thesis

Bochumer Linguistische Arbeitsberichte (BLA) 25, 2023.

The linguistic analysis of historical German and diachronic syntactic change is traditionally based on small, manually annotated data sets. As a consequence, such studies lack the generalizability and statistical significance that quantitative approaches can offer. In this thesis, computational methods for the automatic syntactic analysis of modern and historical German are developed, which help to overcome the natural limits of manual annotation and enable the creation of large annotated data sets. The main goal of the thesis is to identify extraposition in modern and historical German, with extraposition being defined as the movement of constituents from their base position to the post-field of the sentence (Höhle 2019; Wöllstein 2018). For the automatic recognition of extraposition, two annotation steps are combined: (i) a topological field analysis for the identification of post-fields and (ii) a constituency analysis to recognize candidates for extraposition. The thesis describes experiments on topological field parsing (Ortmann 2020), chunking (Ortmann 2021a), and constituency parsing (Ortmann 2021b). The best results are achieved with statistical models trained on Part-of-Speech tags as input. Contrary to previous studies, all annotation steps are thoroughly evaluated with the newly developed FairEval method for the fine-grained error analysis and fair evaluation of labeled spans (Ortmann 2022). In an example analysis, the created methods are applied to large collections of modern and historical text to explore different factors for the extraposition of relative clauses, demonstrating the practical value of computational approaches for linguistic studies. The developed methods are released as the CLASSIG pipeline (Computational Linguistic Analysis of Syntactic Structures In German) at https://github.com/rubcompling/classig- pipeline. Data sets, models, and evaluation results are provided for download at https://github.com/rubcompling/classig-data and https://doi.org/10.5281/zenodo.7180973.

@phdthesis{ortmann23,
title = {Computational Methods for Investigating Syntactic Change: Automatic Identification of Extraposition in Modern and Historical German},
author = {Katrin Ortmann},
url = {https://www.linguistics.rub.de/forschung/arbeitsberichte/25.pdf},
year = {2023},
date = {2023},
publisher = {Bochumer Linguistische Arbeitsberichte (BLA) 25},
abstract = {The linguistic analysis of historical German and diachronic syntactic change is traditionally based on small, manually annotated data sets. As a consequence, such studies lack the generalizability and statistical significance that quantitative approaches can offer. In this thesis, computational methods for the automatic syntactic analysis of modern and historical German are developed, which help to overcome the natural limits of manual annotation and enable the creation of large annotated data sets. The main goal of the thesis is to identify extraposition in modern and historical German, with extraposition being defined as the movement of constituents from their base position to the post-field of the sentence (H{\"o}hle 2019; W{\"o}llstein 2018). For the automatic recognition of extraposition, two annotation steps are combined: (i) a topological field analysis for the identification of post-fields and (ii) a constituency analysis to recognize candidates for extraposition. The thesis describes experiments on topological field parsing (Ortmann 2020), chunking (Ortmann 2021a), and constituency parsing (Ortmann 2021b). The best results are achieved with statistical models trained on Part-of-Speech tags as input. Contrary to previous studies, all annotation steps are thoroughly evaluated with the newly developed FairEval method for the fine-grained error analysis and fair evaluation of labeled spans (Ortmann 2022). In an example analysis, the created methods are applied to large collections of modern and historical text to explore different factors for the extraposition of relative clauses, demonstrating the practical value of computational approaches for linguistic studies. The developed methods are released as the CLASSIG pipeline (Computational Linguistic Analysis of Syntactic Structures In German) at https://github.com/rubcompling/classig- pipeline. Data sets, models, and evaluation results are provided for download at https://github.com/rubcompling/classig-data and https://doi.org/10.5281/zenodo.7180973.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C6

Hug, Marius; Rau, Felix; Debbeler, Anke; Saleh, Sara; Mollenhauer, Elisabeth; Leinen, Peter; Genêt, Philippe; Trippel, Thorsten; Zinn, Claus; Dogaru, George; Witt, Andreas; Werthmann, Antonina; Draxler, Christoph; Schiel, Florian; Knappen, Jörg; Fischer, Stefan; Krielke, Marie-Pauline; Teich, Elke; Barth, Florian; Calvo Tello, José; Funk, Stefan E.; Göbel, Mathias; Kurzawe, Daniel; Veentjer, Ubbo; Weimer, Lukas; Blätte, Andreas; Lehmberg, Timm

Wohin damit? Storing and reusing my language data: Minute Madness der Datenzentren Miscellaneous

Text+, Zenodo, pp. 1-12, Potsdam, 2023.

Präsentiert beim Workshop „Wohin damit? Storing and reusing my language data“ am 22. Juni 2023 in Mannheim. Die Präsentation wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. gehalten.

@miscellaneous{HugRauDebbeleretal.2023,
title = {Wohin damit? Storing and reusing my language data: Minute Madness der Datenzentren},
author = {Marius Hug and Felix Rau and Anke Debbeler and Sara Saleh and Elisabeth Mollenhauer and Peter Leinen and Philippe Genêt and Thorsten Trippel and Claus Zinn and George Dogaru and Andreas Witt and Antonina Werthmann and Christoph Draxler and Florian Schiel and J{\"o}rg Knappen and Stefan Fischer and Marie-Pauline Krielke and Elke Teich and Florian Barth and Jos{\'e} Calvo Tello and Stefan E. Funk and Mathias G{\"o}bel and Daniel Kurzawe and Ubbo Veentjer and Lukas Weimer and Andreas Bl{\"a}tte and Timm Lehmberg},
url = {https://nbn-resolving.org/urn:nbn:de:bsz:mh39-121108},
doi = {https://doi.org/10.5281/zenodo.8123896},
year = {2023},
date = {2023},
booktitle = {Text+},
pages = {1-12},
publisher = {Zenodo},
address = {Potsdam},
abstract = {Pr{\"a}sentiert beim Workshop "Wohin damit? Storing and reusing my language data" am 22. Juni 2023 in Mannheim. Die Pr{\"a}sentation wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. gehalten.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B1

Fischer, Stefan; Fankhauser, Peter; Teich, Elke

Multi-word expressions and language efficiency: an information-theoretic account Miscellaneous

DGfS Computerlinguistik Postersession, Köln, 2023.

Multi-word expressions (MWEs) are a cornerstone in conventionalized language use and vital for the perceived fluency of a message (Fillmore 1979). From a processing perspective, MWEs seem to have an advantage over arbitrary word sequences due to highly predictable transitions from one word to the next, or they may be perceived as wholes (see e.g. Siyanova-Chanturia et al. 2017). The emergence and use of specific MWEs is typically context-dependent and register-specific. In our work, we investigate MWEs in the scientific domain from a diachronic perspective, asking what is the contribution of MWEs in the development of “scientific language” (here: English)? We assume that over time scientific English develops an optimal code for scientific expert communication characterized by high information density (Halliday 2004; Teich et al. 2021). Using a large diachronic corpus of English scientific texts (Fischer et al. 2020), we work in a data-driven fashion using various established word association measures (e.g. log-likelihood, PMI) to identify and classify MWEs by time periods (e.g. 50-year periods). In a complementary step, we account for the environments of words using selected computational language models (statistical models, embeddings; cf. Fankhauser & Kupietz 2022). On this basis, we then analyse the informational characteristics of MWEs diachronically: The more conventionalized an MWE becomes, the lower its surprisal (higher predictability of the MWE) and the lower the uncertainty about an upcoming word within the MWE (entropy). We expect to see that while specific MWEs come and go over time, during their life cycles they will exhibit surprisal/entropy reduction, thus contributing to language efficiency.

@miscellaneous{Fischer_etal_2024,
title = {Multi-word expressions and language efficiency: an information-theoretic account},
author = {Stefan Fischer and Peter Fankhauser and Elke Teich},
url = {https://dgfs2023.uni-koeln.de/sites/dgfs2023/Booklet/AG_Beschreibungen-und-Abstracts/Description-Abstracts-CL.pdf},
year = {2023},
date = {2023},
booktitle = {DGfS Computerlinguistik Postersession},
address = {K{\"o}ln},
abstract = {Multi-word expressions (MWEs) are a cornerstone in conventionalized language use and vital for the perceived fluency of a message (Fillmore 1979). From a processing perspective, MWEs seem to have an advantage over arbitrary word sequences due to highly predictable transitions from one word to the next, or they may be perceived as wholes (see e.g. Siyanova-Chanturia et al. 2017). The emergence and use of specific MWEs is typically context-dependent and register-specific. In our work, we investigate MWEs in the scientific domain from a diachronic perspective, asking what is the contribution of MWEs in the development of “scientific language” (here: English)? We assume that over time scientific English develops an optimal code for scientific expert communication characterized by high information density (Halliday 2004; Teich et al. 2021). Using a large diachronic corpus of English scientific texts (Fischer et al. 2020), we work in a data-driven fashion using various established word association measures (e.g. log-likelihood, PMI) to identify and classify MWEs by time periods (e.g. 50-year periods). In a complementary step, we account for the environments of words using selected computational language models (statistical models, embeddings; cf. Fankhauser & Kupietz 2022). On this basis, we then analyse the informational characteristics of MWEs diachronically: The more conventionalized an MWE becomes, the lower its surprisal (higher predictability of the MWE) and the lower the uncertainty about an upcoming word within the MWE (entropy). We expect to see that while specific MWEs come and go over time, during their life cycles they will exhibit surprisal/entropy reduction, thus contributing to language efficiency.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B1

Chingacham, Anupama; Demberg, Vera; Klakow, Dietrich

A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification Inproceedings

2022 IEEE Spoken Language Technology Workshop (SLT 2022, 9th - 12th January 2023, Doha, Qatar), 2023.

In noisy environments, speech can be hard to understand for humans. Spoken dialog systems can help to enhance the intelligibility of their output, either by modifying the speech synthesis (e.g., imitate Lombard speech) or by optimizing the language generation. We here focus on the second type of approach, by which an intended message is realized with words that are more intelligible in a specific noisy environment. By conducting a speech perception experiment, we created a dataset of 900 paraphrases in babble noise, perceived by native English speakers with normal hearing. We find that careful selection of paraphrases can improve intelligibility by 33% at SNR -5 dB. Our analysis of the data shows that the intelligibility differences between paraphrases are mainly driven by noise-robust acoustic cues. Furthermore, we propose an intelligibility-aware paraphrase ranking model, which outperforms baseline models with a relative improvement of 31.37% at SNR -5 dB.

@inproceedings{Chingachametal23,
title = {A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification},
author = {Anupama Chingacham and Vera Demberg and Dietrich Klakow},
url = {https://arxiv.org/abs/2210.10252},
doi = {https://doi.org/10.48550/arXiv.2210.10252},
year = {2023},
date = {2023},
booktitle = {2022 IEEE Spoken Language Technology Workshop (SLT 2022, 9th - 12th January 2023, Doha, Qatar)},
abstract = {In noisy environments, speech can be hard to understand for humans. Spoken dialog systems can help to enhance the intelligibility of their output, either by modifying the speech synthesis (e.g., imitate Lombard speech) or by optimizing the language generation. We here focus on the second type of approach, by which an intended message is realized with words that are more intelligible in a specific noisy environment. By conducting a speech perception experiment, we created a dataset of 900 paraphrases in babble noise, perceived by native English speakers with normal hearing. We find that careful selection of paraphrases can improve intelligibility by 33% at SNR -5 dB. Our analysis of the data shows that the intelligibility differences between paraphrases are mainly driven by noise-robust acoustic cues. Furthermore, we propose an intelligibility-aware paraphrase ranking model, which outperforms baseline models with a relative improvement of 31.37% at SNR -5 dB.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Przybyl, Heike; Karakanta, Alina; Menzel, Katrin; Teich, Elke

Exploring linguistic variation in mediated discourse: translation vs. interpreting Book Chapter

Kajzer-Wietrzny, Marta; Bernardini, Silvia; Ferraresi, Adriano; Ivaska, Ilmari;  (Ed.): Mediated discourse at the European Parliament: Empirical investigations, Language Science Press, pp. 191–218, Berlin, 2022.

This paper focuses on the distinctive features of translated and interpreted texts in specific language combinations as forms of mediated discourse at the European Parliament. We aim to contribute to the long line of research on the specific properties of translation/interpreting. Specifically, we are interested in mediation effects (translation vs. interpreting) vs. effects of discourse mode (written vs. spoken). We propose a data-driven, exploratory approach to detecting and evaluating linguistic features as typical of translation/interpreting. Our approach utilizes simple wordbased 𝑛-gram language models combined with the information-theoretic measure of relative entropy, a standard measure of similarity/difference between probability distributions, applied here as a method of corpus comparison. Comparing translation
and interpreting (including the relation to their originals), we confirm the previously observed overall trend of written vs. spoken mode being strongly reflected in the translation and interpreting output. In addition, we detect some new features, such as a tendency towards more general lexemes in the verbal domain in interpreting or features of nominal style in translation.

@inbook{Przybyl2021exploring,
title = {Exploring linguistic variation in mediated discourse: translation vs. interpreting},
author = {Heike Przybyl and Alina Karakanta and Katrin Menzel and Elke Teich},
editor = {Marta Kajzer-Wietrzny and Silvia Bernardini and Adriano Ferraresi and Ilmari Ivaska},
url = {https://langsci-press.org/catalog/book/343},
doi = {https://doi.org/10.5281/zenodo.6977050},
year = {2022},
date = {2022},
booktitle = {Mediated discourse at the European Parliament: Empirical investigations},
pages = {191–218},
publisher = {Language Science Press},
address = {Berlin},
abstract = {This paper focuses on the distinctive features of translated and interpreted texts in specific language combinations as forms of mediated discourse at the European Parliament. We aim to contribute to the long line of research on the specific properties of translation/interpreting. Specifically, we are interested in mediation effects (translation vs. interpreting) vs. effects of discourse mode (written vs. spoken). We propose a data-driven, exploratory approach to detecting and evaluating linguistic features as typical of translation/interpreting. Our approach utilizes simple wordbased 𝑛-gram language models combined with the information-theoretic measure of relative entropy, a standard measure of similarity/difference between probability distributions, applied here as a method of corpus comparison. Comparing translation and interpreting (including the relation to their originals), we confirm the previously observed overall trend of written vs. spoken mode being strongly reflected in the translation and interpreting output. In addition, we detect some new features, such as a tendency towards more general lexemes in the verbal domain in interpreting or features of nominal style in translation.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B7

Bhandari, Pratik

Interaction of top-down and bottom-up processes in spoken language comprehension PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

It seems pretty easy to listen to and understand someone speaking. However, our day-to-day conversations occur under adverse listening conditions. For example, background noise comes from different sound sources, multiple people talk simul- taneously (e.g., in a café), a poor signal connection distorts the voice of a person talking on the other end of a telephone call, and the list goes on. Despite these adversities, most of the time, we communicate successfully. One of the significant contributors to our ability to understand language in adverse listening conditions is predictive language processing. Humans are not passive consumers of language: we use the information available to us from a context and predict the not-yet-encountered, upcoming linguistic events. We do not wait for a speech signal to unfold completely to decode its meaning. This feature of human language processing is critical in understanding speech in adverse listening conditions. The studies in this thesis are timely in the field when the discussion about the role of prediction in language processing is vibrant and to some extent—heated. Some argue that prediction is a universal phenomenon, not only of language, but of human cognition, in general. The present thesis examined the boundary conditions of predictive language processing. We investigated if linguistic predictions are automatic, or if they are constrained by other factors like top-down attention regulation and bottom-up processing of different speech rates in degraded speech comprehension. In this thesis, we examined how listeners can use context information and form predictions while listening to speech at different levels of degradation. The central theme of the thesis is the investigation of the interactions between top- down semantic predictions and bottom-up auditory processing in adverse listening conditions under the theoretical framework of predictive processing and the noisy channel model of communication. We first introduce these concepts of top-down– bottom-up interactions in adverse listening conditions, then report the experiments that empirically investigated different aspects of degraded speech comprehension and the top-down – bottom-up interactions. Our findings showed that to understand a speaker’s utterance in a noisy channel (e.g., due to the degradation of speech signal), a listener takes into account the noise in the signal as well as the context information to form lexical-semantic predictions. Studies have shown that lexical-semantic predictions facilitate language com- prehension. We investigated if such a facilitatory effect of linguistic predictions is observed at all levels of speech degradation. We also addressed the debate on the nature of predictability effect (graded vs all-or-nothing). The studies in this thesis concluded that comprehension of degraded speech is predictive in nature: language processing in a noisy channel is probabilistic and rational. Listeners weigh top-down predictive (lexical-semantic cues) and bottom- up auditory (acoustic-phonetic cues) processes. When the speech degradation is not severe, they can rely on the bottom-up input of an upcoming word (i.e., what they actually heard), regardless of the context information available to them. When the speech is moderately degraded but intelligible enough, they generate predictions about the upcoming word from the context information. In addition, the weighing of lexical-semantic and acoustic-phonetic cues is also modulated by attention regulation and speech rate. Taken together, this thesis contributes to a better understanding of the dynamic interaction between top-down and bottom-up processes in speech comprehension.


Es scheint ziemlich einfach zu sein, jemandem beim Sprechen zuzuhören und ihn zu verstehen. Unsere täglichen Gespräche finden jedoch unter ungünstigen Bedingungen statt. Zum Beispiel kommen Hintergrundgeräusche von verschiedenen Schallquellen, mehrere Personen sprechen gleichzeitig (z. B. in einem Café), eine schlechte Signalverbindung verzerrt die Stimme des Gesprächspartners am anderen Ende des Telefons, und die Liste geht weiter. Trotz dieser Widrigkeiten kommunizieren wir in den meisten Fällen erfolgreich. Einer der wichtigsten Faktoren, der dazu beiträgt, dass wir Sprache auch unter ungünstigen Bedingungen verstehen können, ist die predictive language processing. In dieser Arbeit haben wir untersucht, wie Hörer Kontextinformationen nutzen und Vorhersagen treffen können, während sie Sprache mit unterschiedliche starken Signalstörungen hören. Das zentrale Thema der Arbeit ist die Untersuchung der Wechselwirkung zwischen semantischen Vorhersagen basierend auf dem vorigen Kontext und auditiver Verarbeitung des Sprachsignals unter ungünstigen Hörbedingungen im theoretischen Rahmen der “predictive processing” und des “noisy channel model of communication”. Es gibt zahlreiche Methoden, mit denen Kontextinformationen und Sprachverschlechterung (ungünstige Hörbedingungen) in einem Versuchsaufbau erzeugt und manipuliert werden können. Wir haben die Kontextinformationen manipuliert, indem wir kurze Subjekt-Verb-Objekt-Sätze auf Deutsch erstellt haben, in denen das Verb eines Satzes das Substantiv vorhersagt. Zusätzlich zur Kontextinformation untersuchten wir den Effekt der strategischen Aufmerksamkeitszuweisung als Top-down-Prozess. Die Sprache wurde durch “noisevocoding” der reinen Sprache degradiert. Zusätzlich zur noise-vocoding untersuchten wir die Wirkung von Änderungen der Sprechgeschwindigkeit als weiteren Faktor, der die Bottom-up-Prozesse beeinflusst. In Kapitel 5 untersuchten wir zunächst die Rolle der Top-down- Aufmerksamkeitsregulation für die Fähigkeit der Hörer, die Kontextinformationen zu nutzen. Unsere Forschungsfrage lautete, ob die Aufmerksamkeit auf den Kontext unabhängig von den Hörer, unbedingt erforderlich ist, um Vorhersagen über ein kommendes Wort in einem Satz auf verschiedenen Degradationsstufen zu treffen. Wir konnten zeigen, dass die semantische Vorhersagbarkeit eines Satzes nur dann zu einem besseren Sprachverständnis beiträgt, wenn die Hörer auf die Kontextinformationen achten. Darüber hinaus war eine solche Erleichterung bei schweren Degradationsstufen nicht vorhanden. Wir haben diese Ergebnisse in Kapitel 6 weiter untersucht und festgestellt, dass der erleichternde Effekt der Vorhersagbarkeit nur bei einem moderaten Grad der Sprachverschlechterung zu beobachten ist. Wir untersuchten die Art des Vorhersageeffekts und fanden heraus, dass er abgestuft ist und nicht alles oder nichts beinhaltet. Mit anderen Worten, wir fanden heraus, dass die Vorhersage der Hörer über ein kommendes Wort nicht nur auf einen stark einschränkenden Satzkontext beschränkt ist; stattdessen sagen die Hörer das kommende Wort in Abhängigkeit von der Wahrscheinlichkeit seines Auftretens in einem bestimmten Kontext voraus (z. B. “cloze probability”). Schließlich untersuchten wir in Kapitel 7, ob eine Änderung der Sprechgeschwindigkeit – die die Verarbeitungszeit verändert – die in Kapitel 6 beobachtete kontextuelle Erleichterung verstärkt oder verringert. Die Ergebnisse zeigten, dass das Hörverstehen der mäßig verschlechterten Sprache bei normaler Sprechgeschwindigkeit am besten ist: Eine Verlangsamung verstärkte die kontextuelle Erleichterung nicht. Bei Erhöhung der Sprechgeschwindigkeit wurde jedoch die Verarbeitung von Sätzen mit geringer, aber nicht mit hoher Vorhersagbarkeit beeinträchtigt. In der begrenzten Verarbeitungszeit war die Aktivierung von Zielwörtern in einem weniger einschränkenden Satzkontext schwieriger als in einem stark einschränkenden Satzkontext. All diese Experimente, die mit deutschen Stimuli an jungen Erwachsenen mit deutscher Muttersprache durchgeführt wurden, haben gezeigt, dass das Verstehen verschlechterter Sprache prädiktiver Natur ist: Die Sprachverarbeitung in einem verrauschten Kanal ist probabilistisch und rational. Die Hörer wägen Top-Down- Prozesse (lexikalisch-semantische Hinweise) und Bottom-Up-Hörprozesse (akustischphonetische Hinweise) ab. Wenn die Sprachverschlechterung nicht schwerwiegend ist, können sie sich auf den Bottom-up-Input eines kommenden Wortes verlassen (d. h. auf das, was sie tatsächlich gehört haben), unabhängig von den ihnen zur Verfügung stehenden Kontextinformationen. Wenn die Sprache mäßig verschlechtert, aber verständlich genug ist, erstellen sie aus den Kontextinformationen Vorhersagen über das kommende Wort. Darüber hinaus wird die Gewichtung von lexikalisch-semantischen und akustisch-phonetischen Hinweisen auch durch die Aufmerksamkeitssteuerung und die Sprechgeschwindigkeit moduliert. Insgesamt trägt diese Arbeit zu einem differenzierten Verständnis der dynamischen Interaktion zwischen Top-down- und Bottom-up-Prozessen beim Sprachverstehen bei.

@phdthesis{Bhandari_Diss_2022,
title = {Interaction of top-down and bottom-up processes in spoken language comprehension},
author = {Pratik Bhandari},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/34800},
doi = {https://doi.org/10.22028/D291-38594},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {It seems pretty easy to listen to and understand someone speaking. However, our day-to-day conversations occur under adverse listening conditions. For example, background noise comes from different sound sources, multiple people talk simul- taneously (e.g., in a caf{\'e}), a poor signal connection distorts the voice of a person talking on the other end of a telephone call, and the list goes on. Despite these adversities, most of the time, we communicate successfully. One of the significant contributors to our ability to understand language in adverse listening conditions is predictive language processing. Humans are not passive consumers of language: we use the information available to us from a context and predict the not-yet-encountered, upcoming linguistic events. We do not wait for a speech signal to unfold completely to decode its meaning. This feature of human language processing is critical in understanding speech in adverse listening conditions. The studies in this thesis are timely in the field when the discussion about the role of prediction in language processing is vibrant and to some extent—heated. Some argue that prediction is a universal phenomenon, not only of language, but of human cognition, in general. The present thesis examined the boundary conditions of predictive language processing. We investigated if linguistic predictions are automatic, or if they are constrained by other factors like top-down attention regulation and bottom-up processing of different speech rates in degraded speech comprehension. In this thesis, we examined how listeners can use context information and form predictions while listening to speech at different levels of degradation. The central theme of the thesis is the investigation of the interactions between top- down semantic predictions and bottom-up auditory processing in adverse listening conditions under the theoretical framework of predictive processing and the noisy channel model of communication. We first introduce these concepts of top-down– bottom-up interactions in adverse listening conditions, then report the experiments that empirically investigated different aspects of degraded speech comprehension and the top-down – bottom-up interactions. Our findings showed that to understand a speaker’s utterance in a noisy channel (e.g., due to the degradation of speech signal), a listener takes into account the noise in the signal as well as the context information to form lexical-semantic predictions. Studies have shown that lexical-semantic predictions facilitate language com- prehension. We investigated if such a facilitatory effect of linguistic predictions is observed at all levels of speech degradation. We also addressed the debate on the nature of predictability effect (graded vs all-or-nothing). The studies in this thesis concluded that comprehension of degraded speech is predictive in nature: language processing in a noisy channel is probabilistic and rational. Listeners weigh top-down predictive (lexical-semantic cues) and bottom- up auditory (acoustic-phonetic cues) processes. When the speech degradation is not severe, they can rely on the bottom-up input of an upcoming word (i.e., what they actually heard), regardless of the context information available to them. When the speech is moderately degraded but intelligible enough, they generate predictions about the upcoming word from the context information. In addition, the weighing of lexical-semantic and acoustic-phonetic cues is also modulated by attention regulation and speech rate. Taken together, this thesis contributes to a better understanding of the dynamic interaction between top-down and bottom-up processes in speech comprehension.


Es scheint ziemlich einfach zu sein, jemandem beim Sprechen zuzuh{\"o}ren und ihn zu verstehen. Unsere t{\"a}glichen Gespr{\"a}che finden jedoch unter ung{\"u}nstigen Bedingungen statt. Zum Beispiel kommen Hintergrundger{\"a}usche von verschiedenen Schallquellen, mehrere Personen sprechen gleichzeitig (z. B. in einem Caf{\'e}), eine schlechte Signalverbindung verzerrt die Stimme des Gespr{\"a}chspartners am anderen Ende des Telefons, und die Liste geht weiter. Trotz dieser Widrigkeiten kommunizieren wir in den meisten F{\"a}llen erfolgreich. Einer der wichtigsten Faktoren, der dazu beitr{\"a}gt, dass wir Sprache auch unter ung{\"u}nstigen Bedingungen verstehen k{\"o}nnen, ist die predictive language processing. In dieser Arbeit haben wir untersucht, wie H{\"o}rer Kontextinformationen nutzen und Vorhersagen treffen k{\"o}nnen, w{\"a}hrend sie Sprache mit unterschiedliche starken Signalst{\"o}rungen h{\"o}ren. Das zentrale Thema der Arbeit ist die Untersuchung der Wechselwirkung zwischen semantischen Vorhersagen basierend auf dem vorigen Kontext und auditiver Verarbeitung des Sprachsignals unter ung{\"u}nstigen H{\"o}rbedingungen im theoretischen Rahmen der “predictive processing” und des “noisy channel model of communication”. Es gibt zahlreiche Methoden, mit denen Kontextinformationen und Sprachverschlechterung (ung{\"u}nstige H{\"o}rbedingungen) in einem Versuchsaufbau erzeugt und manipuliert werden k{\"o}nnen. Wir haben die Kontextinformationen manipuliert, indem wir kurze Subjekt-Verb-Objekt-S{\"a}tze auf Deutsch erstellt haben, in denen das Verb eines Satzes das Substantiv vorhersagt. Zus{\"a}tzlich zur Kontextinformation untersuchten wir den Effekt der strategischen Aufmerksamkeitszuweisung als Top-down-Prozess. Die Sprache wurde durch “noisevocoding” der reinen Sprache degradiert. Zus{\"a}tzlich zur noise-vocoding untersuchten wir die Wirkung von {\"A}nderungen der Sprechgeschwindigkeit als weiteren Faktor, der die Bottom-up-Prozesse beeinflusst. In Kapitel 5 untersuchten wir zun{\"a}chst die Rolle der Top-down- Aufmerksamkeitsregulation f{\"u}r die F{\"a}higkeit der H{\"o}rer, die Kontextinformationen zu nutzen. Unsere Forschungsfrage lautete, ob die Aufmerksamkeit auf den Kontext unabh{\"a}ngig von den H{\"o}rer, unbedingt erforderlich ist, um Vorhersagen {\"u}ber ein kommendes Wort in einem Satz auf verschiedenen Degradationsstufen zu treffen. Wir konnten zeigen, dass die semantische Vorhersagbarkeit eines Satzes nur dann zu einem besseren Sprachverst{\"a}ndnis beitr{\"a}gt, wenn die H{\"o}rer auf die Kontextinformationen achten. Dar{\"u}ber hinaus war eine solche Erleichterung bei schweren Degradationsstufen nicht vorhanden. Wir haben diese Ergebnisse in Kapitel 6 weiter untersucht und festgestellt, dass der erleichternde Effekt der Vorhersagbarkeit nur bei einem moderaten Grad der Sprachverschlechterung zu beobachten ist. Wir untersuchten die Art des Vorhersageeffekts und fanden heraus, dass er abgestuft ist und nicht alles oder nichts beinhaltet. Mit anderen Worten, wir fanden heraus, dass die Vorhersage der H{\"o}rer {\"u}ber ein kommendes Wort nicht nur auf einen stark einschr{\"a}nkenden Satzkontext beschr{\"a}nkt ist; stattdessen sagen die H{\"o}rer das kommende Wort in Abh{\"a}ngigkeit von der Wahrscheinlichkeit seines Auftretens in einem bestimmten Kontext voraus (z. B. “cloze probability”). Schlie{\ss}lich untersuchten wir in Kapitel 7, ob eine {\"A}nderung der Sprechgeschwindigkeit - die die Verarbeitungszeit ver{\"a}ndert - die in Kapitel 6 beobachtete kontextuelle Erleichterung verst{\"a}rkt oder verringert. Die Ergebnisse zeigten, dass das H{\"o}rverstehen der m{\"a}{\ss}ig verschlechterten Sprache bei normaler Sprechgeschwindigkeit am besten ist: Eine Verlangsamung verst{\"a}rkte die kontextuelle Erleichterung nicht. Bei Erh{\"o}hung der Sprechgeschwindigkeit wurde jedoch die Verarbeitung von S{\"a}tzen mit geringer, aber nicht mit hoher Vorhersagbarkeit beeintr{\"a}chtigt. In der begrenzten Verarbeitungszeit war die Aktivierung von Zielw{\"o}rtern in einem weniger einschr{\"a}nkenden Satzkontext schwieriger als in einem stark einschr{\"a}nkenden Satzkontext. All diese Experimente, die mit deutschen Stimuli an jungen Erwachsenen mit deutscher Muttersprache durchgef{\"u}hrt wurden, haben gezeigt, dass das Verstehen verschlechterter Sprache pr{\"a}diktiver Natur ist: Die Sprachverarbeitung in einem verrauschten Kanal ist probabilistisch und rational. Die H{\"o}rer w{\"a}gen Top-Down- Prozesse (lexikalisch-semantische Hinweise) und Bottom-Up-H{\"o}rprozesse (akustischphonetische Hinweise) ab. Wenn die Sprachverschlechterung nicht schwerwiegend ist, k{\"o}nnen sie sich auf den Bottom-up-Input eines kommenden Wortes verlassen (d. h. auf das, was sie tats{\"a}chlich geh{\"o}rt haben), unabh{\"a}ngig von den ihnen zur Verf{\"u}gung stehenden Kontextinformationen. Wenn die Sprache m{\"a}{\ss}ig verschlechtert, aber verst{\"a}ndlich genug ist, erstellen sie aus den Kontextinformationen Vorhersagen {\"u}ber das kommende Wort. Dar{\"u}ber hinaus wird die Gewichtung von lexikalisch-semantischen und akustisch-phonetischen Hinweisen auch durch die Aufmerksamkeitssteuerung und die Sprechgeschwindigkeit moduliert. Insgesamt tr{\"a}gt diese Arbeit zu einem differenzierten Verst{\"a}ndnis der dynamischen Interaktion zwischen Top-down- und Bottom-up-Prozessen beim Sprachverstehen bei.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   A4

Hedderich, Michael

Weak supervision and label noise handling for Natural language processing in low-resource scenarios PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

The lack of large amounts of labeled data is a significant factor blocking many low-resource languages and domains from catching up with recent advancements in natural language processing. To reduce this dependency on labeled instances, weak supervision (semi-)automatically annotates unlabeled data. These labels can be obtained more quickly and cheaply than manual, gold-standard annotations. They also, however, contain more errors. Handling these noisy labels is often required to leverage the weakly supervised data successfully. In this dissertation, we study the whole weak supervision pipeline with a focus on the task of named entity recognition. We develop a tool for automatic annotation, and we propose an approach to model label noise when a small amount of clean data is available. We study the factors that influence the noise model’s quality from a theoretic perspective, and we validate this approach empirically on several different tasks and languages. An important aspect is the aim for a realistic evaluation. We perform our analysis, among others, on several African low-resource languages. We show the performance benefits that can be achieved using weak supervision and label noise modeling. But we also highlight open issues that the field still has to overcome. For the low-resource settings, we expand the analysis to few-shot learning. For classification errors, we present a novel approach to obtain interpretable insights of where classifiers fail.


Der Mangel an annotierten Daten ist ein wesentlicher Faktor, der viele Sprachen und Domänen mit geringen Ressourcen daran hindert, mit den jüngsten Fortschritten in der digitalen Textverarbeitung Schritt zu halten. Um diese Abhängigkeit von gelabelten Trainingsdaten zu verringern, werden bei Weak Supervision nicht gelabelte Daten (halb-)automatisch annotiert. Diese Annotationen sind schneller und günstiger zu erhalten. Sie enthalten jedoch auch mehr Fehler. Oft ist eine besondere Behandlung dieser Noisy Labels notwendig, um die Daten erfolgreich nutzen zu können. In dieser Dissertation untersuchen wir die gesamte Weak Supervision Pipeline mit einem Schwerpunkt auf den Einsatz für die Erkennung von Entitäten. Wir entwickeln ein Tool zur automatischen Annotation und präsentieren einen neuen Ansatz zur Modellierung von Noisy Labels. Wir untersuchen die Faktoren, die die Qualität dieses Modells aus theoretischer Sicht beeinflussen, und wir validieren den Ansatz empirisch für verschiedene Aufgaben und Sprachen. Ein wichtiger Aspekt dieser Arbeit ist das Ziel einer realistischen Analyse. Die Untersuchung führen wir unter anderem an mehreren afrikanischen Sprachen durch und zeigen die Leistungsvorteile, die durch Weak Supervision und die Modellierung von Label Noise erreicht werden können. Auch erweitern wir die Analyse auf das Lernen mit wenigen Beispielen. In Bezug auf Klassifizierungsfehler, stellen wir zudem einen neuen Ansatz vor, um interpretierbare Erkenntnisse zu gewinnen.

@phdthesis{Hedderich_Diss_2022,
title = {Weak supervision and label noise handling for Natural language processing in low-resource scenarios},
author = {Michael Hedderich},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/35026},
doi = {https://doi.org/10.22028/D291-38691},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {The lack of large amounts of labeled data is a significant factor blocking many low-resource languages and domains from catching up with recent advancements in natural language processing. To reduce this dependency on labeled instances, weak supervision (semi-)automatically annotates unlabeled data. These labels can be obtained more quickly and cheaply than manual, gold-standard annotations. They also, however, contain more errors. Handling these noisy labels is often required to leverage the weakly supervised data successfully. In this dissertation, we study the whole weak supervision pipeline with a focus on the task of named entity recognition. We develop a tool for automatic annotation, and we propose an approach to model label noise when a small amount of clean data is available. We study the factors that influence the noise model's quality from a theoretic perspective, and we validate this approach empirically on several different tasks and languages. An important aspect is the aim for a realistic evaluation. We perform our analysis, among others, on several African low-resource languages. We show the performance benefits that can be achieved using weak supervision and label noise modeling. But we also highlight open issues that the field still has to overcome. For the low-resource settings, we expand the analysis to few-shot learning. For classification errors, we present a novel approach to obtain interpretable insights of where classifiers fail.


Der Mangel an annotierten Daten ist ein wesentlicher Faktor, der viele Sprachen und Dom{\"a}nen mit geringen Ressourcen daran hindert, mit den j{\"u}ngsten Fortschritten in der digitalen Textverarbeitung Schritt zu halten. Um diese Abh{\"a}ngigkeit von gelabelten Trainingsdaten zu verringern, werden bei Weak Supervision nicht gelabelte Daten (halb-)automatisch annotiert. Diese Annotationen sind schneller und g{\"u}nstiger zu erhalten. Sie enthalten jedoch auch mehr Fehler. Oft ist eine besondere Behandlung dieser Noisy Labels notwendig, um die Daten erfolgreich nutzen zu k{\"o}nnen. In dieser Dissertation untersuchen wir die gesamte Weak Supervision Pipeline mit einem Schwerpunkt auf den Einsatz f{\"u}r die Erkennung von Entit{\"a}ten. Wir entwickeln ein Tool zur automatischen Annotation und pr{\"a}sentieren einen neuen Ansatz zur Modellierung von Noisy Labels. Wir untersuchen die Faktoren, die die Qualit{\"a}t dieses Modells aus theoretischer Sicht beeinflussen, und wir validieren den Ansatz empirisch f{\"u}r verschiedene Aufgaben und Sprachen. Ein wichtiger Aspekt dieser Arbeit ist das Ziel einer realistischen Analyse. Die Untersuchung f{\"u}hren wir unter anderem an mehreren afrikanischen Sprachen durch und zeigen die Leistungsvorteile, die durch Weak Supervision und die Modellierung von Label Noise erreicht werden k{\"o}nnen. Auch erweitern wir die Analyse auf das Lernen mit wenigen Beispielen. In Bezug auf Klassifizierungsfehler, stellen wir zudem einen neuen Ansatz vor, um interpretierbare Erkenntnisse zu gewinnen.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   B4

Ibrahim, Omnia

Speaker Adaptations as a Function of Message, Channel and Listener Variability PhD Thesis

University of Zürich, Zürich, Switzerland, 2022.

Speech is a highly dynamic process. Some variability is inherited directly from the language itself, while other variability stems from adapting to the surrounding environment or interlocutor. This Ph.D. thesis consists of seven studies investigating speech adaptation concerning the message, channel, and listener variability. It starts with investigating speakers’ adaptation to the linguistic message. Previous work has shown that duration is shortened in more predictable contexts, and conversely lengthened in less predictable contexts. This pervasive predictability effect is well studied in multiple languages and linguistic levels. However, syllable level predictability has been generally overlooked so far. This thesis aims to őll that gap. It focuses on the effect of information-theoretic factors at both the syllable and segmental levels. Furthermore, it found that the predictability effect is not uniform across all durational cues but is somewhat sensitive to the phonological relevance of a language-specific phonetic cue.
Speakers adapt not only to their message but also to the channel of transfer. For example, it is known that speakers modulate the characteristics of their speech and produce clear speech in response to background noise – syllables in noise have a longer duration, with higher average intensity, larger intensity range, and higher F0. Hence, speakers choose redundant multi-dimensional acoustic modifications to make their voices more salient and detectable in a noisy environment. This Ph.D. thesis provides new insights into speakers’ adaptation to noise and predictability on the acoustic realizations of syllables in German; showing that the speakers’ response to background noise is independent of syllable predictability.
Regarding speaker-to-listener adaptations, this thesis finds that speech variability is not necessarily a function of the interaction’s duration. Instead, speakers constantly position themselves concerning the ongoing social interaction. Indeed, speakers’ cooperation during the discussion would lead to a higher convergence behavior. Moreover, interpersonal power dynamics between interlocutors were found to serve as a predictor for accommodation behavior. This adaptation holds for both human-human interaction and human-robot interaction. In an ecological validity study, speakers changed their voice depending on whether they were addressing a human or a robot. Those findings align with previous studies on robot-directed speech and confirm that this difference also holds when the conversations are more natural and spontaneous.
The results of this thesis provide compelling evidence that speech adaptation is socially motivated and, to some extent, consciously controlled by the speaker. These findings have implications for including environment-based and listener-based formulations in speech production models along with message-based formulations. Furthermore, this thesis aims to advance our understanding of verbal and non-verbal behavior mechanisms for social communication. Finally, it contributes to the broader literature on information-theoretical factors and accommodation effects on speakers’ acoustic realization.

@phdthesis{Ibrahim_Diss_2022,
title = {Speaker Adaptations as a Function of Message, Channel and Listener Variability},
author = {Omnia Ibrahim},
url = {https://www.zora.uzh.ch/id/eprint/233694/},
doi = {https://doi.org/10.5167/uzh-233694},
year = {2022},
date = {2022},
school = {University of Z{\"u}rich},
address = {Z{\"u}rich, Switzerland},
abstract = {Speech is a highly dynamic process. Some variability is inherited directly from the language itself, while other variability stems from adapting to the surrounding environment or interlocutor. This Ph.D. thesis consists of seven studies investigating speech adaptation concerning the message, channel, and listener variability. It starts with investigating speakers’ adaptation to the linguistic message. Previous work has shown that duration is shortened in more predictable contexts, and conversely lengthened in less predictable contexts. This pervasive predictability effect is well studied in multiple languages and linguistic levels. However, syllable level predictability has been generally overlooked so far. This thesis aims to őll that gap. It focuses on the effect of information-theoretic factors at both the syllable and segmental levels. Furthermore, it found that the predictability effect is not uniform across all durational cues but is somewhat sensitive to the phonological relevance of a language-specific phonetic cue. Speakers adapt not only to their message but also to the channel of transfer. For example, it is known that speakers modulate the characteristics of their speech and produce clear speech in response to background noise – syllables in noise have a longer duration, with higher average intensity, larger intensity range, and higher F0. Hence, speakers choose redundant multi-dimensional acoustic modifications to make their voices more salient and detectable in a noisy environment. This Ph.D. thesis provides new insights into speakers’ adaptation to noise and predictability on the acoustic realizations of syllables in German; showing that the speakers’ response to background noise is independent of syllable predictability. Regarding speaker-to-listener adaptations, this thesis finds that speech variability is not necessarily a function of the interaction’s duration. Instead, speakers constantly position themselves concerning the ongoing social interaction. Indeed, speakers’ cooperation during the discussion would lead to a higher convergence behavior. Moreover, interpersonal power dynamics between interlocutors were found to serve as a predictor for accommodation behavior. This adaptation holds for both human-human interaction and human-robot interaction. In an ecological validity study, speakers changed their voice depending on whether they were addressing a human or a robot. Those findings align with previous studies on robot-directed speech and confirm that this difference also holds when the conversations are more natural and spontaneous. The results of this thesis provide compelling evidence that speech adaptation is socially motivated and, to some extent, consciously controlled by the speaker. These findings have implications for including environment-based and listener-based formulations in speech production models along with message-based formulations. Furthermore, this thesis aims to advance our understanding of verbal and non-verbal behavior mechanisms for social communication. Finally, it contributes to the broader literature on information-theoretical factors and accommodation effects on speakers’ acoustic realization.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C1

Kudera, Jacek

Slavic receptive multilingualism: intercomprehension of speech PhD Thesis

Saarland University, Saarbruecken, Germany, 2022.

Intercomprehension refers to a communication practice in which speakers use closely related languages. We know that the degree of mutual intelligibility differs according to the stimulus modality. This work aims to define the linguistic features which contribute to and impede cross-lingual understanding of speech via production and perception studies involving speakers of four Slavic languages. The current study combines the methodological apparatus from acoustic phonetics and information theory to provide evidence for mutual intelligibility on various levels of language processing. It concludes that the degree of mutual understanding does not always correspond to typological divisions of tested languages. The results presented here suggest that intercomprehension is often driven by unit (un)expectedness rather than the phonetic resemblance of a perceived stimulus and its equivalence in the native lexicon of speakers.

@phdthesis{Kudera_Diss_2022,
title = {Slavic receptive multilingualism: intercomprehension of speech},
author = {Jacek Kudera},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/33236},
doi = {https://doi.org/10.22028/D291-36578},
year = {2022},
date = {2022},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {Intercomprehension refers to a communication practice in which speakers use closely related languages. We know that the degree of mutual intelligibility differs according to the stimulus modality. This work aims to define the linguistic features which contribute to and impede cross-lingual understanding of speech via production and perception studies involving speakers of four Slavic languages. The current study combines the methodological apparatus from acoustic phonetics and information theory to provide evidence for mutual intelligibility on various levels of language processing. It concludes that the degree of mutual understanding does not always correspond to typological divisions of tested languages. The results presented here suggest that intercomprehension is often driven by unit (un)expectedness rather than the phonetic resemblance of a perceived stimulus and its equivalence in the native lexicon of speakers.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C4

Talamo, Luigi

Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase Inproceedings

Vylomova, Ekaterina; Ponti, Edoardo; Cotterell, Ryan (Ed.): Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Association for Computational Linguistics, pp. 36-41, Seattle, Washington, 2022.

We describe a methodology to extract with finer accuracy word order patterns from texts automatically annotated with Universal Dependency (UD) trained parsers. We use the methodology to quantify the word order entropy of determiners, quantifiers and numerals in ten Indo-European languages, using UD-parsed texts from a parallel corpus of prosaic texts. Our results suggest that the combinations of different UD annotation layers, such as UD Relations, Universal Parts of Speech and lemma, and the introduction of language-specific lists of closed-category lemmata has the two-fold effect of improving the quality of analysis and unveiling hidden areas of variability in word order patterns.

@inproceedings{Talamo_2022,
title = {Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase},
author = {Luigi Talamo},
editor = {Ekaterina Vylomova and Edoardo Ponti and Ryan Cotterell},
url = {https://aclanthology.org/2022.sigtyp-1.5/},
doi = {https://doi.org/10.18653/v1/2022.sigtyp-1.5},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
pages = {36-41},
publisher = {Association for Computational Linguistics},
address = {Seattle, Washington},
abstract = {We describe a methodology to extract with finer accuracy word order patterns from texts automatically annotated with Universal Dependency (UD) trained parsers. We use the methodology to quantify the word order entropy of determiners, quantifiers and numerals in ten Indo-European languages, using UD-parsed texts from a parallel corpus of prosaic texts. Our results suggest that the combinations of different UD annotation layers, such as UD Relations, Universal Parts of Speech and lemma, and the introduction of language-specific lists of closed-category lemmata has the two-fold effect of improving the quality of analysis and unveiling hidden areas of variability in word order patterns.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C7

Jesujoba , Alabi; Adelani, David; Mosbach, Marius; Klakow, Dietrich

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning Inproceedings

Proceedings of the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics, pp. 4336-4349, Gyeongju, Republic of Korea, 2022.

Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) {—} fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to target language individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50{\%}. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.

@inproceedings{alabi-etal-2022-adapting,
title = {Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning},
author = {Alabi Jesujoba and David Adelani and Marius Mosbach and Dietrich Klakow},
url = {https://aclanthology.org/2022.coling-1.382},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 29th International Conference on Computational Linguistics},
pages = {4336-4349},
publisher = {International Committee on Computational Linguistics},
address = {Gyeongju, Republic of Korea},
abstract = {Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) {---} fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to target language individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50{\%}. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Lapshinova-Koltunski, Ekaterina; Pollkläsener, Christina; Przybyl, Heike

Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora Journal Article

The Prague Bulletin of Mathematical Linguistics, 119, pp. 5-22, 2022, ISSN 0032-6585.

We present a study of discourse connectives in English-German and German-English translation and interpreting where we focus on the phenomena of explicitation and implicitation.
Apart from distributional analysis of translation patterns in parallel data, we also look into surprisal, i.e. an information-theoretic measure of cognitive effort, which helps us to interpret the observed tendencies.

@article{lapshinova-koltunski-pollklaesener-przybyl:2022,
title = {Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora},
author = {Ekaterina Lapshinova-Koltunski and Christina Pollkl{\"a}sener and Heike Przybyl},
url = {https://ufal.mff.cuni.cz/pbml/119/art-lapshinova-koltunski-pollklaesener-przybyl.pdf},
doi = {https://doi.org/10.14712/00326585.020},
year = {2022},
date = {2022},
journal = {The Prague Bulletin of Mathematical Linguistics},
pages = {5-22},
volume = {119},
abstract = {We present a study of discourse connectives in English-German and German-English translation and interpreting where we focus on the phenomena of explicitation and implicitation. Apart from distributional analysis of translation patterns in parallel data, we also look into surprisal, i.e. an information-theoretic measure of cognitive effort, which helps us to interpret the observed tendencies.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B7

Successfully