Publications

Tröger, Johannes; Lindsay, Hali; Mina, Mario; Linz, Nicklas; Klöppel, Stefan; Kray, Jutta; Peter, Jessica

Patients with amnestic MCI Fail to Adapt Executive Control When Repeatedly Tested with Semantic Verbal Fluency Tasks Journal Article

Journal of the International Neuropsychological Society, Cambridge University Press, pp. 1-8, 2021.

Semantic verbal fluency (SVF) tasks require individuals to name items from a specified category within a fixed time. An impaired SVF performance is well documented in patients with amnestic Mild Cognitive Impairment (aMCI). The two leading theoretical views suggest either loss of semantic knowledge or impaired executive control to be responsible. We assessed SVF 3 times on 2 consecutive days in 29 healthy controls (HC) and 29 patients with aMCI with the aim to answer the question which of the two views holds true. When doing the task for the first time, patients with aMCI produced fewer and more common words with a shorter mean response latency. When tested repeatedly, only healthy volunteers increased performance. Likewise, only the performance of HC indicated two distinct retrieval processes: a prompt retrieval of readily available items at the beginning of the task and an active search through semantic space towards the end. With repeated assessment, the pool of readily available items became larger in HC, but not patients with aMCI. The production of fewer and more common words in aMCI points to a smaller search set and supports the loss of semantic knowledge view. The failure to improve performance as well as the lack of distinct retrieval processes point to an additional impairment in executive control. Our data did not clearly favour one theoretical view over the other, but rather indicates that the impairment of patients with aMCI in SVF is due to a combination of both.

@article{troger2021patients,
title = {Patients with amnestic MCI Fail to Adapt Executive Control When Repeatedly Tested with Semantic Verbal Fluency Tasks},
author = {Johannes Tr{\"o}ger and Hali Lindsay and Mario Mina and Nicklas Linz and Stefan Kl{\"o}ppel and Jutta Kray and Jessica Peter},
url = {https://www.cambridge.org/core/journals/journal-of-the-international-neuropsychological-society/article/abs/patients-with-amnestic-mci-fail-to-adapt-executive-control-when-repeatedly-tested-with-semantic-verbal-fluency-tasks/E09D9B7801DA02360B056E34E0BD96F7},
year = {2021},
date = {2021-06-30},
journal = {Journal of the International Neuropsychological Society},
pages = {1-8},
publisher = {Cambridge University Press},
abstract = {

Semantic verbal fluency (SVF) tasks require individuals to name items from a specified category within a fixed time. An impaired SVF performance is well documented in patients with amnestic Mild Cognitive Impairment (aMCI). The two leading theoretical views suggest either loss of semantic knowledge or impaired executive control to be responsible. We assessed SVF 3 times on 2 consecutive days in 29 healthy controls (HC) and 29 patients with aMCI with the aim to answer the question which of the two views holds true. When doing the task for the first time, patients with aMCI produced fewer and more common words with a shorter mean response latency. When tested repeatedly, only healthy volunteers increased performance. Likewise, only the performance of HC indicated two distinct retrieval processes: a prompt retrieval of readily available items at the beginning of the task and an active search through semantic space towards the end. With repeated assessment, the pool of readily available items became larger in HC, but not patients with aMCI. The production of fewer and more common words in aMCI points to a smaller search set and supports the loss of semantic knowledge view. The failure to improve performance as well as the lack of distinct retrieval processes point to an additional impairment in executive control. Our data did not clearly favour one theoretical view over the other, but rather indicates that the impairment of patients with aMCI in SVF is due to a combination of both.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A5

Brandt, Erika; Möbius, Bernd; Andreeva, Bistra

Dynamic Formant Trajectories in German Read Speech: Impact of Predictability and Prominence Journal Article

Frontiers in Communication, section Language Sciences, 6, pp. 1-15, 2021.

Phonetic structures expand temporally and spectrally when they are difficult to predict from their context. To some extent, effects of predictability are modulated by prosodic structure. So far, studies on the impact of contextual predictability and prosody on phonetic structures have neglected the dynamic nature of the speech signal. This study investigates the impact of predictability and prominence on the dynamic structure of the first and second formants of German vowels. We expect to find differences in the formant movements between vowels standing in different predictability contexts and a modulation of this effect by prominence. First and second formant values are extracted from a large German corpus. Formant trajectories of peripheral vowels are modeled using generalized additive mixed models, which estimate nonlinear regressions between a dependent variable and predictors. Contextual predictability is measured as biphone and triphone surprisal based on a statistical German language model. We test for the effects of the information-theoretic measures surprisal and word frequency, as well as prominence, on formant movement, while controlling for vowel phonemes and duration. Primary lexical stress and vowel phonemes are significant predictors of first and second formant trajectory shape. We replicate previous findings that vowels are more dispersed in stressed syllables than in unstressed syllables. The interaction of stress and surprisal explains formant movement: unstressed vowels show more variability in their formant trajectory shape at different surprisal levels than stressed vowels. This work shows that effects of contextual predictability on fine phonetic detail can be observed not only in pointwise measures but also in dynamic features of phonetic segments.

@article{Brandt/etal:2021,
title = {Dynamic Formant Trajectories in German Read Speech: Impact of Predictability and Prominence},
author = {Erika Brandt and Bernd M{\"o}bius and Bistra Andreeva},
url = {https://www.frontiersin.org/articles/10.3389/fcomm.2021.643528/full},
doi = {https://doi.org/10.3389/fcomm.2021.643528},
year = {2021},
date = {2021-06-21},
journal = {Frontiers in Communication, section Language Sciences},
pages = {1-15},
volume = {6},
number = {643528},
abstract = {Phonetic structures expand temporally and spectrally when they are difficult to predict from their context. To some extent, effects of predictability are modulated by prosodic structure. So far, studies on the impact of contextual predictability and prosody on phonetic structures have neglected the dynamic nature of the speech signal. This study investigates the impact of predictability and prominence on the dynamic structure of the first and second formants of German vowels. We expect to find differences in the formant movements between vowels standing in different predictability contexts and a modulation of this effect by prominence. First and second formant values are extracted from a large German corpus. Formant trajectories of peripheral vowels are modeled using generalized additive mixed models, which estimate nonlinear regressions between a dependent variable and predictors. Contextual predictability is measured as biphone and triphone surprisal based on a statistical German language model. We test for the effects of the information-theoretic measures surprisal and word frequency, as well as prominence, on formant movement, while controlling for vowel phonemes and duration. Primary lexical stress and vowel phonemes are significant predictors of first and second formant trajectory shape. We replicate previous findings that vowels are more dispersed in stressed syllables than in unstressed syllables. The interaction of stress and surprisal explains formant movement: unstressed vowels show more variability in their formant trajectory shape at different surprisal levels than stressed vowels. This work shows that effects of contextual predictability on fine phonetic detail can be observed not only in pointwise measures but also in dynamic features of phonetic segments.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Jágrová, Klára; Hedderich, Michael; Mosbach, Marius; Avgustinova, Tania; Klakow, Dietrich

On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers Journal Article

Frontiers in Psychology, 12, pp. 2296, 2021, ISSN 1664-1078.

This contribution seeks to provide a rational probabilistic explanation for the intelligibility of words in a genetically related language that is unknown to the reader, a phenomenon referred to as intercomprehension. In this research domain, linguistic distance, among other factors, was proved to correlate well with the mutual intelligibility of individual words. However, the role of context for the intelligibility of target words in sentences was subject to very few studies. To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences. We compare correlations of target word intelligibility with data from 3-g language models (LMs) to their correlations with data obtained from context-aware LMs. More specifically, we evaluate two context-aware LM architectures: Long Short-Term Memory (LSTMs) that can, theoretically, take infinitely long-distance dependencies into account and Transformer-based LMs which can access the whole input sequence at the same time. We investigate how their use of context affects surprisal and its correlation with intelligibility.

@article{10.3389/fpsyg.2021.662277,
title = {On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers},
author = {Kl{\'a}ra J{\'a}grov{\'a} and Michael Hedderich and Marius Mosbach and Tania Avgustinova and Dietrich Klakow},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2021.662277/full},
doi = {https://doi.org/10.3389/fpsyg.2021.662277},
year = {2021},
date = {2021},
journal = {Frontiers in Psychology},
pages = {2296},
volume = {12},
abstract = {This contribution seeks to provide a rational probabilistic explanation for the intelligibility of words in a genetically related language that is unknown to the reader, a phenomenon referred to as intercomprehension. In this research domain, linguistic distance, among other factors, was proved to correlate well with the mutual intelligibility of individual words. However, the role of context for the intelligibility of target words in sentences was subject to very few studies. To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences. We compare correlations of target word intelligibility with data from 3-g language models (LMs) to their correlations with data obtained from context-aware LMs. More specifically, we evaluate two context-aware LM architectures: Long Short-Term Memory (LSTMs) that can, theoretically, take infinitely long-distance dependencies into account and Transformer-based LMs which can access the whole input sequence at the same time. We investigate how their use of context affects surprisal and its correlation with intelligibility.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   B4 C4

Lapshinova-Koltunski, Ekaterina; Bizzoni, Yuri; Przybyl, Heike; Teich, Elke

Found in translation/interpreting: combining data-driven and supervised methods to analyse cross-linguistically mediated communication Inproceedings

Proceedings of the Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21), Association for Computational Linguistics, pp. 82-90, online, 2021.

We report on a study of the specific linguistic properties of cross-linguistically mediated communication, comparing written and spoken translation (simultaneous interpreting) in the domain of European Parliament discourse. Specifically, we compare translations and interpreting with target language original texts/speeches in terms of (a) predefined features commonly used for translationese detection, and (b) features derived in a data-driven fashion from translation and interpreting corpora. For the latter, we use n-gram language models combined with relative entropy (Kullback-Leibler Divergence). We set up a number of classification tasks comparing translations with comparable texts originally written in the target language and interpreted speeches with target language comparable speeches to assess the contributions of predefined and data-driven features to the distinction between translation, interpreting and originals. Our analysis reveals that interpreting is more distinct from comparable originals than translation and that its most distinctive features signal an overemphasis of oral, online production more than showing traces of cross-linguistically mediated communication.

@inproceedings{LapshinovaEtAl2021interp,
title = {Found in translation/interpreting: combining data-driven and supervised methods to analyse cross-linguistically mediated communication},
author = {Ekaterina Lapshinova-Koltunski and Yuri Bizzoni and Heike Przybyl and Elke Teich},
url = {https://aclanthology.org/2021.motra-1.9/},
year = {2021},
date = {2021-05-31},
booktitle = {Proceedings of the Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21)},
pages = {82-90},
publisher = {Association for Computational Linguistics},
address = {online},
abstract = {We report on a study of the specific linguistic properties of cross-linguistically mediated communication, comparing written and spoken translation (simultaneous interpreting) in the domain of European Parliament discourse. Specifically, we compare translations and interpreting with target language original texts/speeches in terms of (a) predefined features commonly used for translationese detection, and (b) features derived in a data-driven fashion from translation and interpreting corpora. For the latter, we use n-gram language models combined with relative entropy (Kullback-Leibler Divergence). We set up a number of classification tasks comparing translations with comparable texts originally written in the target language and interpreted speeches with target language comparable speeches to assess the contributions of predefined and data-driven features to the distinction between translation, interpreting and originals. Our analysis reveals that interpreting is more distinct from comparable originals than translation and that its most distinctive features signal an overemphasis of oral, online production more than showing traces of cross-linguistically mediated communication.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Sikos, Les; Staudte, Maria

A rose by any other verb: The effect of expectations and word category on processing effort in situated sentence comprehension Journal Article

Frontiers in Psychology, 2021.

Recent work has shown that linguistic and visual contexts jointly modulate linguistic expectancy and, thus, the processing effort for a (more or less) expected critical word (Ankener et al., 2018; Tourtouri et al., 2019; Staudte et al., 2020). According to these findings, uncertainty about the upcoming referent in a visually-situated sentence can be reduced by exploiting the selectional restrictions of a preceding word (e.g., a verb or an adjective), which then reduces processing effort on the critical word (e.g., a referential noun). Interestingly, however, no such modulation was observed in these studies on the expectation-generating word itself. The goal of the current study is to investigate whether the reduction of uncertainty (i.e., the generation of expectations) simply does not modulate processing effort — or whether the particular subject-verb-object (SVO) sentence structure used in these studies (which emphasizes the referential nature of the noun as direct pointer to visually co-present objects) accounts for the observed pattern. To test these questions, the current design reverses the functional roles of nouns and verbs by using sentence constructions in which the noun reduces uncertainty about upcoming verbs, and the verb provides the disambiguating and reference-resolving piece of information. Experiment~1 (a Visual World Paradigm study) and Experiment~2 (a Grammaticality Maze study) both replicate the effect found in Ankener et al. (2018) of visually-situated context on the word which uniquely identifies the referent, albeit on the verb in the current study. Results on the noun, where uncertainty is reduced and expectations are generated in the current design, were mixed and were most likely influenced by design decisions specific to each experiment. These results show that processing of the reference-resolving word — whether it be a noun or a verb — reliably benefits from the prior linguistic and visual information that lead to the generation of concrete expectations.

@article{Sikos2021b,
title = {A rose by any other verb: The effect of expectations and word category on processing effort in situated sentence comprehension},
author = {Les Sikos and Maria Staudte},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2021.661898/full},
doi = {https://doi.org/10.3389/fpsyg.2021.661898},
year = {2021},
date = {2021},
journal = {Frontiers in Psychology},
abstract = {Recent work has shown that linguistic and visual contexts jointly modulate linguistic expectancy and, thus, the processing effort for a (more or less) expected critical word (Ankener et al., 2018; Tourtouri et al., 2019; Staudte et al., 2020). According to these findings, uncertainty about the upcoming referent in a visually-situated sentence can be reduced by exploiting the selectional restrictions of a preceding word (e.g., a verb or an adjective), which then reduces processing effort on the critical word (e.g., a referential noun). Interestingly, however, no such modulation was observed in these studies on the expectation-generating word itself. The goal of the current study is to investigate whether the reduction of uncertainty (i.e., the generation of expectations) simply does not modulate processing effort --- or whether the particular subject-verb-object (SVO) sentence structure used in these studies (which emphasizes the referential nature of the noun as direct pointer to visually co-present objects) accounts for the observed pattern. To test these questions, the current design reverses the functional roles of nouns and verbs by using sentence constructions in which the noun reduces uncertainty about upcoming verbs, and the verb provides the disambiguating and reference-resolving piece of information. Experiment~1 (a Visual World Paradigm study) and Experiment~2 (a Grammaticality Maze study) both replicate the effect found in Ankener et al. (2018) of visually-situated context on the word which uniquely identifies the referent, albeit on the verb in the current study. Results on the noun, where uncertainty is reduced and expectations are generated in the current design, were mixed and were most likely influenced by design decisions specific to each experiment. These results show that processing of the reference-resolving word --- whether it be a noun or a verb --- reliably benefits from the prior linguistic and visual information that lead to the generation of concrete expectations.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C3

Yung, Frances Pik Yu; Jungbluth, Jana; Demberg, Vera

Limits to the Rational Production of Discourse Connectives Journal Article

Frontiers in Psychology, 12, pp. 1729, 2021.

Rational accounts of language use such as the uniform information density hypothesis, which asserts that speakers distribute information uniformly across their utterances, and the rational speech act (RSA) model, which suggests that speakers optimize the formulation of their message by reasoning about what the comprehender would understand, have been hypothesized to account for a wide range of language use phenomena. We here specifically focus on the production of discourse connectives. While there is some prior work indicating that discourse connective production may be governed by RSA, that work uses a strongly gamified experimental setting. In this study, we aim to explore whether speakers reason about the interpretation of their conversational partner also in more realistic settings. We thereby systematically vary the task setup to tease apart effects of task instructions and effects of the speaker explicitly seeing the interpretation alternatives for the listener. Our results show that the RSA-predicted effect of connective choice based on reasoning about the listener is only found in the original setting where explicit interpretation alternatives of the listener are available for the speaker. The effect disappears when the speaker has to reason about listener interpretations. We furthermore find that rational effects are amplified by the gamified task setting, indicating that meta-reasoning about the specific task may play an important role and potentially limit the generalizability of the found effects to more naturalistic every-day language use.

@article{yungJungbluthDemberg2021,
title = {Limits to the Rational Production of Discourse Connectives},
author = {Frances Pik Yu Yung and Jana Jungbluth and Vera Demberg},
url = {https://www.frontiersin.org/article/10.3389/fpsyg.2021.660730},
doi = {https://doi.org/10.3389/fpsyg.2021.660730},
year = {2021},
date = {2021-05-28},
journal = {Frontiers in Psychology},
pages = {1729},
volume = {12},
abstract = {Rational accounts of language use such as the uniform information density hypothesis, which asserts that speakers distribute information uniformly across their utterances, and the rational speech act (RSA) model, which suggests that speakers optimize the formulation of their message by reasoning about what the comprehender would understand, have been hypothesized to account for a wide range of language use phenomena. We here specifically focus on the production of discourse connectives. While there is some prior work indicating that discourse connective production may be governed by RSA, that work uses a strongly gamified experimental setting. In this study, we aim to explore whether speakers reason about the interpretation of their conversational partner also in more realistic settings. We thereby systematically vary the task setup to tease apart effects of task instructions and effects of the speaker explicitly seeing the interpretation alternatives for the listener. Our results show that the RSA-predicted effect of connective choice based on reasoning about the listener is only found in the original setting where explicit interpretation alternatives of the listener are available for the speaker. The effect disappears when the speaker has to reason about listener interpretations. We furthermore find that rational effects are amplified by the gamified task setting, indicating that meta-reasoning about the specific task may play an important role and potentially limit the generalizability of the found effects to more naturalistic every-day language use.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Menzel, Katrin; Knappen, Jörg; Teich, Elke

Generating linguistically relevant metadata for the Royal Society Corpus Journal Article

Säily, Tanja; Tyrkkö, Jukka (Ed.): Research in Corpus Linguistics, Challenges in combining structured and unstructured data in corpus development (special issue), 9, pp. 1-18, 2021, ISSN 2243-4712.

This paper provides an overview of metadata generation and management for the Royal Society Corpus (RSC), aiming to encourage discussion about the specific challenges in building substantial diachronic corpora intended to be used for linguistic and humanistic analysis. We discuss the motivations and goals of building the corpus, describe its composition and present the types of metadata it contains. Specifically, we tackle two challenges: first, integration of original metadata from the data providers (JSTOR and the Royal Society); second, derivation of additional linguistically relevant metadata regarding text structure and situational context (register).

@article{Menzel2021,
title = {Generating linguistically relevant metadata for the Royal Society Corpus},
author = {Katrin Menzel and J{\"o}rg Knappen and Elke Teich},
editor = {Tanja S{\"a}ily and Jukka Tyrkk{\"o}},
url = {https://ricl.aelinco.es/index.php/ricl/article/view/158},
doi = {https://doi.org/10.32714/ricl.09.01.02},
year = {2021},
date = {2021},
journal = {Research in Corpus Linguistics, Challenges in combining structured and unstructured data in corpus development (special issue)},
pages = {1-18},
volume = {9},
number = {1},
abstract = {This paper provides an overview of metadata generation and management for the Royal Society Corpus (RSC), aiming to encourage discussion about the specific challenges in building substantial diachronic corpora intended to be used for linguistic and humanistic analysis. We discuss the motivations and goals of building the corpus, describe its composition and present the types of metadata it contains. Specifically, we tackle two challenges: first, integration of original metadata from the data providers (JSTOR and the Royal Society); second, derivation of additional linguistically relevant metadata regarding text structure and situational context (register).},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Schäfer, Lisa; Lemke, Tyll Robin; Drenhaus, Heiner; Reich, Ingo

The Role of UID for the Usage of Verb Phrase Ellipsis: Psycholinguistic Evidence From Length and Context Effects Journal Article

Frontiers in Psychology, 12, pp. 1672, 2021, ISSN 1664-1078.

We investigate the underexplored question of when speakers make use of the omission phenomenon verb phrase ellipsis (VPE) in English given that the full form is also available to them. We base the interpretation of our results on the well-established information-theoretic Uniform Information Density (UID) hypothesis: Speakers tend to distribute processing effort uniformly across utterances and avoid regions of low information by omitting redundant material through, e.g., VPE. We investigate the length of the omittable VP and its predictability in context as sources of redundancy which lead to larger or deeper regions of low information and an increased pressure to use ellipsis. We use both naturalness rating and self-paced reading studies in order to link naturalness patterns to potential processing difficulties. For the length effects our rating and reading results support a UID account. Surprisingly, we do not find an effect of the context on the naturalness and the processing of VPE. We suggest that our manipulation might have been too weak or not effective to evidence such an effect.

@article{schaeferetal_2021b,
title = {The Role of UID for the Usage of Verb Phrase Ellipsis: Psycholinguistic Evidence From Length and Context Effects},
author = {Lisa Sch{\"a}fer and Tyll Robin Lemke and Heiner Drenhaus and Ingo Reich},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2021.661087/full},
doi = {https://doi.org/10.3389/fpsyg.2021.661087},
year = {2021},
date = {2021-05-26},
journal = {Frontiers in Psychology},
pages = {1672},
volume = {12},
abstract = {We investigate the underexplored question of when speakers make use of the omission phenomenon verb phrase ellipsis (VPE) in English given that the full form is also available to them. We base the interpretation of our results on the well-established information-theoretic Uniform Information Density (UID) hypothesis: Speakers tend to distribute processing effort uniformly across utterances and avoid regions of low information by omitting redundant material through, e.g., VPE. We investigate the length of the omittable VP and its predictability in context as sources of redundancy which lead to larger or deeper regions of low information and an increased pressure to use ellipsis. We use both naturalness rating and self-paced reading studies in order to link naturalness patterns to potential processing difficulties. For the length effects our rating and reading results support a UID account. Surprisingly, we do not find an effect of the context on the naturalness and the processing of VPE. We suggest that our manipulation might have been too weak or not effective to evidence such an effect.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Schäfer, Lisa

Topic drop in German: Empirical support for an information-theoretic account to a long-known omission phenomenon Journal Article

Zeitschrift für Sprachwissenschaft, 40, pp. 161-197, 2021, ISSN 1613-3706, 0721-9067.

German allows for topic drop (Fries1988), the omission of a preverbal constituent from a V2 sentence. I address the underexplored question of why speakers use topic drop with a corpus study and two acceptability rating studies. I propose an information-theoretic explanation based on the Uniform Information Density hypothesis (Levy and Jaeger2007) that accounts for the full picture of data. The information-theoretic approach predicts that topic drop is more felicitous when the omitted constituent is predictable in context and easy to recover. This leads to a more optimal use of the hearer’s processing capacities. The corpus study on the FraC corpus (Horch and Reich2017) shows that grammatical person, verb probability and verbal inflection impact the frequency of topic drop. The two rating experiments indicate that these differences in frequency are also reflected in acceptability and additionally evidence an impact of topicality on topic drop. Taken together my studies constitute the first systematic empirical investigation of previously only sparsely researched observations from the literature. My information-theoretic account provides a unifying explanation of these isolated observations and is also able to account for the effect of verb probability that I find in my corpus study.

@article{schaefer_2021a,
title = {Topic drop in German: Empirical support for an information-theoretic account to a long-known omission phenomenon},
author = {Lisa Sch{\"a}fer},
url = {https://www.degruyter.com/document/doi/10.1515/zfs-2021-2024/html},
doi = {https://doi.org/10.1515/zfs-2021-2024},
year = {2021},
date = {2021-05-19},
journal = {Zeitschrift f{\"u}r Sprachwissenschaft},
pages = {161-197},
volume = {40},
number = {2},
abstract = {German allows for topic drop (Fries1988), the omission of a preverbal constituent from a V2 sentence. I address the underexplored question of why speakers use topic drop with a corpus study and two acceptability rating studies. I propose an information-theoretic explanation based on the Uniform Information Density hypothesis (Levy and Jaeger2007) that accounts for the full picture of data. The information-theoretic approach predicts that topic drop is more felicitous when the omitted constituent is predictable in context and easy to recover. This leads to a more optimal use of the hearer’s processing capacities. The corpus study on the FraC corpus (Horch and Reich2017) shows that grammatical person, verb probability and verbal inflection impact the frequency of topic drop. The two rating experiments indicate that these differences in frequency are also reflected in acceptability and additionally evidence an impact of topicality on topic drop. Taken together my studies constitute the first systematic empirical investigation of previously only sparsely researched observations from the literature. My information-theoretic account provides a unifying explanation of these isolated observations and is also able to account for the effect of verb probability that I find in my corpus study.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Häuser, Katja; Kray, Jutta

Effects of prediction error on episodic memory retrieval: evidence from sentence reading and word recognition Journal Article

Language, Cognition and Neuroscience, Taylor & Francis, pp. 1-17, 2021.

Prediction facilitates word processing in the moment, but the longer-term consequences of prediction remain unclear. We investigated whether prediction error during language encoding enhances memory for words later on. German-speaking participants read sentences in which the gender marking of the pre-nominal article was consistent or inconsistent with the predictable noun. During subsequent word recognition, we probed participants’ recognition memory for predictable and unpredictable nouns. Our results indicate that individuals who demonstrated early prediction error during sentence reading, showed enhanced recognition memory for nouns overall. Results from an exploratory step-wise regression showed that prenominal prediction error and general reading speed were the best proxies for recognition memory. Hence, prediction error may facilitate recognition by furnishing memory traces built during initial reading of the sentences. Results are discussed in the light of hypotheses positing that predictable words show a memory disadvantage because they are processed less thoroughly.

@article{haeuser2021effects,
title = {Effects of prediction error on episodic memory retrieval: evidence from sentence reading and word recognition},
author = {Katja H{\"a}user and Jutta Kray},
url = {https://www.tandfonline.com/doi/full/10.1080/23273798.2021.1924387},
doi = {https://doi.org/10.1080/23273798.2021.1924387},
year = {2021},
date = {2021},
journal = {Language, Cognition and Neuroscience},
pages = {1-17},
publisher = {Taylor & Francis},
abstract = {Prediction facilitates word processing in the moment, but the longer-term consequences of prediction remain unclear. We investigated whether prediction error during language encoding enhances memory for words later on. German-speaking participants read sentences in which the gender marking of the pre-nominal article was consistent or inconsistent with the predictable noun. During subsequent word recognition, we probed participants’ recognition memory for predictable and unpredictable nouns. Our results indicate that individuals who demonstrated early prediction error during sentence reading, showed enhanced recognition memory for nouns overall. Results from an exploratory step-wise regression showed that prenominal prediction error and general reading speed were the best proxies for recognition memory. Hence, prediction error may facilitate recognition by furnishing memory traces built during initial reading of the sentences. Results are discussed in the light of hypotheses positing that predictable words show a memory disadvantage because they are processed less thoroughly.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A5

van Os, Marjolein; Kray, Jutta; Demberg, Vera

Recognition of minimal pairs in (un)predictive sentence contexts in two types of noise Inproceedings

Proceedings of the 43rd Annual Meeting of the Cognitive Science Society (CogSci), pp. 2943-2949, 2021.

Top-down predictive processes and bottom-up auditory processes interact in speech comprehension. In background noise, the acoustic signal is degraded. This study investigated the interaction of these processes in a word recognition paradigm using high and low predictability sentences in two types of background noise and using phonetically controlled contrasts. Previous studies have reported false hearing, but have not provided insight into what phonetic features are most prone to false hearing. We here systematically explore this issue and find that plosives lead to increased false hearing compared to vowels. Furthermore, this study on German for the first time replicates the overall false hearing effect in young adults for a language other than English.

@inproceedings{vanOs2021,
title = {Recognition of minimal pairs in (un)predictive sentence contexts in two types of noise},
author = {Marjolein van Os and Jutta Kray and Vera Demberg},
url = {https://escholarship.org/uc/item/70z995v4},
year = {2021},
date = {2021},
booktitle = {Proceedings of the 43rd Annual Meeting of the Cognitive Science Society (CogSci)},
pages = {2943-2949},
abstract = {Top-down predictive processes and bottom-up auditory processes interact in speech comprehension. In background noise, the acoustic signal is degraded. This study investigated the interaction of these processes in a word recognition paradigm using high and low predictability sentences in two types of background noise and using phonetically controlled contrasts. Previous studies have reported false hearing, but have not provided insight into what phonetic features are most prone to false hearing. We here systematically explore this issue and find that plosives lead to increased false hearing compared to vowels. Furthermore, this study on German for the first time replicates the overall false hearing effect in young adults for a language other than English.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Zarcone, Alessandra; Demberg, Vera

A bathtub by any other name: the reduction of german compounds in predictive contexts Inproceedings

Proceedings of the Annual Meeting of the Cognitive Science Society, 43, 2021.

The Uniform Information Density hypothesis (UID) predicts that lexical choice between long and short word forms depends on the predictability of the referent in context, and recent studies have shown such an effect of predictability on lexical choice during online production. We here set out to test whether the UID predictions hold up in a related setting, but different language (German) and different phenomenon, namely the choice between compounds (e.g. Badewanne / bathtub) or their base forms (Wanne / tub). Our study is consistent with the UID: we find that participants choose the shorter base form more often in predictive contexts, showing an active tendency to be information-theoretically efficient.

@inproceedings{Zarcone2021,
title = {A bathtub by any other name: the reduction of german compounds in predictive contexts},
author = {Alessandra Zarcone and Vera Demberg},
url = {https://escholarship.org/uc/item/3w6451rz},
year = {2021},
date = {2021},
booktitle = {Proceedings of the Annual Meeting of the Cognitive Science Society},
abstract = {The Uniform Information Density hypothesis (UID) predicts that lexical choice between long and short word forms depends on the predictability of the referent in context, and recent studies have shown such an effect of predictability on lexical choice during online production. We here set out to test whether the UID predictions hold up in a related setting, but different language (German) and different phenomenon, namely the choice between compounds (e.g. Badewanne / bathtub) or their base forms (Wanne / tub). Our study is consistent with the UID: we find that participants choose the shorter base form more often in predictive contexts, showing an active tendency to be information-theoretically efficient.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A3

Lapshinova-Koltunski, Ekaterina

Analysing the Dimension of Mode in Translation Book Chapter

Bisiada, Mario;  (Ed.): Empirical Studies in Translation and Discourse. Translation and Multilingual Natural Language Processing, Language Science Press, pp. 223-243, Berlin, 2021, ISBN 978-3-96110-300-3, ISSN 2364-8899.

The present chapter applies text classification to test how well we can distinguish between texts along two dimensions: a text-production dimension that distinguishes between translations and non-translations (where translations also include interpreted texts); and a mode dimension that distinguishes between and spoken and written texts. The chapter also aims to investigate the relationship between these two dimensions. Moreover, it investigates whether the same linguistic features that are derived from variational linguistics contribute to the prediction of mode in both translations and non-translations. The distributional information about these features was used to statistically model variation along the two dimensions. The results show that the same feature set can be used to automatically differentiate translations from non-translations, as well as spoken texts from the written texts. However, language variation along the dimension of mode is stronger
than that along the dimension of text production, as classification into spoken and written texts delivers better results. Besides, linguistic features that contribute to the distinction between spoken and written mode are similar in both translated and non-translated language.

@inbook{Lapshinova2021dimension,
title = {Analysing the Dimension of Mode in Translation},
author = {Ekaterina Lapshinova-Koltunski},
editor = {Mario Bisiada},
url = {https://doi.org/10.5281/zenodo.4450014},
doi = {https://doi.org/10.5281/zenodo.4450014},
year = {2021},
date = {2021},
booktitle = {Empirical Studies in Translation and Discourse. Translation and Multilingual Natural Language Processing},
isbn = {978-3-96110-300-3},
issn = {2364-8899},
pages = {223-243},
publisher = {Language Science Press},
address = {Berlin},
abstract = {The present chapter applies text classification to test how well we can distinguish between texts along two dimensions: a text-production dimension that distinguishes between translations and non-translations (where translations also include interpreted texts); and a mode dimension that distinguishes between and spoken and written texts. The chapter also aims to investigate the relationship between these two dimensions. Moreover, it investigates whether the same linguistic features that are derived from variational linguistics contribute to the prediction of mode in both translations and non-translations. The distributional information about these features was used to statistically model variation along the two dimensions. The results show that the same feature set can be used to automatically differentiate translations from non-translations, as well as spoken texts from the written texts. However, language variation along the dimension of mode is stronger than that along the dimension of text production, as classification into spoken and written texts delivers better results. Besides, linguistic features that contribute to the distinction between spoken and written mode are similar in both translated and non-translated language.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B7

Sikos, Les; Venhuizen, Noortje; Drenhaus, Heiner; Crocker, Matthew W.

Reevaluating pragmatic reasoning in language games Journal Article

PLOS ONE, 2021.

The results of a highly influential study that tested the predictions of the Rational Speech Act (RSA) model suggest that (a) listeners use pragmatic reasoning in one-shot web-based referential communication games despite the artificial, highly constrained, and minimally interactive nature of the task, and (b) that RSA accurately captures this behavior. In this work, we reevaluate the contribution of the pragmatic reasoning formalized by RSA in explaining listener behavior by comparing RSA to a baseline literal listener model that is only driven by literal word meaning and the prior probability of referring to an object. Across three experiments we observe only modest evidence of pragmatic behavior in one-shot web-based language games, and only under very limited circumstances. We find that although RSA provides a strong fit to listener responses, it does not perform better than the baseline literal listener model. Our results suggest that while participants playing the role of the Speaker are informative in these one-shot web-based reference games, participants playing the role of the Listener only rarely take this Speaker behavior into account to reason about the intended referent. In addition, we show that RSA’s fit is primarily due to a combination of non-pragmatic factors, perhaps the most surprising of which is that in the majority of conditions that are amenable to pragmatic reasoning, RSA (accurately) predicts that listeners will behave non-pragmatically. This leads us to conclude that RSA’s strong overall correlation with human behavior in one-shot web-based language games does not reflect listener’s pragmatic reasoning about informative speakers.

@article{Sikos2021,
title = {Reevaluating pragmatic reasoning in language games},
author = {Les Sikos and Noortje Venhuizen and Heiner Drenhaus and Matthew W. Crocker},
url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0248388},
doi = {https://doi.org/10.1371/journal.pone.0248388},
year = {2021},
date = {2021-03-17},
journal = {PLOS ONE},
abstract = {The results of a highly influential study that tested the predictions of the Rational Speech Act (RSA) model suggest that (a) listeners use pragmatic reasoning in one-shot web-based referential communication games despite the artificial, highly constrained, and minimally interactive nature of the task, and (b) that RSA accurately captures this behavior. In this work, we reevaluate the contribution of the pragmatic reasoning formalized by RSA in explaining listener behavior by comparing RSA to a baseline literal listener model that is only driven by literal word meaning and the prior probability of referring to an object. Across three experiments we observe only modest evidence of pragmatic behavior in one-shot web-based language games, and only under very limited circumstances. We find that although RSA provides a strong fit to listener responses, it does not perform better than the baseline literal listener model. Our results suggest that while participants playing the role of the Speaker are informative in these one-shot web-based reference games, participants playing the role of the Listener only rarely take this Speaker behavior into account to reason about the intended referent. In addition, we show that RSA’s fit is primarily due to a combination of non-pragmatic factors, perhaps the most surprising of which is that in the majority of conditions that are amenable to pragmatic reasoning, RSA (accurately) predicts that listeners will behave non-pragmatically. This leads us to conclude that RSA’s strong overall correlation with human behavior in one-shot web-based language games does not reflect listener’s pragmatic reasoning about informative speakers.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C3

Köhne-Fuetterer, Judith; Drenhaus, Heiner; Delogu, Francesca; Demberg, Vera

The online processing of causal and concessive discourse connectives Journal Article

Linguistics, 59, pp. 417-448, 2021.

While there is a substantial amount of evidence for language processing being a highly incremental and predictive process, we still know relatively little about how top-down discourse based expectations are combined with bottom-up information such as discourse connectives. The present article reports on three experiments investigating this question using different methodologies (visual world paradigm and ERPs) in two languages (German and English). We find support for highly incremental processing of causal and concessive discourse connectives, causing anticipation of upcoming material. Our visual world study shows that anticipatory looks depend on the discourse connective; furthermore, the German ERP study revealed an N400 effect on a gender-marked adjective preceding the target noun, when the target noun was inconsistent with the expectations elicited by the combination of context and discourse connective. Moreover, our experiments reveal that the facilitation of downstream material based on earlier connectives comes at the cost of reversing original expectations, as evidenced by a P600 effect on the concessive relative to the causal connective.

@article{koehne2021online,
title = {The online processing of causal and concessive discourse connectives},
author = {Judith K{\"o}hne-Fuetterer and Heiner Drenhaus and Francesca Delogu and Vera Demberg},
url = {https://doi.org/10.1515/ling-2021-0011},
doi = {https://doi.org/doi:10.1515/ling-2021-0011},
year = {2021},
date = {2021-03-04},
journal = {Linguistics},
pages = {417-448},
volume = {59},
number = {2},
abstract = {While there is a substantial amount of evidence for language processing being a highly incremental and predictive process, we still know relatively little about how top-down discourse based expectations are combined with bottom-up information such as discourse connectives. The present article reports on three experiments investigating this question using different methodologies (visual world paradigm and ERPs) in two languages (German and English). We find support for highly incremental processing of causal and concessive discourse connectives, causing anticipation of upcoming material. Our visual world study shows that anticipatory looks depend on the discourse connective; furthermore, the German ERP study revealed an N400 effect on a gender-marked adjective preceding the target noun, when the target noun was inconsistent with the expectations elicited by the combination of context and discourse connective. Moreover, our experiments reveal that the facilitation of downstream material based on earlier connectives comes at the cost of reversing original expectations, as evidenced by a P600 effect on the concessive relative to the causal connective.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   A1 B2 B3

Lemke, Tyll Robin; Schäfer, Lisa; Reich, Ingo

Modeling the predictive potential of extralinguistic context with script knowledge: The case of fragments Journal Article

PLOS ONE, 16, pp. e0246255, 2021.

We describe a novel approach to estimating the predictability of utterances given extralinguistic context in psycholinguistic research. Predictability effects on language production and comprehension are widely attested, but so far predictability has mostly been manipulated through local linguistic context, which is captured with n-gram language models. However, this method does not allow to investigate predictability effects driven by extralinguistic context. Modeling effects of extralinguistic context is particularly relevant to discourse-initial expressions, which can be predictable even if they lack linguistic context at all. We propose to use script knowledge as an approximation to extralinguistic context. Since the application of script knowledge involves the generation of prediction about upcoming events, we expect that scrips can be used to manipulate the likelihood of linguistic expressions referring to these events. Previous research has shown that script-based discourse expectations modulate the likelihood of linguistic expressions, but script knowledge has often been operationalized with stimuli which were based on researchers’ intuitions and/or expensive production and norming studies. We propose to quantify the likelihood of an utterance based on the probability of the event to which it refers. This probability is calculated with event language models trained on a script knowledge corpus and modulated with probabilistic event chains extracted from the corpus. We use the DeScript corpus of script knowledge to obtain empirically founded estimates of the likelihood of an event to occur in context without having to resort to expensive pre-tests of the stimuli. We exemplify our method at a case study on the usage of nonsentential expressions (fragments), which shows that utterances that are predictable given script-based extralinguistic context are more likely to be reduced.

@article{Lemke2021,
title = {Modeling the predictive potential of extralinguistic context with script knowledge: The case of fragments},
author = {Tyll Robin Lemke and Lisa Sch{\"a}fer and Ingo Reich},
url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0246255},
doi = {https://doi.org/10.1371/journal.pone.0246255},
year = {2021},
date = {2021-02-11},
journal = {PLOS ONE},
pages = {e0246255},
volume = {16},
number = {2},
abstract = {We describe a novel approach to estimating the predictability of utterances given extralinguistic context in psycholinguistic research. Predictability effects on language production and comprehension are widely attested, but so far predictability has mostly been manipulated through local linguistic context, which is captured with n-gram language models. However, this method does not allow to investigate predictability effects driven by extralinguistic context. Modeling effects of extralinguistic context is particularly relevant to discourse-initial expressions, which can be predictable even if they lack linguistic context at all. We propose to use script knowledge as an approximation to extralinguistic context. Since the application of script knowledge involves the generation of prediction about upcoming events, we expect that scrips can be used to manipulate the likelihood of linguistic expressions referring to these events. Previous research has shown that script-based discourse expectations modulate the likelihood of linguistic expressions, but script knowledge has often been operationalized with stimuli which were based on researchers’ intuitions and/or expensive production and norming studies. We propose to quantify the likelihood of an utterance based on the probability of the event to which it refers. This probability is calculated with event language models trained on a script knowledge corpus and modulated with probabilistic event chains extracted from the corpus. We use the DeScript corpus of script knowledge to obtain empirically founded estimates of the likelihood of an event to occur in context without having to resort to expensive pre-tests of the stimuli. We exemplify our method at a case study on the usage of nonsentential expressions (fragments), which shows that utterances that are predictable given script-based extralinguistic context are more likely to be reduced.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Brouwer, Harm; Delogu, Francesca; Venhuizen, Noortje; Crocker, Matthew W.

Neurobehavioral Correlates of Surprisal in Language Comprehension: A Neurocomputational Model Journal Article

Frontiers in Psychology, 2021.

Expectation-based theories of language comprehension, in particular Surprisal Theory, go a long way in accounting for the behavioral correlates of word-by-word processing difficulty, such as reading times. An open question, however, is in which component(s) of the Event-Related brain Potential (ERP) signal Surprisal is reflected, and how these electrophysiological correlates relate to behavioral processing indices. Here, we address this question by instantiating an explicit neurocomputational model of incremental, word-by-word language comprehension that produces estimates of the N400 and the P600 – the two most salient ERP components for language processing – as well as estimates of `comprehension-centric‘ Surprisal for each word in a sentence. We derive model predictions for a recent experimental design that directly investigates `world-knowledge‘-induced Surprisal. By relating these predictions to both empirical electrophysiological and behavioral results, we establish a close link between Surprisal, as indexed by reading times, and the P600 component of the ERP signal. The resultant model thus offers an integrated neurobehavioral account of processing difficulty in language comprehension.

@article{Brouwer2021,
title = {Neurobehavioral Correlates of Surprisal in Language Comprehension: A Neurocomputational Model},
author = {Harm Brouwer and Francesca Delogu and Noortje Venhuizen and Matthew W. Crocker},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2021.615538/full},
doi = {https://doi.org/10.3389/fpsyg.2021.615538},
year = {2021},
date = {2021-02-11},
journal = {Frontiers in Psychology},
abstract = {Expectation-based theories of language comprehension, in particular Surprisal Theory, go a long way in accounting for the behavioral correlates of word-by-word processing difficulty, such as reading times. An open question, however, is in which component(s) of the Event-Related brain Potential (ERP) signal Surprisal is reflected, and how these electrophysiological correlates relate to behavioral processing indices. Here, we address this question by instantiating an explicit neurocomputational model of incremental, word-by-word language comprehension that produces estimates of the N400 and the P600 - the two most salient ERP components for language processing - as well as estimates of `comprehension-centric' Surprisal for each word in a sentence. We derive model predictions for a recent experimental design that directly investigates `world-knowledge'-induced Surprisal. By relating these predictions to both empirical electrophysiological and behavioral results, we establish a close link between Surprisal, as indexed by reading times, and the P600 component of the ERP signal. The resultant model thus offers an integrated neurobehavioral account of processing difficulty in language comprehension.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Teich, Elke; Fankhauser, Peter; Degaetano-Ortlieb, Stefania; Bizzoni, Yuri

Less is More/More Diverse: On The Communicative Utility of Linguistic Conventionalization Journal Article

Benîtez-Burraco, Antonio (Ed.): Frontiers in Communication, section Language Sciences, 2021.

We present empirical evidence of the communicative utility of CONVENTIONALIZATION, i.e., convergence in linguistic usage over time, and DIVERSIFICATION, i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation and grammaticalization (Bybee, 2010; Schmid, 2015) and diversification is a cornerstone in the formation of sublanguages/registers, i.e., functional linguistic varieties (Halliday, 1988; Harris, 1991). While it is widely acknowledged that change in language use is primarily socio-culturally determined pushing towards greater linguistic expressivity, we here highlight the limiting function of communicative factors on diachronic linguistic variation showing that conventionalization and diversification are associated with a reduction of linguistic variability. To be able to observe effects of linguistic variability reduction, we first need a well-defined notion of choice in context. Linguistically, this implies the paradigmatic axis of linguistic organization, i.e., the sets of linguistic options available in a given or similar syntagmatic contexts. Here, we draw on word embeddings, weakly neural distributional language models that have recently been employed to model lexicalsemantic change and allow us to approximate the notion of paradigm by neighbourhood in vector space. Second, we need to capture changes in paradigmatic variability, i.e. reduction/expansion of linguistic options in a given context. As a formal index of paradigmatic variability we use entropy, which measures the contribution of linguistic units (e.g., words) in predicting linguistic choice in bits of information. Using entropy provides us with a link to a communicative interpretation, as it is a well-established measure of communicative efficiency with implications for cognitive processing (Linzen and Jaeger, 2016; Venhuizen et al., 2019); also, entropy is negatively correlated with distance in (word embedding) spaces which in turn shows cognitive reflexes in certain language processing tasks (Mitchel et al., 2008; Auguste et al., 2017). In terms of domain we focus on science, looking at the diachronic development of scientific English from the 17th century to modern time. This provides us with a fairly constrained yet dynamic domain of discourse that has witnessed a powerful systematization throughout the centuries and developed specific linguistic conventions geared towards efficient communication. Overall, our study confirms the assumed trends of conventionalization and diversification shown by diachronically decreasing entropy, interspersed with local, temporary entropy highs pointing to phases of linguistic expansion pertaining primarily to introduction of new technical terminology.

@article{Teich2021,
title = {Less is More/More Diverse: On The Communicative Utility of Linguistic Conventionalization},
author = {Elke Teich and Peter Fankhauser and Stefania Degaetano-Ortlieb and Yuri Bizzoni},
editor = {Antonio Benîtez-Burraco},
url = {https://www.frontiersin.org/articles/10.3389/fcomm.2020.620275/full?&utm_source=Email_to_authors_&utm_medium=Email&utm_content=T1_11.5e1_author&utm_campaign=Email_publication&field=&journalName=Frontiers_in_Communication&id=620275},
doi = {https://doi.org/10.3389/fcomm.2020.620275},
year = {2021},
date = {2021-01-26},
journal = {Frontiers in Communication, section Language Sciences},
abstract = {We present empirical evidence of the communicative utility of CONVENTIONALIZATION, i.e., convergence in linguistic usage over time, and DIVERSIFICATION, i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation and grammaticalization (Bybee, 2010; Schmid, 2015) and diversification is a cornerstone in the formation of sublanguages/registers, i.e., functional linguistic varieties (Halliday, 1988; Harris, 1991). While it is widely acknowledged that change in language use is primarily socio-culturally determined pushing towards greater linguistic expressivity, we here highlight the limiting function of communicative factors on diachronic linguistic variation showing that conventionalization and diversification are associated with a reduction of linguistic variability. To be able to observe effects of linguistic variability reduction, we first need a well-defined notion of choice in context. Linguistically, this implies the paradigmatic axis of linguistic organization, i.e., the sets of linguistic options available in a given or similar syntagmatic contexts. Here, we draw on word embeddings, weakly neural distributional language models that have recently been employed to model lexicalsemantic change and allow us to approximate the notion of paradigm by neighbourhood in vector space. Second, we need to capture changes in paradigmatic variability, i.e. reduction/expansion of linguistic options in a given context. As a formal index of paradigmatic variability we use entropy, which measures the contribution of linguistic units (e.g., words) in predicting linguistic choice in bits of information. Using entropy provides us with a link to a communicative interpretation, as it is a well-established measure of communicative efficiency with implications for cognitive processing (Linzen and Jaeger, 2016; Venhuizen et al., 2019); also, entropy is negatively correlated with distance in (word embedding) spaces which in turn shows cognitive reflexes in certain language processing tasks (Mitchel et al., 2008; Auguste et al., 2017). In terms of domain we focus on science, looking at the diachronic development of scientific English from the 17th century to modern time. This provides us with a fairly constrained yet dynamic domain of discourse that has witnessed a powerful systematization throughout the centuries and developed specific linguistic conventions geared towards efficient communication. Overall, our study confirms the assumed trends of conventionalization and diversification shown by diachronically decreasing entropy, interspersed with local, temporary entropy highs pointing to phases of linguistic expansion pertaining primarily to introduction of new technical terminology.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Kudera, Jacek; Tavi, Lauri; Möbius, Bernd; Avgustinova, Tania; Klakow, Dietrich

The effect of surprisal on articulatory gestures in Polish consonant-to-vowel transitions: A pilot EMA study Inproceedings

14. ITG-Konferenz, ITG-Fachbericht 298: Speech Communication, pp. 179-183, Kiel, Germany, 2021, ISBN 978-3-8007-5627-8.

This study is concerned with the relation between the information-theoretic notion of surprisal and articulatory gesture in Polish consonant-to-vowel transitions. It addresses the question of the influence of diphone predictability on spectral trajectories and articulatory gestures by relating the effect of surprisal with motor fluency. The study combines the computation of locus equations (LE) with kinematic data obtained from electromagnetic articulograph (EMA). The kinematic and acoustic data showed that a small coarticulation effect was present in the highand low-surprisal clusters. Regardless of some small discrepancies across the measures, a high degree of overlap of adjacent segments is reported for the mid-surprisal group in both domains. Two explanations of the observed effect are proposed. The first refers to low-surprisal coarticulation resistance and suggests the need to disambiguate predictable sequences. The second, observed in high surprisal clusters, refers to the prominence given to emphasize the unexpected concatenation.

@inproceedings{Kudera/etal:2021c,
title = {The effect of surprisal on articulatory gestures in Polish consonant-to-vowel transitions: A pilot EMA study},
author = {Jacek Kudera and Lauri Tavi and Bernd M{\"o}bius and Tania Avgustinova and Dietrich Klakow},
url = {https://ieeexplore.ieee.org/document/9657527},
year = {2021},
date = {2021},
booktitle = {14. ITG-Konferenz, ITG-Fachbericht 298: Speech Communication},
isbn = {978-3-8007-5627-8},
pages = {179-183},
address = {Kiel, Germany},
abstract = {This study is concerned with the relation between the information-theoretic notion of surprisal and articulatory gesture in Polish consonant-to-vowel transitions. It addresses the question of the influence of diphone predictability on spectral trajectories and articulatory gestures by relating the effect of surprisal with motor fluency. The study combines the computation of locus equations (LE) with kinematic data obtained from electromagnetic articulograph (EMA). The kinematic and acoustic data showed that a small coarticulation effect was present in the highand low-surprisal clusters. Regardless of some small discrepancies across the measures, a high degree of overlap of adjacent segments is reported for the mid-surprisal group in both domains. Two explanations of the observed effect are proposed. The first refers to low-surprisal coarticulation resistance and suggests the need to disambiguate predictable sequences. The second, observed in high surprisal clusters, refers to the prominence given to emphasize the unexpected concatenation.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Kudera, Jacek; Georgis, Philip; Möbius, Bernd; Avgustinova, Tania; Klakow, Dietrich

Phonetic Distance and Surprisal in Multilingual Priming: Evidence from Slavic Inproceedings

Proc. Interspeech, pp. 3944-3948, 2021.

This study reveals the relation between surprisal, phonetic distance, and latency based on a multilingual, short-term priming framework. Four Slavic languages (Bulgarian, Czech, Polish, and Russian) are investigated across two priming conditions: associative and phonetic priming, involving true cognates and near-homophones, respectively. This research is grounded in the methodology of information theory and proposes new methods for quantifying differences between meaningful lexical primes and targets for closely related languages. It also outlines the influence of phonetic distance between cognate and noncognate pairs of primes and targets on response times in a cross-lingual lexical decision task. The experimental results show that phonetic distance moderates response times only in Polish and Czech, whereas the surprisal-based correspondence effect is an accurate predictor of latency for all tested languages. The information-theoretic approach of quantifying feature-based alternations between Slavic cognates and near-homophones appears to be a valid method for latency moderation in the auditory modality. The outcomes of this study suggest that the surprisal-based (un)expectedness of spoken stimuli is an accurate predictor of human performance in multilingual lexical decision tasks.

@inproceedings{kudera21_interspeech,
title = {Phonetic Distance and Surprisal in Multilingual Priming: Evidence from Slavic},
author = {Jacek Kudera and Philip Georgis and Bernd M{\"o}bius and Tania Avgustinova and Dietrich Klakow},
url = {https://www.isca-speech.org/archive/interspeech_2021/kudera21_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2021-1003},
year = {2021},
date = {2021},
booktitle = {Proc. Interspeech},
pages = {3944-3948},
abstract = {This study reveals the relation between surprisal, phonetic distance, and latency based on a multilingual, short-term priming framework. Four Slavic languages (Bulgarian, Czech, Polish, and Russian) are investigated across two priming conditions: associative and phonetic priming, involving true cognates and near-homophones, respectively. This research is grounded in the methodology of information theory and proposes new methods for quantifying differences between meaningful lexical primes and targets for closely related languages. It also outlines the influence of phonetic distance between cognate and noncognate pairs of primes and targets on response times in a cross-lingual lexical decision task. The experimental results show that phonetic distance moderates response times only in Polish and Czech, whereas the surprisal-based correspondence effect is an accurate predictor of latency for all tested languages. The information-theoretic approach of quantifying feature-based alternations between Slavic cognates and near-homophones appears to be a valid method for latency moderation in the auditory modality. The outcomes of this study suggest that the surprisal-based (un)expectedness of spoken stimuli is an accurate predictor of human performance in multilingual lexical decision tasks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Successfully