Publications

Lemke, Tyll Robin; Schäfer, Lisa; Reich, Ingo

Can identity conditions on ellipsis be explained by processing principles? Inproceedings

Hörnig, Robin; von Wietersheim, Sophie; Konietzko, Andreas; Featherston, Sam;  (Ed.): Proceedings of Linguistic Evidence 2020: Linguistic Theory Enriched by Experimental Data, University of Tübingen, pp. 541-561, Tübingen, Germany, 2022.

In the theoretical literature, the unacceptability of (some) structural mismatches between the antecedent and the target of ellipsis have been taken to indicate that ellipsis is subject to syntactic identity conditions. Such constraints have been defended for verb phrase ellipsis (VPE) (Arregui et al., 2006; Merchant, 2013) and sluicing (Chung, 2006, 2013). The assumption of syntactic identity conditions increases the complexity of the grammar, because conditions which are specific to particular ellipses must be added to a system of more general rules. If the data that apparently support syntactic identity conditions could be explained by independently motivated principles, this would consequently reduce the complexity of the syntactic system. In this article we investigate syntactic identity conditions proposed by Chung (2006, 2013) for sluicing, i.e. the ellipsis of the TP in a wh-question, which is survived only by the wh-phrase (1a) (Ross, 1969). Our study shows that apparent grammaticality contrasts can be accounted for by a probabilistic processing account, which is supported by an acceptability rating, a production and a self-paced reading experiment. In contrast, Chung’s constraints lead to predictions which are not supported by our data.

@inproceedings{lemke.etalidentity,
title = {Can identity conditions on ellipsis be explained by processing principles?},
author = {Tyll Robin Lemke and Lisa Sch{\"a}fer and Ingo Reich},
editor = {Robin H{\"o}rnig and Sophie von Wietersheim and Andreas Konietzko and Sam Featherston},
url = {https://publikationen.uni-tuebingen.de/xmlui/handle/10900/119301},
year = {2022},
date = {2022},
booktitle = {Proceedings of Linguistic Evidence 2020: Linguistic Theory Enriched by Experimental Data},
pages = {541-561},
publisher = {University of T{\"u}bingen},
address = {T{\"u}bingen, Germany},
abstract = {In the theoretical literature, the unacceptability of (some) structural mismatches between the antecedent and the target of ellipsis have been taken to indicate that ellipsis is subject to syntactic identity conditions. Such constraints have been defended for verb phrase ellipsis (VPE) (Arregui et al., 2006; Merchant, 2013) and sluicing (Chung, 2006, 2013). The assumption of syntactic identity conditions increases the complexity of the grammar, because conditions which are specific to particular ellipses must be added to a system of more general rules. If the data that apparently support syntactic identity conditions could be explained by independently motivated principles, this would consequently reduce the complexity of the syntactic system. In this article we investigate syntactic identity conditions proposed by Chung (2006, 2013) for sluicing, i.e. the ellipsis of the TP in a wh-question, which is survived only by the wh-phrase (1a) (Ross, 1969). Our study shows that apparent grammaticality contrasts can be accounted for by a probabilistic processing account, which is supported by an acceptability rating, a production and a self-paced reading experiment. In contrast, Chung’s constraints lead to predictions which are not supported by our data.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin; Reich, Ingo; Schäfer, Lisa

Questions under discussion, salience and the acceptability of fragments Incollection Forthcoming

Konietzko, Andreas; Winkler, Susanne;  (Ed.): Information Structure and Discourse in Generative Grammar: Mechanisms and Processes, De Gruyter Mouton, Berlin; Boston, 2022.

@incollection{lemke.etalquestions,
title = {Questions under discussion, salience and the acceptability of fragments},
author = {Tyll Robin Lemke and Ingo Reich and Lisa Sch{\"a}fer},
editor = {Andreas Konietzko and Susanne Winkler},
year = {2022},
date = {2022},
booktitle = {Information Structure and Discourse in Generative Grammar: Mechanisms and Processes},
publisher = {De Gruyter Mouton},
address = {Berlin; Boston},
pubstate = {forthcoming},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin; Reich, Ingo; Schäfer, Lisa; Drenhaus, Heiner

Predictable words are more likely to be omitted in fragments – Evidence from production data Journal Article

Frontiers in Psychology, 12, pp. 662125, 2021.

Instead of a full sentence like Bring me to the university (uttered by the passenger to a taxi driver) speakers often use fragments like To the university to get their message across. So far there is no comprehensive and empirically supported account of why and under which circumstances speakers sometimes prefer a fragment over the corresponding full sentence. We propose an information-theoretic account to model this choice: A speaker chooses the encoding that distributes information most uniformly across the utterance in order to make the most efficient use of the hearer’s processing resources (Uniform Information Density, Levy and Jaeger, 2007). Since processing effort is related to the predictability of words (Hale, 2001) our account predicts two effects of word probability on omissions: First, omitting predictable words (which are more easily processed), avoids underutilizing processing resources. Second, inserting words before very unpredictable words distributes otherwise excessively high processing effort more uniformly. We test these predictions with a production study that supports both of these predictions. Our study makes two main contributions: First we develop an empirically motivated and supported account of fragment usage. Second, we extend previous evidence for information-theoretic processing constraints on language in two ways: We find predictability effects on omissions driven by extralinguistic context, whereas previous research mostly focused on effects of local linguistic context. Furthermore, we show that omissions of content words are also subject to information-theoretic well-formedness considerations. Previously, this has been shown mostly for the omission of function words.

@article{lemke.etal2021.frontiers,
title = {Predictable words are more likely to be omitted in fragments – Evidence from production data},
author = {Tyll Robin Lemke and Ingo Reich and Lisa Sch{\"a}fer and Heiner Drenhaus},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2021.662125/full},
doi = {https://doi.org/10.3389/fpsyg.2021.662125},
year = {2021},
date = {2021-07-22},
journal = {Frontiers in Psychology},
pages = {662125},
volume = {12},
abstract = {Instead of a full sentence like Bring me to the university (uttered by the passenger to a taxi driver) speakers often use fragments like To the university to get their message across. So far there is no comprehensive and empirically supported account of why and under which circumstances speakers sometimes prefer a fragment over the corresponding full sentence. We propose an information-theoretic account to model this choice: A speaker chooses the encoding that distributes information most uniformly across the utterance in order to make the most efficient use of the hearer's processing resources (Uniform Information Density, Levy and Jaeger, 2007). Since processing effort is related to the predictability of words (Hale, 2001) our account predicts two effects of word probability on omissions: First, omitting predictable words (which are more easily processed), avoids underutilizing processing resources. Second, inserting words before very unpredictable words distributes otherwise excessively high processing effort more uniformly. We test these predictions with a production study that supports both of these predictions. Our study makes two main contributions: First we develop an empirically motivated and supported account of fragment usage. Second, we extend previous evidence for information-theoretic processing constraints on language in two ways: We find predictability effects on omissions driven by extralinguistic context, whereas previous research mostly focused on effects of local linguistic context. Furthermore, we show that omissions of content words are also subject to information-theoretic well-formedness considerations. Previously, this has been shown mostly for the omission of function words.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Schäfer, Lisa; Lemke, Tyll Robin; Drenhaus, Heiner; Reich, Ingo

The Role of UID for the Usage of Verb Phrase Ellipsis: Psycholinguistic Evidence From Length and Context Effects Journal Article

Frontiers in Psychology, 12, pp. 1672, 2021, ISSN 1664-1078.

We investigate the underexplored question of when speakers make use of the omission phenomenon verb phrase ellipsis (VPE) in English given that the full form is also available to them. We base the interpretation of our results on the well-established information-theoretic Uniform Information Density (UID) hypothesis: Speakers tend to distribute processing effort uniformly across utterances and avoid regions of low information by omitting redundant material through, e.g., VPE. We investigate the length of the omittable VP and its predictability in context as sources of redundancy which lead to larger or deeper regions of low information and an increased pressure to use ellipsis. We use both naturalness rating and self-paced reading studies in order to link naturalness patterns to potential processing difficulties. For the length effects our rating and reading results support a UID account. Surprisingly, we do not find an effect of the context on the naturalness and the processing of VPE. We suggest that our manipulation might have been too weak or not effective to evidence such an effect.

@article{schaeferetal_2021b,
title = {The Role of UID for the Usage of Verb Phrase Ellipsis: Psycholinguistic Evidence From Length and Context Effects},
author = {Lisa Sch{\"a}fer and Tyll Robin Lemke and Heiner Drenhaus and Ingo Reich},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2021.661087/full},
doi = {https://doi.org/10.3389/fpsyg.2021.661087},
year = {2021},
date = {2021-05-26},
journal = {Frontiers in Psychology},
pages = {1672},
volume = {12},
abstract = {We investigate the underexplored question of when speakers make use of the omission phenomenon verb phrase ellipsis (VPE) in English given that the full form is also available to them. We base the interpretation of our results on the well-established information-theoretic Uniform Information Density (UID) hypothesis: Speakers tend to distribute processing effort uniformly across utterances and avoid regions of low information by omitting redundant material through, e.g., VPE. We investigate the length of the omittable VP and its predictability in context as sources of redundancy which lead to larger or deeper regions of low information and an increased pressure to use ellipsis. We use both naturalness rating and self-paced reading studies in order to link naturalness patterns to potential processing difficulties. For the length effects our rating and reading results support a UID account. Surprisingly, we do not find an effect of the context on the naturalness and the processing of VPE. We suggest that our manipulation might have been too weak or not effective to evidence such an effect.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Schäfer, Lisa

Topic drop in German: Empirical support for an information-theoretic account to a long-known omission phenomenon Journal Article

Zeitschrift für Sprachwissenschaft, 40, pp. 161-197, 2021, ISSN 1613-3706, 0721-9067.

German allows for topic drop (Fries1988), the omission of a preverbal constituent from a V2 sentence. I address the underexplored question of why speakers use topic drop with a corpus study and two acceptability rating studies. I propose an information-theoretic explanation based on the Uniform Information Density hypothesis (Levy and Jaeger2007) that accounts for the full picture of data. The information-theoretic approach predicts that topic drop is more felicitous when the omitted constituent is predictable in context and easy to recover. This leads to a more optimal use of the hearer’s processing capacities. The corpus study on the FraC corpus (Horch and Reich2017) shows that grammatical person, verb probability and verbal inflection impact the frequency of topic drop. The two rating experiments indicate that these differences in frequency are also reflected in acceptability and additionally evidence an impact of topicality on topic drop. Taken together my studies constitute the first systematic empirical investigation of previously only sparsely researched observations from the literature. My information-theoretic account provides a unifying explanation of these isolated observations and is also able to account for the effect of verb probability that I find in my corpus study.

@article{schaefer_2021a,
title = {Topic drop in German: Empirical support for an information-theoretic account to a long-known omission phenomenon},
author = {Lisa Sch{\"a}fer},
url = {https://www.degruyter.com/document/doi/10.1515/zfs-2021-2024/html},
doi = {https://doi.org/10.1515/zfs-2021-2024},
year = {2021},
date = {2021-05-19},
journal = {Zeitschrift f{\"u}r Sprachwissenschaft},
pages = {161-197},
volume = {40},
number = {2},
abstract = {German allows for topic drop (Fries1988), the omission of a preverbal constituent from a V2 sentence. I address the underexplored question of why speakers use topic drop with a corpus study and two acceptability rating studies. I propose an information-theoretic explanation based on the Uniform Information Density hypothesis (Levy and Jaeger2007) that accounts for the full picture of data. The information-theoretic approach predicts that topic drop is more felicitous when the omitted constituent is predictable in context and easy to recover. This leads to a more optimal use of the hearer’s processing capacities. The corpus study on the FraC corpus (Horch and Reich2017) shows that grammatical person, verb probability and verbal inflection impact the frequency of topic drop. The two rating experiments indicate that these differences in frequency are also reflected in acceptability and additionally evidence an impact of topicality on topic drop. Taken together my studies constitute the first systematic empirical investigation of previously only sparsely researched observations from the literature. My information-theoretic account provides a unifying explanation of these isolated observations and is also able to account for the effect of verb probability that I find in my corpus study.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Köhne-Fuetterer, Judith; Drenhaus, Heiner; Delogu, Francesca; Demberg, Vera

The online processing of causal and concessive discourse connectives Journal Article

Linguistics, 59, pp. 417-448, 2021.

While there is a substantial amount of evidence for language processing being a highly incremental and predictive process, we still know relatively little about how top-down discourse based expectations are combined with bottom-up information such as discourse connectives. The present article reports on three experiments investigating this question using different methodologies (visual world paradigm and ERPs) in two languages (German and English). We find support for highly incremental processing of causal and concessive discourse connectives, causing anticipation of upcoming material. Our visual world study shows that anticipatory looks depend on the discourse connective; furthermore, the German ERP study revealed an N400 effect on a gender-marked adjective preceding the target noun, when the target noun was inconsistent with the expectations elicited by the combination of context and discourse connective. Moreover, our experiments reveal that the facilitation of downstream material based on earlier connectives comes at the cost of reversing original expectations, as evidenced by a P600 effect on the concessive relative to the causal connective.

@article{koehne2021online,
title = {The online processing of causal and concessive discourse connectives},
author = {Judith K{\"o}hne-Fuetterer and Heiner Drenhaus and Francesca Delogu and Vera Demberg},
url = {https://doi.org/10.1515/ling-2021-0011},
doi = {https://doi.org/doi:10.1515/ling-2021-0011},
year = {2021},
date = {2021-03-04},
journal = {Linguistics},
pages = {417-448},
volume = {59},
number = {2},
abstract = {While there is a substantial amount of evidence for language processing being a highly incremental and predictive process, we still know relatively little about how top-down discourse based expectations are combined with bottom-up information such as discourse connectives. The present article reports on three experiments investigating this question using different methodologies (visual world paradigm and ERPs) in two languages (German and English). We find support for highly incremental processing of causal and concessive discourse connectives, causing anticipation of upcoming material. Our visual world study shows that anticipatory looks depend on the discourse connective; furthermore, the German ERP study revealed an N400 effect on a gender-marked adjective preceding the target noun, when the target noun was inconsistent with the expectations elicited by the combination of context and discourse connective. Moreover, our experiments reveal that the facilitation of downstream material based on earlier connectives comes at the cost of reversing original expectations, as evidenced by a P600 effect on the concessive relative to the causal connective.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   A1 B2 B3

Lemke, Tyll Robin; Schäfer, Lisa; Reich, Ingo

Modeling the predictive potential of extralinguistic context with script knowledge: The case of fragments Journal Article

PLOS ONE, 16, pp. e0246255, 2021.

We describe a novel approach to estimating the predictability of utterances given extralinguistic context in psycholinguistic research. Predictability effects on language production and comprehension are widely attested, but so far predictability has mostly been manipulated through local linguistic context, which is captured with n-gram language models. However, this method does not allow to investigate predictability effects driven by extralinguistic context. Modeling effects of extralinguistic context is particularly relevant to discourse-initial expressions, which can be predictable even if they lack linguistic context at all. We propose to use script knowledge as an approximation to extralinguistic context. Since the application of script knowledge involves the generation of prediction about upcoming events, we expect that scrips can be used to manipulate the likelihood of linguistic expressions referring to these events. Previous research has shown that script-based discourse expectations modulate the likelihood of linguistic expressions, but script knowledge has often been operationalized with stimuli which were based on researchers’ intuitions and/or expensive production and norming studies. We propose to quantify the likelihood of an utterance based on the probability of the event to which it refers. This probability is calculated with event language models trained on a script knowledge corpus and modulated with probabilistic event chains extracted from the corpus. We use the DeScript corpus of script knowledge to obtain empirically founded estimates of the likelihood of an event to occur in context without having to resort to expensive pre-tests of the stimuli. We exemplify our method at a case study on the usage of nonsentential expressions (fragments), which shows that utterances that are predictable given script-based extralinguistic context are more likely to be reduced.

@article{Lemke2021,
title = {Modeling the predictive potential of extralinguistic context with script knowledge: The case of fragments},
author = {Tyll Robin Lemke and Lisa Sch{\"a}fer and Ingo Reich},
url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0246255},
doi = {https://doi.org/10.1371/journal.pone.0246255},
year = {2021},
date = {2021-02-11},
journal = {PLOS ONE},
pages = {e0246255},
volume = {16},
number = {2},
abstract = {We describe a novel approach to estimating the predictability of utterances given extralinguistic context in psycholinguistic research. Predictability effects on language production and comprehension are widely attested, but so far predictability has mostly been manipulated through local linguistic context, which is captured with n-gram language models. However, this method does not allow to investigate predictability effects driven by extralinguistic context. Modeling effects of extralinguistic context is particularly relevant to discourse-initial expressions, which can be predictable even if they lack linguistic context at all. We propose to use script knowledge as an approximation to extralinguistic context. Since the application of script knowledge involves the generation of prediction about upcoming events, we expect that scrips can be used to manipulate the likelihood of linguistic expressions referring to these events. Previous research has shown that script-based discourse expectations modulate the likelihood of linguistic expressions, but script knowledge has often been operationalized with stimuli which were based on researchers’ intuitions and/or expensive production and norming studies. We propose to quantify the likelihood of an utterance based on the probability of the event to which it refers. This probability is calculated with event language models trained on a script knowledge corpus and modulated with probabilistic event chains extracted from the corpus. We use the DeScript corpus of script knowledge to obtain empirically founded estimates of the likelihood of an event to occur in context without having to resort to expensive pre-tests of the stimuli. We exemplify our method at a case study on the usage of nonsentential expressions (fragments), which shows that utterances that are predictable given script-based extralinguistic context are more likely to be reduced.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin

Satzäquivalente — Syntax oder Pragmatik? Incollection

Külpmann, Robert; Finkbeiner, Rita;  (Ed.): Neues zur Selbstständigkeit von Sätzen, Linguistische Berichte, Sonderheft, Buske, pp. 81-104, Hamburg, 2021, ISBN 978-3-96769-170-2 .

„Satzäquivalente“ scheinen einen Widerspruch zwischen Syntax und Pragmatik darzustellen, da sie trotz nichtsententialer Form die selben Funktionen wie Sätze erfüllen. Wir stellen zwei Experimente vor, die Vorhersagen zweier theoretischer Perspektiven auf diese Ausdrücke untersuchen. Einerseits generieren elliptische Ansätze (Morgan, 1973; Merchant, 2004; Reich, 2007) Satzäquivalente mittels Ellipse aus vollständigen Sätzen, andererseits schlagen nichtsententiale Ansätze (Barton & Progovac, 2005; Stainton, 2006) vor, dass die Syntax subsententiale Ausdrücke generieren kann.

@incollection{Lemke2021a,
title = {Satz{\"a}quivalente — Syntax oder Pragmatik?},
author = {Tyll Robin Lemke},
editor = {Robert K{\"u}lpmann and Rita Finkbeiner},
url = {https://buske.de/zeitschriften-bei-sonderhefte/linguistische-berichte-sonderhefte/neues-zur-selbststandigkeit-von-satzen-16620.html},
doi = {https://doi.org/10.46771/978-3-96769-170-2},
year = {2021},
date = {2021},
booktitle = {Neues zur Selbstst{\"a}ndigkeit von S{\"a}tzen},
isbn = {978-3-96769-170-2},
pages = {81-104},
publisher = {Buske},
address = {Hamburg},
abstract = {"Satz{\"a}quivalente" scheinen einen Widerspruch zwischen Syntax und Pragmatik darzustellen, da sie trotz nichtsententialer Form die selben Funktionen wie S{\"a}tze erf{\"u}llen. Wir stellen zwei Experimente vor, die Vorhersagen zweier theoretischer Perspektiven auf diese Ausdr{\"u}cke untersuchen. Einerseits generieren elliptische Ans{\"a}tze (Morgan, 1973; Merchant, 2004; Reich, 2007) Satz{\"a}quivalente mittels Ellipse aus vollst{\"a}ndigen S{\"a}tzen, andererseits schlagen nichtsententiale Ans{\"a}tze (Barton & Progovac, 2005; Stainton, 2006) vor, dass die Syntax subsententiale Ausdr{\"u}cke generieren kann.},
pubstate = {published},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin

Experimental investigations on the syntax and usage of fragments Miscellaneous

Experimental investigations on the syntax and usage of fragments, Open Germanic Linguistics, Language Science Press, Berlin, 2021.

This book investigates the syntax and usage of fragments (Morgan 1973), apparently subsentential utterances like „A coffee, please!“ which fulfill the same communicative function as the corresponding full sentence „I’d like to have a coffee, please!“. Even though such utterances are frequently used, they challenge the central role that has been attributed to the notion of sentence in linguistic theory, particularly from a semantic perspective.

The first part of the book is dedicated to the syntactic analysis of fragments, which is investigated with experimental methods. Currently there are several competing theoretical analyses of fragments, which rely almost only on introspective data. The experiments presented in this book constitute a first systematic evaluation of some of their crucial predictions and, taken together, support an in situ ellipsis account of fragments, as has been suggested by Reich (2007).

The second part of the book addresses the questions of why fragments are used at all, and under which circumstances they are preferred over complete sentences. Syntactic accounts impose licensing conditions on fragments, but they do not explain, why fragments are sometimes (dis)preferred provided that their usage is licensed. This book proposes an information-theoretic account of fragments, which predicts that the usage of fragments in constrained by a general tendency to distribute processing effort uniformly across the utterance. With respect to fragments, this leads to two predictions, which are empirically confirmed: Speakers tend towards omitting predictable words and they insert additional redundancy before unpredictable words.

@miscellaneous{Lemke2021,
title = {Experimental investigations on the syntax and usage of fragments},
author = {Tyll Robin Lemke},
url = {https://langsci-press.org/catalog/book/321},
doi = {https://doi.org/10.5281/zenodo.5596236},
year = {2021},
date = {2021},
booktitle = {Experimental investigations on the syntax and usage of fragments},
publisher = {Language Science Press},
address = {Berlin},
abstract = {This book investigates the syntax and usage of fragments (Morgan 1973), apparently subsentential utterances like "A coffee, please!" which fulfill the same communicative function as the corresponding full sentence "I'd like to have a coffee, please!". Even though such utterances are frequently used, they challenge the central role that has been attributed to the notion of sentence in linguistic theory, particularly from a semantic perspective. The first part of the book is dedicated to the syntactic analysis of fragments, which is investigated with experimental methods. Currently there are several competing theoretical analyses of fragments, which rely almost only on introspective data. The experiments presented in this book constitute a first systematic evaluation of some of their crucial predictions and, taken together, support an in situ ellipsis account of fragments, as has been suggested by Reich (2007). The second part of the book addresses the questions of why fragments are used at all, and under which circumstances they are preferred over complete sentences. Syntactic accounts impose licensing conditions on fragments, but they do not explain, why fragments are sometimes (dis)preferred provided that their usage is licensed. This book proposes an information-theoretic account of fragments, which predicts that the usage of fragments in constrained by a general tendency to distribute processing effort uniformly across the utterance. With respect to fragments, this leads to two predictions, which are empirically confirmed: Speakers tend towards omitting predictable words and they insert additional redundancy before unpredictable words.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin; Schäfer, Lisa; Drenhaus, Heiner; Reich, Ingo

Script Knowledge Constrains Ellipses in Fragments - Evidence from Production Data and Language Modeling Inproceedings

Proceedings of the Society for Computation in Linguistics, 3, 2020.

We investigate the effect of script-based (Schank and Abelson 1977) extralinguistic context on the omission of words in fragments. Our data elicited with a production task show that predictable words are more often omitted than unpredictable ones, as predicted by the Uniform Information Density (UID) hypothesis (Levy and Jaeger, 2007).

We take into account effects of linguistic and extralinguistic context on predictability and propose a method for estimating the surprisal of words in presence of ellipsis. Our study extends previous evidence for UID in two ways: First, we show that not only local linguistic context, but also extralinguistic context determines the likelihood of omissions. Second, we find UID effects on the omission of content words.

@inproceedings{Lemke2020,
title = {Script Knowledge Constrains Ellipses in Fragments - Evidence from Production Data and Language Modeling},
author = {Tyll Robin Lemke and Lisa Sch{\"a}fer and Heiner Drenhaus and Ingo Reich},
url = {https://scholarworks.umass.edu/scil/vol3/iss1/45},
doi = {https://doi.org/https://doi.org/10.7275/mpby-zr74 },
year = {2020},
date = {2020},
booktitle = {Proceedings of the Society for Computation in Linguistics},
abstract = {We investigate the effect of script-based (Schank and Abelson 1977) extralinguistic context on the omission of words in fragments. Our data elicited with a production task show that predictable words are more often omitted than unpredictable ones, as predicted by the Uniform Information Density (UID) hypothesis (Levy and Jaeger, 2007). We take into account effects of linguistic and extralinguistic context on predictability and propose a method for estimating the surprisal of words in presence of ellipsis. Our study extends previous evidence for UID in two ways: First, we show that not only local linguistic context, but also extralinguistic context determines the likelihood of omissions. Second, we find UID effects on the omission of content words.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Reich, Ingo

Saulecker und supergemütlich! Pilotstudien zur fragmentarischen Verwendung expressiver Adjektive. Incollection

d'Avis, Franz; Finkbeiner, Rita (Ed.): Expressivität im Deutschen, De Gruyter, pp. 109-128, Berlin, Boston, 2019.

Schaut man auf dem Kika die „Jungs-WG“ oder „Durch die Wildnis“, dann ist gefühlt jede dritte Äußerung eine isolierte Verwendung eines expressiven Adjektivs der Art „Mega!. Ausgehend von dieser ersten impressionistischen Beobachtung wird in diesem Artikel sowohl korpuslinguistisch wie auch experimentell der Hypothese nachgegangen, dass expressive Adjektive in fragmentarischer Verwendung signifikant akzeptabler sind als deskriptive Adjektive. Während sich diese Hypothese im Korpus zunächst weitgehend bestätigt, zeigen die experimentellen Untersuchungen zwar, dass expressive Äußerungen generell besser bewertet werden als deskriptive Äußerungen, die ursprüngliche Hypothese lässt sich aber nicht bestätigen. Die Diskrepanz zwischen den korpuslinguistischen und den experimentellen Ergebnissen wird in der Folge auf eine Unterscheidung zwischen individuenbezogenen und äußerungsbezogenen (expressiven) Adjektiven zurückgeführt und festgestellt, dass die Korpusergebnisse die Verteilung äußerungsbezogener expressiver Adjektive nachzeichnen, während sich die Experimente alleine auf individuenbezogene (expressive) Adjektive beziehen. Die ursprüngliche Hypothese wäre daher in dem Sinne zu qualifizieren, dass sie nur Aussagen über die isolierte Verwendung äußerungsbezogener Adjektive macht.

@incollection{Reich2019,
title = {Saulecker und supergem{\"u}tlich! Pilotstudien zur fragmentarischen Verwendung expressiver Adjektive.},
author = {Ingo Reich},
editor = {Franz d'Avis and Rita Finkbeiner},
url = {https://www.degruyter.com/document/doi/10.1515/9783110630190-005/html},
doi = {https://doi.org/10.1515/9783110630190-005},
year = {2019},
date = {2019},
booktitle = {Expressivit{\"a}t im Deutschen},
pages = {109-128},
publisher = {De Gruyter},
address = {Berlin, Boston},
abstract = {Schaut man auf dem Kika die „Jungs-WG“ oder „Durch die Wildnis“, dann ist gef{\"u}hlt jede dritte {\"A}u{\ss}erung eine isolierte Verwendung eines expressiven Adjektivs der Art „Mega!. Ausgehend von dieser ersten impressionistischen Beobachtung wird in diesem Artikel sowohl korpuslinguistisch wie auch experimentell der Hypothese nachgegangen, dass expressive Adjektive in fragmentarischer Verwendung signifikant akzeptabler sind als deskriptive Adjektive. W{\"a}hrend sich diese Hypothese im Korpus zun{\"a}chst weitgehend best{\"a}tigt, zeigen die experimentellen Untersuchungen zwar, dass expressive {\"A}u{\ss}erungen generell besser bewertet werden als deskriptive {\"A}u{\ss}erungen, die urspr{\"u}ngliche Hypothese l{\"a}sst sich aber nicht best{\"a}tigen. Die Diskrepanz zwischen den korpuslinguistischen und den experimentellen Ergebnissen wird in der Folge auf eine Unterscheidung zwischen individuenbezogenen und {\"a}u{\ss}erungsbezogenen (expressiven) Adjektiven zur{\"u}ckgef{\"u}hrt und festgestellt, dass die Korpusergebnisse die Verteilung {\"a}u{\ss}erungsbezogener expressiver Adjektive nachzeichnen, w{\"a}hrend sich die Experimente alleine auf individuenbezogene (expressive) Adjektive beziehen. Die urspr{\"u}ngliche Hypothese w{\"a}re daher in dem Sinne zu qualifizieren, dass sie nur Aussagen {\"u}ber die isolierte Verwendung {\"a}u{\ss}erungsbezogener Adjektive macht.},
pubstate = {published},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   B3

Reich, Ingo

Ellipsen Book Chapter

Liedtke, Frank; Tuchen, Astrid (Ed.): Handbuch Pragmatik, J.B. Metzler, pp. 240-251, Stuttgart, 2018, ISBN 978-3-476-04624-6.

Der Begriff ›Ellipse‹ wird in der Literatur nicht einheitlich verwendet und ist aufgrund der Heterogenität des Phänomenbereichs auch nicht ganz einfach zu definieren. In erster Annäherung kann man unter Ellipsen sprachliche Äußerungen verstehen, die in einem zu präzisierenden Sinne unvollständig sind oder von kompetenten Sprecher/innen (des Deutschen) als unvollständig aufgefasst werden.

@inbook{Reich2018,
title = {Ellipsen},
author = {Ingo Reich},
editor = {Frank Liedtke and Astrid Tuchen},
url = {https://doi.org/10.1007/978-3-476-04624-6_24},
doi = {https://doi.org/10.1007/978-3-476-04624-6_24},
year = {2018},
date = {2018},
booktitle = {Handbuch Pragmatik},
isbn = {978-3-476-04624-6},
pages = {240-251},
publisher = {J.B. Metzler},
address = {Stuttgart},
abstract = {Der Begriff ›Ellipse‹ wird in der Literatur nicht einheitlich verwendet und ist aufgrund der Heterogenit{\"a}t des Ph{\"a}nomenbereichs auch nicht ganz einfach zu definieren. In erster Ann{\"a}herung kann man unter Ellipsen sprachliche {\"A}u{\ss}erungen verstehen, die in einem zu pr{\"a}zisierenden Sinne unvollst{\"a}ndig sind oder von kompetenten Sprecher/innen (des Deutschen) als unvollst{\"a}ndig aufgefasst werden.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin; Horch, Eva; Reich, Ingo

Optimal encoding! - Information Theory constrains article omission in newspaper headlines Inproceedings

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Association for Computational Linguistics, pp. 131-135, Valencia, Spain, 2017.

In this paper we pursue the hypothesis that the distribution of article omission specifically is constrained by principles of Information Theory (Shannon 1948). In particular, Information Theory predicts a stronger preference for article omission before nouns which are relatively unpredictable in context of the preceding words. We investigated article omission in German newspaper headlines with a corpus and acceptability rating study. Both support our hypothesis: Articles are inserted more often before unpredictable nouns and subjects perceive article omission before predictable nouns as more well-formed than before unpredictable ones. This suggests that information theoretic principles constrain the distribution of article omission in headlines.

@inproceedings{LemkeHorchReich:17,
title = {Optimal encoding! - Information Theory constrains article omission in newspaper headlines},
author = {Tyll Robin Lemke and Eva Horch and Ingo Reich},
url = {https://www.aclweb.org/anthology/E17-2021},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
pages = {131-135},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {In this paper we pursue the hypothesis that the distribution of article omission specifically is constrained by principles of Information Theory (Shannon 1948). In particular, Information Theory predicts a stronger preference for article omission before nouns which are relatively unpredictable in context of the preceding words. We investigated article omission in German newspaper headlines with a corpus and acceptability rating study. Both support our hypothesis: Articles are inserted more often before unpredictable nouns and subjects perceive article omission before predictable nouns as more well-formed than before unpredictable ones. This suggests that information theoretic principles constrain the distribution of article omission in headlines.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin

Sentential or not? - An experimental investigation on the syntax of fragments Inproceedings

Proceedings of Linguistic Evidence 2016, Tübingen, 2017.
This paper presents four experiments on the syntactic structure of fragments, i.e. nonsentential utterances with propositional meaning and illocutionary force (Morgan, 1973). The experiments evaluate the predictions of two competing theories of fragments: Merchant’s (2004) movement and deletion account and Barton & Progovac’s (2005) nonsentential account. Experiment 1 provides evidence for case connectivity effects, this suggests that there is indeed unarticulated linguistic structure in fragments (unlike argued by Barton & Progovac 2005). Experiments 2-4 address a central prediction of the movement and deletion account: only those constituents which may appear in the left periphery are possible fragments. Merchant et al. (2013) present two studies on preposition stranding and complement clause topicalization in favor of this. My experiments 2-4 replicate and extend these studies in German and English. Taken together, the acceptability pattern predicted by Merchant (2004) holds only for the preposition stranding data (exp. 2), but not for complement clauses (exp.3) or German multiple prefield constituents (exp.4).

@inproceedings{Lemke-toappear,
title = {Sentential or not? - An experimental investigation on the syntax of fragments},
author = {Tyll Robin Lemke},
url = {https://publikationen.uni-tuebingen.de/xmlui/handle/10900/77657},
doi = {https://doi.org/10.15496/publikation-19058},
year = {2017},
date = {2017},
booktitle = {Proceedings of Linguistic Evidence 2016},
address = {T{\"u}bingen},
abstract = {

This paper presents four experiments on the syntactic structure of fragments, i.e. nonsentential utterances with propositional meaning and illocutionary force (Morgan, 1973). The experiments evaluate the predictions of two competing theories of fragments: Merchant's (2004) movement and deletion account and Barton & Progovac's (2005) nonsentential account. Experiment 1 provides evidence for case connectivity effects, this suggests that there is indeed unarticulated linguistic structure in fragments (unlike argued by Barton & Progovac 2005). Experiments 2-4 address a central prediction of the movement and deletion account: only those constituents which may appear in the left periphery are possible fragments. Merchant et al. (2013) present two studies on preposition stranding and complement clause topicalization in favor of this. My experiments 2-4 replicate and extend these studies in German and English. Taken together, the acceptability pattern predicted by Merchant (2004) holds only for the preposition stranding data (exp. 2), but not for complement clauses (exp.3) or German multiple prefield constituents (exp.4).
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Reich, Ingo

On the omission of articles and copulae in German newspaper headlines Journal Article

Linguistic Variation, 17, pp. 186-204, 2017.

This paper argues based on a corpus-linguistic study that both omitted articles and copulae in German headlines are to be treated as null elements NA and NC. Both items need to be licensed by a specific (parsing) strategy known as discourse orientation (Huang, 1984), which is also applicable in the special register of headlines. It is shown that distinguishing between discourse and sentence orientation and correlating these two strategies with λ-binding and existential quantification, respectively, naturally accounts for an asymmetry in article omission observed in Stowell (1991).

@article{Reich-inpress,
title = {On the omission of articles and copulae in German newspaper headlines},
author = {Ingo Reich},
url = {https://benjamins.com/catalog/lv.14017.rei},
doi = {https://doi.org/https://doi.org/10.1075/lv.14017.rei},
year = {2017},
date = {2017},
journal = {Linguistic Variation},
pages = {186-204},
volume = {17},
number = {2},
abstract = {

This paper argues based on a corpus-linguistic study that both omitted articles and copulae in German headlines are to be treated as null elements NA and NC. Both items need to be licensed by a specific (parsing) strategy known as discourse orientation (Huang, 1984), which is also applicable in the special register of headlines. It is shown that distinguishing between discourse and sentence orientation and correlating these two strategies with λ-binding and existential quantification, respectively, naturally accounts for an asymmetry in article omission observed in Stowell (1991).

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Horch, Eva; Reich, Ingo

The Fragment Corpus Inproceedings

Proceedings of the 9th International Corpus Linguistics Conference, pp. 392-393, Birmingham, UK, 2017.

We present the Fragment Corpus (FraC), a corpus for the investigation of fragments (see Morgan 1973), i.e. incomplete sentences, in German. The corpus is a mixed-register corpus and consists of 17 different text types including written (newspaper texts, legal texts, blogs etc.) and spoken texts (dialogues, interviews, radio moderations etc.), as well as social media texts (tweets, sms, chats). Each subcorpus comprises approx. 2000 utterance units (including fragments, following the orthographic notion of sentence) which amounts to a total corpus size of 380K tokens. The data was taken from electronically available sources.

@inproceedings{HorchReich:17,
title = {The Fragment Corpus},
author = {Eva Horch and Ingo Reich},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/30290},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 9th International Corpus Linguistics Conference},
pages = {392-393},
address = {Birmingham, UK},
abstract = {We present the Fragment Corpus (FraC), a corpus for the investigation of fragments (see Morgan 1973), i.e. incomplete sentences, in German. The corpus is a mixed-register corpus and consists of 17 different text types including written (newspaper texts, legal texts, blogs etc.) and spoken texts (dialogues, interviews, radio moderations etc.), as well as social media texts (tweets, sms, chats). Each subcorpus comprises approx. 2000 utterance units (including fragments, following the orthographic notion of sentence) which amounts to a total corpus size of 380K tokens. The data was taken from electronically available sources.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Reich, Ingo; Horch, Eva

On “Article Omission” in German and the “Uniform Information Density Hypothesis” Inproceedings

Dipper, Stefanie; Neubarth, Friedrich; Zinsmeister, Heike (Ed.): Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), 16, pp. 125-127, Bochum, 2016.

This paper investigates whether Information Theory (IT) in the tradition of Shannon (1948) and in particular the “Uniform Information Density Hypothesis” (UID, see Jager 2010) might contribute to our understanding of a phenomenon called “article omission” (AO) in the literature. To this effect, we trained language models on a corpus of 17 different text types (from prototypically written text types like legal texts to prototypically spoken text types like dialogue) with about 2.000 sentences each and compared the density profiles of minimal pairs. Our results suggest, firstly, that an overtly realized article significantly reduces the surprisal on the following head noun (as was to be expected). It also shows, however, that omitting the article results in a non-uniform distribution (thus contradicting the UID). Since empirically AO seems not to depend on specific lexical items, we also trained our language models on a more abstract level (part of speech). With respect to this level of analysis we were able to show that, again, an overtly realized article significantly reduces the surprisal on the following head noun, but at the same time AO results in a more uniform distribution of information. In the case of AO the UID thus seems to operate on the level of POS rather than on the lexical level.

@inproceedings{HorchReich2016,
title = {On “Article Omission” in German and the “Uniform Information Density Hypothesis”},
author = {Ingo Reich and Eva Horch},
editor = {Stefanie Dipper and Friedrich Neubarth and Heike Zinsmeister},
url = {https://www.linguistics.rub.de/konvens16/pub/16_konvensproc.pdf},
year = {2016},
date = {2016},
booktitle = {Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016)},
pages = {125-127},
address = {Bochum},
abstract = {This paper investigates whether Information Theory (IT) in the tradition of Shannon (1948) and in particular the “Uniform Information Density Hypothesis” (UID, see Jager 2010) might contribute to our understanding of a phenomenon called “article omission” (AO) in the literature. To this effect, we trained language models on a corpus of 17 different text types (from prototypically written text types like legal texts to prototypically spoken text types like dialogue) with about 2.000 sentences each and compared the density profiles of minimal pairs. Our results suggest, firstly, that an overtly realized article significantly reduces the surprisal on the following head noun (as was to be expected). It also shows, however, that omitting the article results in a non-uniform distribution (thus contradicting the UID). Since empirically AO seems not to depend on specific lexical items, we also trained our language models on a more abstract level (part of speech). With respect to this level of analysis we were able to show that, again, an overtly realized article significantly reduces the surprisal on the following head noun, but at the same time AO results in a more uniform distribution of information. In the case of AO the UID thus seems to operate on the level of POS rather than on the lexical level.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Successfully