Publications

Fischer, Stefan; Teich, Elke

More complex or just more diverse? Capturing diachronic linguistic variation Inproceedings

41. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft (DGfS), Bremen, Germany, 2019.

We present a diachronic comparison of general (register-mixed) and scientific English in the late modern period (1700–1900). For our analysis we use two corpora which are comparable in size and time-span: the Corpus of Late Modern English (CLMET; De Smet et al. 2015) and the Royal Society Corpus (RSC; Kermes et al. 2016). Previous studies of scientific English found a diachronic tendency from a verbal, involved to a more nominal, abstract style compared to other discourse types (cf. Halliday 1988; Biber & Gray 2011). The features reported include type-token ratio, lexical density, number of words per sentence and relative frequency of nominal vs. verbal categories—all potential indicators of linguistic complexity at a shallow level. We present results for these common measures on our data set as well as for selected information-theoretic measures, notably relative entropy (Kullback–Leibler divergence: KLD) and surprisal. For instance, using KLD, we observe a continuous divergence between general and scientific language based on word unigrams as well as part-of-speech trigrams. Lexical density increases over time for both scientific language and general language. In both corpora, sentence length decreases by roughly 25%, with scientific sentences being longer on average. On the other hand, mean sentence surprisal remains stable over time. The poster will give an overview of our results using the selected measures and discuss possible interpretations. Moreover, we will assess their utility for capturing linguistic diversification, showing that the information-theoretic measures are fairly fine-tuned, robust and link up well to explanations in terms of linguistic complexity and rational communication (cf. Hale 2016; Crocker, Demberg, & Teich 2016).

@inproceedings{Fischer2019,
title = {More complex or just more diverse? Capturing diachronic linguistic variation},
author = {Stefan Fischer and Elke Teich},
url = {http://www.dgfs2019.uni-bremen.de/abstracts/poster/Fischer_Teich.pdf},
year = {2019},
date = {2019-03-06},
publisher = {41. Jahrestagung der Deutschen Gesellschaft f{\"u}r Sprachwissenschaft (DGfS)},
address = {Bremen, Germany},
abstract = {We present a diachronic comparison of general (register-mixed) and scientific English in the late modern period (1700–1900). For our analysis we use two corpora which are comparable in size and time-span: the Corpus of Late Modern English (CLMET; De Smet et al. 2015) and the Royal Society Corpus (RSC; Kermes et al. 2016). Previous studies of scientific English found a diachronic tendency from a verbal, involved to a more nominal, abstract style compared to other discourse types (cf. Halliday 1988; Biber & Gray 2011). The features reported include type-token ratio, lexical density, number of words per sentence and relative frequency of nominal vs. verbal categories—all potential indicators of linguistic complexity at a shallow level. We present results for these common measures on our data set as well as for selected information-theoretic measures, notably relative entropy (Kullback–Leibler divergence: KLD) and surprisal. For instance, using KLD, we observe a continuous divergence between general and scientific language based on word unigrams as well as part-of-speech trigrams. Lexical density increases over time for both scientific language and general language. In both corpora, sentence length decreases by roughly 25%, with scientific sentences being longer on average. On the other hand, mean sentence surprisal remains stable over time. The poster will give an overview of our results using the selected measures and discuss possible interpretations. Moreover, we will assess their utility for capturing linguistic diversification, showing that the information-theoretic measures are fairly fine-tuned, robust and link up well to explanations in terms of linguistic complexity and rational communication (cf. Hale 2016; Crocker, Demberg, & Teich 2016).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Grosse, Kathrin; Trost, Thomas; Mosbach, Marius; Backes, Michael; Klakow, Dietrich

On the security relevance of weights in deep learning Journal Article

CoRR, 2019.

Recently, a weight-based attack on stochastic gradient descent inducing overfitting has been proposed. We show that the threat is broader: A task-independent permutation on the initial weights suffices to limit the achieved accuracy to for example 50% on the Fashion MNIST dataset from initially more than 90%. These findings are confirmed on MNIST and CIFAR. We formally confirm that the attack succeeds with high likelihood and does not depend on the data. Empirically, weight statistics and loss appear unsuspicious, making it hard to detect the attack if the user is not aware. Our paper is thus a call for action to acknowledge the importance of the initial weights in deep learning.

@article{Grosse2019,
title = {On the security relevance of weights in deep learning},
author = {Kathrin Grosse and Thomas Trost and Marius Mosbach and Michael Backes and Dietrich Klakow},
url = {https://arxiv.org/abs/1902.03020},
year = {2019},
date = {2019},
journal = {CoRR},
abstract = {Recently, a weight-based attack on stochastic gradient descent inducing overfitting has been proposed. We show that the threat is broader: A task-independent permutation on the initial weights suffices to limit the achieved accuracy to for example 50% on the Fashion MNIST dataset from initially more than 90%. These findings are confirmed on MNIST and CIFAR. We formally confirm that the attack succeeds with high likelihood and does not depend on the data. Empirically, weight statistics and loss appear unsuspicious, making it hard to detect the attack if the user is not aware. Our paper is thus a call for action to acknowledge the importance of the initial weights in deep learning.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B4

Engonopoulos, Nikos; Teichmann, Christoph; Koller, Alexander

Discovering user groups for natural language generation Inproceedings

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018.

We present a model which predicts how individual users of a dialog system understand and produce utterances based on user groups. In contrast to previous work, these user groups are not specified beforehand, but learned in training. We evaluate on two referring expression (RE) generation tasks; our experiments show that our model can identify user groups and learn how to most effectively talk to them, and can dynamically assign unseen users to the correct groups as they interact with the system.

@inproceedings{Engonopoulos2018discovering,
title = {Discovering user groups for natural language generation},
author = {Nikos Engonopoulos and Christoph Teichmann and Alexander Koller},
url = {https://arxiv.org/abs/1806.05947},
year = {2018},
date = {2018},
booktitle = {Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue},
abstract = {We present a model which predicts how individual users of a dialog system understand and produce utterances based on user groups. In contrast to previous work, these user groups are not specified beforehand, but learned in training. We evaluate on two referring expression (RE) generation tasks; our experiments show that our model can identify user groups and learn how to most effectively talk to them, and can dynamically assign unseen users to the correct groups as they interact with the system.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Jágrová, Klára; Avgustinova, Tania; Stenger, Irina; Fischer, Andrea

Language models, surprisal and fantasy in Slavic intercomprehension Journal Article

Computer Speech & Language, 2018.

In monolingual human language processing, the predictability of a word given its surrounding sentential context is crucial. With regard to receptive multilingualism, it is unclear to what extent predictability in context interplays with other linguistic factors in understanding a related but unknown language – a process called intercomprehension. We distinguish two dimensions influencing processing effort during intercomprehension: surprisal in sentential context and linguistic distance.

Based on this hypothesis, we formulate expectations regarding the difficulty of designed experimental stimuli and compare them to the results from think-aloud protocols of experiments in which Czech native speakers decode Polish sentences by agreeing on an appropriate translation. On the one hand, orthographic and lexical distances are reliable predictors of linguistic similarity. On the other hand, we obtain the predictability of words in a sentence with the help of trigram language models.

We find that linguistic distance (encoding similarity) and in-context surprisal (predictability in context) appear to be complementary, with neither factor outweighing the other, and that our distinguishing of these two measurable dimensions is helpful in understanding certain unexpected effects in human behaviour.

@article{Jágrová2018b,
title = {Language models, surprisal and fantasy in Slavic intercomprehension},
author = {Kl{\'a}ra J{\'a}grov{\'a} and Tania Avgustinova and Irina Stenger and Andrea Fischer},
url = {https://www.sciencedirect.com/science/article/pii/S0885230817300451},
year = {2018},
date = {2018},
journal = {Computer Speech & Language},
abstract = {In monolingual human language processing, the predictability of a word given its surrounding sentential context is crucial. With regard to receptive multilingualism, it is unclear to what extent predictability in context interplays with other linguistic factors in understanding a related but unknown language – a process called intercomprehension. We distinguish two dimensions influencing processing effort during intercomprehension: surprisal in sentential context and linguistic distance. Based on this hypothesis, we formulate expectations regarding the difficulty of designed experimental stimuli and compare them to the results from think-aloud protocols of experiments in which Czech native speakers decode Polish sentences by agreeing on an appropriate translation. On the one hand, orthographic and lexical distances are reliable predictors of linguistic similarity. On the other hand, we obtain the predictability of words in a sentence with the help of trigram language models. We find that linguistic distance (encoding similarity) and in-context surprisal (predictability in context) appear to be complementary, with neither factor outweighing the other, and that our distinguishing of these two measurable dimensions is helpful in understanding certain unexpected effects in human behaviour.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Jágrová, Klára; Stenger, Irina; Avgustinova, Tania

Polski nadal nieskomplikowany? Interkomprehensionsexperimente mit Nominalphrasen Journal Article

Polnisch in Deutschland. Zeitschrift der Bundesvereinigung der Polnischlehrkräfte, 5/2017, pp. 20-37, 2018.

@article{Jágrová2018,
title = {Polski nadal nieskomplikowany? Interkomprehensionsexperimente mit Nominalphrasen},
author = {Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova},
year = {2018},
date = {2018},
journal = {Polnisch in Deutschland. Zeitschrift der Bundesvereinigung der Polnischlehrkr{\"a}fte},
pages = {20-37},
volume = {5/2017},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Tourtouri, Elli; Sikos, Les; Crocker, Matthew W.

Referential Entropy influences Overspecification: Evidence from Production Miscellaneous

31st Annual CUNY Sentence Processing Conference, UC Davis, Davis CA, USA, 2018.

Specificity in referential communication

  • Grice’s Maxim of Quantity [1]: Speakers should produce only informa9on that is strictly necessary for identifying the target
  • However, it is possible to establish reference with either minimally-specified (MS; precise) or over-specified (OS; redundant) expressions
  • Moreover, speakers overspecify frequently and systematically [e.g., 2-6]

Q: Why do people overspecificy?

 

@miscellaneous{Tourtourietal2018a,
title = {Referential Entropy influences Overspecification: Evidence from Production},
author = {Elli Tourtouri and Les Sikos and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/323809271_Referential_entropy_influences_overspecification_Evidence_from_production},
year = {2018},
date = {2018},
booktitle = {31st Annual CUNY Sentence Processing Conference},
publisher = {UC Davis},
address = {Davis CA, USA},
abstract = {Specificity in referential communication

  • Grice’s Maxim of Quantity [1]: Speakers should produce only informa9on that is strictly necessary for identifying the target
  • However, it is possible to establish reference with either minimally-specified (MS; precise) or over-specified (OS; redundant) expressions
  • Moreover, speakers overspecify frequently and systematically [e.g., 2-6]
Q: Why do people overspecificy?},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   C3

Karakanta, Alina; Przybyl, Heike; Teich, Elke

Exploring Variation in Translation with Relative Entropy Inproceedings

Lavid-López, Carmen Maíz-Arévalo and Juan Rafael Zamorano-Mansilla, Julia (Ed.): Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations, John Benjamins Publishing Company, pp. 307–323, 2018.

While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.

@inproceedings{Karakanta2018b,
title = {Exploring Variation in Translation with Relative Entropy},
author = {Alina Karakanta and Heike Przybyl and Elke Teich},
editor = {Julia Lavid-López Carmen Ma{\'i}z-Ar{\'e}valo and Juan Rafael Zamorano-Mansilla},
url = {https://benjamins.com/catalog/btl.158.12kar},
doi = {https://doi.org/10.1075/btl.158.12kar},
year = {2018},
date = {2018},
booktitle = {Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations},
pages = {307–323},
publisher = {John Benjamins Publishing Company},
abstract = {

While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Karakanta, Alina; Vela, Mihaela; Teich, Elke

EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates Inproceedings

ParlaCLARIN workshop, 11th Language Resources and Evaluation Conference (LREC2018), Miyazaki, Japan, 2018.

Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, extra-textual information about speakers is often absent, and as a result, these resources cannot be fully used in translation studies.

In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish, where original language and native speaker information is available as metadata. The paperdocumentsallnecessary(pre-andpost-)processingstepsforcreatingsuchavaluableresource. Inadditiontotheparallelcorpora, we collect monolingual comparable corpora for English, German and Spanish using the same method.

@inproceedings{Karakanta2018b,
title = {EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates},
author = {Alina Karakanta and Mihaela Vela and Elke Teich},
url = {http://lrec-conf.org/workshops/lrec2018/W2/pdf/10_W2.pdf},
year = {2018},
date = {2018},
booktitle = {ParlaCLARIN workshop, 11th Language Resources and Evaluation Conference (LREC2018)},
address = {Miyazaki, Japan},
abstract = {Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, extra-textual information about speakers is often absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish, where original language and native speaker information is available as metadata. The paperdocumentsallnecessary(pre-andpost-)processingstepsforcreatingsuchavaluableresource. Inadditiontotheparallelcorpora, we collect monolingual comparable corpora for English, German and Spanish using the same method.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B7

Collard, Camille; Przybyl, Heike; Defrancq, Bart

Interpreting into an SOV Language: Memory and the Position of the Verb. A Corpus-Based Comparative Study of Interpreted and Non-mediated Speech Journal Article

Küblera, Nathalie; Loock, Rudy; Pecman, Mojca (Ed.): Meta, 63, Les Presses de l’Université de Montréal, pp. 695-716, 2018.

In Dutch and German subordinate clauses, the verb is generally placed after the clausal constituents (Subject-Object-Verb structure) thereby creating a middle field (or verbal brace). This makes interpreting from SOV into SVO languages particularly challenging as it requires further processing and feats of memory. It often requires interpreters to use specific strategies (for example, anticipation) (Lederer 1981; Liontou 2011). However, few studies have tackled this issue from the point of view of interpreting into SOV languages. Producing SOV structures requires some specific cognitive effort as, for instance, subject properties need to be kept in mind in order to ensure the correct subject-verb agreement across a span of 10 or 20 words. Speakers therefore often opt for a strategy called extraposition, placing specific elements after the verb in order to shorten the brace (Hawkins 1994; Bevilacqua 2009). Dutch speakers use this strategy more often than German speakers (Haeseryn 1990). Given the additional cognitive load generated by the interpreting process (Gile 1999), it may be assumed that interpreters will shorten the verbal brace to a larger extent than native speakers.

The present study is based on a corpus of interpreted and non-mediated speeches at the European Parliament and compares middle field lengths as well as extraposition in Dutch and German subordinate clauses. Results from 3460 subordinate clauses confirm that interpreters of both languages shorten the middle field more than native speakers. The study also shows that German interpreters use extraposition more often than native speakers, but this is not the case for Dutch interpreters. Dutch and German interpreters appear to use extraposition partly because they imitate the clause word order of the source speech, showing that, in this case, extraposition can be considered an effort-saving tool.

@article{Collard2018,
title = {Interpreting into an SOV Language: Memory and the Position of the Verb. A Corpus-Based Comparative Study of Interpreted and Non-mediated Speech},
author = {Camille Collard and Heike Przybyl and Bart Defrancq},
editor = {Nathalie K{\"u}blera and Rudy Loock and Mojca Pecman},
url = {https://id.erudit.org/iderudit/1060169ar},
doi = {https://doi.org/10.7202/1060169ar},
year = {2018},
date = {2018},
journal = {Meta},
pages = {695-716},
publisher = {Les Presses de l’Universit{\'e} de Montr{\'e}al},
volume = {63},
number = {3},
abstract = {In Dutch and German subordinate clauses, the verb is generally placed after the clausal constituents (Subject-Object-Verb structure) thereby creating a middle field (or verbal brace). This makes interpreting from SOV into SVO languages particularly challenging as it requires further processing and feats of memory. It often requires interpreters to use specific strategies (for example, anticipation) (Lederer 1981; Liontou 2011). However, few studies have tackled this issue from the point of view of interpreting into SOV languages. Producing SOV structures requires some specific cognitive effort as, for instance, subject properties need to be kept in mind in order to ensure the correct subject-verb agreement across a span of 10 or 20 words. Speakers therefore often opt for a strategy called extraposition, placing specific elements after the verb in order to shorten the brace (Hawkins 1994; Bevilacqua 2009). Dutch speakers use this strategy more often than German speakers (Haeseryn 1990). Given the additional cognitive load generated by the interpreting process (Gile 1999), it may be assumed that interpreters will shorten the verbal brace to a larger extent than native speakers. The present study is based on a corpus of interpreted and non-mediated speeches at the European Parliament and compares middle field lengths as well as extraposition in Dutch and German subordinate clauses. Results from 3460 subordinate clauses confirm that interpreters of both languages shorten the middle field more than native speakers. The study also shows that German interpreters use extraposition more often than native speakers, but this is not the case for Dutch interpreters. Dutch and German interpreters appear to use extraposition partly because they imitate the clause word order of the source speech, showing that, in this case, extraposition can be considered an effort-saving tool.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B7

Reich, Ingo

Ellipsen Book Chapter

Liedtke, Frank; Tuchen, Astrid (Ed.): Handbuch Pragmatik, J.B. Metzler, pp. 240-251, Stuttgart, 2018, ISBN 978-3-476-04624-6.

Der Begriff ›Ellipse‹ wird in der Literatur nicht einheitlich verwendet und ist aufgrund der Heterogenität des Phänomenbereichs auch nicht ganz einfach zu definieren. In erster Annäherung kann man unter Ellipsen sprachliche Äußerungen verstehen, die in einem zu präzisierenden Sinne unvollständig sind oder von kompetenten Sprecher/innen (des Deutschen) als unvollständig aufgefasst werden.

@inbook{Reich2018,
title = {Ellipsen},
author = {Ingo Reich},
editor = {Frank Liedtke and Astrid Tuchen},
url = {https://doi.org/10.1007/978-3-476-04624-6_24},
doi = {https://doi.org/10.1007/978-3-476-04624-6_24},
year = {2018},
date = {2018},
booktitle = {Handbuch Pragmatik},
isbn = {978-3-476-04624-6},
pages = {240-251},
publisher = {J.B. Metzler},
address = {Stuttgart},
abstract = {Der Begriff ›Ellipse‹ wird in der Literatur nicht einheitlich verwendet und ist aufgrund der Heterogenit{\"a}t des Ph{\"a}nomenbereichs auch nicht ganz einfach zu definieren. In erster Ann{\"a}herung kann man unter Ellipsen sprachliche {\"A}u{\ss}erungen verstehen, die in einem zu pr{\"a}zisierenden Sinne unvollst{\"a}ndig sind oder von kompetenten Sprecher/innen (des Deutschen) als unvollst{\"a}ndig aufgefasst werden.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B3

Crible, Ludivine; Demberg, Vera

The effect of genre variation on the production and acceptability of underspecified discourse markers in English Inproceedings

20th DiscourseNet, Budapest, Hungary, 2018.

@inproceedings{Crible2018,
title = {The effect of genre variation on the production and acceptability of underspecified discourse markers in English},
author = {Ludivine Crible and Vera Demberg},
url = {https://dial.uclouvain.be/pr/boreal/object/boreal:192393},
year = {2018},
date = {2018},
publisher = {20th DiscourseNet},
address = {Budapest, Hungary},
abstract = {

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Degaetano-Ortlieb, Stefania; Teich, Elke

Using relative entropy for detection and analysis of periods of diachronic linguistic change Inproceedings

Proceedings of the 2nd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature at COLING2018, Association for Computational Linguistics , pp. 22-33, Santa Fe, New Mexico, 2018.

We present a data-driven approach to detect periods of linguistic change and the lexical and grammatical features contributing to change. We focus on the development of scientific English in the late modern period. Our approach is based on relative entropy (Kullback-Leibler Divergence) comparing temporally adjacent periods and sliding over the time line from past to present. Using a diachronic corpus of scientific publications of the Royal Society of London, we show how periods of change reflect the interplay between lexis and grammar, where periods of lexical expansion are typically followed by periods of grammatical consolidation resulting in a balance between expressivity and communicative efficiency. Our method is generic and can be applied to other data sets, languages and time ranges.

@inproceedings{Degaetano-Ortlieb2018b,
title = {Using relative entropy for detection and analysis of periods of diachronic linguistic change},
author = {Stefania Degaetano-Ortlieb and Elke Teich},
url = {https://aclanthology.org/W18-4503},
year = {2018},
date = {2018},
booktitle = {Proceedings of the 2nd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature at COLING2018},
pages = {22-33},
publisher = {Association for Computational Linguistics},
address = {Santa Fe, New Mexico},
abstract = {We present a data-driven approach to detect periods of linguistic change and the lexical and grammatical features contributing to change. We focus on the development of scientific English in the late modern period. Our approach is based on relative entropy (Kullback-Leibler Divergence) comparing temporally adjacent periods and sliding over the time line from past to present. Using a diachronic corpus of scientific publications of the Royal Society of London, we show how periods of change reflect the interplay between lexis and grammar, where periods of lexical expansion are typically followed by periods of grammatical consolidation resulting in a balance between expressivity and communicative efficiency. Our method is generic and can be applied to other data sets, languages and time ranges.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Teich, Elke; Fankhauser, Peter

Aspects of Linguistic and Computational Modeling in Language Science Book Chapter

Flanders, Julia; Jannidis, Fotis (Ed.): The Shape of Data in Digital Humanities. Modeling Texts and Text-based Resources. (Digital Research in the Arts and Humanities). , Routledge, Taylor & Francis, pp. 236-249, New York, 2018.

Linguistics is concerned with modeling language from the cognitive, social, and historical perspectives. When practiced as a science, linguistics is characterized by the tension between the two methodological dispositions of rationalism and empiricism. At any point in time in the history of linguistics, one is more dominant than the other. In the last two decades, we have been experiencing a new wave of empiricism in linguistic fields as diverse as psycholinguistics (e.g., Chater et al., 2015), language typology (e.g., Piantidosi and Gibson, 2014), language change (e.g., Bybee, 2010) and language variation (e.g., Bresnan and Ford, 2010). Consequently, the practices of modeling are being renegotiated in different linguistic communities, readdressing some fundamental methodological questions such as: How to cast a research question into an appropriate study design? How to obtain evidence (data) for a hypothesis (e.g., experiment vs. corpus)? How to process the data? How to evaluate a hypothesis in the light of the data obtained? This new empiricism is characterized by an interest in language use in context accompanied by a commitment to computational modeling, which is probably most developed in psycholinguistics, giving rise to the field of “computational psycholinguistics” (cf. Crocker, 2010), but recently getting stronger also in corpus linguistics.

@inbook{Teich2018,
title = {Aspects of Linguistic and Computational Modeling in Language Science},
author = {Elke Teich and Peter Fankhauser},
editor = {Julia Flanders and Fotis Jannidis},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/34320},
year = {2018},
date = {2018},
booktitle = {The Shape of Data in Digital Humanities. Modeling Texts and Text-based Resources. (Digital Research in the Arts and Humanities).},
pages = {236-249},
publisher = {Routledge, Taylor & Francis},
address = {New York},
abstract = {Linguistics is concerned with modeling language from the cognitive, social, and historical perspectives. When practiced as a science, linguistics is characterized by the tension between the two methodological dispositions of rationalism and empiricism. At any point in time in the history of linguistics, one is more dominant than the other. In the last two decades, we have been experiencing a new wave of empiricism in linguistic fields as diverse as psycholinguistics (e.g., Chater et al., 2015), language typology (e.g., Piantidosi and Gibson, 2014), language change (e.g., Bybee, 2010) and language variation (e.g., Bresnan and Ford, 2010). Consequently, the practices of modeling are being renegotiated in different linguistic communities, readdressing some fundamental methodological questions such as: How to cast a research question into an appropriate study design? How to obtain evidence (data) for a hypothesis (e.g., experiment vs. corpus)? How to process the data? How to evaluate a hypothesis in the light of the data obtained? This new empiricism is characterized by an interest in language use in context accompanied by a commitment to computational modeling, which is probably most developed in psycholinguistics, giving rise to the field of “computational psycholinguistics” (cf. Crocker, 2010), but recently getting stronger also in corpus linguistics.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B1

Degaetano-Ortlieb, Stefania; Strötgen, Jannik

Diachronic variation of temporal expressions in scientific writing through the lens of relative entropy Inproceedings

Rehm, Georg; Declerck, Thierry (Ed.): Language Technologies for the Challenges of the Digital Age: 27th International Conference, GSCL 2017, September 13-14, Proceedings. Lecture Notes in Computer Science, 10713, Springer International Publishing, pp. 250-275, Berlin, Germany, 2018.

The abundance of temporal information in documents has lead to an increased interest in processing such information in the NLP community by considering temporal expressions. Besides domain-adaptation, acquiring knowledge on variation of temporal expressions according to time is relevant for improvement in automatic processing. So far, frequency-based accounts dominate in the investigation of specific temporal expressions. We present an approach to investigate diachronic changes of temporal expressions based on relative entropy – with the advantage of using conditioned probabilities rather than mere frequency. While we focus on scientific writing, our approach is generalizable to other domains and interesting not only in the field of NLP, but also in humanities.

@inproceedings{Degaetano-Ortlieb2018b,
title = {Diachronic variation of temporal expressions in scientific writing through the lens of relative entropy},
author = {Stefania Degaetano-Ortlieb and Jannik Str{\"o}tgen},
editor = {Georg Rehm and Thierry Declerck},
url = {https://link.springer.com/chapter/10.1007/978-3-319-73706-5_22},
year = {2018},
date = {2018},
booktitle = {Language Technologies for the Challenges of the Digital Age: 27th International Conference, GSCL 2017, September 13-14, Proceedings. Lecture Notes in Computer Science},
pages = {250-275},
publisher = {Springer International Publishing},
address = {Berlin, Germany},
abstract = {The abundance of temporal information in documents has lead to an increased interest in processing such information in the NLP community by considering temporal expressions. Besides domain-adaptation, acquiring knowledge on variation of temporal expressions according to time is relevant for improvement in automatic processing. So far, frequency-based accounts dominate in the investigation of specific temporal expressions. We present an approach to investigate diachronic changes of temporal expressions based on relative entropy – with the advantage of using conditioned probabilities rather than mere frequency. While we focus on scientific writing, our approach is generalizable to other domains and interesting not only in the field of NLP, but also in humanities.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Degaetano-Ortlieb, Stefania

Stylistic Variation over 200 Years of Court. Proceedings According to Gender and Social Class Inproceedings

Proceedings of the 2nd Workshop on Stylistic Variation collocated with NAACL HLT 2018, June 1-6. ACL, Association for Computational Linguistics, pp. 1-10, New Orleans, 2018.

We present an approach to detect stylistic variation across social variables (here: gender and social class), considering also diachronic change in language use. For detection of stylistic variation, we use relative entropy, measuring the difference between probability distributions at different linguistic levels (here: lexis and grammar). In addition, by relative entropy, we can determine which linguistic units are related to stylistic variation.

@inproceedings{Degaetano-Ortlieb2018,
title = {Stylistic Variation over 200 Years of Court. Proceedings According to Gender and Social Class},
author = {Stefania Degaetano-Ortlieb},
url = {https://aclanthology.org/W18-1601},
doi = {https://doi.org/10.18653/v1/W18-1601},
year = {2018},
date = {2018},
booktitle = {Proceedings of the 2nd Workshop on Stylistic Variation collocated with NAACL HLT 2018, June 1-6. ACL},
pages = {1-10},
publisher = {Association for Computational Linguistics},
address = {New Orleans},
abstract = {We present an approach to detect stylistic variation across social variables (here: gender and social class), considering also diachronic change in language use. For detection of stylistic variation, we use relative entropy, measuring the difference between probability distributions at different linguistic levels (here: lexis and grammar). In addition, by relative entropy, we can determine which linguistic units are related to stylistic variation.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Fischer, Stefan; Knappen, Jörg; Teich, Elke

Using Topic Modelling to Explore Authors’ Research Fields in a Corpus of Historical Scientific English Inproceedings

Proceedings of DH 2018, Mexico City, Mexico, 2018.

In the digital humanities, topic models are a widely applied text mining method (Meeks and Weingart, 2012). While their use for mining literary texts is not entirely straightforward (Schmidt, 2012), there is ample evidence for their use on factual text (e.g. Au Yeung and Jatowt, 2011; Thompson et al., 2016). We present an approach for exploring the research fields of selected authors in a corpus of late modern scientific English by topic modelling, looking at the topics assigned to an author’s texts over the author’s lifetime. Areas of applications we target are history of science, where we may be interested in the evolution of scientific disciplines over time (Thompson et al., 2016; Fankhauser et al., 2016), or diachronic linguistics, where we may be interested in the formation of languages for specific purposes (LSP) or specific scientific “styles” (cf. Bazerman, 1988; Degaetano-Ortlieb and Teich, 2016). We use the Royal Society Corpus (RSC, Kermes et al., 2016), which is based on the first two centuries (1665–1869) of the Philosophical Transactions and the Proceedings of the Royal Society of London. The corpus contains 9,779 texts (32 million tokens) and is available at https://fedora.clarin-d.uni-saarland.de/rsc/. As we are interested in the development of individual authors, we focus on the single-author texts (81%) of the corpus. In total, 2,752 names are annotated in the single-author papers, but the activity of authors varies. Figure 1 shows that a small group of authors wrote a large portion of the texts. In fact, the twelve authors used for our analysis wrote 11% of the single-author articles.

@inproceedings{fischer-etal2018,
title = {Using Topic Modelling to Explore Authors’ Research Fields in a Corpus of Historical Scientific English},
author = {Stefan Fischer and J{\"o}rg Knappen and Elke Teich},
url = {https://dh2018.adho.org/en/using-topic-modelling-to-explore-authors-research-fields-in-a-corpus-of-historical-scientific-english/},
year = {2018},
date = {2018},
booktitle = {Proceedings of DH 2018},
address = {Mexico City, Mexico},
abstract = {In the digital humanities, topic models are a widely applied text mining method (Meeks and Weingart, 2012). While their use for mining literary texts is not entirely straightforward (Schmidt, 2012), there is ample evidence for their use on factual text (e.g. Au Yeung and Jatowt, 2011; Thompson et al., 2016). We present an approach for exploring the research fields of selected authors in a corpus of late modern scientific English by topic modelling, looking at the topics assigned to an author’s texts over the author’s lifetime. Areas of applications we target are history of science, where we may be interested in the evolution of scientific disciplines over time (Thompson et al., 2016; Fankhauser et al., 2016), or diachronic linguistics, where we may be interested in the formation of languages for specific purposes (LSP) or specific scientific “styles” (cf. Bazerman, 1988; Degaetano-Ortlieb and Teich, 2016). We use the Royal Society Corpus (RSC, Kermes et al., 2016), which is based on the first two centuries (1665–1869) of the Philosophical Transactions and the Proceedings of the Royal Society of London. The corpus contains 9,779 texts (32 million tokens) and is available at https://fedora.clarin-d.uni-saarland.de/rsc/. As we are interested in the development of individual authors, we focus on the single-author texts (81%) of the corpus. In total, 2,752 names are annotated in the single-author papers, but the activity of authors varies. Figure 1 shows that a small group of authors wrote a large portion of the texts. In fact, the twelve authors used for our analysis wrote 11% of the single-author articles.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Sekicki, Mirjana; Staudte, Maria

Eye'll help you out! How the gaze cue reduces the cognitive load required for reference processing Journal Article

Cognitive Science, 42, pp. 1-40, 2018.

Referential gaze has been shown to benefit language processing in situated communication in terms of shifting visual attention and leading to shorter reaction times on subsequent tasks. The present study simultaneously assessed both visual attention and, importantly, the immediate cogni-tive load induced at different stages of sentence processing. We aimed to examine the dynamics of combining visual and linguistic information in creating anticipation for a specific object and the effect this has on language processing. We report evidence from three visual-world eye-tracking experiments, showing that referential gaze leads to a shift in visual attention toward the cued object, which consequently lowers the effort required for processing the linguistic reference. Importantly, perceiving and following the gaze cue did not prove costly in terms of cognitive effort, unless the cued object did not fit the verb selectional preferences.

@article{ Sekicki2018,
title = {Eye'll help you out! How the gaze cue reduces the cognitive load required for reference processing},
author = {Mirjana Sekicki and Maria Staudte},
url = {https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.12682},
doi = {https://doi.org/10.1111/cogs.12682},
year = {2018},
date = {2018},
journal = {Cognitive Science},
pages = {1-40},
volume = {42},
abstract = {Referential gaze has been shown to benefit language processing in situated communication in terms of shifting visual attention and leading to shorter reaction times on subsequent tasks. The present study simultaneously assessed both visual attention and, importantly, the immediate cogni-tive load induced at different stages of sentence processing. We aimed to examine the dynamics of combining visual and linguistic information in creating anticipation for a specific object and the effect this has on language processing. We report evidence from three visual-world eye-tracking experiments, showing that referential gaze leads to a shift in visual attention toward the cued object, which consequently lowers the effort required for processing the linguistic reference. Importantly, perceiving and following the gaze cue did not prove costly in terms of cognitive effort, unless the cued object did not fit the verb selectional preferences.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A5

Staudte, Maria; Ankener, Christine

Visually informed prediction: How combining lexical and visual information affects surprisal Miscellaneous

31st Annual Conference on Sentence Processing (CUNY), UC Davis, USA, 2018.

@miscellaneous{Ankener2018b,
title = {Visually informed prediction: How combining lexical and visual information affects surprisal},
author = {Maria Staudte and Christine Ankener},
year = {2018},
date = {2018-10-17},
booktitle = {31st Annual Conference on Sentence Processing (CUNY)},
address = {UC Davis, USA},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   A5

Staudte, Maria; Sekicki, Mirjana

Reference resolution and the integration of referential visual cues Inproceedings

SSLP (pre-AMLaP) workshop 2018, Berlin, Germany, 2018.

@inproceedings{Sekicki2018c,
title = {Reference resolution and the integration of referential visual cues},
author = {Maria Staudte andMirjana Sekicki},
year = {2018},
date = {2018-10-17},
booktitle = {SSLP (pre-AMLaP) workshop 2018},
address = {Berlin, Germany},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A5

Jachmann, Torsten; Drenhaus, Heiner; Staudte, Maria; Crocker, Matthew W.

(Dis-)confirmation of linguistic prediction by non-linguistic cues Miscellaneous

24th Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP), Berlin, 2018.

Gaze Cues in face-to-face interactions

  • Speakers‘ direct their gaze toward an object approximately 800ms before mentioning. (Griffin & Bock, 2000)
  • Previous studies showed that listeners utilize speakers‘ gaze to form predictions about the unfolding sentence. (Jachmann et al., 2017)
  • Do listeners utilize this external cue to validate expectations about the unfolding sentence? And, if so, how does this effect the comprehension of the noun?

@miscellaneous{Jachmann2018,
title = {(Dis-)confirmation of linguistic prediction by non-linguistic cues},
author = {Torsten Jachmann and Heiner Drenhaus and Maria Staudte and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/327623334_DISCONFIRMATION_OF_LINGUISTIC_PREDICTION_BY_NON-LINGUISTIC_CUES},
year = {2018},
date = {2018},
booktitle = {24th Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP)},
address = {Berlin},
abstract = {Gaze Cues in face-to-face interactions

  • Speakers‘ direct their gaze toward an object approximately 800ms before mentioning. (Griffin & Bock, 2000)
  • Previous studies showed that listeners utilize speakers‘ gaze to form predictions about the unfolding sentence. (Jachmann et al., 2017)
  • Do listeners utilize this external cue to validate expectations about the unfolding sentence? And, if so, how does this effect the comprehension of the noun?
},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Projects:   A5 C3

Successfully