Publications

Alves, Diego; Degaetano-Ortlieb, Stefania; Schmidt, Elena; Teich, Elke

Diachronic Analysis of Multi-word Expression Functional Categories in Scientific English Inproceedings

Bhatia, Archna; Bouma, Gosse; Seza Dogruoz, A.; Evang, Kilian; Garcia, Marcos; Giouli, Voula; Han, Lifeng; Nivre, Joakim; Rademaker, Alexandre (Ed.): Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, ELRA and ICCL, pp. 81-87, Torino, Italia, 2024.

We present a diachronic analysis of multi-word expressions (MWEs) in English based on the Royal Society Corpus, a dataset containing 300+ years of the scientific publications of the Royal Society of London. Specifically, we investigate the functions of MWEs, such as stance markers (“is is interesting”) or discourse organizers (“in this section”), and their development over time. Our approach is multi-disciplinary: to detect MWEs we use Universal Dependencies, to classify them functionally we use an approach from register linguistics, and to assess their role in diachronic development we use an information-theoretic measure, relative entropy.

@inproceedings{alves-etal-2024-diachronic,
title = {Diachronic Analysis of Multi-word Expression Functional Categories in Scientific English},
author = {Diego Alves and Stefania Degaetano-Ortlieb and Elena Schmidt and Elke Teich},
editor = {Archna Bhatia and Gosse Bouma and A. Seza Dogruoz and Kilian Evang and Marcos Garcia and Voula Giouli and Lifeng Han and Joakim Nivre and Alexandre Rademaker},
url = {https://aclanthology.org/2024.mwe-1.12},
year = {2024},
date = {2024},
booktitle = {Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024},
pages = {81-87},
publisher = {ELRA and ICCL},
address = {Torino, Italia},
abstract = {We present a diachronic analysis of multi-word expressions (MWEs) in English based on the Royal Society Corpus, a dataset containing 300+ years of the scientific publications of the Royal Society of London. Specifically, we investigate the functions of MWEs, such as stance markers (“is is interesting”) or discourse organizers (“in this section”), and their development over time. Our approach is multi-disciplinary: to detect MWEs we use Universal Dependencies, to classify them functionally we use an approach from register linguistics, and to assess their role in diachronic development we use an information-theoretic measure, relative entropy.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Bagdasarov, Sergei; Degaetano-Ortlieb, Stefania

Applying Information-theoretic Notions to Measure Effects of the Plain English Movement on English Law Reports and Scientific Articles Inproceedings

Bizzoni, Yuri; Degaetano-Ortlieb, Stefania; Kazantseva, Anna; Szpakowicz, Stan (Ed.): Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), Association for Computational Linguistics, pp. 101-110, St. Julians, Malta, 2024.

We investigate the impact of the Plain English Movement (PEM) on the complexity of legal language in UK law reports from the 1950s-2010s, contrasting it with the evolution of scientific language. The PEM, emerging in the late 20th century, advocated for clear and understandable legal language. We define complexity through the concept of surprisal – an information-theoretic measure correlating with cognitive processing difficulty. Our research contrasts surprisal with traditional readability measures, which often overlook content. We hypothesize that, if the PEM has influenced legal language, there would be a reduction in complexity over time and a shift from a nominal to a more verbal style. We analyze text complexity and lexico-grammatical changes in line with PEM recommendations. Results indicate minimal impact of the PEM on both legal and scientific domains. This finding suggests future research should consider processing effort when advocating for linguistic norms to enhance accessibility.

@inproceedings{bagdasarov-degaetano-ortlieb-2024-applying,
title = {Applying Information-theoretic Notions to Measure Effects of the Plain English Movement on English Law Reports and Scientific Articles},
author = {Sergei Bagdasarov and Stefania Degaetano-Ortlieb},
editor = {Yuri Bizzoni and Stefania Degaetano-Ortlieb and Anna Kazantseva and Stan Szpakowicz},
url = {https://aclanthology.org/2024.latechclfl-1.11},
year = {2024},
date = {2024},
booktitle = {Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)},
pages = {101-110},
publisher = {Association for Computational Linguistics},
address = {St. Julians, Malta},
abstract = {We investigate the impact of the Plain English Movement (PEM) on the complexity of legal language in UK law reports from the 1950s-2010s, contrasting it with the evolution of scientific language. The PEM, emerging in the late 20th century, advocated for clear and understandable legal language. We define complexity through the concept of surprisal - an information-theoretic measure correlating with cognitive processing difficulty. Our research contrasts surprisal with traditional readability measures, which often overlook content. We hypothesize that, if the PEM has influenced legal language, there would be a reduction in complexity over time and a shift from a nominal to a more verbal style. We analyze text complexity and lexico-grammatical changes in line with PEM recommendations. Results indicate minimal impact of the PEM on both legal and scientific domains. This finding suggests future research should consider processing effort when advocating for linguistic norms to enhance accessibility.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Alves, Diego; Fischer, Stefan; Degaetano-Ortlieb, Stefania; Teich, Elke

Multi-word Expressions in English Scientific Writing Inproceedings

Bizzoni, Yuri; Degaetano-Ortlieb, Stefania; Kazantseva, Anna; Szpakowicz, Stan (Ed.): Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), Association for Computational Linguistics, pp. 67-76, St. Julians, Malta, 2024.

Multi-Word Expressions (MWEs) play a pivotal role in language use overall and in register formation more specifically, e.g. encoding field-specific terminology. Our study focuses on the identification and categorization of MWEs used in scientific writing, considering their formal characteristics as well as their developmental trajectory over time from the mid-17th century to the present. For this, we develop an approach combining three different types of methods to identify MWEs (Universal Dependency annotation, Partitioner and the Academic Formulas List) and selected measures to characterize MWE properties (e.g., dispersion by Kullback-Leibler Divergence and several association measures). This allows us to inspect MWEs types in a novel data-driven way regarding their functions and change over time in specialized discourse.

@inproceedings{alves-etal-2024-multi,
title = {Multi-word Expressions in English Scientific Writing},
author = {Diego Alves and Stefan Fischer and Stefania Degaetano-Ortlieb and Elke Teich},
editor = {Yuri Bizzoni and Stefania Degaetano-Ortlieb and Anna Kazantseva and Stan Szpakowicz},
url = {https://aclanthology.org/2024.latechclfl-1.8},
year = {2024},
date = {2024},
booktitle = {Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)},
pages = {67-76},
publisher = {Association for Computational Linguistics},
address = {St. Julians, Malta},
abstract = {Multi-Word Expressions (MWEs) play a pivotal role in language use overall and in register formation more specifically, e.g. encoding field-specific terminology. Our study focuses on the identification and categorization of MWEs used in scientific writing, considering their formal characteristics as well as their developmental trajectory over time from the mid-17th century to the present. For this, we develop an approach combining three different types of methods to identify MWEs (Universal Dependency annotation, Partitioner and the Academic Formulas List) and selected measures to characterize MWE properties (e.g., dispersion by Kullback-Leibler Divergence and several association measures). This allows us to inspect MWEs types in a novel data-driven way regarding their functions and change over time in specialized discourse.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Krielke, Marie-Pauline

Cross-linguistic Dependency Length Minimization in scientific language: Syntactic complexity reduction in English and German in the Late Modern period Journal Article

Languages in Contrast, 24, pp. 133 - 163, 2024, ISSN 1387-6759.

We use Universal Dependencies (UD) for the study of cross-linguistic diachronic syntactic complexity reduction. Specifically, we look at whether and how scientific English and German minimize the length of syntactic dependency relations in the Late Modern period (ca. 1650–1900). Our linguistic analysis follows the assumption that over time, scientific discourse cross-linguistically develops towards an increasingly efficient syntactic code by minimizing Dependency Length (DL) as a factor of syntactic complexity. For each language, we analyse a large UD-annotated scientific and general language corpus for comparison. While on a macro level, our analysis suggests that there is an overall diachronic cross-linguistic and cross-register reduction in Average Dependency Length (ADL), on the micro level we find that only scientific language shows a sentence length independent reduction of ADL, while general language shows an overall decrease of ADL due to sentence length reduction. We further analyse the syntactic constructions responsible for this reduction in both languages, showing that both scientific English and German increasingly make use of short, intra-phrasal dependency relations while long dependency relations such as clausal embeddings become rather disfavoured over time.

@article{Krielke-2024,
title = {Cross-linguistic Dependency Length Minimization in scientific language: Syntactic complexity reduction in English and German in the Late Modern period},
author = {Marie-Pauline Krielke},
url = {https://www.jbe-platform.com/content/journals/10.1075/lic.00038.kri},
doi = {https://doi.org/10.1075/lic.00038.kri},
year = {2024},
date = {2024},
journal = {Languages in Contrast},
pages = {133 - 163},
volume = {24},
number = {1},
abstract = {

We use Universal Dependencies (UD) for the study of cross-linguistic diachronic syntactic complexity reduction. Specifically, we look at whether and how scientific English and German minimize the length of syntactic dependency relations in the Late Modern period (ca. 1650–1900). Our linguistic analysis follows the assumption that over time, scientific discourse cross-linguistically develops towards an increasingly efficient syntactic code by minimizing Dependency Length (DL) as a factor of syntactic complexity. For each language, we analyse a large UD-annotated scientific and general language corpus for comparison. While on a macro level, our analysis suggests that there is an overall diachronic cross-linguistic and cross-register reduction in Average Dependency Length (ADL), on the micro level we find that only scientific language shows a sentence length independent reduction of ADL, while general language shows an overall decrease of ADL due to sentence length reduction. We further analyse the syntactic constructions responsible for this reduction in both languages, showing that both scientific English and German increasingly make use of short, intra-phrasal dependency relations while long dependency relations such as clausal embeddings become rather disfavoured over time.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Krielke, Marie-Pauline

Optimizing scientific communication: the role of relative clauses as markers of complexity in English and German scientific writing between 1650 and 1900 PhD Thesis

Saarland University, Saarbruecken, Germany, 2023.

The aim of this thesis is to show that both scientific English and German have become increasingly optimized for scientific communication from 1650 to 1900 by adapting the usage of relative clauses as markers of grammatical complexity. While the lexico-grammatical changes in terms of features and their frequency distribution in scientific writing during this period are well documented, in the present work we are interested in the underlying factors driving these changes and how they affect efficient scientific communication. As the scientific register emerges and evolves, it continuously adapts to the changing communicative needs posed by extra-linguistic pressures arising from the scientific community and its achievements. We assume that, over time, scientific language maintains communicative efficiency by balancing lexico-semantic expansion with a reduction in (lexico-)grammatical complexity on different linguistic levels. This is based on the idea that linguistic complexity affects processing difficulty and, in turn, communicative efficiency. To achieve optimization, complexity is adjusted on the level of lexico-grammar, which is related to expectation-based processing cost, and syntax, which is linked to working memory-based processing cost. We conduct five corpus-based studies comparing English and German scientific writing to general language. The first two investigate the development of relative clauses in terms of lexico-grammar, measuring the paradigmatic richness and syntagmatic predictability of relativizers as indicators of expectation-based processing cost. The results confirm that both levels undergo a reduction in complexity over time. The other three studies focus on the syntactic complexity of relative clauses, investigating syntactic intricacy, locality, and accessibility. Results show that intricacy and locality decrease, leading to lower grammatical complexity and thus mitigating memory-based processing cost. However, accessibility is not a factor of complexity reduction over time. Our studies reveal a register-specific diachronic complexity reduction in scientific language both in lexico-grammar and syntax. The cross-linguistic comparison shows that English is more advanced in its register-specific development while German lags behind due to a later establishment of the vernacular as a language of scientific communication.

@phdthesis{Krielke_Diss_2023,
title = {Optimizing scientific communication: the role of relative clauses as markers of complexity in English and German scientific writing between 1650 and 1900},
author = {Marie-Pauline Krielke},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/36825},
doi = {https://doi.org/10.22028/D291-40997},
year = {2023},
date = {2023},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {The aim of this thesis is to show that both scientific English and German have become increasingly optimized for scientific communication from 1650 to 1900 by adapting the usage of relative clauses as markers of grammatical complexity. While the lexico-grammatical changes in terms of features and their frequency distribution in scientific writing during this period are well documented, in the present work we are interested in the underlying factors driving these changes and how they affect efficient scientific communication. As the scientific register emerges and evolves, it continuously adapts to the changing communicative needs posed by extra-linguistic pressures arising from the scientific community and its achievements. We assume that, over time, scientific language maintains communicative efficiency by balancing lexico-semantic expansion with a reduction in (lexico-)grammatical complexity on different linguistic levels. This is based on the idea that linguistic complexity affects processing difficulty and, in turn, communicative efficiency. To achieve optimization, complexity is adjusted on the level of lexico-grammar, which is related to expectation-based processing cost, and syntax, which is linked to working memory-based processing cost. We conduct five corpus-based studies comparing English and German scientific writing to general language. The first two investigate the development of relative clauses in terms of lexico-grammar, measuring the paradigmatic richness and syntagmatic predictability of relativizers as indicators of expectation-based processing cost. The results confirm that both levels undergo a reduction in complexity over time. The other three studies focus on the syntactic complexity of relative clauses, investigating syntactic intricacy, locality, and accessibility. Results show that intricacy and locality decrease, leading to lower grammatical complexity and thus mitigating memory-based processing cost. However, accessibility is not a factor of complexity reduction over time. Our studies reveal a register-specific diachronic complexity reduction in scientific language both in lexico-grammar and syntax. The cross-linguistic comparison shows that English is more advanced in its register-specific development while German lags behind due to a later establishment of the vernacular as a language of scientific communication.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   B1

Hug, Marius; Rau, Felix; Debbeler, Anke; Saleh, Sara; Mollenhauer, Elisabeth; Leinen, Peter; Genêt, Philippe; Trippel, Thorsten; Zinn, Claus; Dogaru, George; Witt, Andreas; Werthmann, Antonina; Draxler, Christoph; Schiel, Florian; Knappen, Jörg; Fischer, Stefan; Krielke, Marie-Pauline; Teich, Elke; Barth, Florian; Calvo Tello, José; Funk, Stefan E.; Göbel, Mathias; Kurzawe, Daniel; Veentjer, Ubbo; Weimer, Lukas; Blätte, Andreas; Lehmberg, Timm

Wohin damit? Storing and reusing my language data: Minute Madness der Datenzentren Miscellaneous

Text+, Zenodo, pp. 1-12, Potsdam, 2023.

Präsentiert beim Workshop „Wohin damit? Storing and reusing my language data“ am 22. Juni 2023 in Mannheim. Die Präsentation wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. gehalten.

@miscellaneous{HugRauDebbeleretal.2023,
title = {Wohin damit? Storing and reusing my language data: Minute Madness der Datenzentren},
author = {Marius Hug and Felix Rau and Anke Debbeler and Sara Saleh and Elisabeth Mollenhauer and Peter Leinen and Philippe Genêt and Thorsten Trippel and Claus Zinn and George Dogaru and Andreas Witt and Antonina Werthmann and Christoph Draxler and Florian Schiel and J{\"o}rg Knappen and Stefan Fischer and Marie-Pauline Krielke and Elke Teich and Florian Barth and Jos{\'e} Calvo Tello and Stefan E. Funk and Mathias G{\"o}bel and Daniel Kurzawe and Ubbo Veentjer and Lukas Weimer and Andreas Bl{\"a}tte and Timm Lehmberg},
url = {https://nbn-resolving.org/urn:nbn:de:bsz:mh39-121108},
doi = {https://doi.org/10.5281/zenodo.8123896},
year = {2023},
date = {2023},
booktitle = {Text+},
pages = {1-12},
publisher = {Zenodo},
address = {Potsdam},
abstract = {Pr{\"a}sentiert beim Workshop "Wohin damit? Storing and reusing my language data" am 22. Juni 2023 in Mannheim. Die Pr{\"a}sentation wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. gehalten.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B1

Fischer, Stefan; Fankhauser, Peter; Teich, Elke

Multi-word expressions and language efficiency: an information-theoretic account Miscellaneous

DGfS Computerlinguistik Postersession, Köln, 2023.

Multi-word expressions (MWEs) are a cornerstone in conventionalized language use and vital for the perceived fluency of a message (Fillmore 1979). From a processing perspective, MWEs seem to have an advantage over arbitrary word sequences due to highly predictable transitions from one word to the next, or they may be perceived as wholes (see e.g. Siyanova-Chanturia et al. 2017). The emergence and use of specific MWEs is typically context-dependent and register-specific. In our work, we investigate MWEs in the scientific domain from a diachronic perspective, asking what is the contribution of MWEs in the development of “scientific language” (here: English)? We assume that over time scientific English develops an optimal code for scientific expert communication characterized by high information density (Halliday 2004; Teich et al. 2021). Using a large diachronic corpus of English scientific texts (Fischer et al. 2020), we work in a data-driven fashion using various established word association measures (e.g. log-likelihood, PMI) to identify and classify MWEs by time periods (e.g. 50-year periods). In a complementary step, we account for the environments of words using selected computational language models (statistical models, embeddings; cf. Fankhauser & Kupietz 2022). On this basis, we then analyse the informational characteristics of MWEs diachronically: The more conventionalized an MWE becomes, the lower its surprisal (higher predictability of the MWE) and the lower the uncertainty about an upcoming word within the MWE (entropy). We expect to see that while specific MWEs come and go over time, during their life cycles they will exhibit surprisal/entropy reduction, thus contributing to language efficiency.

@miscellaneous{Fischer_etal_2024,
title = {Multi-word expressions and language efficiency: an information-theoretic account},
author = {Stefan Fischer and Peter Fankhauser and Elke Teich},
url = {https://dgfs2023.uni-koeln.de/sites/dgfs2023/Booklet/AG_Beschreibungen-und-Abstracts/Description-Abstracts-CL.pdf},
year = {2023},
date = {2023},
booktitle = {DGfS Computerlinguistik Postersession},
address = {K{\"o}ln},
abstract = {Multi-word expressions (MWEs) are a cornerstone in conventionalized language use and vital for the perceived fluency of a message (Fillmore 1979). From a processing perspective, MWEs seem to have an advantage over arbitrary word sequences due to highly predictable transitions from one word to the next, or they may be perceived as wholes (see e.g. Siyanova-Chanturia et al. 2017). The emergence and use of specific MWEs is typically context-dependent and register-specific. In our work, we investigate MWEs in the scientific domain from a diachronic perspective, asking what is the contribution of MWEs in the development of “scientific language” (here: English)? We assume that over time scientific English develops an optimal code for scientific expert communication characterized by high information density (Halliday 2004; Teich et al. 2021). Using a large diachronic corpus of English scientific texts (Fischer et al. 2020), we work in a data-driven fashion using various established word association measures (e.g. log-likelihood, PMI) to identify and classify MWEs by time periods (e.g. 50-year periods). In a complementary step, we account for the environments of words using selected computational language models (statistical models, embeddings; cf. Fankhauser & Kupietz 2022). On this basis, we then analyse the informational characteristics of MWEs diachronically: The more conventionalized an MWE becomes, the lower its surprisal (higher predictability of the MWE) and the lower the uncertainty about an upcoming word within the MWE (entropy). We expect to see that while specific MWEs come and go over time, during their life cycles they will exhibit surprisal/entropy reduction, thus contributing to language efficiency.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B1

Menzel, Katrin; Krielke, Marie-Pauline; Degaetano-Ortlieb, Stefania

Synthetic and analytic adjective negation in English scientific journal articles: A diachronic perspective Journal Article

LEGE ARTIS: Language yesterday, today, tomorrow, VII, Trnava: University of SS Cyril and Methodius in Trnava, pp. 157-213, 2022, ISSN 2453-8035 .

This paper addresses the development of synthetic and analytic adjective negation in a corpus of English scientific articles from the mid-17th century towards the end of the 20th century. Analytic patterns of adjective negation are found to become less frequent in the language of scientific articles, but more conventionalised in their textual contexts. Conversely, prefixed negated adjectives are identified as more frequent and more diverse with regard to their contexts.

@article{menzel_2022_diachronicperspective,
title = {Synthetic and analytic adjective negation in English scientific journal articles: A diachronic perspective},
author = {Katrin Menzel and Marie-Pauline Krielke and Stefania Degaetano-Ortlieb},
url = {https://www.researchgate.net/publication/361099180_Synthetic_and_analytic_adjective_negation_in_English_scientific_journal_articles_A_diachronic_perspective},
year = {2022},
date = {2022},
journal = {LEGE ARTIS: Language yesterday, today, tomorrow},
pages = {157-213},
publisher = {Trnava: University of SS Cyril and Methodius in Trnava},
volume = {VII},
number = {1},
abstract = {This paper addresses the development of synthetic and analytic adjective negation in a corpus of English scientific articles from the mid-17th century towards the end of the 20th century. Analytic patterns of adjective negation are found to become less frequent in the language of scientific articles, but more conventionalised in their textual contexts. Conversely, prefixed negated adjectives are identified as more frequent and more diverse with regard to their contexts.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Krielke, Marie-Pauline; Talamo, Luigi; Fawzi, M.; Knappen, J.

Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German Inproceedings

LREC 2022, Marseille, France, 2022.

We present two comparable diachronic corpora of scientific English and German from the Late Modern Period (17th c.–19th c.) annotated with Universal Dependencies. We describe several steps of data pre-processing and evaluate the resulting parsing accuracy showing how our pre-processing steps significantly improve output quality. As a sanity check for the representativity of our data, we conduct a case study comparing previously gained insights on grammatical change in the scientific genre with our data. Our results reflect the often reported trend of English scientific discourse towards heavy noun phrases and a simplification of the sentence structure (Halliday, 1988; Halliday and Martin, 1993; Biber and Gray, 2011; Biber and Gray, 2016). We also show that this trend applies to German scientific discourse as well. The presented corpora are valuable resources suitable for the contrastive analysis of syntactic diachronic change in the scientific genre between 1650 and 1900. The presented pre-processing procedures and their evaluations are applicable to other languages and can be useful for a variety of Natural Language Processing tasks such as syntactic parsing.

@inproceedings{krielke-etal-2022-tracing,
title = {Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German},
author = {Marie-Pauline Krielke and Luigi Talamo andM. Fawzi and J. Knappen},
url = {https://aclanthology.org/2022.lrec-1.514/},
year = {2022},
date = {2022},
publisher = {LREC 2022},
address = {Marseille, France},
abstract = {We present two comparable diachronic corpora of scientific English and German from the Late Modern Period (17th c.–19th c.) annotated with Universal Dependencies. We describe several steps of data pre-processing and evaluate the resulting parsing accuracy showing how our pre-processing steps significantly improve output quality. As a sanity check for the representativity of our data, we conduct a case study comparing previously gained insights on grammatical change in the scientific genre with our data. Our results reflect the often reported trend of English scientific discourse towards heavy noun phrases and a simplification of the sentence structure (Halliday, 1988; Halliday and Martin, 1993; Biber and Gray, 2011; Biber and Gray, 2016). We also show that this trend applies to German scientific discourse as well. The presented corpora are valuable resources suitable for the contrastive analysis of syntactic diachronic change in the scientific genre between 1650 and 1900. The presented pre-processing procedures and their evaluations are applicable to other languages and can be useful for a variety of Natural Language Processing tasks such as syntactic parsing.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Menzel, Katrin

Medical discourse in Late Modern English: Insights from the Royal Society Corpus. Book Chapter

Hiltunen, Turo; Taavitsainen, Irma;  (Ed.): Corpus pragmatic studies on the history of medical discourse (Pragmatics & Beyond New Series; Vol. 330), John Benjamins, pp. 79-104, Amsterdam, 2022.

This chapter demonstrates how the Royal Society Corpus, a richly annotated corpus of around 48,000 English scientific journal articles covering more than 330 years, can be used for lexico-grammatical and pragmatic studies that contribute to a broader understanding of the development of medical research articles. The Late Modern English period together with several decades before and after this time frame was a productive period in the medical output of the Royal Society. This chapter addresses typical linguistic features of scientific journal articles from medical and related sciences from this period demonstrating their special status in the context of other traditional and emerging disciplines in the corpus data. Additionally, language usage and text-type conventions of historical medical research articles will be compared to the features of corpus texts on medical topics from Present-day English.

@inbook{MedicalDiscourse22,
title = {Medical discourse in Late Modern English: Insights from the Royal Society Corpus.},
author = {Katrin Menzel},
editor = {Turo Hiltunen and Irma Taavitsainen},
url = {https://benjamins.com/catalog/pbns.330},
year = {2022},
date = {2022},
booktitle = {Corpus pragmatic studies on the history of medical discourse (Pragmatics & Beyond New Series; Vol. 330)},
pages = {79-104},
publisher = {John Benjamins},
address = {Amsterdam},
abstract = {This chapter demonstrates how the Royal Society Corpus, a richly annotated corpus of around 48,000 English scientific journal articles covering more than 330 years, can be used for lexico-grammatical and pragmatic studies that contribute to a broader understanding of the development of medical research articles. The Late Modern English period together with several decades before and after this time frame was a productive period in the medical output of the Royal Society. This chapter addresses typical linguistic features of scientific journal articles from medical and related sciences from this period demonstrating their special status in the context of other traditional and emerging disciplines in the corpus data. Additionally, language usage and text-type conventions of historical medical research articles will be compared to the features of corpus texts on medical topics from Present-day English.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B1

Degaetano-Ortlieb, Stefania

Measuring informativity: The rise of compounds as informationally dense structures in 20th century Scientific English Book Chapter

Soave, Elena; Biber, Douglas (Ed.): Corpus Approaches to Register Variation, Studies in Corpus Linguistics, 103, John Benjamins Publishing Company, pp. 291-312, 2021.

By applying data-driven methods based on information theory, this study adds to previous work on the development of the scientific register by measuring the informativity of alternative phrasal structures shown to be involved in change in language use in 20th-century Scientific English. The analysis based on data-driven periodization shows compounds to be distinctive grammatical structures from the 1920s onwards in Proceedings A of the Royal Society of London. Compounds not only increase in frequency, but also show higher informativity than their less dense prepositional counterparts. Results also show that the lower the informativity of particular items, the more alternative, more informationally dense options might be favoured (e.g., of-phrases vs. compounds) – striving for communicative efficiency thus being one force shaping the scientific register.

@inbook{Degaetano-Ortlieb2021,
title = {Measuring informativity: The rise of compounds as informationally dense structures in 20th century Scientific English},
author = {Stefania Degaetano-Ortlieb},
editor = {Elena Soave and Douglas Biber},
url = {https://benjamins.com/catalog/scl.103.11deg},
doi = {https://doi.org/10.1075/scl.103.11deg},
year = {2021},
date = {2021},
booktitle = {Corpus Approaches to Register Variation},
pages = {291-312},
publisher = {John Benjamins Publishing Company},
abstract = {By applying data-driven methods based on information theory, this study adds to previous work on the development of the scientific register by measuring the informativity of alternative phrasal structures shown to be involved in change in language use in 20th-century Scientific English. The analysis based on data-driven periodization shows compounds to be distinctive grammatical structures from the 1920s onwards in Proceedings A of the Royal Society of London. Compounds not only increase in frequency, but also show higher informativity than their less dense prepositional counterparts. Results also show that the lower the informativity of particular items, the more alternative, more informationally dense options might be favoured (e.g., of-phrases vs. compounds) – striving for communicative efficiency thus being one force shaping the scientific register.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B1

Bizzoni, Yuri; Degaetano-Ortlieb, Stefania; Menzel, Katrin; Teich, Elke

The diffusion of scientific terms - tracing individuals' influence in the history of science for English Inproceedings

Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Association for Computational Linguistics, pp. 120-127, Punta Cana, Dominican Republic (online), 2021.

Tracing the influence of individuals or groups in social networks is an increasingly popular task in sociolinguistic studies. While methods to determine someone’s influence in shortterm contexts (e.g., social media, on-line political debates) are widespread, influence in longterm contexts is less investigated and may be harder to capture. We study the diffusion of scientific terms in an English diachronic scientific corpus, applying Hawkes Processes to capture the role of individual scientists as „influencers“ or „influencees“ in the diffusion of new concepts. Our findings on two major scientific discoveries in chemistry and astronomy of the 18th century reveal that modelling both the introduction and diffusion of scientific terms in a historical corpus as Hawkes Processes allows detecting patterns of influence between authors on a long-term scale.

@inproceedings{bizzoni-etal-2021-diffusion,
title = {The diffusion of scientific terms - tracing individuals' influence in the history of science for English},
author = {Yuri Bizzoni and Stefania Degaetano-Ortlieb and Katrin Menzel and Elke Teich},
url = {https://aclanthology.org/2021.latechclfl-1.14},
doi = {https://doi.org/10.18653/v1/2021.latechclfl-1.14},
year = {2021},
date = {2021-11-30},
booktitle = {Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature},
pages = {120-127},
publisher = {Association for Computational Linguistics},
address = {Punta Cana, Dominican Republic (online)},
abstract = {Tracing the influence of individuals or groups in social networks is an increasingly popular task in sociolinguistic studies. While methods to determine someone's influence in shortterm contexts (e.g., social media, on-line political debates) are widespread, influence in longterm contexts is less investigated and may be harder to capture. We study the diffusion of scientific terms in an English diachronic scientific corpus, applying Hawkes Processes to capture the role of individual scientists as "influencers" or "influencees" in the diffusion of new concepts. Our findings on two major scientific discoveries in chemistry and astronomy of the 18th century reveal that modelling both the introduction and diffusion of scientific terms in a historical corpus as Hawkes Processes allows detecting patterns of influence between authors on a long-term scale.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Menzel, Katrin; Krielke, Marie-Pauline; Degaetano-Ortlieb, Stefania

Structural complexity in scientific journal articles across time - from negative clausal expressions towards adjectival negative prefixes Inproceedings

Workshop on Complexity and Register (CAR21), Berlin, Germany, CRC1412 Register, 2021.

@inproceedings{Menzel-etal2021,
title = {Structural complexity in scientific journal articles across time - from negative clausal expressions towards adjectival negative prefixes},
author = {Katrin Menzel and Marie-Pauline Krielke and Stefania Degaetano-Ortlieb},
year = {2021},
date = {2021-11-19},
booktitle = {Workshop on Complexity and Register (CAR21)},
address = {Berlin, Germany, CRC1412 Register},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Menzel, Katrin

Scientific Eponyms throughout the History of English Scholarly Journal Articles Book Chapter

Van de Velde, Hans; Dolezal, Fredric T.;  (Ed.): Broadening Perspectives in the History of Dictionaries and Word Studies, Cambridge Scholars Publishing, pp. 159-193, Newcastle upon Tyne, 2021, ISBN 1-5275-7432-6.

@inbook{Menzel2021_eponyms,
title = {Scientific Eponyms throughout the History of English Scholarly Journal Articles},
author = {Katrin Menzel},
editor = {Hans Van de Velde and Fredric T. Dolezal},
url = {https://www.cambridgescholars.com/product/978-1-5275-7432-8},
year = {2021},
date = {2021-11-08},
booktitle = {Broadening Perspectives in the History of Dictionaries and Word Studies},
isbn = {1-5275-7432-6},
pages = {159-193},
publisher = {Cambridge Scholars Publishing},
address = {Newcastle upon Tyne},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B1

Degaetano-Ortlieb, Stefania; Säily, Tanja; Bizzoni, Yuri

Registerial Adaptation vs. Innovation Across Situational Contexts: 18th Century Women in Transition Journal Article

Frontiers in Artificial Intelligence, section Language and Computation, 4, 2021.

Endeavors to computationally model language variation and change are ever increasing. While analyses of recent diachronic trends are frequently conducted, long-term trends accounting for sociolinguistic variation are less well-studied. Our work sheds light on the temporal dynamics of language use of British 18th century women as a group in transition across two situational contexts. Our findings reveal that in formal contexts women adapt to register conventions, while in informal contexts they act as innovators of change in language use influencing others. While adopted from other disciplines, our methods inform (historical) sociolinguistic work in novel ways. These methods include diachronic periodization by Kullback-Leibler divergence to determine periods of change and relevant features of variation, and event cascades as influencer models.

@article{Degaetano-Ortlieb2021,
title = {Registerial Adaptation vs. Innovation Across Situational Contexts: 18th Century Women in Transition},
author = {Stefania Degaetano-Ortlieb and Tanja S{\"a}ily and Yuri Bizzoni},
url = {https://www.frontiersin.org/article/10.3389/frai.2021.609970},
doi = {https://doi.org/10.3389/frai.2021.609970},
year = {2021},
date = {2021},
journal = {Frontiers in Artificial Intelligence, section Language and Computation},
volume = {4},
abstract = {Endeavors to computationally model language variation and change are ever increasing. While analyses of recent diachronic trends are frequently conducted, long-term trends accounting for sociolinguistic variation are less well-studied. Our work sheds light on the temporal dynamics of language use of British 18th century women as a group in transition across two situational contexts. Our findings reveal that in formal contexts women adapt to register conventions, while in informal contexts they act as innovators of change in language use influencing others. While adopted from other disciplines, our methods inform (historical) sociolinguistic work in novel ways. These methods include diachronic periodization by Kullback-Leibler divergence to determine periods of change and relevant features of variation, and event cascades as influencer models.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Krielke, Marie-Pauline

Relativizers as markers of grammatical complexity: A diachronic, cross-register study of English and German Journal Article

Bergen Language and Linguistics Studies, 11, pp. 91-120, 2021.

In this paper, we investigate grammatical complexity as a register feature of scientific English and German. Specifically, we carry out a diachronic comparison between general and scientific discourse in the two languages from the 17th to the 19th century, using relativizers as proxies for grammatical complexity. We ground our study in register theory (Halliday and Hasan, 1985), assuming that language use reflects contextual factors, which contribute to the formation of registers (Quirk et al., 1985; Biber et al., 1999; Teich et al., 2016). Our findings show a clear tendency towards grammatical simplification in scientific discourse in both languages with English spearheading the trend early on and German following later.

@article{Krielke2021relativizers,
title = {Relativizers as markers of grammatical complexity: A diachronic, cross-register study of English and German},
author = {Marie-Pauline Krielke},
url = {https://doi.org/10.15845/bells.v11i1.3440},
doi = {https://doi.org/10.15845/bells.v11i1.3440},
year = {2021},
date = {2021-09-15},
journal = {Bergen Language and Linguistics Studies},
pages = {91-120},
volume = {11},
number = {1},
abstract = {In this paper, we investigate grammatical complexity as a register feature of scientific English and German. Specifically, we carry out a diachronic comparison between general and scientific discourse in the two languages from the 17th to the 19th century, using relativizers as proxies for grammatical complexity. We ground our study in register theory (Halliday and Hasan, 1985), assuming that language use reflects contextual factors, which contribute to the formation of registers (Quirk et al., 1985; Biber et al., 1999; Teich et al., 2016). Our findings show a clear tendency towards grammatical simplification in scientific discourse in both languages with English spearheading the trend early on and German following later.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Menzel, Katrin; Knappen, Jörg; Teich, Elke

Generating linguistically relevant metadata for the Royal Society Corpus Journal Article

Säily, Tanja; Tyrkkö, Jukka (Ed.): Research in Corpus Linguistics, Challenges in combining structured and unstructured data in corpus development (special issue), 9, pp. 1-18, 2021, ISSN 2243-4712.

This paper provides an overview of metadata generation and management for the Royal Society Corpus (RSC), aiming to encourage discussion about the specific challenges in building substantial diachronic corpora intended to be used for linguistic and humanistic analysis. We discuss the motivations and goals of building the corpus, describe its composition and present the types of metadata it contains. Specifically, we tackle two challenges: first, integration of original metadata from the data providers (JSTOR and the Royal Society); second, derivation of additional linguistically relevant metadata regarding text structure and situational context (register).

@article{Menzel2021,
title = {Generating linguistically relevant metadata for the Royal Society Corpus},
author = {Katrin Menzel and J{\"o}rg Knappen and Elke Teich},
editor = {Tanja S{\"a}ily and Jukka Tyrkk{\"o}},
url = {https://ricl.aelinco.es/index.php/ricl/article/view/158},
doi = {https://doi.org/10.32714/ricl.09.01.02},
year = {2021},
date = {2021},
journal = {Research in Corpus Linguistics, Challenges in combining structured and unstructured data in corpus development (special issue)},
pages = {1-18},
volume = {9},
number = {1},
abstract = {This paper provides an overview of metadata generation and management for the Royal Society Corpus (RSC), aiming to encourage discussion about the specific challenges in building substantial diachronic corpora intended to be used for linguistic and humanistic analysis. We discuss the motivations and goals of building the corpus, describe its composition and present the types of metadata it contains. Specifically, we tackle two challenges: first, integration of original metadata from the data providers (JSTOR and the Royal Society); second, derivation of additional linguistically relevant metadata regarding text structure and situational context (register).},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Teich, Elke; Fankhauser, Peter; Degaetano-Ortlieb, Stefania; Bizzoni, Yuri

Less is More/More Diverse: On The Communicative Utility of Linguistic Conventionalization Journal Article

Benîtez-Burraco, Antonio (Ed.): Frontiers in Communication, section Language Sciences, 2021.

We present empirical evidence of the communicative utility of CONVENTIONALIZATION, i.e., convergence in linguistic usage over time, and DIVERSIFICATION, i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation and grammaticalization (Bybee, 2010; Schmid, 2015) and diversification is a cornerstone in the formation of sublanguages/registers, i.e., functional linguistic varieties (Halliday, 1988; Harris, 1991). While it is widely acknowledged that change in language use is primarily socio-culturally determined pushing towards greater linguistic expressivity, we here highlight the limiting function of communicative factors on diachronic linguistic variation showing that conventionalization and diversification are associated with a reduction of linguistic variability. To be able to observe effects of linguistic variability reduction, we first need a well-defined notion of choice in context. Linguistically, this implies the paradigmatic axis of linguistic organization, i.e., the sets of linguistic options available in a given or similar syntagmatic contexts. Here, we draw on word embeddings, weakly neural distributional language models that have recently been employed to model lexicalsemantic change and allow us to approximate the notion of paradigm by neighbourhood in vector space. Second, we need to capture changes in paradigmatic variability, i.e. reduction/expansion of linguistic options in a given context. As a formal index of paradigmatic variability we use entropy, which measures the contribution of linguistic units (e.g., words) in predicting linguistic choice in bits of information. Using entropy provides us with a link to a communicative interpretation, as it is a well-established measure of communicative efficiency with implications for cognitive processing (Linzen and Jaeger, 2016; Venhuizen et al., 2019); also, entropy is negatively correlated with distance in (word embedding) spaces which in turn shows cognitive reflexes in certain language processing tasks (Mitchel et al., 2008; Auguste et al., 2017). In terms of domain we focus on science, looking at the diachronic development of scientific English from the 17th century to modern time. This provides us with a fairly constrained yet dynamic domain of discourse that has witnessed a powerful systematization throughout the centuries and developed specific linguistic conventions geared towards efficient communication. Overall, our study confirms the assumed trends of conventionalization and diversification shown by diachronically decreasing entropy, interspersed with local, temporary entropy highs pointing to phases of linguistic expansion pertaining primarily to introduction of new technical terminology.

@article{Teich2021,
title = {Less is More/More Diverse: On The Communicative Utility of Linguistic Conventionalization},
author = {Elke Teich and Peter Fankhauser and Stefania Degaetano-Ortlieb and Yuri Bizzoni},
editor = {Antonio Benîtez-Burraco},
url = {https://www.frontiersin.org/articles/10.3389/fcomm.2020.620275/full?&utm_source=Email_to_authors_&utm_medium=Email&utm_content=T1_11.5e1_author&utm_campaign=Email_publication&field=&journalName=Frontiers_in_Communication&id=620275},
doi = {https://doi.org/10.3389/fcomm.2020.620275},
year = {2021},
date = {2021-01-26},
journal = {Frontiers in Communication, section Language Sciences},
abstract = {We present empirical evidence of the communicative utility of CONVENTIONALIZATION, i.e., convergence in linguistic usage over time, and DIVERSIFICATION, i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation and grammaticalization (Bybee, 2010; Schmid, 2015) and diversification is a cornerstone in the formation of sublanguages/registers, i.e., functional linguistic varieties (Halliday, 1988; Harris, 1991). While it is widely acknowledged that change in language use is primarily socio-culturally determined pushing towards greater linguistic expressivity, we here highlight the limiting function of communicative factors on diachronic linguistic variation showing that conventionalization and diversification are associated with a reduction of linguistic variability. To be able to observe effects of linguistic variability reduction, we first need a well-defined notion of choice in context. Linguistically, this implies the paradigmatic axis of linguistic organization, i.e., the sets of linguistic options available in a given or similar syntagmatic contexts. Here, we draw on word embeddings, weakly neural distributional language models that have recently been employed to model lexicalsemantic change and allow us to approximate the notion of paradigm by neighbourhood in vector space. Second, we need to capture changes in paradigmatic variability, i.e. reduction/expansion of linguistic options in a given context. As a formal index of paradigmatic variability we use entropy, which measures the contribution of linguistic units (e.g., words) in predicting linguistic choice in bits of information. Using entropy provides us with a link to a communicative interpretation, as it is a well-established measure of communicative efficiency with implications for cognitive processing (Linzen and Jaeger, 2016; Venhuizen et al., 2019); also, entropy is negatively correlated with distance in (word embedding) spaces which in turn shows cognitive reflexes in certain language processing tasks (Mitchel et al., 2008; Auguste et al., 2017). In terms of domain we focus on science, looking at the diachronic development of scientific English from the 17th century to modern time. This provides us with a fairly constrained yet dynamic domain of discourse that has witnessed a powerful systematization throughout the centuries and developed specific linguistic conventions geared towards efficient communication. Overall, our study confirms the assumed trends of conventionalization and diversification shown by diachronically decreasing entropy, interspersed with local, temporary entropy highs pointing to phases of linguistic expansion pertaining primarily to introduction of new technical terminology.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Mosbach, Marius; Degaetano-Ortlieb, Stefania; Krielke, Marie-Pauline; Abdullah, Badr M.; Klakow, Dietrich

A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English Inproceedings

Proceedings of the 28th International Conference on Computational Linguistics, pp. 771-787, 2020.

Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a naturalistic dataset, probing shows that all three models indeed capture linguistic knowledge about grammaticality, achieving high performance. Evaluation on diagnostic cases and masked prediction tasks considering fine-grained linguistic knowledge, however, shows pronounced model-specific weaknesses especially on semantic knowledge, strongly impacting models’ performance. Our results highlight the importance of (a) model comparison in evaluation task and (b) building up claims of model performance and the linguistic knowledge they capture beyond purely probing-based evaluations.

@inproceedings{Mosbach2020,
title = {A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English},
author = {Marius Mosbach and Stefania Degaetano-Ortlieb and Marie-Pauline Krielke and Badr M. Abdullah and Dietrich Klakow},
url = {https://aclanthology.org/2020.coling-main.67/},
year = {2020},
date = {2020},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
pages = {771-787},
abstract = {Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a naturalistic dataset, probing shows that all three models indeed capture linguistic knowledge about grammaticality, achieving high performance. Evaluation on diagnostic cases and masked prediction tasks considering fine-grained linguistic knowledge, however, shows pronounced model-specific weaknesses especially on semantic knowledge, strongly impacting models’ performance. Our results highlight the importance of (a) model comparison in evaluation task and (b) building up claims of model performance and the linguistic knowledge they capture beyond purely probing-based evaluations.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B1 B4 C4

Juzek, Tom; Krielke, Marie-Pauline; Teich, Elke

Exploring diachronic syntactic shifts with dependency length: the case of scientific English Inproceedings

Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), Association for Computational Linguistics, pp. 109-119, Barcelona, Spain (Online), 2020.

We report on an application of universal dependencies for the study of diachronic shifts in syntactic usage patterns. Our focus is on the evolution of Scientific English in the Late Modern English period (ca. 1700-1900). Our data set is the Royal Society Corpus (RSC), comprising the full set of publications of the Royal Society of London between 1665 and 1996. Our starting assumption is that over time, Scientific English develops specific syntactic choice preferences that increase efficiency in (expert-to-expert) communication. The specific hypothesis we pursue in this paper is that changing syntactic choice preferences lead to greater dependency locality/dependency length minimization, which is associated with positive effects for the efficiency of human as well as computational linguistic processing. As a basis for our measurements, we parsed the RSC using Stanford CoreNLP. Overall, we observe a decrease in dependency length, with long dependency structures becoming less frequent and short dependency structures becoming more frequent over time, notably pertaining to the nominal phrase, thus marking an overall push towards greater communicative efficiency.

@inproceedings{juzek-etal-2020-exploring,
title = {Exploring diachronic syntactic shifts with dependency length: the case of scientific English},
author = {Tom Juzek and Marie-Pauline Krielke and Elke Teich},
url = {https://www.aclweb.org/anthology/2020.udw-1.13},
year = {2020},
date = {2020},
booktitle = {Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)},
pages = {109-119},
publisher = {Association for Computational Linguistics},
address = {Barcelona, Spain (Online)},
abstract = {We report on an application of universal dependencies for the study of diachronic shifts in syntactic usage patterns. Our focus is on the evolution of Scientific English in the Late Modern English period (ca. 1700-1900). Our data set is the Royal Society Corpus (RSC), comprising the full set of publications of the Royal Society of London between 1665 and 1996. Our starting assumption is that over time, Scientific English develops specific syntactic choice preferences that increase efficiency in (expert-to-expert) communication. The specific hypothesis we pursue in this paper is that changing syntactic choice preferences lead to greater dependency locality/dependency length minimization, which is associated with positive effects for the efficiency of human as well as computational linguistic processing. As a basis for our measurements, we parsed the RSC using Stanford CoreNLP. Overall, we observe a decrease in dependency length, with long dependency structures becoming less frequent and short dependency structures becoming more frequent over time, notably pertaining to the nominal phrase, thus marking an overall push towards greater communicative efficiency.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Successfully