Alves, Diego ; Fischer, Stefan; Degaetano-Ortlieb, Stefania; Teich, Elke

Multi-word Expressions in English Scientific Writing

Bizzoni, Yuri; Degaetano-Ortlieb, Stefania; Kazantseva, Anna; Szpakowicz, Stan (Ed.): Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), Association for Computational Linguistics, pp. 67-76, St. Julians, Malta, 2024.

Multi-Word Expressions (MWEs) play a pivotal role in language use overall and in register formation more specifically, e.g. encoding field-specific terminology. Our study focuses on the identification and categorization of MWEs used in scientific writing, considering their formal characteristics as well as their developmental trajectory over time from the mid-17th century to the present. For this, we develop an approach combining three different types of methods to identify MWEs (Universal Dependency annotation, Partitioner and the Academic Formulas List) and selected measures to characterize MWE properties (e.g., dispersion by Kullback-Leibler Divergence and several association measures). This allows us to inspect MWEs types in a novel data-driven way regarding their functions and change over time in specialized discourse.