Cross-linguistic Information-Theoretic Modelling of Communicative Efficiency

Project C7

Information-theoretic modelling of cross-linguistic data has uncovered general principles of language optimization—dependency length minimization and dependency locality (Futrell et al. 2015, 2020)—and its link to efficient memory use (Ferrer i Cancho 2015; Hahn et al. 2021), among others. While existing studies usually provide statements about the overall difference between languages, they do not inspect in detail which language-specific structures license them. In this project, we aim to replicate some of this work using a more comprehensive dataset and build models to explain the original findings further, from an explicitly cross-linguistic perspective, sampling 33 Indo-European languages and 10 diverse non-Indo-European languages. We believe that, as tentatively suggested by the accounts cited above and by earlier work in linguistic typology (Givón 1988; Gundel 1988; Herring 1990; Payne 1990; Skopeteas and Fanselow 2010), the major factor missing in these accounts is the impact of information structure on word order variability: most languages of the world allow word order changes to encode specific changes in the given/new/contrastive status of constituents (Gundel 1988). While word order is partly taken into account in a general sense in cross-linguistic information-theoretic studies, very little attention has been paid to the impact of word order variability on the fit of these models. Languages differ widely regarding the amount of word order variability that they allow, which word orders are ‘rigid’ and which ones are ‘free’, and how much of that variability is dependent on information structure. Hence, there are large cross-linguistic differences regarding the word order—information structure interface that have not explicitly informed information-theoretic modelling thus far. This project aims to remedy that.
Our main aim is to incorporate information status in information-theoretic modelling of language use in an explicit and cross-linguistic fashion in order to investigate communicative efficiency in terms of locality and the memory-surprisal tradeoff, building on Futrell et al. (2015) and Hahn et al. (2021). Secondly, given known cross-linguistic variation of the word order–information structure interface we also determine how much of the fit of these information-theoretic models is dependent on word order variability. We consider which word orders (amongst others, the word order of nominal heads and different types of modifiers; arguments and their verbal head; clauses and their heads) are variable across the sampled languages, and how word order variability interacts with minimization of dependencies, both from a cross-linguistic as well as a language-internal perspective. Our third goal is to study the relation between information-theoretic concepts (surprisal/information density) and information status; these are conceptually related but this relation is empirically under-studied. Lastly, we aim to contribute and enrich the long-standing discussion of communicative efficiency in typology with a conceptual framework combining information status and information theory, focusing on the interaction between (overt) morphological marking of syntactic arguments and word order (variability), including additionally a diachronic perspective.

Keywords: language typology