Liang, Yiming; Amsili, Pascal; Burnett, Heather; Demberg, Vera
Uniform information density explains subject doubling in French
Proceedings of the Annual Meeting of the Cognitive Science Society, 46, pp. 780-788, 2024.
In this paper we investigate whether subject doubling in French is affected by the Uniform Information Density (UID) principle, which states that speakers prefer language encoding that minimizes fluctuations in information density. We show that, other factors being controlled, speakers are more likely to double the NP subject when it has a high surprisal, thus providing further empirical evidence to the UID principle which predicts a surprisal-redundancy trade-off as a property of natural languages. We argue for the importance of employing GPT-2 to investigate complex linguistic phenomena such as subject doubling, as it enables the estimation of subject surprisal by considering a rather large conversational context, a task made possible by powerful language models that incorporate linguistic knowledge through pre-training on extensive datasets.