Cognitive Modelling of Information Density for Discourse Relations

Project B2

A central goal of the third phase of project B2 is to investigate the marking of coherence relations cross-linguistically, and to extend the coverage of discourse relational resources to a larger variety of languages—this will include languages which have already received attention in discourse studies as well as more under-researched languages such as Nigerian Pidgin, which is in an interesting stage of language evolution. The project thus directly contributes to Focus Area ’Language typology, multilinguality and language change’ of the third phase of the CRC.
Specifically, we plan to focus on (i) cross-linguistic differences in the processing of discourse connectives, and to what extent these differences may be driven by information-theoretic principles; (ii) how differences in linearization between languages (placing discourse coherence devices in different positions within the relational arguments) affects the distribution of information across the relational arguments; and (iii) differences in the degree of specificity and function of discourse markers, which may affect the amount of information conveyed by these markers, and in turn may affect their usage distributions.
In order to address these goals, we aim to annotate a cross-lingual corpus with discourse relation information (WP 1) using crowd-sourcing methods developed in earlier project phases. The project will combine corpus-based investigations with psycholinguistic experiments intended to specifically test for processing differences between speakers of different languages (WPs 3, 4 and 5). Furthermore, the project will contain a computational work package which provides automatic tools for mapping between annotations of different languages and transfer of information across languages, and which will develop discourse connective identifiers and relation classifiers for under-resourced languages (WP 2).

Keywords: psycholinguistics, computational modelling, discourse relations