Modelling and Measuring Information Density

Project B4

B4 in the third phase continues our long-term research effort of exploring and shifting the limits of what neural language models are capable of. While in phase I our focus was on modelling long-range dependencies, phase II concentrated on improving language models by leveraging additional modalities and providing a better understanding of their inner workings. Now in the third phase, the main goal of B4 is to study the adaptation of language models as well as how to use them for dynamically changing tasks and data distributions, tightly relating to the notion of information in flux. In addition we will be exploring resource usage of neural language models to study the relationship between memory efficient models and adaptation and explore novel ways to represent knowledge in language models, which we belief is tightly connected to making language models more adaptable and robust to distribution changes.
WP1 will research new language modelling techniques that can cope with temporally changing or drifting data. In WP1 we will evaluate purely in terms of perplexity. This will change in WP2. Here will will explore what happens to language models fine-tuned on downstream tasks in a temporally changing setting building on top of the models developed in WP1. In WP3 we will turn to the second main theme of B4 in phase III: memory efficiency. Memory in WP3 refers to compute resources used by the language model. We will explore more parameter efficient language models and study the relationship between efficiency, adaptation, and generalization. WP4 will serve as the conceptual backbone for all work packages as we expect models that combine parametric with non-parametric representations to be to be more adaptable and generalize better. WP5 finally is a work package, where we study memory in language models from a different perspective: which parts of the history (that is the words preceding the word for which the surprisal is calculated) are most relevant for surprisal calculation. WP5 will provide the modelling relevant for A5.

Keywords: language modelling, long range dependencies, memory