Information Management as a Factor for Syntactic Variation in the History of German

Project C6

The overarching goal of project C6 is an information-theoretic account of specific types of syntactic variation throughout the history of German. In the first project phase, we investigated the role that surprisal plays for extraposition of constituents. In the next phase, we want to focus on the serialization of constituents relative to each other. In German, word order is only partially determined by grammatical factors such as syntactic function or case: It has been demonstrated abundantly in the literature that an important factor for the serialization in German is information structure (e.g., Lenerz, 1977; Musan, 2002; Frey, 2004; Speyer, 2011, 2015a; Rauth, 2020). In information-structural research, constraints such as “given before new information” (Lenerz, 1977; Musan, 2002; Frey, 2004) have been identified as highly relevant for German word order. In particular, this concerns the serialization in the so-called middle field, i.e., the part of the clause between the finite verb or the conjunction (in main or subordinate clauses, respectively) and the remaining verbal elements. Such constraints also had an impact on word order in earlier stages of German (e.g., Speyer, 2011, 2015a; Rauth, 2020), although it is not clear whether their impact decreased or increased over time.
We assume that information-structural notions such as information status are correlated to surprisal, or at least concomitant with it (see, e.g., Speyer and Lemke, 2017; Speyer and Voigtmann, 2021). For example, an explanation for the observed constraint “given > new” could be that the new information conveyed later in the clause is in some ways made more predictable by the given information conveyed earlier in the clause, in that the given information sets up some expectations as to what the new information could be. Thus, the surprisal of the new information would be lowered, which could be a strategy to smooth out the information profile of clauses in accordance with the Uniform Information Density hypothesis (UID; Levy and Jaeger, 2007).
We will test this assumption with modern and historical corpus data to show whether changes in word and constituent order are related to and can be explained by information-theoretic concepts like UID. For this, we first transfer existing automatic methods for annotating syntactic categories and information structure to historical data to create a data base of authentic text samples. Next, we generate information profiles and calculate surprisal curves for these samples and compare them to the profiles and curves of a variant corpus. A variant corpus (as proposed in the current phase) is an artificial “parallel” corpus that we generate from the authentic samples by changing specific linguistic parameters according to our hypothesis while other parameters are kept constant. This allows us to quantify the effects of the information profile and other information-theoretic measures on the syntactic changes in question.
Besides the corpus analysis, we will validate our corpus findings with experiments on ‘living’ German by exploring whether the influence of information density on constituent order that we expect to find can be correlated to processing difficulties. In that respect, we will cooperate with projects that have an experimental approach (as detailed in the work packages).

Keywords: syntactic variation