Fischer, Stefan; Teich, Elke

More complex or just more diverse? Capturing diachronic linguistic variation

41. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft (DGfS), Bremen, Germany, 2019.

We present a diachronic comparison of general (register-mixed) and scientific English in the late modern period (1700–1900). For our analysis we use two corpora which are comparable in size and time-span: the Corpus of Late Modern English (CLMET; De Smet et al. 2015) and the Royal Society Corpus (RSC; Kermes et al. 2016). Previous studies of scientific English found a diachronic tendency from a verbal, involved to a more nominal, abstract style compared to other discourse types (cf. Halliday 1988; Biber & Gray 2011). The features reported include type-token ratio, lexical density, number of words per sentence and relative frequency of nominal vs. verbal categories—all potential indicators of linguistic complexity at a shallow level. We present results for these common measures on our data set as well as for selected information-theoretic measures, notably relative entropy (Kullback–Leibler divergence: KLD) and surprisal. For instance, using KLD, we observe a continuous divergence between general and scientific language based on word unigrams as well as part-of-speech trigrams. Lexical density increases over time for both scientific language and general language. In both corpora, sentence length decreases by roughly 25%, with scientific sentences being longer on average. On the other hand, mean sentence surprisal remains stable over time. The poster will give an overview of our results using the selected measures and discuss possible interpretations. Moreover, we will assess their utility for capturing linguistic diversification, showing that the information-theoretic measures are fairly fine-tuned, robust and link up well to explanations in terms of linguistic complexity and rational communication (cf. Hale 2016; Crocker, Demberg, & Teich 2016).