Kermes, Hannah; Knappen, Jörg; Khamis, Ashraf; Degaetano-Ortlieb, Stefania; Teich, Elke

The Royal Society Corpus. Towards a high-quality resource for studying diachronic variation in scientific writing

Proceedings of Digital Humanities (DH'16), Krakow, Poland, 2016.

We introduce a diachronic corpus of English scientific writing – the Royal Society Corpus (RSC) – adopting a middle ground between big and ‘poor’ and small and ‘rich’ data. The corpus has been built from an electronic version of the Transactions and Proceedings of the Royal Society of London and comprises c. 35 million tokens from the period 1665-1869 (see Table 1). The motivation for building a corpus from this material is to investigate the diachronic development of written scientific English.