Fischer, Stefan; Knappen, Jörg; Menzel, Katrin; Teich, Elke

The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study

Proceedings of the 12th Language Resources and Evaluation Conference, European Language Resources Association, pp. 794-802, Marseille, France, 2020.

We present a new, extended version of the Royal Society Corpus (RSC), a diachronic corpus of scientific English now covering 300+ years of scientific writing (1665–1996). The corpus comprises 47 837 texts, primarily scientific articles, and is based on publications of the Royal Society of London, mainly its Philosophical Transactions and Proceedings.

The corpus has been built on the basis of the FAIR principles and is freely available under a Creative Commons license, excluding copy-righted parts. We provide information on how the corpus can be found, the file formats available for download as well as accessibility via a web-based corpus query platform. We show a number of analytic tools that we have implemented for better usability and provide an example of use of the corpus for linguistic analysis as well as examples of subsequent, external uses of earlier releases.

We place the RSC against the background of existing English diachronic/scientific corpora, elaborating on its value for linguistic and humanistic study.