Knappen, Jörg; Fischer, Stefan; Kermes, Hannah; Teich, Elke; Fankhauser, Peter

The making of the Royal Society Corpus

21st Nordic Conference on Computational Linguistics (NoDaLiDa) Workshop on Processing Historical language, Workshop on Processing Historical language, pp. 7-11, Gothenburg, Sweden, 2017.

The Royal Society Corpus is a corpus of Early and Late modern English built in an agile process covering publications of the Royal Society of London from 1665 to 1869 (Kermes et al., 2016) with a size of approximately 30 million words. In this paper we will provide details on two aspects of the building process namely the mining of patterns for OCR correction and the improvement and evaluation of part-of-speech tagging.

Back