Ortmann, Katrin; Roussel, Adam; Dipper, Stefanie

Evaluating Off-the-Shelf NLP Tools for German

Proceedings of the Conference on Natural Language Processing (KONVENS), pp. 212-222, Erlangen, Germany, 2019.

It is not always easy to keep track of what toolsarecurrentlyavailableforaparticular annotation task, nor is it obvious how the provided models will perform on a given dataset. Inthiscontribution,weprovidean overview of the tools available for the automatic annotation of German-language text. We evaluate fifteen free and open source NLP tools for the linguistic annotation of German, looking at the fundamental NLP tasks of sentence segmentation, tokenization, POS tagging, morphological analysis, lemmatization, and dependency parsing. To get an idea of how the systems’ performance will generalize to various domains, we compiled our test corpus from various non-standard domains. All of the systems in our study are evaluated not only with respect to accuracy, but also the computational resources required.