Ortmann, Katrin; Dipper, Stefanie

Variation between Different Discourse Types: Literate vs. Oral

In Proceedings of the NAACL-Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), Association for Computational Linguistics, pp. 64-79, Ann Arbor, Michigan, 2019.

This paper deals with the automatic identification of literate and oral discourse in German texts. A range of linguistic features is selected and their role in distinguishing between literate- and oral-oriented registers is investigated, using a decision-tree classifier. It turns out that all of the investigated features are related in some way to oral conceptuality. Especially simple measures of complexity (average sentence and word length) are prominent indicators of oral and literate discourse. In addition, features of reference and deixis (realized by different types of pronouns) also prove to be very useful in determining the degree of orality of different registers