Marchal, Marian; Scholman, Merel; Demberg, Vera
Semi-automatic discourse annotation in a low-resource language: Developing a connective lexicon for Nigerian Pidgin
Proceedings of the Second Workshop on Computational Approaches to Discourse (CODI 2021), Association for Computational Linguistics, pp. 84-94, Punta Cana, Dominican Republic and Online, 2021.
Cross-linguistic research on discourse structure and coherence marking requires discourse-annotated corpora and connective lexicons in a large number of languages. However, the availability of such resources is limited, especially for languages for which linguistic resources are scarce in general, such as Nigerian Pidgin. In this study, we demonstrate how a semi-automatic approach can be used to source connectives and their relation senses and develop a discourse-annotated corpus in a low-resource language. Connectives and their relation senses were extracted from a parallel corpus combining automatic (PDTB end-to-end parser) and manual annotations. This resulted in Naija-Lex, a lexicon of discourse connectives in Nigerian Pidgin with English translations. The lexicon shows that the majority of Nigerian Pidgin connectives are borrowed from its English lexifier, but that there are also some connectives that are unique to Nigerian Pidgin.