Bourgonje, Peter; Demberg, Vera
Generalizing across Languages and Domains for Discourse Relation Classification
Kawahara, Tatsuya; Demberg, Vera; Ultes, Stefan; Inoue, Koji; Mehri, Shikib; Howcroft, David; Komatani, Kazunori (Ed.): Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Association for Computational Linguistics, pp. 554-565, Kyoto, Japan, 2024.
The availability of corpora annotated for discourse relations is limited and discourse relation classification performance varies greatly depending on both language and domain. This is a problem for downstream applications that are intended for a language (i.e., not English) or a domain (i.e., not financial news) with comparatively low coverage for discourse annotations. In this paper, we experiment with a state-of-the-art model for discourse relation classification, originally developed for English, extend it to a multi-lingual setting (testing on Italian, Portuguese and Turkish), and employ a simple, yet effective method to mark out-of-domain training instances. By doing so, we aim to contribute to better generalization and more robust discourse relation classification performance across both language and domain.