Publications

Zogaj, Doruntinë; Bader, Regine; Mecklinger, Axel

How the brain segments experience: ERP evidence of event boundaries enhancing memory formation in narratives Journal Article

Cognitive, Affective, & Behavioral Neuroscience, pp. 1-18, 2026, ISSN 1531-135X.

Event boundaries are known to have an impact on how discrete events are remembered; however, the neural mechanisms supporting memory for boundaries themselves remain poorly understood. This study investigated how both event boundaries and preceding information are processed and remembered while listening to naturalistically spoken narratives. We recorded participants’ neural responses (event-related potentials) while they listened to stories where a critical word signaled either a predictable (no-event boundary) or an unpredictable (event boundary) action. Critical words in the boundary condition were better remembered than those in the no-event boundary condition and elicited a larger N400 amplitude. Crucially, a subsequent memory effect was observed only in the boundary condition, with remembered critical words eliciting more negative N400s than forgotten ones, highlighting the role of increased demands in conceptual semantic processing in episodic memory encoding. Furthermore, a retrograde subsequent memory effect emerged also exclusively in the boundary condition, with more negative amplitudes to critical words when preceding information was later remembered, consistent with the notion that boundaries trigger rapid reinstatement of a recently experienced event. These findings provide compelling evidence that event boundaries act as “cognitive anchor points” that enhance the encoding of new information and also contribute to the strengthening of recently encoded events.

@article{Zogaj_etal_ERP_2026,
title = {How the brain segments experience: ERP evidence of event boundaries enhancing memory formation in narratives},
author = {Doruntinë Zogaj and Regine Bader and Axel Mecklinger},
url = {https://doi.org/10.3758/s13415-026-01399-0},
doi = {https://doi.org/10.3758/s13415-026-01399-0},
year = {2026},
date = {2026},
journal = {Cognitive, Affective, & Behavioral Neuroscience},
pages = {1-18},
abstract = {Event boundaries are known to have an impact on how discrete events are remembered; however, the neural mechanisms supporting memory for boundaries themselves remain poorly understood. This study investigated how both event boundaries and preceding information are processed and remembered while listening to naturalistically spoken narratives. We recorded participants’ neural responses (event-related potentials) while they listened to stories where a critical word signaled either a predictable (no-event boundary) or an unpredictable (event boundary) action. Critical words in the boundary condition were better remembered than those in the no-event boundary condition and elicited a larger N400 amplitude. Crucially, a subsequent memory effect was observed only in the boundary condition, with remembered critical words eliciting more negative N400s than forgotten ones, highlighting the role of increased demands in conceptual semantic processing in episodic memory encoding. Furthermore, a retrograde subsequent memory effect emerged also exclusively in the boundary condition, with more negative amplitudes to critical words when preceding information was later remembered, consistent with the notion that boundaries trigger rapid reinstatement of a recently experienced event. These findings provide compelling evidence that event boundaries act as “cognitive anchor points” that enhance the encoding of new information and also contribute to the strengthening of recently encoded events.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A6

Jablotschkin, Sarah; Lapshinova-Koltunski, Ekaterina; Zinsmeister, Heike

Coreference in simplified German: Linguistic features and challenges of automatic annotation Inproceedings

Ogrodniczuk, Maciej; Novak, Michal; Poesio, Massimo; Pradhan, Sameer; Ng, Vincent (Ed.): Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference, Association for Computational Linguistics, pp. 12-23, Suzhou, China, 2025.

In this paper, we analyse coreference annotation of the German language, focussing on the phenomenon of simplification, that is, the tendency to use words and constructions that are assumed to be easier perceived, understood, or produced. Simplification is one of the tools used by language users in order to optimise communication effectively. We are interested in how simplification is reflected in coreference in two different language products exposed to the phenomena of simplification: simultaneous interpreting and Easy German. For this, we automatically annotate simplified texts with coreference. We then evaluate the outputs of automatic annotation. In addition, we also look into quantitative distributions of some coreference features. Our findings show that although the language products under analysis diverge in terms of simplification driving factors, they share some specific coreference features. We also show that this specificity may cause annotation errors in simplified language, e.g. in non-nominal or split antecedents.

@inproceedings{jablotschkin-etal-2025-coreference,
title = {Coreference in simplified German: Linguistic features and challenges of automatic annotation},
author = {Sarah Jablotschkin and Ekaterina Lapshinova-Koltunski and Heike Zinsmeister},
editor = {Maciej Ogrodniczuk and Michal Novak and Massimo Poesio and Sameer Pradhan and Vincent Ng},
url = {https://aclanthology.org/2025.crac-1.2/},
doi = {https://doi.org/10.18653/v1/2025.crac-1.2},
year = {2025},
date = {2025},
booktitle = {Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference},
pages = {12-23},
publisher = {Association for Computational Linguistics},
address = {Suzhou, China},
abstract = {In this paper, we analyse coreference annotation of the German language, focussing on the phenomenon of simplification, that is, the tendency to use words and constructions that are assumed to be easier perceived, understood, or produced. Simplification is one of the tools used by language users in order to optimise communication effectively. We are interested in how simplification is reflected in coreference in two different language products exposed to the phenomena of simplification: simultaneous interpreting and Easy German. For this, we automatically annotate simplified texts with coreference. We then evaluate the outputs of automatic annotation. In addition, we also look into quantitative distributions of some coreference features. Our findings show that although the language products under analysis diverge in terms of simplification driving factors, they share some specific coreference features. We also show that this specificity may cause annotation errors in simplified language, e.g. in non-nominal or split antecedents.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B7 T1

Menzel, Katrin

Eine korpusbasierte diachrone Untersuchung zu übersetzten Wissenschaftsartikeln aus den Zeitschriften der Royal Society of London Journal Article

chronotopos – A Journal of Translation History, 6, pp. 31--57, 2025.

Dieser Beitrag beschreibt eine Korpusstudie zu den englischen Übersetzungen von naturwissenschaftlichen Texten, die in Zeitschriften der Royal Society of London seit dem 17. Jhd. veröffentlicht wurden. Als Datengrundlage dient das Royal Society Corpus (RSC), welches vor allem originalsprachliche englische Fachartikel, aber auch eine beachtliche Anzahl von übersetzten englischen Beiträgen aus Zeitschriften wie den Philosophical Transactions und den Proceedings der Royal Society beinhaltet. In einem ersten Schritt werden die übersetzten Fachartikel in den Daten identifiziert und in einem zusammenfassenden Überblick im Hinblick auf ihre genauen Entstehungszeiten, Themen, Ausgangssprachen und Übersetzer analysiert. Dabei stellt sich u. a. heraus, dass die meisten Übersetzungen im RSC aus dem 18. Jhd. stammen. Daher werden in einem nächsten Schritt speziell diese Texte in Bezug auf ausgewählte linguistische Merkmale untersucht, welche geeignet sind, um eine übersetzungswissenschaftliche Universalienhypothese, und zwar die ‚Normalisierungshypothese‘, in einem historischen Kontext zu überprüfen. Hierbei soll geklärt werden, ob die übersetzten Texte durch sprachlich weniger innovative Merkmale geprägt sind als nicht-übersetzte englische Vergleichstexte aus dem RSC. Insgesamt zeigen die Ergebnisse, dass Normalisierung und eine stärkere Nutzung von konventionelleren sprachlichen Strukturen keine auf die historischen Wissenschaftsübersetzungen zutreffende Übersetzungspraktiken waren. Anschließend wird ein Ausblick auf den Aufbau eines multilingualen Parallelkorpus mit den übersetzten Fachartikeln und ihren jeweiligen Ausgangstexten gegeben, um weitere Untersuchungen zu prototypischen Übersetzungseigenschaften zu ermöglichen, bei denen auch der Einfluss der Ausgangstexte berücksichtigt werden kann.

@article{Menzel_2025,
title = {Eine korpusbasierte diachrone Untersuchung zu {\"u}bersetzten Wissenschaftsartikeln aus den Zeitschriften der Royal Society of London},
author = {Katrin Menzel},
url = {https://chronotopos.eu/cts/article/view/130},
doi = {https://doi.org/10.70596/cts130},
year = {2025},
date = {2025},
journal = {chronotopos – A Journal of Translation History},
pages = {31--57},
volume = {6},
number = {2},
abstract = {Dieser Beitrag beschreibt eine Korpusstudie zu den englischen {\"U}bersetzungen von naturwissenschaftlichen Texten, die in Zeitschriften der Royal Society of London seit dem 17. Jhd. ver{\"o}ffentlicht wurden. Als Datengrundlage dient das Royal Society Corpus (RSC), welches vor allem originalsprachliche englische Fachartikel, aber auch eine beachtliche Anzahl von {\"u}bersetzten englischen Beitr{\"a}gen aus Zeitschriften wie den Philosophical Transactions und den Proceedings der Royal Society beinhaltet. In einem ersten Schritt werden die {\"u}bersetzten Fachartikel in den Daten identifiziert und in einem zusammenfassenden {\"U}berblick im Hinblick auf ihre genauen Entstehungszeiten, Themen, Ausgangssprachen und {\"U}bersetzer analysiert. Dabei stellt sich u. a. heraus, dass die meisten {\"U}bersetzungen im RSC aus dem 18. Jhd. stammen. Daher werden in einem n{\"a}chsten Schritt speziell diese Texte in Bezug auf ausgew{\"a}hlte linguistische Merkmale untersucht, welche geeignet sind, um eine {\"u}bersetzungswissenschaftliche Universalienhypothese, und zwar die ‚Normalisierungshypothese‘, in einem historischen Kontext zu {\"u}berpr{\"u}fen. Hierbei soll gekl{\"a}rt werden, ob die {\"u}bersetzten Texte durch sprachlich weniger innovative Merkmale gepr{\"a}gt sind als nicht-{\"u}bersetzte englische Vergleichstexte aus dem RSC. Insgesamt zeigen die Ergebnisse, dass Normalisierung und eine st{\"a}rkere Nutzung von konventionelleren sprachlichen Strukturen keine auf die historischen Wissenschafts{\"u}bersetzungen zutreffende {\"U}bersetzungspraktiken waren. Anschlie{\ss}end wird ein Ausblick auf den Aufbau eines multilingualen Parallelkorpus mit den {\"u}bersetzten Fachartikeln und ihren jeweiligen Ausgangstexten gegeben, um weitere Untersuchungen zu prototypischen {\"U}bersetzungseigenschaften zu erm{\"o}glichen, bei denen auch der Einfluss der Ausgangstexte ber{\"u}cksichtigt werden kann.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Dyer, Andrew; O’Brien, Colleen Alena

Towards better annotation practices for symmetrical voice in Universal Dependencies Inproceedings

Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025), Association for Computational Linguistics, pp. 137-142, Ljubljana, Slovenia, 2025, ISBN 979-8-89176-292-3.

Austronesian languages exhibit features that are challenging for Universal Dependencies: most notably, the symmetric voice system, whereby agent, patient, and instrumental arguments (among others) can be the pivot of a transitive structure – complicating the usual assumption that subjects of transitive sentences are semantic agents, and objects semantic patients. To showcase our ideas of how to address the representation of such systems in Universal Dependencies, we introduce a small treebank of sentences from texts and elicitation sessions in Gorontalo, an Austronesian language of Sulawesi (Indonesia), which exhibits a Philippine-type voice system. We discuss the annotation guidelines for this language, and the extensions of the Universal Dependencies guidelines that are needed to accommodate this and other Austronesian languages.

@inproceedings{dyer-obrien-2025-towards,
title = {Towards better annotation practices for symmetrical voice in Universal Dependencies},
author = {Andrew Dyer and Colleen Alena O’Brien},
url = {https://aclanthology.org/2025.udw-1.15/},
year = {2025},
date = {2025},
booktitle = {Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)},
isbn = {979-8-89176-292-3},
pages = {137-142},
publisher = {Association for Computational Linguistics},
address = {Ljubljana, Slovenia},
abstract = {Austronesian languages exhibit features that are challenging for Universal Dependencies: most notably, the symmetric voice system, whereby agent, patient, and instrumental arguments (among others) can be the pivot of a transitive structure – complicating the usual assumption that subjects of transitive sentences are semantic agents, and objects semantic patients. To showcase our ideas of how to address the representation of such systems in Universal Dependencies, we introduce a small treebank of sentences from texts and elicitation sessions in Gorontalo, an Austronesian language of Sulawesi (Indonesia), which exhibits a Philippine-type voice system. We discuss the annotation guidelines for this language, and the extensions of the Universal Dependencies guidelines that are needed to accommodate this and other Austronesian languages.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C7

Abdullah, Badr M.; Al Ghussin, Yusser; Al Khalili, Zena; Özyilmaz, Ömer Tarik; Valdenegro-Toro, Matias; Ostermann, Simon; Klakow, Dietrich

Saarland-Groningen at NADI 2025 Shared Task: Effective Dialectal Arabic Speech Processing under Data Constraints Inproceedings

Darwish, Kareem; Ali, Ahmed; Abu Farha, Ibrahim; Touileb, Samia; Zitouni, Imed; Abdelali, Ahmed; Al-Ghamdi, Sharefah; Alkhereyf, Sakhar; Zaghouani, Wajdi; Khalifa, Salam; AlKhamissi, Badr; Almatham, Rawan; Hamed, Injy; Alyafeai, Zaid; Alowisheq, Areeb; Inoue, Go; Mrini, Khalil; Alshammari, Waad (Ed.): Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks, Association for Computational Linguistics, pp. 745-751, Suzhou, China, 2025, ISBN 979-8-89176-356-2.

We present our systems for the NADI 2025 shared task on multidialectal Arabic speech processing, participating in both spoken dialect identification (ADI) and automatic speech recognition (ASR) subtasks. Working under data constraints by using only the provided shared task resources for dialect adaptation, we explore effective model adaptation strategies for dialectal Arabic speech. For ADI, we fine-tune w2v-BERT 2.0 and employ voice conversion as data augmentation, improving accuracy from 68.71% to 76.40% on a blind crossdomain test set. For ASR, we develop two complementary approaches: (1) a CTC-based model pre-trained on public Arabic speech data, and (2) Whisper-based models using twostage fine-tuning. Our experiments show that while dialect-centric CTC models exhibit better zero-shot dialectal performance (58.89 vs 93.90 WER), Whisper achieves better performance after dialect-specific adaptation, which reduces WER from 93.89 to 39.78 WER. We also demonstrate that using character error rate (CER) as a validation criterion provides practical benefits with minimal performance tradeoffs. Despite using no external resources for dialect adaptation beyond the shared task data, our systems ranked second in ADI and third in ASR, demonstrating that careful adaptation strategies can overcome data constraints in dialectal speech processing.

@inproceedings{m-abdullah-etal-2025-saarland,
title = {Saarland-Groningen at NADI 2025 Shared Task: Effective Dialectal Arabic Speech Processing under Data Constraints},
author = {Badr M. Abdullah and Yusser Al Ghussin and Zena Al Khalili and {\"O}mer Tarik {\"O}zyilmaz and Matias Valdenegro-Toro and Simon Ostermann and Dietrich Klakow},
editor = {Kareem Darwish and Ahmed Ali and Ibrahim Abu Farha and Samia Touileb and Imed Zitouni and Ahmed Abdelali and Sharefah Al-Ghamdi and Sakhar Alkhereyf and Wajdi Zaghouani and Salam Khalifa and Badr AlKhamissi and Rawan Almatham and Injy Hamed and Zaid Alyafeai and Areeb Alowisheq and Go Inoue and Khalil Mrini and Waad Alshammari},
url = {https://aclanthology.org/2025.arabicnlp-sharedtasks.102/},
doi = {https://doi.org/10.18653/v1/2025.arabicnlp-sharedtasks.102},
year = {2025},
date = {2025},
booktitle = {Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
isbn = {979-8-89176-356-2},
pages = {745-751},
publisher = {Association for Computational Linguistics},
address = {Suzhou, China},
abstract = {We present our systems for the NADI 2025 shared task on multidialectal Arabic speech processing, participating in both spoken dialect identification (ADI) and automatic speech recognition (ASR) subtasks. Working under data constraints by using only the provided shared task resources for dialect adaptation, we explore effective model adaptation strategies for dialectal Arabic speech. For ADI, we fine-tune w2v-BERT 2.0 and employ voice conversion as data augmentation, improving accuracy from 68.71% to 76.40% on a blind crossdomain test set. For ASR, we develop two complementary approaches: (1) a CTC-based model pre-trained on public Arabic speech data, and (2) Whisper-based models using twostage fine-tuning. Our experiments show that while dialect-centric CTC models exhibit better zero-shot dialectal performance (58.89 vs 93.90 WER), Whisper achieves better performance after dialect-specific adaptation, which reduces WER from 93.89 to 39.78 WER. We also demonstrate that using character error rate (CER) as a validation criterion provides practical benefits with minimal performance tradeoffs. Despite using no external resources for dialect adaptation beyond the shared task data, our systems ranked second in ADI and third in ASR, demonstrating that careful adaptation strategies can overcome data constraints in dialectal speech processing.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Yung, Frances Pik Yu; Suresh, Varsha; Reza, Zaynab; Ahmad, Mansoor; Demberg, Vera

Synthetic Data Augmentation for Cross-domain Implicit Discourse Relation Recognition Inproceedings

Béchet, Frédéric; Lefèvre, Fabrice; Asher, Nicholas; Kim, Seokhwan; Merlin, Teva (Ed.): Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Association for Computational Linguistics, pp. 172-182, Avignon, France, 2025.

Implicit discourse relation recognition (IDRR) – the task of identifying the implicit coherence relation between two text spans – requires deep semantic understanding. Recent studies have shown that zero-/few-shot approaches significantly lag behind supervised models. However, LLMs may be useful for synthetic data augmentation, where LLMs generate a second argument following a specified coherence relation. We applied this approach in a cross-domain setting, generating discourse continuations using unlabelled target-domain data to adapt a base model which was trained on source-domain labelled data. Evaluations conducted on a large-scale test set revealed that different variations of the approach did not result in any significant improvements. We conclude that LLMs often fail to generate useful samples for IDRR, and emphasize the importance of considering both statistical significance and comparability when evaluating IDRR models.

@inproceedings{yung-etal-2025-synthetic,
title = {Synthetic Data Augmentation for Cross-domain Implicit Discourse Relation Recognition},
author = {Frances Pik Yu Yung and Varsha Suresh and Zaynab Reza and Mansoor Ahmad and Vera Demberg},
editor = {Fr{\'e}d{\'e}ric B{\'e}chet and Fabrice Lefèvre and Nicholas Asher and Seokhwan Kim and Teva Merlin},
url = {https://aclanthology.org/2025.sigdial-1.13/},
year = {2025},
date = {2025},
booktitle = {Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue},
pages = {172-182},
publisher = {Association for Computational Linguistics},
address = {Avignon, France},
abstract = {Implicit discourse relation recognition (IDRR) – the task of identifying the implicit coherence relation between two text spans – requires deep semantic understanding. Recent studies have shown that zero-/few-shot approaches significantly lag behind supervised models. However, LLMs may be useful for synthetic data augmentation, where LLMs generate a second argument following a specified coherence relation. We applied this approach in a cross-domain setting, generating discourse continuations using unlabelled target-domain data to adapt a base model which was trained on source-domain labelled data. Evaluations conducted on a large-scale test set revealed that different variations of the approach did not result in any significant improvements. We conclude that LLMs often fail to generate useful samples for IDRR, and emphasize the importance of considering both statistical significance and comparability when evaluating IDRR models.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Saeed, Muhammed; Bourgonje, Peter; Demberg, Vera

Implicit Discourse Relation Classification For Nigerian Pidgin Inproceedings

Rambow, Owen; Wanner, Leo; Apidianaki, Marianna; Al-Khalifa, Hend; Di Eugenio, Barbara; Schockaert, Steven (Ed.): Proceedings of the 31st International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 2561-2574, Abu Dhabi, UAE, 2025.

Nigerian Pidgin (NP) is an English-based creole language spoken by nearly 100 million people across Nigeria, and is still low-resource in NLP. In particular, there are currently no available discourse parsing tools, which, if available, would have the potential to improve various downstream tasks. Our research focuses on implicit discourse relation classification (IDRC) for NP, a task which, even in English, is not easily solved by prompting LLMs, but requires supervised training. % With this in mind, we have developed a framework for the task, which could also be used by researchers for other English-lexified languages. We systematically compare different approaches to the low resource IDRC task: in one approach, we use English IDRC tools directly on the NP text as well as on their English translations (followed by a back-projection of labels). In another approach, we create a synthetic discourse corpus for NP, in which we automatically translate the English discourse-annotated corpus PDTB to NP, project PDTB labels, and then train an NP IDR classifier. The latter approach of training a “native” NP classifier outperforms our baseline by 13.27% and 33.98% in f1 score for 4-way and 11-way classification, respectively.

@inproceedings{saeed-etal-2025-implicit,
title = {Implicit Discourse Relation Classification For Nigerian Pidgin},
author = {Muhammed Saeed and Peter Bourgonje and Vera Demberg},
editor = {Owen Rambow and Leo Wanner and Marianna Apidianaki and Hend Al-Khalifa and Barbara Di Eugenio and Steven Schockaert},
url = {https://aclanthology.org/2025.coling-main.174/},
year = {2025},
date = {2025},
booktitle = {Proceedings of the 31st International Conference on Computational Linguistics},
pages = {2561-2574},
publisher = {Association for Computational Linguistics},
address = {Abu Dhabi, UAE},
abstract = {Nigerian Pidgin (NP) is an English-based creole language spoken by nearly 100 million people across Nigeria, and is still low-resource in NLP. In particular, there are currently no available discourse parsing tools, which, if available, would have the potential to improve various downstream tasks. Our research focuses on implicit discourse relation classification (IDRC) for NP, a task which, even in English, is not easily solved by prompting LLMs, but requires supervised training. % With this in mind, we have developed a framework for the task, which could also be used by researchers for other English-lexified languages. We systematically compare different approaches to the low resource IDRC task: in one approach, we use English IDRC tools directly on the NP text as well as on their English translations (followed by a back-projection of labels). In another approach, we create a synthetic discourse corpus for NP, in which we automatically translate the English discourse-annotated corpus PDTB to NP, project PDTB labels, and then train an NP IDR classifier. The latter approach of training a “native” NP classifier outperforms our baseline by 13.27% and 33.98% in f1 score for 4-way and 11-way classification, respectively.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Suresh, Varsha; Mughal, Muhammad Hamza; Theobalt, Christian; Demberg, Vera

Enhancing Spoken Discourse Modeling in Language Models Using Gestural Cues Inproceedings

Che, Wanxiang; Nabende, Joyce; Shutova, Ekaterina; Taher Pilehvar, Mohammad (Ed.): Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, pp. 18109-18123, Vienna, Austria, 2025, ISBN 979-8-89176-251-0.

Research in linguistics shows that non-verbal cues, such as gestures, play a crucial role in spoken discourse. For example, speakers perform hand gestures to indicate topic shifts, helping listeners identify transitions in discourse. In this work, we investigate whether the joint modeling of gestures using human motion sequences and language can improve spoken discourse modeling in language models. To integrate gestures into language models, we first encode 3D human motion sequences into discrete gesture tokens using a VQ-VAE. These gesture token embeddings are then aligned with text embeddings through feature alignment, mapping them into the text embedding space. To evaluate the gesture-aligned language model on spoken discourse, we construct text infilling tasks targeting three key discourse cues grounded in linguistic research: discourse connectives, stance markers, and quantifiers. Results show that incorporating gestures enhances marker prediction accuracy across the three tasks, highlighting the complementary information that gestures can offer in modeling spoken discourse. We view this work as an initial step toward leveraging non-verbal cues to advance spoken language modeling in language models.

@inproceedings{suresh-etal-2025-enhancing,
title = {Enhancing Spoken Discourse Modeling in Language Models Using Gestural Cues},
author = {Varsha Suresh and Muhammad Hamza Mughal and Christian Theobalt and Vera Demberg},
editor = {Wanxiang Che and Joyce Nabende and Ekaterina Shutova and Mohammad Taher Pilehvar},
url = {https://aclanthology.org/2025.acl-long.886/},
doi = {https://doi.org/10.18653/v1/2025.acl-long.886},
year = {2025},
date = {2025},
booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
isbn = {979-8-89176-251-0},
pages = {18109-18123},
publisher = {Association for Computational Linguistics},
address = {Vienna, Austria},
abstract = {Research in linguistics shows that non-verbal cues, such as gestures, play a crucial role in spoken discourse. For example, speakers perform hand gestures to indicate topic shifts, helping listeners identify transitions in discourse. In this work, we investigate whether the joint modeling of gestures using human motion sequences and language can improve spoken discourse modeling in language models. To integrate gestures into language models, we first encode 3D human motion sequences into discrete gesture tokens using a VQ-VAE. These gesture token embeddings are then aligned with text embeddings through feature alignment, mapping them into the text embedding space. To evaluate the gesture-aligned language model on spoken discourse, we construct text infilling tasks targeting three key discourse cues grounded in linguistic research: discourse connectives, stance markers, and quantifiers. Results show that incorporating gestures enhances marker prediction accuracy across the three tasks, highlighting the complementary information that gestures can offer in modeling spoken discourse. We view this work as an initial step toward leveraging non-verbal cues to advance spoken language modeling in language models.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Marchal, Marian; Hewett, Freya; Scholman, Merel; Shahmohammadi, Sara; Stede, Manfred; Demberg, Vera

The facilitating effect of connectives across relations and languages Journal Article

Frontiers in Language Sciences, 4 - 2025, 2025, ISSN 2813-4605.

The facilitating effect of connectives on discourse processing has been found to be smaller in result relations, compared to other relations (e.g., concession). In addition, connectives are hypothesized to facilitate more in some languages than in others due to typological differences between languages. Speakers of analytic languages (such as English) are assumed to rely more on contextual cues and therefore be less affected by the presence of a connective than speakers of synthetic languages (such as German), who are presumed to rely more on lexical information. We present two self-paced reading studies examining how the effect of a connective depends on the relation type and the language. We find that the presence of a connective facilitates reading more in concession relations than in result relations. This interaction between relation type and relation marking was only found in German.

@article{Marchal_etal_2025:Connectives,
title = {The facilitating effect of connectives across relations and languages},
author = {Marian Marchal and Freya Hewett and Merel Scholman and Sara Shahmohammadi and Manfred Stede and Vera Demberg},
url = {https://www.frontiersin.org/journals/language-sciences/articles/10.3389/flang.2025.1721510},
doi = {https://doi.org/10.3389/flang.2025.1721510},
year = {2025},
date = {2025},
journal = {Frontiers in Language Sciences},
volume = {4 - 2025},
abstract = {The facilitating effect of connectives on discourse processing has been found to be smaller in result relations, compared to other relations (e.g., concession). In addition, connectives are hypothesized to facilitate more in some languages than in others due to typological differences between languages. Speakers of analytic languages (such as English) are assumed to rely more on contextual cues and therefore be less affected by the presence of a connective than speakers of synthetic languages (such as German), who are presumed to rely more on lexical information. We present two self-paced reading studies examining how the effect of a connective depends on the relation type and the language. We find that the presence of a connective facilitates reading more in concession relations than in result relations. This interaction between relation type and relation marking was only found in German.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Alves, Diego; Bagdasarov, Sergei; Teich, Elke

Surprisal Dynamics for the Detection of Multi-Word Expressions in English Inproceedings

Inui, Kentaro; Sakti, Sakriani; Wang, Haofen; F. Wong, Derek; Bhattacharyya, Pushpak; Banerjee, Biplab; Ekbal, Asif; Chakraborty, Tanmoy; Pratap Singh, Dhirendra (Ed.): Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, The Asian Federation of Natural Language Processing and The Association for Computational Linguistics, pp. 1185-1194, Mumbai, India, 2025, ISBN 979-8-89176-303-6.

This work examines the potential of surprisal slope as a feature for identifying multi-word expressions (MWEs) in English, leveraging token-level surprisal estimates from the GPT-2 language model. Evaluations on the DiMSUM and SemEval-2022 datasets reveal that surprisal slope provides moderate yet meaningful discriminative power with a trade-off between specificity and coverage: while high recall indicates that surprisal slope captures many true MWEs, the slightly lower precision reflects false positives, particularly for non-MWEs that follow formulaic patterns (e.g., adjective-noun or verb-pronoun structures). The method performs particularly well for conventionalized expressions, such as idiomatic bigrams in the SemEval-2022 corpus. Both idiomatic and literal usages of these bigrams exhibit negative slopes, with idiomatic instances generally showing a more pronounced decrease.Overall, surprisal slope offers a cognitively motivated and interpretable signal that complements existing MWE identification methods, particularly for conventionalized expressions.

@inproceedings{alves-etal-2025-surprisal,
title = {Surprisal Dynamics for the Detection of Multi-Word Expressions in English},
author = {Diego Alves and Sergei Bagdasarov and Elke Teich},
editor = {Kentaro Inui and Sakriani Sakti and Haofen Wang and Derek F. Wong and Pushpak Bhattacharyya and Biplab Banerjee and Asif Ekbal and Tanmoy Chakraborty and Dhirendra Pratap Singh},
url = {https://aclanthology.org/2025.findings-ijcnlp.72/},
doi = {https://doi.org/10.18653/v1/2025.findings-ijcnlp.72},
year = {2025},
date = {2025},
booktitle = {Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics},
isbn = {979-8-89176-303-6},
pages = {1185-1194},
publisher = {The Asian Federation of Natural Language Processing and The Association for Computational Linguistics},
address = {Mumbai, India},
abstract = {This work examines the potential of surprisal slope as a feature for identifying multi-word expressions (MWEs) in English, leveraging token-level surprisal estimates from the GPT-2 language model. Evaluations on the DiMSUM and SemEval-2022 datasets reveal that surprisal slope provides moderate yet meaningful discriminative power with a trade-off between specificity and coverage: while high recall indicates that surprisal slope captures many true MWEs, the slightly lower precision reflects false positives, particularly for non-MWEs that follow formulaic patterns (e.g., adjective-noun or verb-pronoun structures). The method performs particularly well for conventionalized expressions, such as idiomatic bigrams in the SemEval-2022 corpus. Both idiomatic and literal usages of these bigrams exhibit negative slopes, with idiomatic instances generally showing a more pronounced decrease.Overall, surprisal slope offers a cognitively motivated and interpretable signal that complements existing MWE identification methods, particularly for conventionalized expressions.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Alves, Diego; Bagdasarov, Sergei; Teich, Elke

Surprisal Dynamics for the Detection of Multi-Word Expressions in English Inproceedings

Inui, Kentaro; Sakt, Sakriani; Wang, Haofen; F. Wong, Derek; Bhattacharyya, Pushpak; Banerjee, Biplab; Ekbal, Asif; Chakraborty, Tanmoy; Pratap Singh, Dhirendra (Ed.): Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, The Asian Federation of Natural Language Processing and The Association for Computational Linguistics, pp. 1185-1194, Mumbai, India, 2025, ISBN 979-8-89176-303-6.

This work examines the potential of surprisal slope as a feature for identifying multi-word expressions (MWEs) in English, leveraging token-level surprisal estimates from the GPT-2 language model. Evaluations on the DiMSUM and SemEval-2022 datasets reveal that surprisal slope provides moderate yet meaningful discriminative power with a trade-off between specificity and coverage: while high recall indicates that surprisal slope captures many true MWEs, the slightly lower precision reflects false positives, particularly for non-MWEs that follow formulaic patterns (e.g., adjective-noun or verb-pronoun structures). The method performs particularly well for conventionalized expressions, such as idiomatic bigrams in the SemEval-2022 corpus. Both idiomatic and literal usages of these bigrams exhibit negative slopes, with idiomatic instances generally showing a more pronounced decrease.Overall, surprisal slope offers a cognitively motivated and interpretable signal that complements existing MWE identification methods, particularly for conventionalized expressions.

@inproceedings{alves-etal-2025-surprisal,
title = {Surprisal Dynamics for the Detection of Multi-Word Expressions in English},
author = {Diego Alves and Sergei Bagdasarov and Elke Teich},
editor = {Kentaro Inui and Sakriani Sakt and Haofen Wang and Derek F. Wong and Pushpak Bhattacharyya and Biplab Banerjee and Asif Ekbal and Tanmoy Chakraborty and Dhirendra Pratap Singh},
url = {https://aclanthology.org/2025.findings-ijcnlp.72/},
doi = {https://doi.org/10.18653/v1/2025.findings-ijcnlp.72},
year = {2025},
date = {2025},
booktitle = {Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics},
isbn = {979-8-89176-303-6},
pages = {1185-1194},
publisher = {The Asian Federation of Natural Language Processing and The Association for Computational Linguistics},
address = {Mumbai, India},
abstract = {This work examines the potential of surprisal slope as a feature for identifying multi-word expressions (MWEs) in English, leveraging token-level surprisal estimates from the GPT-2 language model. Evaluations on the DiMSUM and SemEval-2022 datasets reveal that surprisal slope provides moderate yet meaningful discriminative power with a trade-off between specificity and coverage: while high recall indicates that surprisal slope captures many true MWEs, the slightly lower precision reflects false positives, particularly for non-MWEs that follow formulaic patterns (e.g., adjective-noun or verb-pronoun structures). The method performs particularly well for conventionalized expressions, such as idiomatic bigrams in the SemEval-2022 corpus. Both idiomatic and literal usages of these bigrams exhibit negative slopes, with idiomatic instances generally showing a more pronounced decrease.Overall, surprisal slope offers a cognitively motivated and interpretable signal that complements existing MWE identification methods, particularly for conventionalized expressions.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Stein, Katharina; Fišer, Daniel; Hoffmann, Jörg; Koller, Alexander

Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning Inproceedings

Proceedings of the International Conference on Automated Planning and Scheduling, 35, pp. 250-259, 2025.
Large language models (LLMs) have revolutionized a large variety of NLP tasks. An active debate is to what extent they can do reasoning and planning. Prior work has assessed the latter in the specific context of PDDL planning, based on manually converting three PDDL domains into natural language (NL) prompts. Here we automate this conversion step, showing how to leverage an LLM to automatically generate NL prompts from PDDL input. Our automatically generated NL prompts result in similar LLM-planning performance as the previous manually generated ones. Beyond this, the automation enables us to run much larger experiments, providing for the first time a broad evaluation of LLM planning performance in PDDL. Our NL prompts yield better performance than PDDL prompts and simple template-based NL prompts. Compared to symbolic planners, LLM planning lags far behind; but in some domains, our best LLM configuration scales up further than A* using LM-cut.

@inproceedings{Stein_2025_etal:PDDLplanning,
title = {Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning},
author = {Katharina Stein and Daniel Fišer and J{\"o}rg Hoffmann and Alexander Koller},
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/36126},
doi = {https://doi.org/10.1609/icaps.v35i1.36126},
year = {2025},
date = {2025-12-09},
booktitle = {Proceedings of the International Conference on Automated Planning and Scheduling},
pages = {250-259},
abstract = {

Large language models (LLMs) have revolutionized a large variety of NLP tasks. An active debate is to what extent they can do reasoning and planning. Prior work has assessed the latter in the specific context of PDDL planning, based on manually converting three PDDL domains into natural language (NL) prompts. Here we automate this conversion step, showing how to leverage an LLM to automatically generate NL prompts from PDDL input. Our automatically generated NL prompts result in similar LLM-planning performance as the previous manually generated ones. Beyond this, the automation enables us to run much larger experiments, providing for the first time a broad evaluation of LLM planning performance in PDDL. Our NL prompts yield better performance than PDDL prompts and simple template-based NL prompts. Compared to symbolic planners, LLM planning lags far behind; but in some domains, our best LLM configuration scales up further than A* using LM-cut.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Lauer, Pascal; Torralba, Álvaro; Höller, Daniel; Hoffmann, Jörg

Continuing the Quest for Polynomial Time Heuristics in PDDL Input Size: Tractable Cases for Lifted hadd Inproceedings

Proceedings of the International Conference on Automated Planning and Scheduling, 35, pp. 74-83, 2025.

Recent interest in solving planning tasks, where full grounding is infeasible, has highlighted the need to compute heuristics at a lifted level. We turn our attention to the evaluation of the hᵃᵈᵈ heuristic, which is an important cornerstone in many classical planning approaches, including the best performing lifted planning approach. We show that hᵃᵈᵈ’s grounded efficiency does not extend to lifted tasks, where the computation is EXPTIME-complete. This prompts to identify tractability islands matching practical use cases. We identify two, where a lifted computation is feasible while grounding may fail: The first constraints to acyclic action schemata and bounds predicate arity. For the second case we introduce a novel computation, operating without grounding. Assuming the extraction encounters only acyclic conditions, and hᵃᵈᵈ values per subgoal are bounded, it remains tractable. (Even with unbounded predicate and action arity.) In an empirical evaluation of the new technique, we observe complementary behavior to the existing lifted forward hᵃᵈᵈ evaluation. Combining both sets a new state-of-the-art in pure-heuristic performance on the hard-to-ground benchmarks.

@inproceedings{Lauer_etal_2025:hadd,
title = {Continuing the Quest for Polynomial Time Heuristics in PDDL Input Size: Tractable Cases for Lifted hadd},
author = {Pascal Lauer and {\'A}lvaro Torralba and Daniel H{\"o}ller and J{\"o}rg Hoffmann},
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/36103},
doi = {https://doi.org/10.1609/icaps.v35i1.36103},
year = {2025},
date = {2025},
booktitle = {Proceedings of the International Conference on Automated Planning and Scheduling},
pages = {74-83},
abstract = {Recent interest in solving planning tasks, where full grounding is infeasible, has highlighted the need to compute heuristics at a lifted level. We turn our attention to the evaluation of the hᵃᵈᵈ heuristic, which is an important cornerstone in many classical planning approaches, including the best performing lifted planning approach. We show that hᵃᵈᵈ’s grounded efficiency does not extend to lifted tasks, where the computation is EXPTIME-complete. This prompts to identify tractability islands matching practical use cases. We identify two, where a lifted computation is feasible while grounding may fail: The first constraints to acyclic action schemata and bounds predicate arity. For the second case we introduce a novel computation, operating without grounding. Assuming the extraction encounters only acyclic conditions, and hᵃᵈᵈ values per subgoal are bounded, it remains tractable. (Even with unbounded predicate and action arity.) In an empirical evaluation of the new technique, we observe complementary behavior to the existing lifted forward hᵃᵈᵈ evaluation. Combining both sets a new state-of-the-art in pure-heuristic performance on the hard-to-ground benchmarks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Xue, Wei; Steuer, Julius; Klakow, Dietrich; Möbius, Bernd

The role of surprisal and entropy in spoken intercomprehension: An experiment on translation of cognates with varied predictability Inproceedings

ITG Conference on Speech Communication, pp. 21-25, Berlin, Germany, 2025.

Word predictability affects comprehension and speech perception, yet its role in intercomprehension – understanding foreign languages without prior learning – remains understudied. While surprisal and entropy, derived from language models (LMs), capture different aspects of predictability, this study aims to explore how these estimates explain intercomprehension success. We asked German and English native speakers to translate Dutch cognates from spoken utterances with varied predictability. We extracted the estimates from cascade Automatic Speech Recognition with LM and LM-only approaches. Results revealed that both approaches explained translation accuracy to a similar extent, but only LM-only estimates were significant. Also, German speakers seem to leverage contextual information as in native comprehension, while English speakers do not. These findings highlight that beyond LM-based estimates, typological proximity shapes intercomprehension in varied predictability contexts.

@inproceedings{Xue/etal:2025b,
title = {The role of surprisal and entropy in spoken intercomprehension: An experiment on translation of cognates with varied predictability},
author = {Wei Xue and Julius Steuer and Dietrich Klakow and Bernd M{\"o}bius},
doi = {https://doi.org/10.30420/456617005},
year = {2025},
date = {2025},
booktitle = {ITG Conference on Speech Communication},
pages = {21-25},
address = {Berlin, Germany},
abstract = {Word predictability affects comprehension and speech perception, yet its role in intercomprehension – understanding foreign languages without prior learning – remains understudied. While surprisal and entropy, derived from language models (LMs), capture different aspects of predictability, this study aims to explore how these estimates explain intercomprehension success. We asked German and English native speakers to translate Dutch cognates from spoken utterances with varied predictability. We extracted the estimates from cascade Automatic Speech Recognition with LM and LM-only approaches. Results revealed that both approaches explained translation accuracy to a similar extent, but only LM-only estimates were significant. Also, German speakers seem to leverage contextual information as in native comprehension, while English speakers do not. These findings highlight that beyond LM-based estimates, typological proximity shapes intercomprehension in varied predictability contexts.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Häuser, Katja; Grüter, Theres

Reduced false memory effects for predictable words in L2 speakers of German: evidence from self-paced reading and recognition memory Journal Article

Applied Psycholinguistics, 46, pp. e26, 2025, ISSN 0142-7164, 1469-1817.
Previous research has demonstrated that predictable words that are not presented linger in memory and lead to false recognition in subsequent memory tests. However, little is known about these effects among second language learners, a population that is known for engaging less in prediction. Here, we used a self-paced reading and word recognition memory test to examine encoding differences and subsequent memory effects in groups of L1 and L2 speakers of German. For initial reading, results showed no group differences in the size of the predictability effect, possibly because group differences in attention allocation during reading masked predictability effects. For recognition memory, L2 learners showed reduced rates of false remembering for predictable words (after correcting for response bias), and they were also less likely to false-alarm to predictable words with high subjective memory confidence, similar to L1 speakers. In addition, L2 learners showed reduced recognition memory for previously presented words. Taken together, these results are consistent with models arguing that lexical-semantic entries are less firmly represented in the L2 lexicon, which in turn lowers pre-activation of predictable referents during L2 sentence processing and leads to the formation of less distinct memory representations for previously encoded information.

@article{haeuser_reduced_2025,
title = {Reduced false memory effects for predictable words in L2 speakers of German: evidence from self-paced reading and recognition memory},
author = {Katja H{\"a}user and Theres Gr{\"u}ter},
url = {https://www.cambridge.org/core/product/identifier/S0142716425100155/type/journal_article},
doi = {https://doi.org/10.1017/S0142716425100155},
year = {2025},
date = {2025},
journal = {Applied Psycholinguistics},
pages = {e26},
volume = {46},
abstract = {

Previous research has demonstrated that predictable words that are not presented linger in memory and lead to false recognition in subsequent memory tests. However, little is known about these effects among second language learners, a population that is known for engaging less in prediction. Here, we used a self-paced reading and word recognition memory test to examine encoding differences and subsequent memory effects in groups of L1 and L2 speakers of German. For initial reading, results showed no group differences in the size of the predictability effect, possibly because group differences in attention allocation during reading masked predictability effects. For recognition memory, L2 learners showed reduced rates of false remembering for predictable words (after correcting for response bias), and they were also less likely to false-alarm to predictable words with high subjective memory confidence, similar to L1 speakers. In addition, L2 learners showed reduced recognition memory for previously presented words. Taken together, these results are consistent with models arguing that lexical-semantic entries are less firmly represented in the L2 lexicon, which in turn lowers pre-activation of predictable referents during L2 sentence processing and leads to the formation of less distinct memory representations for previously encoded information.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A5

Alabi, Jesujoba; Azime, Israel Abebe; Zhang, Miaoran; España-Bonet, Cristina; Bawden, Rachel; Zhu, Dawei; Adelani, David; Odoje, Clement Oyeleke; Akinade, Idris; Maab, Iffat; David, Davis; Muhammad, Shamsuddeen Hassan; Putini, Neo ; Ademuyiwa, David O.; Caines, Andrew; Klakow, Dietrich

AFRIDOC-MT: Document-level MT Corpus for African Languages Inproceedings

Christodoulopoulos, Christos; Chakraborty, Tanmoy; Rose, Carolyn; Peng, Violet (Ed.): Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 27770-27806, Suzhou, China, 2025, ISBN 979-8-89176-332-6.

This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, Hausa, Swahili, Yorùbá, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating the ability of neural machine translation (NMT) models and large language models (LLMs) to translate between English and these languages, at both the sentence and pseudo-document levels, the outputs being realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieves the best average performance among the standard NMT models, while GPT-4o outperforms general-purpose LLMs. Fine-tuning selected models leads to substantial performance gains, but models trained on sentences struggle to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, over-generation, repetition of words and phrases, and off-target translations, specifically for translation into African languages.

@inproceedings{alabi-etal-2025-afridoc,
title = {AFRIDOC-MT: Document-level MT Corpus for African Languages},
author = {Jesujoba Alabi and Israel Abebe Azime and Miaoran Zhang and Cristina Espa{\~n}a-Bonet and Rachel Bawden and Dawei Zhu and David Adelani and Clement Oyeleke Odoje and Idris Akinade and Iffat Maab and Davis David and Shamsuddeen Hassan Muhammad and Neo Putini and David O. Ademuyiwa and Andrew Caines and Dietrich Klakow},
editor = {Christos Christodoulopoulos and Tanmoy Chakraborty and Carolyn Rose and Violet Peng},
url = {https://aclanthology.org/2025.emnlp-main.1413/},
doi = {https://doi.org/10.18653/v1/2025.emnlp-main.1413},
year = {2025},
date = {2025},
booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
isbn = {979-8-89176-332-6},
pages = {27770-27806},
publisher = {Association for Computational Linguistics},
address = {Suzhou, China},
abstract = {This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, Hausa, Swahili, Yorùb{\'a}, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating the ability of neural machine translation (NMT) models and large language models (LLMs) to translate between English and these languages, at both the sentence and pseudo-document levels, the outputs being realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieves the best average performance among the standard NMT models, while GPT-4o outperforms general-purpose LLMs. Fine-tuning selected models leads to substantial performance gains, but models trained on sentences struggle to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, over-generation, repetition of words and phrases, and off-target translations, specifically for translation into African languages.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B4 B6

Alabi, Jesujoba; Hedderich, Michael; Adelani, David; Klakow, Dietrich

Charting the landscape of african nlp: Mapping progress and shaping the road ahead Inproceedings

Christodoulopoulos, Christos; Chakraborty, Tanmoy; Rose, Carolyn; Peng, Violet (Ed.): Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 27807-27841, Suzhou, China, 2025, ISBN 979-8-89176-332-6.

With over 2,000 languages and potentially millions of speakers, Africa represents one of the richest linguistic regions in the world. Yet, this diversity is scarcely reflected in state-of-the-art natural language processing (NLP) systems and large language models (LLMs), which predominantly support a narrow set of high-resource languages. This exclusion not only limits the reach and utility of modern NLP technologies but also risks widening the digital divide across linguistic communities. Nevertheless, NLP research on African languages is active and growing. In recent years, there has been a surge of interest in this area, driven by several factors{—}including the creation of multilingual language resources, the rise of community-led initiatives, and increased support through funding programs. In this survey, we analyze 884 research papers on NLP for African languages published over the past five years, offering a comprehensive overview of recent progress across core tasks. We identify key trends shaping the field and conclude by outlining promising directions to foster more inclusive and sustainable NLP research for African languages.

@inproceedings{alabi-etal-2025-charting,
title = {Charting the landscape of african nlp: Mapping progress and shaping the road ahead},
author = {Jesujoba Alabi and Michael Hedderich and David Adelani and Dietrich Klakow},
editor = {Christos Christodoulopoulos and Tanmoy Chakraborty and Carolyn Rose and Violet Peng},
url = {https://aclanthology.org/2025.emnlp-main.1414/},
doi = {https://doi.org/10.18653/v1/2025.emnlp-main.1414},
year = {2025},
date = {2025},
booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
isbn = {979-8-89176-332-6},
pages = {27807-27841},
publisher = {Association for Computational Linguistics},
address = {Suzhou, China},
abstract = {With over 2,000 languages and potentially millions of speakers, Africa represents one of the richest linguistic regions in the world. Yet, this diversity is scarcely reflected in state-of-the-art natural language processing (NLP) systems and large language models (LLMs), which predominantly support a narrow set of high-resource languages. This exclusion not only limits the reach and utility of modern NLP technologies but also risks widening the digital divide across linguistic communities. Nevertheless, NLP research on African languages is active and growing. In recent years, there has been a surge of interest in this area, driven by several factors{---}including the creation of multilingual language resources, the rise of community-led initiatives, and increased support through funding programs. In this survey, we analyze 884 research papers on NLP for African languages published over the past five years, offering a comprehensive overview of recent progress across core tasks. We identify key trends shaping the field and conclude by outlining promising directions to foster more inclusive and sustainable NLP research for African languages.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Verkerk, Annemarie; Shcherbakova, Olena; Haynie, Hannah J. ; Skirgård, Hedvig; Rzymski, Christoph; Atkinson, Quentin D.; Greenhill, Simon J.; Gray, Russell D.

Enduring constraints on grammar revealed by Bayesian spatiophylogenetic analyses Journal Article

Nature Human Behaviour, 2025, ISSN 2397-3374.

Human languages show astonishing variety, yet their diversity is constrained by recurring patterns. Linguists have long argued over the extent and causes of these grammatical ‘universals’. Using Grambank—a comprehensive database of grammatical features across the world’s languages—we tested 191 proposed universals with Bayesian analyses that account for both genealogical descent and geographical proximity. We find statistical support for about a third of the proposed linguistic universals. The majority of these concern word order and hierarchical universals: two types that have featured prominently in earlier work. Evolutionary analyses show that languages tend to change in ways that converge on these preferred patterns. This suggests that, despite the vast design space of possible grammars, languages do not evolve entirely at random. Shared cognitive and communicative pressures repeatedly push languages towards similar solutions.

@article{Verkerk-etal-2025-Bayesian,
title = {Enduring constraints on grammar revealed by Bayesian spatiophylogenetic analyses},
author = {Annemarie Verkerk and Olena Shcherbakova and Hannah J. Haynie and Hedvig Skirgård and Christoph Rzymski and Quentin D. Atkinson and Simon J. Greenhill and Russell D. Gray},
url = {https://doi.org/10.1038/s41562-025-02325-z},
doi = {https://doi.org/10.1038/s41562-025-02325-z},
year = {2025},
date = {2025},
journal = {Nature Human Behaviour},
abstract = {Human languages show astonishing variety, yet their diversity is constrained by recurring patterns. Linguists have long argued over the extent and causes of these grammatical ‘universals’. Using Grambank—a comprehensive database of grammatical features across the world’s languages—we tested 191 proposed universals with Bayesian analyses that account for both genealogical descent and geographical proximity. We find statistical support for about a third of the proposed linguistic universals. The majority of these concern word order and hierarchical universals: two types that have featured prominently in earlier work. Evolutionary analyses show that languages tend to change in ways that converge on these preferred patterns. This suggests that, despite the vast design space of possible grammars, languages do not evolve entirely at random. Shared cognitive and communicative pressures repeatedly push languages towards similar solutions.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C7

Höltje, Gerrit; Bader, Regine; Meßmer, Julia; Zogaj, Doruntinë; Mecklinger, Axel

Unexpected words that become your best memories: How sentential constraint and word expectedness affect memory retrieval Journal Article

Frontiers in Human Neuroscience, 19 - 2025, 2025, ISSN 1662-5161.

Much is known about how the strength of contextual support from strongly constraining (SC) and weakly constraining (WC) sentences influences the online processing of expected (EXP) and unexpected (UNEXP) sentence-ending words. In the present study, we investigated the long-term mnemonic consequences associated with the processing of contextually constraint words and used event-related potentials (ERPs) to explore the memory retrieval mechanisms at work. Furthermore, we investigated false memories for expected but unpresented words. If these unpresented words remained highly accessible in memory, their false recognition as familiar would manifest in a larger early frontal old/new effect, the putative ERP correlate of episodic familiarity. Behavioral results indicated that strongly expected and highly unexpected words were more likely to be recognized, whereas memory for moderately expected words was attenuated. However, the anticipated early frontal old/new effects in these conditions did not materialize. Instead, the retrieval of highly unexpected (SC-UNEXP) words was characterized by a late parietal old/new effect, reflecting a reliance on recollection-based processes. Unexpectedly, during retrieval SC-UNEXP words also evoked a late frontal positivity, a pattern usually associated with the inhibition of unpresented expected words during encoding. This suggests that the retrieval of these words reactivated inhibitory mechanisms akin to those activated during encoding. Additionally, expected lures that were correctly identified as new elicited a broadly distributed positive slow wave, indicative of recollective processing in support of a recall-to-reject strategy. This latter effect was observed irrespective of the predictive strength of the contextual support.

@article{höltje-etal-unexpected-2025,
title = {Unexpected words that become your best memories: How sentential constraint and word expectedness affect memory retrieval},
author = {Gerrit H{\"o}ltje and Regine Bader and Julia Me{\ss}mer and Doruntinë Zogaj and Axel Mecklinger},
url = {https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2025.1645907},
doi = {https://doi.org/10.3389/fnhum.2025.1645907},
year = {2025},
date = {2025},
journal = {Frontiers in Human Neuroscience},
volume = {19 - 2025},
abstract = {

Much is known about how the strength of contextual support from strongly constraining (SC) and weakly constraining (WC) sentences influences the online processing of expected (EXP) and unexpected (UNEXP) sentence-ending words. In the present study, we investigated the long-term mnemonic consequences associated with the processing of contextually constraint words and used event-related potentials (ERPs) to explore the memory retrieval mechanisms at work. Furthermore, we investigated false memories for expected but unpresented words. If these unpresented words remained highly accessible in memory, their false recognition as familiar would manifest in a larger early frontal old/new effect, the putative ERP correlate of episodic familiarity. Behavioral results indicated that strongly expected and highly unexpected words were more likely to be recognized, whereas memory for moderately expected words was attenuated. However, the anticipated early frontal old/new effects in these conditions did not materialize. Instead, the retrieval of highly unexpected (SC-UNEXP) words was characterized by a late parietal old/new effect, reflecting a reliance on recollection-based processes. Unexpectedly, during retrieval SC-UNEXP words also evoked a late frontal positivity, a pattern usually associated with the inhibition of unpresented expected words during encoding. This suggests that the retrieval of these words reactivated inhibitory mechanisms akin to those activated during encoding. Additionally, expected lures that were correctly identified as new elicited a broadly distributed positive slow wave, indicative of recollective processing in support of a recall-to-reject strategy. This latter effect was observed irrespective of the predictive strength of the contextual support.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A6

Tumurchuluun, Ariun-Erdene; Al Ghussin, Yusser; Mareček, David ; van Genabith, Josef; Dutta Chowdhury, Koel

TenseLoC: Tense Localization and Control in a Multilingual LLM Inproceedings

Ifeoluwa Adelani, David; Arnett, Catherine; Ataman, Duygu; A. Chang, Tyler; Gonen, Hila; Raja, Rahul; Schmidt, Fabian; Stap, David; Wang, Jiayi (Ed.): Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), Association for Computational Linguistics, pp. 243-264, Suzhuo, China, 2025, ISBN 979-8-89176-345-6.

Multilingual language models excel across languages, yet how they internally encode grammatical tense remains largely unclear. We investigate how decoder-only transformers represent, transfer, and control tense across eight typologically diverse languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. We construct a synthetic tense-annotated dataset and combine probing, causal analysis, feature disentanglement, and model steering to LLaMA-3.1 8B. We show that tense emerges as a distinct signal from early layers and transfers most strongly within the same language family. Causal tracing reveals that attention outputs around layer 16 consistently carry cross-lingually transferable tense information. Leveraging sparse autoencoders in this subspace, we isolate and steer English tense-related features, improving target-tense prediction accuracy by up to 11%% in a downstream cloze task.

@inproceedings{tumurchuluun-etal-2025-tenseloc,
title = {TenseLoC: Tense Localization and Control in a Multilingual LLM},
author = {Ariun-Erdene Tumurchuluun and Yusser Al Ghussin and David Mare{\v{c}ek and Josef van Genabith and Koel Dutta Chowdhury},
editor = {David Ifeoluwa Adelani and Catherine Arnett and Duygu Ataman and Tyler A. Chang and Hila Gonen and Rahul Raja and Fabian Schmidt and David Stap and Jiayi Wang},
url = {https://aclanthology.org/2025.mrl-main.17/},
doi = {https://doi.org/10.18653/v1/2025.mrl-main.17},
year = {2025},
date = {2025},
booktitle = {Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)},
isbn = {979-8-89176-345-6},
pages = {243-264},
publisher = {Association for Computational Linguistics},
address = {Suzhuo, China},
abstract = {Multilingual language models excel across languages, yet how they internally encode grammatical tense remains largely unclear. We investigate how decoder-only transformers represent, transfer, and control tense across eight typologically diverse languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. We construct a synthetic tense-annotated dataset and combine probing, causal analysis, feature disentanglement, and model steering to LLaMA-3.1 8B. We show that tense emerges as a distinct signal from early layers and transfers most strongly within the same language family. Causal tracing reveals that attention outputs around layer 16 consistently carry cross-lingually transferable tense information. Leveraging sparse autoencoders in this subspace, we isolate and steer English tense-related features, improving target-tense prediction accuracy by up to 11%% in a downstream cloze task.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Successfully