Publications

Scholman, Merel; Dong, Tianai; Yung, Frances Pik Yu; Demberg, Vera

DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations Journal Article

Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 22), Marseille, France, pp. 3281-3290, 2022.

We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech, literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses. Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into non-connective signals of discourse relations.

@article{Scholman_et-al22.2,
title = {DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations},
author = {Merel Scholman and Tianai Dong and Frances Pik Yu Yung and Vera Demberg},
url = {https://aclanthology.org/2022.lrec-1.351/},
year = {2022},
date = {2022},
journal = {Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 22), Marseille, France},
pages = {3281-3290},
abstract = {We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech, literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses. Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into non-connective signals of discourse relations.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Scholman, Merel; Demberg, Vera; Sanders, Ted J. M.

Descriptively adequate and cognitively plausible? Validating distinctions between types of coherence relations Journal Article

Discours, 30, pp. 1-30a, 2022.

A central issue in linguistics concerns the relationship between theories and evidence in data. We investigate this issue in the field of discourse coherence, and particularly the study of coherence relations such as causal and contrastive. Proposed inventories of coherence relations differ greatly in the type and number of proposed relations. Such proposals are often validated by focusing on either the descriptive adequacy (researcher’s intuitions on textual interpretations) or the cognitive plausibility of distinctions (empirical research on cognition). We argue that both are important, and note that the concept of cognitive plausibility is in need of a concrete definition and quantifiable operationalization. This contribution focuses on how the criterion of cognitive plausibility can be operationalized and presents a systematic validation approach to evaluate discourse frameworks. This is done by detailing how various sources of evidence can be used to support or falsify distinctions between coherence relational labels. Finally, we present methodological issues regarding verification and falsification that are of importance to all discourse researchers studying the relationship between theory and data.

@article{Scholman_etal22,
title = {Descriptively adequate and cognitively plausible? Validating distinctions between types of coherence relations},
author = {Merel Scholman and Vera Demberg and Ted J. M. Sanders},
url = {https://journals.openedition.org/discours/12075},
year = {2022},
date = {2022},
journal = {Discours},
pages = {1-30a},
volume = {30},
abstract = {A central issue in linguistics concerns the relationship between theories and evidence in data. We investigate this issue in the field of discourse coherence, and particularly the study of coherence relations such as causal and contrastive. Proposed inventories of coherence relations differ greatly in the type and number of proposed relations. Such proposals are often validated by focusing on either the descriptive adequacy (researcher’s intuitions on textual interpretations) or the cognitive plausibility of distinctions (empirical research on cognition). We argue that both are important, and note that the concept of cognitive plausibility is in need of a concrete definition and quantifiable operationalization. This contribution focuses on how the criterion of cognitive plausibility can be operationalized and presents a systematic validation approach to evaluate discourse frameworks. This is done by detailing how various sources of evidence can be used to support or falsify distinctions between coherence relational labels. Finally, we present methodological issues regarding verification and falsification that are of importance to all discourse researchers studying the relationship between theory and data.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Marchal, Marian; Scholman, Merel; Yung, Frances Pik Yu; Demberg, Vera

Establishing annotation quality in multi-label annotations Inproceedings

Proceedings of the 29th International Conference on Computational Linguistic (COLING)Proceedings of the 29th International Conference on Computational Linguistic (COLING), pp. 3659–3668, 2022.

In many linguistic fields requiring annotated data, multiple interpretations of a single item are possible. Multi-label annotations more accurately reflect this possibility. However, allowing for multi-label annotations also affects the chance that two coders agree with each other. Calculating inter-coder agreement for multi-label datasets is therefore not trivial. In the current contribution, we evaluate different metrics for calculating agreement on multi-label annotations: agreement on the intersection of annotated labels, an augmented version of Cohen’s Kappa, and precision, recall and F1. We propose a bootstrapping method to obtain chance agreement for each measure, which allows us to obtain an adjusted agreement coefficient that is more interpretable. We demonstrate how various measures affect estimates of agreement on simulated datasets and present a case study of discourse relation annotations. We also show how the proportion of double labels, and the entropy of the label distribution, influences the measures outlined above and how a bootstrapped adjusted agreement can make agreement measures more comparable across datasets in multi-label scenarios.

@inproceedings{Marchaletal22-2,
title = {Establishing annotation quality in multi-label annotations},
author = {Marian Marchal and Merel Scholman and Frances Pik Yu Yung and Vera Demberg},
url = {https://aclanthology.org/2022.coling-1.322/},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 29th International Conference on Computational Linguistic (COLING)},
pages = {3659–3668},
abstract = {In many linguistic fields requiring annotated data, multiple interpretations of a single item are possible. Multi-label annotations more accurately reflect this possibility. However, allowing for multi-label annotations also affects the chance that two coders agree with each other. Calculating inter-coder agreement for multi-label datasets is therefore not trivial. In the current contribution, we evaluate different metrics for calculating agreement on multi-label annotations: agreement on the intersection of annotated labels, an augmented version of Cohen’s Kappa, and precision, recall and F1. We propose a bootstrapping method to obtain chance agreement for each measure, which allows us to obtain an adjusted agreement coefficient that is more interpretable. We demonstrate how various measures affect estimates of agreement on simulated datasets and present a case study of discourse relation annotations. We also show how the proportion of double labels, and the entropy of the label distribution, influences the measures outlined above and how a bootstrapped adjusted agreement can make agreement measures more comparable across datasets in multi-label scenarios.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Marchal, Marian; Scholman, Merel; Demberg, Vera

The effect of domain knowledge on discourse relation inferences: Relation marking and interpretation strategies Journal Article

Dialogue & Discourse, 13, pp. 49-78, 2022.

It is generally assumed that readers draw on their background knowledge to make inferences about information that is left implicit in the text. However, readers may differ in how much background knowledge they have, which may impact their text understanding. The present study investigates the role of domain knowledge in discourse relation interpretation, in order to examine how readers with high vs. low domain knowledge differ in their discourse relation inferences. We compare interpretations of experts from the field of economics and biomedical sciences in scientific biomedical texts as well as more easily accessible economic texts. The results show that high-knowledge readers from the biomedical domain are better at inferring the correct relation interpretation in biomedical texts compared to low-knowledge readers, but such an effect was not found for the economic domain. The results also suggest that, in the absence of domain knowledge, readers exploit linguistic signals other than connectives to infer the discourse relation, but domain knowledge is sometimes required to exploit these cues. The study provides insight into the impact of domain knowledge on discourse relation inferencing and how readers interpret discourse relations when they lack the required domain knowledge.

@article{Marchaletal22,
title = {The effect of domain knowledge on discourse relation inferences: Relation marking and interpretation strategies},
author = {Marian Marchal and Merel Scholman and Vera Demberg},
url = {https://journals.uic.edu/ojs/index.php/dad/article/view/12343/10711},
year = {2022},
date = {2022},
journal = {Dialogue & Discourse},
pages = {49-78},
volume = {13},
number = {(2)},
abstract = {It is generally assumed that readers draw on their background knowledge to make inferences about information that is left implicit in the text. However, readers may differ in how much background knowledge they have, which may impact their text understanding. The present study investigates the role of domain knowledge in discourse relation interpretation, in order to examine how readers with high vs. low domain knowledge differ in their discourse relation inferences. We compare interpretations of experts from the field of economics and biomedical sciences in scientific biomedical texts as well as more easily accessible economic texts. The results show that high-knowledge readers from the biomedical domain are better at inferring the correct relation interpretation in biomedical texts compared to low-knowledge readers, but such an effect was not found for the economic domain. The results also suggest that, in the absence of domain knowledge, readers exploit linguistic signals other than connectives to infer the discourse relation, but domain knowledge is sometimes required to exploit these cues. The study provides insight into the impact of domain knowledge on discourse relation inferencing and how readers interpret discourse relations when they lack the required domain knowledge.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Andreeva, Bistra; Dimitrova, Snezhina

The influence of L1 prosody on Bulgarian-accented German and English Inproceedings

Proc. Speech Prosody 2022, pp. 764-768, Lisbon, 2022.

The present study investigates L2 prosodic realizations in the readings of two groups of Bulgarian informants: (a) with L2 German, and (b) with L2 English. Each group consisted of ten female learners, who read the fable “The North Wind and the Sun” in their L1 and in the respective L2. We also recorded two groups of female native speakers of the target languages as controls. The following durational parameters were obtained: mean accented syllable duration, accented/naccented duration ratio, speaking rate. With respect to F0 parameters, mean, median, minimum, maximum, span in semitones, and standard deviations per IP were measured. Additionally, we calculated the number of accented and unaccented syllables, IPs and pauses in each reading. Statistical analyses show that the two groups differ in their use of F0. Both groups use higher standard deviation and level in their L2, whereas the ‘German group’ use higher pitch span as well. The number of accented syllables, IPs and pauses is also higher in L2. Regarding duration, both groups use slower articulation rate. The accented/unaccented syllable duration ratio is lower in L2 for the ‘English group’. We also provide original data on speaking rate in Bulgarian from an information theoretical perspective.

@inproceedings{andreeva_2022_speechprosody,
title = {The influence of L1 prosody on Bulgarian-accented German and English},
author = {Bistra Andreeva and Snezhina Dimitrova},
url = {https://www.isca-speech.org/archive/speechprosody_2022/andreeva22_speechprosody.html},
doi = {https://doi.org/10.21437/SpeechProsody.2022-155},
year = {2022},
date = {2022},
booktitle = {Proc. Speech Prosody 2022},
pages = {764-768},
address = {Lisbon},
abstract = {The present study investigates L2 prosodic realizations in the readings of two groups of Bulgarian informants: (a) with L2 German, and (b) with L2 English. Each group consisted of ten female learners, who read the fable “The North Wind and the Sun” in their L1 and in the respective L2. We also recorded two groups of female native speakers of the target languages as controls. The following durational parameters were obtained: mean accented syllable duration, accented/naccented duration ratio, speaking rate. With respect to F0 parameters, mean, median, minimum, maximum, span in semitones, and standard deviations per IP were measured. Additionally, we calculated the number of accented and unaccented syllables, IPs and pauses in each reading. Statistical analyses show that the two groups differ in their use of F0. Both groups use higher standard deviation and level in their L2, whereas the ‘German group’ use higher pitch span as well. The number of accented syllables, IPs and pauses is also higher in L2. Regarding duration, both groups use slower articulation rate. The accented/unaccented syllable duration ratio is lower in L2 for the ‘English group’. We also provide original data on speaking rate in Bulgarian from an information theoretical perspective.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Ibrahim, Omnia; Yuen, Ivan; Andreeva, Bistra; Möbius, Bernd

The effect of predictability on German stop voicing is phonologically selective Inproceedings

Proc. Speech Prosody 2022, pp. 669-673, Lisbon, 2022.

Cross-linguistic evidence suggests that syllables in predictable contexts have shorter duration than in unpredictable contexts. However, it is not clear if predictability uniformly affects phonetic cues of a phonological feature in a segment. The current study explored the effect of syllable-based predictability on the durational correlates of the phonological stop voicing contrast in German, viz. voice onset time (VOT) and closure duration (CD), using data in Ibrahim et al. [1]. The target stop consonants /b, p, d, k/ occurred in stressed CV syllables in polysyllabic words embedded in a sentence, with either voiced or voiceless preceding contexts. The syllable occurred in either a low or a high predictable condition, which was based on a syllable-level trigram language model. We measured VOT and CD of the target consonants (voiced vs. voiceless). Our results showed an interaction effect of predictability and the voicing status of the target consonants on VOT, but a uniform effect on closure duration. This interaction effect on a primary cue like VOT indicates a selective effect of predictability on VOT, but not on CD. This suggests that the effect of predictability is sensitive to the phonological relevance of a language-specific phonetic cue.

@inproceedings{ibrahim_2022_speechprosody,
title = {The effect of predictability on German stop voicing is phonologically selective},
author = {Omnia Ibrahim and Ivan Yuen and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.isca-speech.org/archive/pdfs/speechprosody_2022/ibrahim22_speechprosody.pdf},
doi = {https://doi.org/10.21437/SpeechProsody.2022-136},
year = {2022},
date = {2022},
booktitle = {Proc. Speech Prosody 2022},
pages = {669-673},
address = {Lisbon},
abstract = {Cross-linguistic evidence suggests that syllables in predictable contexts have shorter duration than in unpredictable contexts. However, it is not clear if predictability uniformly affects phonetic cues of a phonological feature in a segment. The current study explored the effect of syllable-based predictability on the durational correlates of the phonological stop voicing contrast in German, viz. voice onset time (VOT) and closure duration (CD), using data in Ibrahim et al. [1]. The target stop consonants /b, p, d, k/ occurred in stressed CV syllables in polysyllabic words embedded in a sentence, with either voiced or voiceless preceding contexts. The syllable occurred in either a low or a high predictable condition, which was based on a syllable-level trigram language model. We measured VOT and CD of the target consonants (voiced vs. voiceless). Our results showed an interaction effect of predictability and the voicing status of the target consonants on VOT, but a uniform effect on closure duration. This interaction effect on a primary cue like VOT indicates a selective effect of predictability on VOT, but not on CD. This suggests that the effect of predictability is sensitive to the phonological relevance of a language-specific phonetic cue.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Talamo, Luigi; Verkerk, Annemarie

A new methodology for an old problem: A corpus-based typology of adnominal word order in European languages Journal Article

Italian Journal of Linguistics, 34, pp. 171-226, 2022.
Linguistic typology is generally characterized by strong data reduction, stemming from the use of binary or categorical classifications. An example are the categories commonly used in describing word order: adjective-noun vs noun-adjective; genitive-noun vs noun-genitive; etc. Token-based typology is part of an answer towards more fine-grained and appropriate measurement in typology. We discuss an implementation of this methodology and provide a case-study involving adnominal word order in a sample of eleven European languages, using a parallel corpus automatically parsed with models from the Universal Dependencies project. By quantifying adnominal word order variability in terms of Shannon’s entropy, we find that the placement of certain nominal modifiers in relation to their head noun is more variable than reported by typological databases , both within and across language genera. Whereas the low variability of placement of articles, adpositions and relative clauses is generally confirmed by our findings, the adnominal ordering of demonstratives and adjectives is more variable than previously reported.

@article{article,
title = {A new methodology for an old problem: A corpus-based typology of adnominal word order in European languages},
author = {Luigi Talamo and Annemarie Verkerk},
url = {https://www.italian-journal-linguistics.com/app/uploads/2023/01/8-Talamo.pdf},
doi = {https://doi.org/10.26346/1120-2726-197},
year = {2022},
date = {2022},
journal = {Italian Journal of Linguistics},
pages = {171-226},
volume = {34},
abstract = {

Linguistic typology is generally characterized by strong data reduction, stemming from the use of binary or categorical classifications. An example are the categories commonly used in describing word order: adjective-noun vs noun-adjective; genitive-noun vs noun-genitive; etc. Token-based typology is part of an answer towards more fine-grained and appropriate measurement in typology. We discuss an implementation of this methodology and provide a case-study involving adnominal word order in a sample of eleven European languages, using a parallel corpus automatically parsed with models from the Universal Dependencies project. By quantifying adnominal word order variability in terms of Shannon's entropy, we find that the placement of certain nominal modifiers in relation to their head noun is more variable than reported by typological databases , both within and across language genera. Whereas the low variability of placement of articles, adpositions and relative clauses is generally confirmed by our findings, the adnominal ordering of demonstratives and adjectives is more variable than previously reported.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C7

España-Bonet, Cristina; Barrón-Cedeño, Alberto

The (Undesired) Attenuation of Human Biases by Multilinguality Inproceedings

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 2056–2077, Online and Abu Dhabi, UAE, Dec 2022, 2022.
Some human preferences are universal. The odor of vanilla is perceived as pleasant all around the world. We expect neural models trained on human texts to exhibit these kind of preferences, i.e. biases, but we show that this is not always the case. We explore 16 static and contextual embedding models in 9 languages and, when possible, compare them under similar training conditions. We introduce and release CA-WEAT, multilingual cultural aware tests to quantify biases, and compare them to previous English-centric tests. Our experiments confirm that monolingual static embeddings do exhibit human biases, but values differ across languages, being far from universal. Biases are less evident in contextual models, to the point that the original human association might be reversed. Multilinguality proves to be another variable that attenuates and even reverses the effect of the bias, specially in contextual multilingual models. In order to explain this variance among models and languages, we examine the effect of asymmetries in the training corpus, departures from isomorphism in multilingual embedding spaces and discrepancies in the testing measures between languages.

@inproceedings{espana-bonet-barron-cedeno-2022-undesired,
title = {The (Undesired) Attenuation of Human Biases by Multilinguality},
author = {Cristina Espa{\~n}a-Bonet and Alberto Barrón-Cede{\~n}o},
url = {https://aclanthology.org/2022.emnlp-main.133},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
pages = {2056–2077},
publisher = {Association for Computational Linguistics},
address = {Online and Abu Dhabi, UAE, Dec 2022},
abstract = {

Some human preferences are universal. The odor of vanilla is perceived as pleasant all around the world. We expect neural models trained on human texts to exhibit these kind of preferences, i.e. biases, but we show that this is not always the case. We explore 16 static and contextual embedding models in 9 languages and, when possible, compare them under similar training conditions. We introduce and release CA-WEAT, multilingual cultural aware tests to quantify biases, and compare them to previous English-centric tests. Our experiments confirm that monolingual static embeddings do exhibit human biases, but values differ across languages, being far from universal. Biases are less evident in contextual models, to the point that the original human association might be reversed. Multilinguality proves to be another variable that attenuates and even reverses the effect of the bias, specially in contextual multilingual models. In order to explain this variance among models and languages, we examine the effect of asymmetries in the training corpus, departures from isomorphism in multilingual embedding spaces and discrepancies in the testing measures between languages.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Bafna, Niyati; van Genabith, Josef; España-Bonet, Cristina; Zabokrtský, Zdenêk

Combining Noisy Semantic Signals with Orthographic Cues: Cognate Induction for the Indic Dialect Continuum Inproceedings

Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), Association for Computational Linguistics, pp. 110-131, Abu Dhabi, UAE, Dec 2022, 2022.
We present a novel method for unsupervised cognate/borrowing identification from monolingual corpora designed for low and extremely low resource scenarios, based on combining noisy semantic signals from joint bilingual spaces with orthographic cues modelling sound change. We apply our method to the North Indian dialect continuum, containing several dozens of dialects and languages spoken by more than 100 million people. Many of these languages are zero-resource and therefore natural language processing for them is non-existent. We first collect monolingual data for 26 Indic languages, 16 of which were previously zero-resource, and perform exploratory character, lexical and subword cross-lingual alignment experiments for the first time at this scale on this dialect continuum. We create bilingual evaluation lexicons against Hindi for 20 of the languages. We then apply our cognate identification method on the data, and show that our method outperforms both traditional orthography baselines as well as EM-style learnt edit distance matrices. To the best of our knowledge, this is the first work to combine traditional orthographic cues with noisy bilingual embeddings to tackle unsupervised cognate detection in a (truly) low-resource setup, showing that even noisy bilingual embeddings can act as good guides for this task. We release our multilingual dialect corpus, called HinDialect, as well as our scripts for evaluation data collection and cognate induction.

@inproceedings{bafna-etal-2022-combining,
title = {Combining Noisy Semantic Signals with Orthographic Cues: Cognate Induction for the Indic Dialect Continuum},
author = {Niyati Bafna and Josef van Genabith and Cristina Espa{\~n}a-Bonet and Zdenêk Zabokrtský},
url = {https://aclanthology.org/2022.conll-1.9},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)},
pages = {110-131},
publisher = {Association for Computational Linguistics},
address = {Abu Dhabi, UAE, Dec 2022},
abstract = {

We present a novel method for unsupervised cognate/borrowing identification from monolingual corpora designed for low and extremely low resource scenarios, based on combining noisy semantic signals from joint bilingual spaces with orthographic cues modelling sound change. We apply our method to the North Indian dialect continuum, containing several dozens of dialects and languages spoken by more than 100 million people. Many of these languages are zero-resource and therefore natural language processing for them is non-existent. We first collect monolingual data for 26 Indic languages, 16 of which were previously zero-resource, and perform exploratory character, lexical and subword cross-lingual alignment experiments for the first time at this scale on this dialect continuum. We create bilingual evaluation lexicons against Hindi for 20 of the languages. We then apply our cognate identification method on the data, and show that our method outperforms both traditional orthography baselines as well as EM-style learnt edit distance matrices. To the best of our knowledge, this is the first work to combine traditional orthographic cues with noisy bilingual embeddings to tackle unsupervised cognate detection in a (truly) low-resource setup, showing that even noisy bilingual embeddings can act as good guides for this task. We release our multilingual dialect corpus, called HinDialect, as well as our scripts for evaluation data collection and cognate induction.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Amponsah-Kaakyire, Kwabena; Pylypenko, Daria; van Genabith, Josef; España-Bonet, Cristina

Explaining Translationese: why are Neural Classifiers Better and what do they Learn? Inproceedings

Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics, pp. 281-296, Abu Dhabi, United Arab Emirates (Hybrid), Dec 2022, 2022.

Recent work has shown that neural feature- and representation-learning, e.g. BERT, achieves superior performance over traditional manual feature engineering based approaches, with e.g. SVMs, in translationese classification tasks. Previous research did not show (i) whether the difference is because of the features, the classifiers or both, and (ii) what the neural classifiers actually learn. To address (i), we carefully design experiments that swap features between BERT- and SVM-based classifiers. We show that an SVM fed with BERT representations performs at the level of the best BERT classifiers, while BERT learning and using handcrafted features performs at the level of an SVM using handcrafted features. This shows that the performance differences are due to the features. To address (ii) we use integrated gradients and find that (a) there is indication that information captured by hand-crafted features is only a subset of what BERT learns, and (b) part of BERT’s top performance results are due to BERT learning topic differences and spurious correlations with translationese.

@inproceedings{amponsah-kaakyire-etal-2022-explaining,
title = {Explaining Translationese: why are Neural Classifiers Better and what do they Learn?},
author = {Kwabena Amponsah-Kaakyire and Daria Pylypenko and Josef van Genabith and Cristina Espa{\~n}a-Bonet},
url = {https://aclanthology.org/2022.blackboxnlp-1.23},
doi = {https://doi.org/10.48550/ARXIV.2210.13391},
year = {2022},
date = {2022-01-19},
booktitle = {Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP},
pages = {281-296},
publisher = {Association for Computational Linguistics},
address = {Abu Dhabi, United Arab Emirates (Hybrid), Dec 2022},
abstract = {Recent work has shown that neural feature- and representation-learning, e.g. BERT, achieves superior performance over traditional manual feature engineering based approaches, with e.g. SVMs, in translationese classification tasks. Previous research did not show (i) whether the difference is because of the features, the classifiers or both, and (ii) what the neural classifiers actually learn. To address (i), we carefully design experiments that swap features between BERT- and SVM-based classifiers. We show that an SVM fed with BERT representations performs at the level of the best BERT classifiers, while BERT learning and using handcrafted features performs at the level of an SVM using handcrafted features. This shows that the performance differences are due to the features. To address (ii) we use integrated gradients and find that (a) there is indication that information captured by hand-crafted features is only a subset of what BERT learns, and (b) part of BERT's top performance results are due to BERT learning topic differences and spurious correlations with translationese.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Rabs, Elisabeth; Delogu, Francesca; Drenhaus, Heiner; Crocker, Matthew W.

Situational expectancy or association? The influence of event knowledge on the N400 Journal Article

Language, Cognition and Neuroscience, Routledge, pp. 1-19, 2022.

Electrophysiological studies suggest that situational event knowledge plays an important role in language processing, but often fail to distinguish whether observed effects are driven by combinatorial expectations, or simple association with the context. In two ERP experiments, participants read short discourses describing ongoing events. We manipulated the situational expectancy of the target word continuing the event as well as the presence of an associated, but inactive event in the context. In both experiments we find an N400 effect for unexpected compared to expected target words, but this effect is significantly attenuated when the unexpected target is nonetheless associated with non-occurring context events. Our findings demonstrate that the N400 is simultaneously influenced by both simple association with – and combinatorial expectations derived from – situational event knowledge. Thus, experimental investigations and comprehension models of the use of event knowledge must accommodate the role of both expectancy and association in electrophysiological measures.

@article{doi:10.1080/23273798.2021.2022171,
title = {Situational expectancy or association? The influence of event knowledge on the N400},
author = {Elisabeth Rabs and Francesca Delogu and Heiner Drenhaus and Matthew W. Crocker},
url = {https://www.tandfonline.com/doi/full/10.1080/23273798.2021.2022171?src=},
doi = {https://doi.org/10.1080/23273798.2021.2022171},
year = {2022},
date = {2022-01-16},
journal = {Language, Cognition and Neuroscience},
pages = {1-19},
publisher = {Routledge},
abstract = {Electrophysiological studies suggest that situational event knowledge plays an important role in language processing, but often fail to distinguish whether observed effects are driven by combinatorial expectations, or simple association with the context. In two ERP experiments, participants read short discourses describing ongoing events. We manipulated the situational expectancy of the target word continuing the event as well as the presence of an associated, but inactive event in the context. In both experiments we find an N400 effect for unexpected compared to expected target words, but this effect is significantly attenuated when the unexpected target is nonetheless associated with non-occurring context events. Our findings demonstrate that the N400 is simultaneously influenced by both simple association with – and combinatorial expectations derived from – situational event knowledge. Thus, experimental investigations and comprehension models of the use of event knowledge must accommodate the role of both expectancy and association in electrophysiological measures.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Zaitova, Iuliia; Abdullah, Badr M.; Klakow, Dietrich

Mapping Phonology to Semantics: A Computational Model of Cross-Lingual Spoken-Word Recognition Inproceedings

Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (October 2022, Gyeongju, Republic of Korea), Association for Computational Linguistics, pp. 54-63, 2022.

Closely related languages are often mutually intelligible to various degrees. Therefore, speakers of closely related languages are usually capable of (partially) comprehending each other’s speech without explicitly learning the target, second language. The cross-linguistic intelligibility among closely related languages is mainly driven by linguistic factors such as lexical similarities. This paper presents a computational model of spoken-word recognition and investigates its ability to recognize word forms from different languages than its native, training language. Our model is based on a recurrent neural network that learns to map a word’s phonological sequence onto a semantic representation of the word. Furthermore, we present a case study on the related Slavic languages and demonstrate that the cross-lingual performance of our model not only predicts mutual intelligibility to a large extent but also reflects the genetic classification of the languages in our study.

@inproceedings{zaitova-etal-2022-mapping,
title = {Mapping Phonology to Semantics: A Computational Model of Cross-Lingual Spoken-Word Recognition},
author = {Iuliia Zaitova and Badr M. Abdullah and Dietrich Klakow},
url = {https://aclanthology.org/2022.vardial-1.6/},
year = {2022},
date = {2022},
booktitle = {Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (October 2022, Gyeongju, Republic of Korea)},
pages = {54-63},
publisher = {Association for Computational Linguistics},
abstract = {Closely related languages are often mutually intelligible to various degrees. Therefore, speakers of closely related languages are usually capable of (partially) comprehending each other’s speech without explicitly learning the target, second language. The cross-linguistic intelligibility among closely related languages is mainly driven by linguistic factors such as lexical similarities. This paper presents a computational model of spoken-word recognition and investigates its ability to recognize word forms from different languages than its native, training language. Our model is based on a recurrent neural network that learns to map a word’s phonological sequence onto a semantic representation of the word. Furthermore, we present a case study on the related Slavic languages and demonstrate that the cross-lingual performance of our model not only predicts mutual intelligibility to a large extent but also reflects the genetic classification of the languages in our study.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Yung, Frances Pik Yu; Anuranjana, Kaveri; Scholman, Merel; Demberg, Vera

Label distributions help implicit discourse relation classification Inproceedings

Proceedings of the 3rd Workshop on Computational Approaches to Discourse (October 2022, Gyeongju, Republic of Korea and Online), International Conference on Computational Linguistics, pp. 48–53, 2022.

Implicit discourse relations can convey more than one relation sense, but much of the research on discourse relations has focused on single relation senses. Recently, DiscoGeM, a novel multi-domain corpus, which contains 10 crowd-sourced labels per relational instance, has become available. In this paper, we analyse the co-occurrences of relations in DiscoGem and show that they are systematic and characteristic of text genre. We then test whether information on multi-label distributions in the data can help implicit relation classifiers. Our results show that incorporating multiple labels in parser training can improve its performance, and yield label distributions which are more similar to human label distributions, compared to a parser that is trained on just a single most frequent label per instance.

@inproceedings{Yungetal2022,
title = {Label distributions help implicit discourse relation classification},
author = {Frances Pik Yu Yung and Kaveri Anuranjana and Merel Scholman and Vera Demberg},
url = {https://aclanthology.org/2022.codi-1.7},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 3rd Workshop on Computational Approaches to Discourse (October 2022, Gyeongju, Republic of Korea and Online)},
pages = {48–53},
publisher = {International Conference on Computational Linguistics},
abstract = {Implicit discourse relations can convey more than one relation sense, but much of the research on discourse relations has focused on single relation senses. Recently, DiscoGeM, a novel multi-domain corpus, which contains 10 crowd-sourced labels per relational instance, has become available. In this paper, we analyse the co-occurrences of relations in DiscoGem and show that they are systematic and characteristic of text genre. We then test whether information on multi-label distributions in the data can help implicit relation classifiers. Our results show that incorporating multiple labels in parser training can improve its performance, and yield label distributions which are more similar to human label distributions, compared to a parser that is trained on just a single most frequent label per instance.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Häuser, Katja; Kray, Jutta

Uninvited and unwanted: False memories for words predicted but not seen Inproceedings

Culbertson, Jennifer; Perfors, Andrew; Rabagliati, Hugh; Ramenzoni, Veronica;  (Ed.): Proceedings of the 44th Annual Conference of the Cognitive Science Society, Toronto, Canada (27 Jul 2022 - 30 Jul 2022), 44, pp. 2401-2408, 2022.

Semantic extension plays a key role in language change and grammaticalisation. Here we use a dyadic interaction paradigm to study semantic extension of novel labels in controlled circumstances. We ask whether participants will be able to (i) use highly accessible associations in the perceptual environment (colour-shape associations) to converge on a meaning for the novel labels, and (ii) extend these meanings to apply to both concrete targets (objects) and abstract targets (emotions). Further, given the argument that both metonymy and metaphor are important drivers of language change, we investigate whether participants will be able to draw on relations of contiguity (‘metonymic’ associations, e.g. colour-shape or object-colour) and relations of similarity (‘metaphorical’ associations, e.g. emotion-colour) to extend the meaning of labels.

@inproceedings{HaeuserKray2022,
title = {Uninvited and unwanted: False memories for words predicted but not seen},
author = {Katja H{\"a}user and Jutta Kray},
editor = {Jennifer Culbertson and Andrew Perfors and Hugh Rabagliati and Veronica Ramenzoni},
url = {https://escholarship.org/uc/item/7w22b8gm},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 44th Annual Conference of the Cognitive Science Society, Toronto, Canada (27 Jul 2022 - 30 Jul 2022)},
pages = {2401-2408},
abstract = {Semantic extension plays a key role in language change and grammaticalisation. Here we use a dyadic interaction paradigm to study semantic extension of novel labels in controlled circumstances. We ask whether participants will be able to (i) use highly accessible associations in the perceptual environment (colour-shape associations) to converge on a meaning for the novel labels, and (ii) extend these meanings to apply to both concrete targets (objects) and abstract targets (emotions). Further, given the argument that both metonymy and metaphor are important drivers of language change, we investigate whether participants will be able to draw on relations of contiguity (‘metonymic’ associations, e.g. colour-shape or object-colour) and relations of similarity (‘metaphorical’ associations, e.g. emotion-colour) to extend the meaning of labels.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A4 A5

Häuser, Katja; Kray, Jutta; Borovsky, Arielle

Hedging Bets in Linguistic Prediction: Younger and Older Adults Vary in the Breadth of Predictive Processing Journal Article

Collabra: Psychology, 8(1):36945, 2022.
Language processing is predictive in nature, but it is unknown whether language users generate multiple predictions about upcoming content simultaneously or whether spreading activation from one pre-activated word facilitates other words downstream. Simultaneously, developmental accounts of predictive processing simultaneously highlight potential tension among spreading activation vs. multiple activation accounts.We used self-paced reading to investigate if younger and older readers of German generate (multiple) graded predictions about the grammatical gender of nouns. Gradedness in predictions was operationalized as the difference in cloze probability between the most likely and second-most likely continuation that could complete a sentence. Sentences with a greater probabilistic difference were considered as imbalanced and more biased towards one gender. Sentences with lower probabilistic differences were considered to be more balanced towards multiple genders.Both young and older adults engaged in predictive processing. However, only younger adults activated multiple predictions, with slower reading times (RTs) when gender representations were balanced, but facilitation when one gender was more likely than others. In contrast, older adults’ RTs did not pattern with imbalance but merely with predictability, showing that, while able to generate predictions based on context, older adults did not predict multiple gender continuations. Hence, our findings suggest that (younger) language users generate graded predictions about upcoming content, by weighing possible sentence continuations according to their difference in cloze probability. Compared to younger adults, older adults’ predictions are reduced in scope. The results provide novel theoretical insights into the developmental mechanisms involved in predictive processing.

@article{Haeuseretal22,
title = {Hedging Bets in Linguistic Prediction: Younger and Older Adults Vary in the Breadth of Predictive Processing},
author = {Katja H{\"a}user and Jutta Kray and Arielle Borovsky},
url = {https://online.ucpress.edu/collabra/article/8/1/36945/187814/Hedging-Bets-in-Linguistic-Prediction-Younger-and},
doi = {https://doi.org/10.1525/collabra.36945},
year = {2022},
date = {2022},
journal = {Collabra: Psychology},
volume = {8(1):36945},
abstract = {

Language processing is predictive in nature, but it is unknown whether language users generate multiple predictions about upcoming content simultaneously or whether spreading activation from one pre-activated word facilitates other words downstream. Simultaneously, developmental accounts of predictive processing simultaneously highlight potential tension among spreading activation vs. multiple activation accounts.We used self-paced reading to investigate if younger and older readers of German generate (multiple) graded predictions about the grammatical gender of nouns. Gradedness in predictions was operationalized as the difference in cloze probability between the most likely and second-most likely continuation that could complete a sentence. Sentences with a greater probabilistic difference were considered as imbalanced and more biased towards one gender. Sentences with lower probabilistic differences were considered to be more balanced towards multiple genders.Both young and older adults engaged in predictive processing. However, only younger adults activated multiple predictions, with slower reading times (RTs) when gender representations were balanced, but facilitation when one gender was more likely than others. In contrast, older adults’ RTs did not pattern with imbalance but merely with predictability, showing that, while able to generate predictions based on context, older adults did not predict multiple gender continuations. Hence, our findings suggest that (younger) language users generate graded predictions about upcoming content, by weighing possible sentence continuations according to their difference in cloze probability. Compared to younger adults, older adults’ predictions are reduced in scope. The results provide novel theoretical insights into the developmental mechanisms involved in predictive processing.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   A4 A5

Häuser, Katja; Kray, Jutta

How odd: Diverging effects of predictability and plausibility violations on sentence reading and word memory Journal Article

Applied Psycholinguistics, 43(5), pp. 1193-1220, 2022.

How do violations of predictability and plausibility affect online language processing? How does it affect longer-term memory and learning when predictions are disconfirmed by plausible or implausible words? We investigated these questions using a self-paced sentence reading and noun recognition task. Critical sentences violated predictability or plausibility or both, for example, “Since Anne is afraid of spiders, she doesn’t like going down into the … basement (predictable, plausible), garden (unpredictable, somewhat plausible), moon (unpredictable, deeply implausible).” Results from sentence reading showed earlier-emerging effects of predictability violations on the critical noun, but later-emerging effects of plausibility violations after the noun. Recognition memory was exclusively enhanced for deeply implausible nouns. The earlier-emerging predictability effect indicates that having word form predictions disconfirmed is registered very early in the processing stream, irrespective of semantics. The later-emerging plausibility effect supports models that argue for a staged architecture of reading comprehension, where plausibility only affects a post-lexical integration stage. Our memory results suggest that, in order to facilitate memory and learning, a certain magnitude of prediction error is required.

@article{HaeuserKray22,
title = {How odd: Diverging effects of predictability and plausibility violations on sentence reading and word memory},
author = {Katja H{\"a}user and Jutta Kray},
url = {https://www.cambridge.org/core/journals/applied-psycholinguistics/article/how-odd-diverging-effects-of-predictability-and-plausibility-violations-on-sentence-reading-and-word-memory/D8E12864E47CE24E62297ABF5BA2BED0},
doi = {https://doi.org/10.1017/S0142716422000364},
year = {2022},
date = {2022},
journal = {Applied Psycholinguistics},
pages = {1193-1220},
volume = {43(5)},
abstract = {How do violations of predictability and plausibility affect online language processing? How does it affect longer-term memory and learning when predictions are disconfirmed by plausible or implausible words? We investigated these questions using a self-paced sentence reading and noun recognition task. Critical sentences violated predictability or plausibility or both, for example, “Since Anne is afraid of spiders, she doesn’t like going down into the … basement (predictable, plausible), garden (unpredictable, somewhat plausible), moon (unpredictable, deeply implausible).” Results from sentence reading showed earlier-emerging effects of predictability violations on the critical noun, but later-emerging effects of plausibility violations after the noun. Recognition memory was exclusively enhanced for deeply implausible nouns. The earlier-emerging predictability effect indicates that having word form predictions disconfirmed is registered very early in the processing stream, irrespective of semantics. The later-emerging plausibility effect supports models that argue for a staged architecture of reading comprehension, where plausibility only affects a post-lexical integration stage. Our memory results suggest that, in order to facilitate memory and learning, a certain magnitude of prediction error is required.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   A4 A5

van Os, Marjolein; Kray, Jutta; Demberg, Vera

Rational speech comprehension: Interaction between predictability, acoustic signal, and noise Journal Article

Frontiers in Psychology (Sec. Language Sciences), 13:914239, 2022.

During speech comprehension, multiple sources of information are available to listeners, which are combined to guide the recognition process. Models of speech comprehension posit that when the acoustic speech signal is obscured, listeners rely more on information from other sources. However, these models take into account only word frequency information and local contexts (surrounding syllables), but not sentence-level information. To date, empirical studies investigating predictability effects in noise did not carefully control the tested speech sounds, while the literature investigating the effect of background noise on the recognition of speech sounds does not manipulate sentence predictability. Additionally, studies on the effect of background noise show conflicting results regarding which noise type affects speech comprehension most. We address this in the present experiment. We investigate how listeners combine information from different sources when listening to sentences embedded in background noise. We manipulate top-down predictability, type of noise, and characteristics of the acoustic signal, thus creating conditions which differ in the extent to which a specific speech sound is masked in a way that is grounded in prior work on the confusability of speech sounds in noise. Participants complete an online word recognition experiment. The results show that participants rely more on the provided sentence context when the acoustic signal is harder to process. This is the case even when interactions of the background noise and speech sounds lead to small differences in intelligibility. Listeners probabilistically combine top-down predictions based on context with noisy bottom-up information from the acoustic signal, leading to a trade-off between the different types of information that is dependent on the combination of a specific type of background noise and speech sound.

@article{VanOsetal22,
title = {Rational speech comprehension: Interaction between predictability, acoustic signal, and noise},
author = {Marjolein van Os and Jutta Kray and Vera Demberg},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2022.914239/full},
doi = {https://doi.org/10.3389/fpsyg.2022.914239},
year = {2022},
date = {2022},
journal = {Frontiers in Psychology (Sec. Language Sciences)},
volume = {13:914239},
abstract = {

During speech comprehension, multiple sources of information are available to listeners, which are combined to guide the recognition process. Models of speech comprehension posit that when the acoustic speech signal is obscured, listeners rely more on information from other sources. However, these models take into account only word frequency information and local contexts (surrounding syllables), but not sentence-level information. To date, empirical studies investigating predictability effects in noise did not carefully control the tested speech sounds, while the literature investigating the effect of background noise on the recognition of speech sounds does not manipulate sentence predictability. Additionally, studies on the effect of background noise show conflicting results regarding which noise type affects speech comprehension most. We address this in the present experiment. We investigate how listeners combine information from different sources when listening to sentences embedded in background noise. We manipulate top-down predictability, type of noise, and characteristics of the acoustic signal, thus creating conditions which differ in the extent to which a specific speech sound is masked in a way that is grounded in prior work on the confusability of speech sounds in noise. Participants complete an online word recognition experiment. The results show that participants rely more on the provided sentence context when the acoustic signal is harder to process. This is the case even when interactions of the background noise and speech sounds lead to small differences in intelligibility. Listeners probabilistically combine top-down predictions based on context with noisy bottom-up information from the acoustic signal, leading to a trade-off between the different types of information that is dependent on the combination of a specific type of background noise and speech sound.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A4

Menzel, Katrin

Medical discourse in Late Modern English: Insights from the Royal Society Corpus. Book Chapter

Hiltunen, Turo; Taavitsainen, Irma;  (Ed.): Corpus pragmatic studies on the history of medical discourse (Pragmatics & Beyond New Series; Vol. 330), John Benjamins, pp. 79-104, Amsterdam, 2022.

This chapter demonstrates how the Royal Society Corpus, a richly annotated corpus of around 48,000 English scientific journal articles covering more than 330 years, can be used for lexico-grammatical and pragmatic studies that contribute to a broader understanding of the development of medical research articles. The Late Modern English period together with several decades before and after this time frame was a productive period in the medical output of the Royal Society. This chapter addresses typical linguistic features of scientific journal articles from medical and related sciences from this period demonstrating their special status in the context of other traditional and emerging disciplines in the corpus data. Additionally, language usage and text-type conventions of historical medical research articles will be compared to the features of corpus texts on medical topics from Present-day English.

@inbook{MedicalDiscourse22,
title = {Medical discourse in Late Modern English: Insights from the Royal Society Corpus.},
author = {Katrin Menzel},
editor = {Turo Hiltunen and Irma Taavitsainen},
url = {https://benjamins.com/catalog/pbns.330},
year = {2022},
date = {2022},
booktitle = {Corpus pragmatic studies on the history of medical discourse (Pragmatics & Beyond New Series; Vol. 330)},
pages = {79-104},
publisher = {John Benjamins},
address = {Amsterdam},
abstract = {This chapter demonstrates how the Royal Society Corpus, a richly annotated corpus of around 48,000 English scientific journal articles covering more than 330 years, can be used for lexico-grammatical and pragmatic studies that contribute to a broader understanding of the development of medical research articles. The Late Modern English period together with several decades before and after this time frame was a productive period in the medical output of the Royal Society. This chapter addresses typical linguistic features of scientific journal articles from medical and related sciences from this period demonstrating their special status in the context of other traditional and emerging disciplines in the corpus data. Additionally, language usage and text-type conventions of historical medical research articles will be compared to the features of corpus texts on medical topics from Present-day English.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B1

Höller, Daniel; Wichlacz, Julia; Bercher, Pascal; Behnke, Gregor

Compiling HTN Plan Verification Problems into HTN Planning Problems Inproceedings

Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling (ICAPS2022), 32, pp. 145-150, 2022.

Plan Verification is the task of deciding whether a sequence of actions is a solution for a given planning problem. In HTN planning, the task is computationally expensive and may be up to NP-hard. However, there are situations where it needs to be solved, e.g. when a solution is post-processed, in systems using approximation, or just to validate whether a planning system works correctly (e.g. for debugging or in a competition). There are verification systems based on translations to propositional logic and on techniques from parsing. Here we present a third approach and translate HTN plan verification problems into HTN planning problems. These can be solved using any HTN planning system. We collected a new bench-mark set based on models and results of the 2020 International Planning Competition. Our evaluation shows that our compilation outperforms the approaches from the literature.

@inproceedings{Höller_Wichlacz_Bercher_Behnke_2022,
title = {Compiling HTN Plan Verification Problems into HTN Planning Problems},
author = {Daniel H{\"o}ller and Julia Wichlacz and Pascal Bercher and Gregor Behnke},
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/19795/19554},
doi = {https://doi.org/10.1609/icaps.v32i1.19795},
year = {2022},
date = {2022},
booktitle = {Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling (ICAPS2022)},
pages = {145-150},
abstract = {Plan Verification is the task of deciding whether a sequence of actions is a solution for a given planning problem. In HTN planning, the task is computationally expensive and may be up to NP-hard. However, there are situations where it needs to be solved, e.g. when a solution is post-processed, in systems using approximation, or just to validate whether a planning system works correctly (e.g. for debugging or in a competition). There are verification systems based on translations to propositional logic and on techniques from parsing. Here we present a third approach and translate HTN plan verification problems into HTN planning problems. These can be solved using any HTN planning system. We collected a new bench-mark set based on models and results of the 2020 International Planning Competition. Our evaluation shows that our compilation outperforms the approaches from the literature.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Wichlacz, Julia; Höller, Daniel; Hoffmann, Jörg

Landmark Heuristics for Lifted Classical Planning Inproceedings

De Raedt, Lud;  (Ed.): Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, 23-29 July 2022, International Joint Conferences on Artificial Intelligence Organization, pp. 4665-4671, 2022.

While state-of-the-art planning systems need a grounded (propositional) task representation, the input model is provided “lifted”, specifying predicates and action schemas with variables over a finite object universe. The size of the grounded model is exponential in predicate/action-schema arity, limiting applicability to cases where it is small enough. Recent work has taken up this challenge, devising an effective lifted forward search planner as basis for lifted heuristic search, as well as a variety of lifted heuristic functions based on the delete relaxation. Here we add a novel family of lifted heuristic functions, based on landmarks. We design two methods for landmark extraction in the lifted setting. The resulting heuristics exhibit performance advantages over previous heuristics in several benchmark domains. Especially the combination with lifted delete relaxation heuristics to a LAMA-style planner yields good results, beating the previous state of the art in lifted planning.

@inproceedings{ijcai2022p647,
title = {Landmark Heuristics for Lifted Classical Planning},
author = {Julia Wichlacz and Daniel H{\"o}ller and J{\"o}rg Hoffmann},
editor = {Lud De Raedt},
url = {https://doi.org/10.24963/ijcai.2022/647},
doi = {https://doi.org/10.24963/ijcai.2022/647},
year = {2022},
date = {2022},
booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, 23-29 July 2022},
pages = {4665-4671},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
abstract = {While state-of-the-art planning systems need a grounded (propositional) task representation, the input model is provided “lifted”, specifying predicates and action schemas with variables over a finite object universe. The size of the grounded model is exponential in predicate/action-schema arity, limiting applicability to cases where it is small enough. Recent work has taken up this challenge, devising an effective lifted forward search planner as basis for lifted heuristic search, as well as a variety of lifted heuristic functions based on the delete relaxation. Here we add a novel family of lifted heuristic functions, based on landmarks. We design two methods for landmark extraction in the lifted setting. The resulting heuristics exhibit performance advantages over previous heuristics in several benchmark domains. Especially the combination with lifted delete relaxation heuristics to a LAMA-style planner yields good results, beating the previous state of the art in lifted planning.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Successfully