Publications

Stenger, Irina; Georgis, Philip; Avgustinova, Tania; Möbius, Bernd; Klakow, Dietrich

Modeling the Impact of Syntactic Distance and Surprisal on Cross-Slavic Text Comprehension Inproceedings

Proceedings of the Language Resources and Evaluation Conference, European Language Resources Association, pp. 7368-7376, Marseille, France, 2022.

We focus on the syntactic variation and measure syntactic distances between nine Slavic languages (Belarusian, Bulgarian, Croatian, Czech, Polish, Slovak, Slovene, Russian, and Ukrainian) using symmetric measures of insertion, deletion and movement of syntactic units in the parallel sentences of the fable „The North Wind and the Sun“. Additionally, we investigate phonetic and orthographic asymmetries between selected languages by means of the information theoretical notion of surprisal. Syntactic distance and surprisal are, thus, considered as potential predictors of mutual intelligibility between related languages. In spoken and written cloze test experiments for Slavic native speakers, the presented predictors will be validated as to whether variations in syntax lead to a slower or impeded intercomprehension of Slavic texts.

@inproceedings{stenger-EtAl:2022:LREC,
title = {Modeling the Impact of Syntactic Distance and Surprisal on Cross-Slavic Text Comprehension},
author = {Irina Stenger and Philip Georgis and Tania Avgustinova and Bernd M{\"o}bius and Dietrich Klakow},
url = {https://aclanthology.org/2022.lrec-1.802},
year = {2022},
date = {2022-06-21},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
pages = {7368-7376},
publisher = {European Language Resources Association},
address = {Marseille, France},
abstract = {We focus on the syntactic variation and measure syntactic distances between nine Slavic languages (Belarusian, Bulgarian, Croatian, Czech, Polish, Slovak, Slovene, Russian, and Ukrainian) using symmetric measures of insertion, deletion and movement of syntactic units in the parallel sentences of the fable "The North Wind and the Sun". Additionally, we investigate phonetic and orthographic asymmetries between selected languages by means of the information theoretical notion of surprisal. Syntactic distance and surprisal are, thus, considered as potential predictors of mutual intelligibility between related languages. In spoken and written cloze test experiments for Slavic native speakers, the presented predictors will be validated as to whether variations in syntax lead to a slower or impeded intercomprehension of Slavic texts.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Ortmann, Katrin

Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans Inproceedings

Proceedings of the Language Resources and Evaluation Conference (LREC), European Language Resources Association, pp. 1400-1407, Marseille, France, 2022.

The traditional evaluation of labeled spans with precision, recall, and F1-score has undesirable effects due to double penalties. Annotations with incorrect label or boundaries count as two errors instead of one, despite being closer to the target annotation than false positives or false negatives. In this paper, new error types are introduced, which more accurately reflect true annotation quality and ensure that every annotation counts only once. An algorithm for error identification in flat and multi-level annotations is presented and complemented with a proposal on how to calculate meaningful precision, recall, and F1-scores based on the more fine-grained error types. The exemplary application to three different annotation tasks (NER, chunking, parsing) shows that the suggested procedure not only prevents double penalties but also allows for a more detailed error analysis, thereby providing more insight into the actual weaknesses of a system.

@inproceedings{ortmann2022,
title = {Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans},
author = {Katrin Ortmann},
url = {https://aclanthology.org/2022.lrec-1.150},
year = {2022},
date = {2022-06-21},
booktitle = {Proceedings of the Language Resources and Evaluation Conference (LREC)},
pages = {1400-1407},
publisher = {European Language Resources Association},
address = {Marseille, France},
abstract = {The traditional evaluation of labeled spans with precision, recall, and F1-score has undesirable effects due to double penalties. Annotations with incorrect label or boundaries count as two errors instead of one, despite being closer to the target annotation than false positives or false negatives. In this paper, new error types are introduced, which more accurately reflect true annotation quality and ensure that every annotation counts only once. An algorithm for error identification in flat and multi-level annotations is presented and complemented with a proposal on how to calculate meaningful precision, recall, and F1-scores based on the more fine-grained error types. The exemplary application to three different annotation tasks (NER, chunking, parsing) shows that the suggested procedure not only prevents double penalties but also allows for a more detailed error analysis, thereby providing more insight into the actual weaknesses of a system.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C6

Menzel, Katrin; Krielke, Marie-Pauline; Degaetano-Ortlieb, Stefania

Synthetic and analytic adjective negation in English scientific journal articles: A diachronic perspective Journal Article

LEGE ARTIS: Language yesterday, today, tomorrow, VII, Trnava: University of SS Cyril and Methodius in Trnava, pp. 157-213, 2022, ISSN 2453-8035 .

This paper addresses the development of synthetic and analytic adjective negation in a corpus of English scientific articles from the mid-17th century towards the end of the 20th century. Analytic patterns of adjective negation are found to become less frequent in the language of scientific articles, but more conventionalised in their textual contexts. Conversely, prefixed negated adjectives are identified as more frequent and more diverse with regard to their contexts.

@article{menzel_2022_diachronicperspective,
title = {Synthetic and analytic adjective negation in English scientific journal articles: A diachronic perspective},
author = {Katrin Menzel and Marie-Pauline Krielke and Stefania Degaetano-Ortlieb},
url = {https://www.researchgate.net/publication/361099180_Synthetic_and_analytic_adjective_negation_in_English_scientific_journal_articles_A_diachronic_perspective},
year = {2022},
date = {2022},
journal = {LEGE ARTIS: Language yesterday, today, tomorrow},
pages = {157-213},
publisher = {Trnava: University of SS Cyril and Methodius in Trnava},
volume = {VII},
number = {1},
abstract = {This paper addresses the development of synthetic and analytic adjective negation in a corpus of English scientific articles from the mid-17th century towards the end of the 20th century. Analytic patterns of adjective negation are found to become less frequent in the language of scientific articles, but more conventionalised in their textual contexts. Conversely, prefixed negated adjectives are identified as more frequent and more diverse with regard to their contexts.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Scholman, Merel; Blything, Liam; Cain, Kate; Evers-Vermeul, Jacqueline

Discourse Rules:The Effects of Clause Order Principles on the Reading Process Journal Article

Language, Cognition and Neuroscience, 37(10), pp. 1277-1291, 2022, ISSN 2327-3798 .

In an eye-tracking-while-reading study, we investigated adult monolinguals’ (N=80) processing of two-clause sentences embedded in short narratives. Three principles theorized to guide comprehension of complex sentences were contrasted: one operating at the clause level, namely clause structure (main clause – subordinate clause or vice versa), and two operating at the discourse-level, namely givenness (given-new vs. new-given) and event order (chronological vs. reverse order). The results indicate that clause structure mainly affects early stages of processing, whereas the two principles operating at the discourse level are more important during later stages and for reading times of the entire sentence. Event order was found to operate relatively independently of the other principles. Givenness was found to overrule clause structure, a phenomenon that can be related to the grounding function of preposed subordinate clauses. We propose a new principle to reflect this interaction effect: the grounding principle.

@article{Merel_Rules_2022,
title = {Discourse Rules:The Effects of Clause Order Principles on the Reading Process},
author = {Merel Scholman and Liam Blything and Kate Cain and Jacqueline Evers-Vermeul},
url = {https://www.tandfonline.com/doi/full/10.1080/23273798.2022.2077971},
doi = {https://doi.org/10.1080/23273798.2022.2077971},
year = {2022},
date = {2022},
journal = {Language, Cognition and Neuroscience},
pages = {1277-1291},
volume = {37(10)},
abstract = {In an eye-tracking-while-reading study, we investigated adult monolinguals’ (N=80) processing of two-clause sentences embedded in short narratives. Three principles theorized to guide comprehension of complex sentences were contrasted: one operating at the clause level, namely clause structure (main clause - subordinate clause or vice versa), and two operating at the discourse-level, namely givenness (given-new vs. new-given) and event order (chronological vs. reverse order). The results indicate that clause structure mainly affects early stages of processing, whereas the two principles operating at the discourse level are more important during later stages and for reading times of the entire sentence. Event order was found to operate relatively independently of the other principles. Givenness was found to overrule clause structure, a phenomenon that can be related to the grounding function of preposed subordinate clauses. We propose a new principle to reflect this interaction effect: the grounding principle.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Kravtchenko, Ekaterina; Demberg, Vera

Informationally redundant utterances elicit pragmatic inferences Journal Article

Cognition, 225, pp. 105159, 2022, ISSN 0010-0277.

Most theories of pragmatics and language processing predict that speakers avoid excessive informational redundancy. Informationally redundant utterances are, however, quite common in natural dialogue. From a comprehension standpoint, it remains unclear how comprehenders interpret these utterances, and whether they make attempts to reconcile the ‚dips‘ in informational utility with expectations of ‚appropriate‘ or ‚rational‘ speaker informativity. We show that informationally redundant (overinformative) utterances can trigger pragmatic inferences that increase utterance utility in line with comprehender expectations. In a series of three studies, we look at utterances which refer to stereotyped event sequences describing common activities (scripts). When comprehenders encounter utterances describing events that can be easily inferred from prior context, they interpret them as signifying that the event conveys new, unstated information (i.e. an event otherwise assumed to be habitual, such as paying the cashier when shopping, is reinterpreted as non-habitual). We call these inferences atypicality inferences. Further, we show that the degree to which these atypicality inferences are triggered depends on the framing of the utterance. In the absence of an exclamation mark or a discourse marker indicating the speaker’s specific intent to communicate the given information, such inferences are far less likely to arise. Overall, the results demonstrate that excessive conceptual redundancy leads to comprehenders revising the conversational common ground, in an effort to accommodate unexpected dips in informational utility.

@article{Kravtchenko_redundant_2022,
title = {Informationally redundant utterances elicit pragmatic inferences},
author = {Ekaterina Kravtchenko and Vera Demberg},
url = {https://www.sciencedirect.com/science/article/pii/S0010027722001470},
doi = {https://doi.org/ 10.1016/j.cognition.2022.105159},
year = {2022},
date = {2022},
journal = {Cognition},
pages = {105159},
volume = {225},
abstract = {Most theories of pragmatics and language processing predict that speakers avoid excessive informational redundancy. Informationally redundant utterances are, however, quite common in natural dialogue. From a comprehension standpoint, it remains unclear how comprehenders interpret these utterances, and whether they make attempts to reconcile the 'dips' in informational utility with expectations of 'appropriate' or 'rational' speaker informativity. We show that informationally redundant (overinformative) utterances can trigger pragmatic inferences that increase utterance utility in line with comprehender expectations. In a series of three studies, we look at utterances which refer to stereotyped event sequences describing common activities (scripts). When comprehenders encounter utterances describing events that can be easily inferred from prior context, they interpret them as signifying that the event conveys new, unstated information (i.e. an event otherwise assumed to be habitual, such as paying the cashier when shopping, is reinterpreted as non-habitual). We call these inferences atypicality inferences. Further, we show that the degree to which these atypicality inferences are triggered depends on the framing of the utterance. In the absence of an exclamation mark or a discourse marker indicating the speaker's specific intent to communicate the given information, such inferences are far less likely to arise. Overall, the results demonstrate that excessive conceptual redundancy leads to comprehenders revising the conversational common ground, in an effort to accommodate unexpected dips in informational utility.},
keywords = {Accommodation; Context-dependent implicatures; Experimental pragmatics; Psycholinguistics; Redundancy},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A3

Sommerfeld, Linda; Staudte, Maria; Kray, Jutta

Ratings of name agreement and semantic categorization of 247 colored clipart pictures by young German children Journal Article

Acta Psychologica, 226, pp. 103558, 2022, ISSN 0001-6918.

Developmental and longitudinal studies with children increasingly use pictorial stimuli in cognitive, psychologic, and psycholinguistic research. To enhance validity and comparability within and across those studies, the use of normed pictures is recommended. Besides, creating picture sets and evaluating them in rating studies is very time consuming, in particular regarding samples of young children in which testing time is rather limited. As there is an increasing number of studies that investigate young German children’s semantic language processing with colored clipart stimuli, this work provides a first set of 247 colored cliparts with ratings of German native speaking children aged 4 to 6 years. We assessed two central rating aspects of pictures: Name agreement (Do pictures elicit the intended name of an object?) and semantic categorization (Are objects classified as members of the intended semantic category?). Our ratings indicate that children are proficient in naming and even better in semantic categorization of objects, whereas both seems to improve with increasing age of young childhood. Finally, this paper discusses some features of pictorial objects that might be important for children’s name agreement and semantic categorization and could be considered in future picture rating studies.

 

@article{Sommerfeld_of_2022,
title = {Ratings of name agreement and semantic categorization of 247 colored clipart pictures by young German children},
author = {Linda Sommerfeld and Maria Staudte and Jutta Kray},
url = {https://www.sciencedirect.com/science/article/pii/S0001691822000737},
doi = {https://doi.org/https://doi.org/10.1016/j.actpsy.2022.103558},
year = {2022},
date = {2022},
journal = {Acta Psychologica},
pages = {103558},
volume = {226},
abstract = {Developmental and longitudinal studies with children increasingly use pictorial stimuli in cognitive, psychologic, and psycholinguistic research. To enhance validity and comparability within and across those studies, the use of normed pictures is recommended. Besides, creating picture sets and evaluating them in rating studies is very time consuming, in particular regarding samples of young children in which testing time is rather limited. As there is an increasing number of studies that investigate young German children's semantic language processing with colored clipart stimuli, this work provides a first set of 247 colored cliparts with ratings of German native speaking children aged 4 to 6 years. We assessed two central rating aspects of pictures: Name agreement (Do pictures elicit the intended name of an object?) and semantic categorization (Are objects classified as members of the intended semantic category?). Our ratings indicate that children are proficient in naming and even better in semantic categorization of objects, whereas both seems to improve with increasing age of young childhood. Finally, this paper discusses some features of pictorial objects that might be important for children's name agreement and semantic categorization and could be considered in future picture rating studies.},
keywords = {Name agreement, Semantic categorization, Picture naming, Picture ratings, Children, Age differences},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A5

Höltje, Gerrit; Mecklinger, Axel

Benefits and costs of predictive processing: How sentential constraint and word expectedness affect memory formation Journal Article

Brain Research, pp. 147942, 2022, ISSN 0006-8993.

This study investigated how the strength of schema support provided by strongly (SC) and weakly constraining (WC) sentences affects the encoding of expected and unexpected words, and how this is reflected in event-related potentials (ERPs). In a surprise recognition memory test, words studied on the previous day were presented together with new words and lures that were expected but not presented in the study phase. ERPs recorded in the study phase were compared for subsequently remembered and forgotten words. Better memory performance for expected over unexpected words was electrophysiologically supported by a parietal subsequent memory effect (SME) reflecting enhanced item-specific encoding of contextually expected words. SC sentences not only facilitated the semantic integration of sentence-ending words, as reflected in reduced N400 amplitudes, but also enabled the rapid successful encoding of these words into memory, which is evidenced by an SC > WC pattern in memory performance and correlations between pre- and post-stimulus SMEs for SC sentences. In contrast, words processed in WC sentence contexts necessitated sustained elaborative encoding processes as reflected in a late frontal slow wave SME. Expected but not presented words were associated with high rates of false positive memory decisions, indicating that these words remained in a state of high accessibility in memory even one day after the study phase. These mnemonic costs of predictive processing were more pronounced for expected words from SC sentences than from WC sentences and could reflect the lingering of strong semantic predictions which were associated with the pre-updating of sentence representations.

@article{Höltje_and_2022,
title = {Benefits and costs of predictive processing: How sentential constraint and word expectedness affect memory formation},
author = {Gerrit H{\"o}ltje and Axel Mecklinger},
url = {https://www.sciencedirect.com/science/article/abs/pii/S0006899322001664},
doi = {https://doi.org/10.1016/j.brainres.2022.147942},
year = {2022},
date = {2022},
journal = {Brain Research},
pages = {147942},
number = {1788},
abstract = {This study investigated how the strength of schema support provided by strongly (SC) and weakly constraining (WC) sentences affects the encoding of expected and unexpected words, and how this is reflected in event-related potentials (ERPs). In a surprise recognition memory test, words studied on the previous day were presented together with new words and lures that were expected but not presented in the study phase. ERPs recorded in the study phase were compared for subsequently remembered and forgotten words. Better memory performance for expected over unexpected words was electrophysiologically supported by a parietal subsequent memory effect (SME) reflecting enhanced item-specific encoding of contextually expected words. SC sentences not only facilitated the semantic integration of sentence-ending words, as reflected in reduced N400 amplitudes, but also enabled the rapid successful encoding of these words into memory, which is evidenced by an SC > WC pattern in memory performance and correlations between pre- and post-stimulus SMEs for SC sentences. In contrast, words processed in WC sentence contexts necessitated sustained elaborative encoding processes as reflected in a late frontal slow wave SME. Expected but not presented words were associated with high rates of false positive memory decisions, indicating that these words remained in a state of high accessibility in memory even one day after the study phase. These mnemonic costs of predictive processing were more pronounced for expected words from SC sentences than from WC sentences and could reflect the lingering of strong semantic predictions which were associated with the pre-updating of sentence representations.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A6

Zouhar, Vilém; Mosbach, Marius; Zhang, Miaoran; Klakow, Dietrich

Knowledge Base Index Compression via Dimensionality and Precision Reduction Inproceedings

Spa-NLP workshop at ACL 2022, 22nd-27th May 2022 Dublin, Ireland, 2022.

Recently neural network based approaches to knowledge-intensive NLP tasks, such as question answering, started to rely heavily on the combination of neural retrievers and readers. Retrieval is typically performed over a large textual knowledge base (KB) which requires significant memory and compute resources, especially when scaled up. On HotpotQA we systematically investigate reducing the size of the KB index by means of dimensionality (sparse random projections, PCA, autoencoders) and numerical precision reduction.
Our results show that PCA is an easy solution that requires very little data and is only slightly worse than autoencoders, which are less stable. All methods are sensitive to pre- and post-processing and data should always be centered and normalized both before and after dimension reduction. Finally, we show that it is possible to combine PCA with using 1bit per dimension. Overall we achieve (1) 100× compression with 75%, and (2) 24× compression with 92% original retrieval performance.

@inproceedings{Zouhar_2022_Base,
title = {Knowledge Base Index Compression via Dimensionality and Precision Reduction},
author = {Vil{\'e}m Zouhar and Marius Mosbach and Miaoran Zhang and Dietrich Klakow},
url = {https://arxiv.org/abs/2204.02906},
year = {2022},
date = {2022},
publisher = {Spa-NLP workshop at ACL 2022},
address = {22nd-27th May 2022 Dublin, Ireland},
abstract = {Recently neural network based approaches to knowledge-intensive NLP tasks, such as question answering, started to rely heavily on the combination of neural retrievers and readers. Retrieval is typically performed over a large textual knowledge base (KB) which requires significant memory and compute resources, especially when scaled up. On HotpotQA we systematically investigate reducing the size of the KB index by means of dimensionality (sparse random projections, PCA, autoencoders) and numerical precision reduction. Our results show that PCA is an easy solution that requires very little data and is only slightly worse than autoencoders, which are less stable. All methods are sensitive to pre- and post-processing and data should always be centered and normalized both before and after dimension reduction. Finally, we show that it is possible to combine PCA with using 1bit per dimension. Overall we achieve (1) 100× compression with 75%, and (2) 24× compression with 92% original retrieval performance.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Dutta Chowdhury, Koel; Jalota, Rricha; van Genabith, Josef; España-Bonet, Cristina

Towards Debiasing Translation Artifacts Inproceedings

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 3983-3991, Seattle, United States, July 2022, 2022.

Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets. However, compared to original texts in the same language, translations possess distinct qualities referred to as translationese. Previous research has shown that these translation artifacts influence the performance of a variety of cross-lingual tasks. In this work, we propose a novel approach to reducing translationese by extending an established bias-removal technique. We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level. We evaluate the utility of debiasing translationese on a natural language inference (NLI) task, and show that by reducing this bias, NLI accuracy improves. To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.

@inproceedings{Chowdhury_2022_Debiasing,
title = {Towards Debiasing Translation Artifacts},
author = {Koel Dutta Chowdhury and Rricha Jalota and Josef van Genabith and Cristina Espa{\~n}a-Bonet},
url = {https://aclanthology.org/2022.naacl-main.292/},
doi = {https://doi.org/10.18653/v1/2022.naacl-main.292},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages = {3983-3991},
publisher = {Association for Computational Linguistics},
address = {Seattle, United States, July 2022},
abstract = {Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets. However, compared to original texts in the same language, translations possess distinct qualities referred to as translationese. Previous research has shown that these translation artifacts influence the performance of a variety of cross-lingual tasks. In this work, we propose a novel approach to reducing translationese by extending an established bias-removal technique. We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level. We evaluate the utility of debiasing translationese on a natural language inference (NLI) task, and show that by reducing this bias, NLI accuracy improves. To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Mayn, Alexandra; Demberg, Vera

Pragmatics of Metaphor Revisited: Modeling the Role of Degree and Salience in Metaphor Understanding Inproceedings

Proceedings of the Annual Meeting of the Cognitive Science Society, 43(43), CogSci2022, pp. 3178ff., 2022.

Experimental pragmatics tells us that a metaphor conveys salient features of a vehicle and that highly typical featurestend to be salient. But can highly atypical features also be salient? When asking if John is loyal and hearing “John is afox”, will the hearer conclude that John is disloyal because loyalty is saliently atypical for a fox? This prediction followsfrom our RSA-based model of metaphor understanding which relies on gradient salience. Our behavioral experimentscorroborate the model’s predictions, providing evidence that high and low typicality are salient and result in high in-terpretation confidence and agreement, while average typicality is not salient and makes a metaphor confusing. Ourmodel implements the idea that other features of a vehicle, along with possible alternative vehicles, influence metaphorinterpretation. It produces a significantly better fit compared to an existing RSA model of metaphor understanding,supporting our predictions about the factors at play.

@inproceedings{Mayn_2022_of,
title = {Pragmatics of Metaphor Revisited: Modeling the Role of Degree and Salience in Metaphor Understanding},
author = {Alexandra Mayn and Vera Demberg},
url = {https://escholarship.org/uc/item/7kq207zs},
year = {2022},
date = {2022},
booktitle = {Proceedings of the Annual Meeting of the Cognitive Science Society, 43(43)},
pages = {3178ff.},
publisher = {CogSci2022},
abstract = {Experimental pragmatics tells us that a metaphor conveys salient features of a vehicle and that highly typical featurestend to be salient. But can highly atypical features also be salient? When asking if John is loyal and hearing “John is afox”, will the hearer conclude that John is disloyal because loyalty is saliently atypical for a fox? This prediction followsfrom our RSA-based model of metaphor understanding which relies on gradient salience. Our behavioral experimentscorroborate the model’s predictions, providing evidence that high and low typicality are salient and result in high in-terpretation confidence and agreement, while average typicality is not salient and makes a metaphor confusing. Ourmodel implements the idea that other features of a vehicle, along with possible alternative vehicles, influence metaphorinterpretation. It produces a significantly better fit compared to an existing RSA model of metaphor understanding,supporting our predictions about the factors at play.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Kravtchenko, Ekaterina; Demberg, Vera

Modeling atypicality inferences in pragmatic reasoning Journal Article

Proceedings of the Annual Meeting of the Cognitive Science Society, 44, CogSci 2022, pp. 1918-1924, Toronto, Canada, 2022.

Empirical studies have demonstrated that when comprehenders are faced with informationally redundant utterances, they may make pragmatic inferences (Kravtchenko & Demberg, 2015). Previous work has also shown that the strength of these inferences depends on prominence of the redundant utterance – if it is stressed prosodically, marked with an exclamation mark, or introduced with a discourse marker such as “Oh yeah”, atypicality inferences are stronger (Kravtchenko & Demberg, 2015, 2022; Ryzhova & Demberg, 2020). The goal of the present paper is to demonstrate how both the atypicality inference and the effect of prominence can be modelled using the rational speech act (RSA) framework. We show that atypicality inferences can be captured by introducing joint reasoning about the habituality of events, following Degen, Tessler, and Goodman (2015); Goodman and Frank (2016). However, we find that joint reasoning models principally cannot account for the effect of differences in utterance prominence. This is because prominence markers do not contribute to the truth-conditional meaning. We then proceed to demonstrate that leveraging a noisy channel model, which has previously been used to model low-level acoustic perception (Bergen & Goodman, 2015), can successfully account for the empirically observed patterns of utterance prominence.

@article{Kravtchenko_2022_atypicality,
title = {Modeling atypicality inferences in pragmatic reasoning},
author = {Ekaterina Kravtchenko and Vera Demberg},
url = {https://escholarship.org/uc/item/7630p08b},
year = {2022},
date = {2022},
journal = {Proceedings of the Annual Meeting of the Cognitive Science Society},
pages = {1918-1924},
publisher = {CogSci 2022},
address = {Toronto, Canada},
volume = {44},
number = {44},
abstract = {Empirical studies have demonstrated that when comprehenders are faced with informationally redundant utterances, they may make pragmatic inferences (Kravtchenko & Demberg, 2015). Previous work has also shown that the strength of these inferences depends on prominence of the redundant utterance – if it is stressed prosodically, marked with an exclamation mark, or introduced with a discourse marker such as “Oh yeah”, atypicality inferences are stronger (Kravtchenko & Demberg, 2015, 2022; Ryzhova & Demberg, 2020). The goal of the present paper is to demonstrate how both the atypicality inference and the effect of prominence can be modelled using the rational speech act (RSA) framework. We show that atypicality inferences can be captured by introducing joint reasoning about the habituality of events, following Degen, Tessler, and Goodman (2015); Goodman and Frank (2016). However, we find that joint reasoning models principally cannot account for the effect of differences in utterance prominence. This is because prominence markers do not contribute to the truth-conditional meaning. We then proceed to demonstrate that leveraging a noisy channel model, which has previously been used to model low-level acoustic perception (Bergen & Goodman, 2015), can successfully account for the empirically observed patterns of utterance prominence.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A3

Krielke, Marie-Pauline; Talamo, Luigi; Fawzi, M.; Knappen, J.

Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German Inproceedings

LREC 2022, Marseille, France, 2022.

We present two comparable diachronic corpora of scientific English and German from the Late Modern Period (17th c.–19th c.) annotated with Universal Dependencies. We describe several steps of data pre-processing and evaluate the resulting parsing accuracy showing how our pre-processing steps significantly improve output quality. As a sanity check for the representativity of our data, we conduct a case study comparing previously gained insights on grammatical change in the scientific genre with our data. Our results reflect the often reported trend of English scientific discourse towards heavy noun phrases and a simplification of the sentence structure (Halliday, 1988; Halliday and Martin, 1993; Biber and Gray, 2011; Biber and Gray, 2016). We also show that this trend applies to German scientific discourse as well. The presented corpora are valuable resources suitable for the contrastive analysis of syntactic diachronic change in the scientific genre between 1650 and 1900. The presented pre-processing procedures and their evaluations are applicable to other languages and can be useful for a variety of Natural Language Processing tasks such as syntactic parsing.

@inproceedings{krielke-etal-2022-tracing,
title = {Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German},
author = {Marie-Pauline Krielke and Luigi Talamo andM. Fawzi and J. Knappen},
url = {https://aclanthology.org/2022.lrec-1.514/},
year = {2022},
date = {2022},
publisher = {LREC 2022},
address = {Marseille, France},
abstract = {We present two comparable diachronic corpora of scientific English and German from the Late Modern Period (17th c.–19th c.) annotated with Universal Dependencies. We describe several steps of data pre-processing and evaluate the resulting parsing accuracy showing how our pre-processing steps significantly improve output quality. As a sanity check for the representativity of our data, we conduct a case study comparing previously gained insights on grammatical change in the scientific genre with our data. Our results reflect the often reported trend of English scientific discourse towards heavy noun phrases and a simplification of the sentence structure (Halliday, 1988; Halliday and Martin, 1993; Biber and Gray, 2011; Biber and Gray, 2016). We also show that this trend applies to German scientific discourse as well. The presented corpora are valuable resources suitable for the contrastive analysis of syntactic diachronic change in the scientific genre between 1650 and 1900. The presented pre-processing procedures and their evaluations are applicable to other languages and can be useful for a variety of Natural Language Processing tasks such as syntactic parsing.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Yuen, Ivan; Demuth, Katherine; Shattuck-Hufnagel, Stefanie

Planning of prosodic clitics in Australian English Journal Article

Language, Cognition and Neuroscience, Routledge, pp. 1-6, 2022.

The prosodic word (PW) has been proposed as a planning unit in speech production (Levelt et al. [1999. A theory of lexical access in speech production. Behavioral and Brain Sciences22, 1–75]), supported by evidence that speech initiation time (RT) is faster for Dutch utterances with fewer PWs due to cliticisation (with the number of lexical words and syllables kept constant) (Wheeldon & Lahiri [1997. Prosodic units in speech production. Journal of Memory and Language37(3), 356–381. https://doi.org/10.1006/jmla.1997.2517], W&L). The present study examined prosodic cliticisation (and resulting RT) for a different set of potential clitics (articles, direct-object pronouns), in English, using a different response task (immediate reading aloud). W&L’s result of shorter RTs for fewer PWs was replicated for articles, but not for pronouns, suggesting a difference in cliticisation for these two function word types. However, a post-hoc analysis of the duration of the verb preceding the clitic suggests that both are cliticised. These findings highlight the importance of supplementing production latency measures with phonetic duration measures to understand different stages of language production during utterance planning.

@article{Yuen_of_2022,
title = {Planning of prosodic clitics in Australian English},
author = {Ivan Yuen and Katherine Demuth and Stefanie Shattuck-Hufnagel},
url = {https://www.tandfonline.com/eprint/4K7DVYQIWRKITU3JCACY/full?target=10.1080/23273798.2022.2060517},
doi = {https://doi.org/10.1080/23273798.2022.2060517},
year = {2022},
date = {2022-04-05},
journal = {Language, Cognition and Neuroscience},
pages = {1-6},
publisher = {Routledge},
abstract = {The prosodic word (PW) has been proposed as a planning unit in speech production (Levelt et al. [1999. A theory of lexical access in speech production. Behavioral and Brain Sciences22, 1–75]), supported by evidence that speech initiation time (RT) is faster for Dutch utterances with fewer PWs due to cliticisation (with the number of lexical words and syllables kept constant) (Wheeldon & Lahiri [1997. Prosodic units in speech production. Journal of Memory and Language37(3), 356–381. https://doi.org/10.1006/jmla.1997.2517], W&L). The present study examined prosodic cliticisation (and resulting RT) for a different set of potential clitics (articles, direct-object pronouns), in English, using a different response task (immediate reading aloud). W&L’s result of shorter RTs for fewer PWs was replicated for articles, but not for pronouns, suggesting a difference in cliticisation for these two function word types. However, a post-hoc analysis of the duration of the verb preceding the clitic suggests that both are cliticised. These findings highlight the importance of supplementing production latency measures with phonetic duration measures to understand different stages of language production during utterance planning.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Kudera, Jacek; Georgis, Philip; Alam, Hasan Md Tusfiqur; Möbius, Bernd; Avgustinova, Tania; Klakow, Dietrich

Comprehension of closely related languages: A visual world eye tracking study Inproceedings

Elektronische Sprachsignalverarbeitung 2022, Tagungsband der 33. Konferenz (Sønderborg), pp. 212-219, 2022.

We present results of an eye tracking experiment which aimed at testing sentence comprehension in closely related Slavic languages. Since none of the participants were trained in translation studies or Slavic linguistics, the study illustrates effects of intercomprehension. The participants were exposed to auditory stimuli in Bulgarian, Czech, Polish, and Russian accompanied by a visual scene. The analysis of anticipatory eye movements has shown that native speakers of one Slavic language listening to sentences in another Slavic language, turn their attention to and begin fixating on the referent objects as soon as they identify a predicate. This experiment provides evidence for surprisal-based effects in intercomprehension.

@inproceedings{Kudera/etal:2022a,
title = {Comprehension of closely related languages: A visual world eye tracking study},
author = {Jacek Kudera and Philip Georgis and Hasan Md Tusfiqur Alam and Bernd M{\"o}bius and Tania Avgustinova and Dietrich Klakow},
url = {https://www.essv.de/pdf/2022_212_219.pdf?id=1161},
year = {2022},
date = {2022},
booktitle = {Elektronische Sprachsignalverarbeitung 2022, Tagungsband der 33. Konferenz (Sønderborg)},
pages = {212-219},
abstract = {We present results of an eye tracking experiment which aimed at testing sentence comprehension in closely related Slavic languages. Since none of the participants were trained in translation studies or Slavic linguistics, the study illustrates effects of intercomprehension. The participants were exposed to auditory stimuli in Bulgarian, Czech, Polish, and Russian accompanied by a visual scene. The analysis of anticipatory eye movements has shown that native speakers of one Slavic language listening to sentences in another Slavic language, turn their attention to and begin fixating on the referent objects as soon as they identify a predicate. This experiment provides evidence for surprisal-based effects in intercomprehension.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Abdullah, Badr M.; Möbius, Bernd; Klakow, Dietrich

Integrating form and meaning: A multi-task learning model for acoustic word embeddings Inproceedings

Proceedings of Interspeech 2022, pp. 1876-1880, 2022.

Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.

@inproceedings{Abdullah/etal:2022a,
title = {Integrating form and meaning: A multi-task learning model for acoustic word embeddings},
author = {Badr M. Abdullah and Bernd M{\"o}bius and Dietrich Klakow},
url = {https://www.isca-speech.org/archive/interspeech_2022/abdullah22_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2022-626},
year = {2022},
date = {2022},
booktitle = {Proceedings of Interspeech 2022},
pages = {1876-1880},
abstract = {Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Gessinger, Iona; Cohn, Michelle; Zellou, Georgia; Möbius, Bernd

Cross-cultural comparison of gradient emotion perception: Human vs. Alexa TTS voices Inproceedings

Proceedings of Interspeech 2022, pp. 4970-4974, 2022.

This study compares how American (US) and German (DE) listeners perceive emotional expressiveness from Amazon Alexa text-to-speech (TTS) and human voices. Participants heard identical stimuli, manipulated from an emotionally ‘neutral‘ production to three levels of increased happiness generated by resynthesis. Results show that, for both groups, ‘happiness‘ manipulations lead to higher ratings of emotional valence (i.e., more positive) for the human voice. Moreover, there was a difference across the groups in their perception of arousal (i.e., excitement): US listeners show higher ratings for human voices with manipulations, while DE listeners perceive the Alexa voice as sounding less ‘excited‘ overall. We discuss these findings in terms of theories of cross-cultural emotion perception and human-computer interaction.

@inproceedings{Gessinger/etal:2022a,
title = {Cross-cultural comparison of gradient emotion perception: Human vs. Alexa TTS voices},
author = {Iona Gessinger and Michelle Cohn and Georgia Zellou and Bernd M{\"o}bius},
url = {https://www.isca-speech.org/archive/interspeech_2022/gessinger22_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2022-146},
year = {2022},
date = {2022},
booktitle = {Proceedings of Interspeech 2022},
pages = {4970-4974},
abstract = {This study compares how American (US) and German (DE) listeners perceive emotional expressiveness from Amazon Alexa text-to-speech (TTS) and human voices. Participants heard identical stimuli, manipulated from an emotionally ‘neutral' production to three levels of increased happiness generated by resynthesis. Results show that, for both groups, ‘happiness' manipulations lead to higher ratings of emotional valence (i.e., more positive) for the human voice. Moreover, there was a difference across the groups in their perception of arousal (i.e., excitement): US listeners show higher ratings for human voices with manipulations, while DE listeners perceive the Alexa voice as sounding less ‘excited' overall. We discuss these findings in terms of theories of cross-cultural emotion perception and human-computer interaction.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Pardo, Jennifer; Pellegrino, Elisa; Dellwo, Volker; Möbius, Bernd

Special issue: Vocal accommodation in speech communication Journal Article

Journal of Phonetics, 95, 1-9, pp. paper 101196, 2022.

This introductory article for the Special Issue on Vocal Accommodation in Speech Communication provides an overview of prevailing theories of vocal accommodation and summarizes the ten papers in the collection. Communication Accommodation Theory focusses on social factors evoking accent convergence or divergence, while the Interactive Alignment Model proposes cognitive integration of perception and production as an automatic priming mechanism driving convergence language production. Recent research including most of the papers in this Special Issue indicates that a hybrid or interactive synergy model provides a more comprehensive account of observed patterns of phonetic convergence than purely automatic mechanisms. Some of the fundamental questions that this special collection aimed to cover concerned (1) the nature of vocal accommodation in terms of underlying mechanisms and social functions in human–human and human–computer interaction; (2) the effect of task-specific and talker-specific characteristics (gender, age, personality, linguistic and cultural background, role in interaction) on degree and direction of convergence towards human and computer interlocutors; (3) integration of articulatory, perceptual, neurocognitive, and/or multimodal data to the analysis of acoustic accommodation in interactive and non-interactive speech tasks; and (4) the contribution of short/long-term accommodation in human–human and human–computer interactions to the diffusion of linguistic innovation and ultimately language variation and change.

@article{Pardo_etal22,
title = {Special issue: Vocal accommodation in speech communication},
author = {Jennifer Pardo and Elisa Pellegrino and Volker Dellwo and Bernd M{\"o}bius},
url = {https://www.coli.uni-saarland.de/~moebius/documents/pardo_etal_jphon-si2022.pdf},
year = {2022},
date = {2022},
journal = {Journal of Phonetics},
pages = {paper 101196},
volume = {95, 1-9},
abstract = {This introductory article for the Special Issue on Vocal Accommodation in Speech Communication provides an overview of prevailing theories of vocal accommodation and summarizes the ten papers in the collection. Communication Accommodation Theory focusses on social factors evoking accent convergence or divergence, while the Interactive Alignment Model proposes cognitive integration of perception and production as an automatic priming mechanism driving convergence language production. Recent research including most of the papers in this Special Issue indicates that a hybrid or interactive synergy model provides a more comprehensive account of observed patterns of phonetic convergence than purely automatic mechanisms. Some of the fundamental questions that this special collection aimed to cover concerned (1) the nature of vocal accommodation in terms of underlying mechanisms and social functions in human–human and human–computer interaction; (2) the effect of task-specific and talker-specific characteristics (gender, age, personality, linguistic and cultural background, role in interaction) on degree and direction of convergence towards human and computer interlocutors; (3) integration of articulatory, perceptual, neurocognitive, and/or multimodal data to the analysis of acoustic accommodation in interactive and non-interactive speech tasks; and (4) the contribution of short/long-term accommodation in human–human and human–computer interactions to the diffusion of linguistic innovation and ultimately language variation and change.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Höller, Daniel; Behnke, Gregor

Encoding Lifted Classical Planning in Propositional Logic Inproceedings

Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, pp. 134-144, 2022.

Planning models are usually defined in lifted, i.e. first order formalisms, while most solvers need (variable-free) grounded representations. Though techniques for grounding prune unnecessary parts of the model, grounding might – nevertheless – be prohibitively expensive in terms of runtime. To overcome this issue, there has been renewed interest in solving planning problems based on the lifted representation in the last years. While these approaches are based on (heuristic) search, we present an encoding of lifted classical planning in propositional logic and use SAT solvers to solve it. Our evaluation shows that our approach is competitive with the heuristic search-based approaches in satisficing planning and outperforms them in a (length-)optimal setting.

@inproceedings{HoellerB22,
title = {Encoding Lifted Classical Planning in Propositional Logic},
author = {Daniel H{\"o}ller and Gregor Behnke},
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/19794},
year = {2022},
date = {2022},
booktitle = {Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {134-144},
publisher = {AAAI Press},
abstract = {Planning models are usually defined in lifted, i.e. first order formalisms, while most solvers need (variable-free) grounded representations. Though techniques for grounding prune unnecessary parts of the model, grounding might – nevertheless – be prohibitively expensive in terms of runtime. To overcome this issue, there has been renewed interest in solving planning problems based on the lifted representation in the last years. While these approaches are based on (heuristic) search, we present an encoding of lifted classical planning in propositional logic and use SAT solvers to solve it. Our evaluation shows that our approach is competitive with the heuristic search-based approaches in satisficing planning and outperforms them in a (length-)optimal setting.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Scholman, Merel; Pyatkin, Valentina; Yung, Frances Pik Yu; Dagan, Ido ; Tsarfaty, Reut; Demberg, Vera

Design Choices in Crowdsourcing Discourse Relation Annotations: The Effect of Worker Selection and Training Inproceedings

Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, European Language Resources Association, pp. 2148–2156, 2022.

Obtaining linguistic annotation from novice crowdworkers is far from trivial. A case in point is the annotation of discourse relations, which is a complicated task. Recent methods have obtained promising results by extracting relation labels from either discourse connectives (DCs) or question-answer (QA) pairs that participants provide. The current contribution studies the effect of worker selection and training on the agreement on implicit relation labels between workers and gold labels, for both the DC and the QA method. In Study 1, workers were not specifically selected or trained, and the results show that there is much room for improvement. Study 2 shows that a combination of selection and training does lead to improved results, but the method is cost- and time-intensive. Study 3 shows that a selection-only approach is a viable alternative; it results in annotations of comparable quality compared to annotations from trained participants. The results generalized over both the DC and QA method and therefore indicate that a selection-only approach could also be effective for other crowdsourced discourse annotation tasks.

@inproceedings{ Scholmanet-al22-3,
title = {Design Choices in Crowdsourcing Discourse Relation Annotations: The Effect of Worker Selection and Training},
author = {Merel Scholman and Valentina Pyatkin and Frances Pik Yu Yung and Ido Dagan and Reut Tsarfaty and Vera Demberg},
url = {https://aclanthology.org/2022.lrec-1.231/},
year = {2022},
date = {2022},
booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France},
pages = {2148–2156},
publisher = {European Language Resources Association},
abstract = {Obtaining linguistic annotation from novice crowdworkers is far from trivial. A case in point is the annotation of discourse relations, which is a complicated task. Recent methods have obtained promising results by extracting relation labels from either discourse connectives (DCs) or question-answer (QA) pairs that participants provide. The current contribution studies the effect of worker selection and training on the agreement on implicit relation labels between workers and gold labels, for both the DC and the QA method. In Study 1, workers were not specifically selected or trained, and the results show that there is much room for improvement. Study 2 shows that a combination of selection and training does lead to improved results, but the method is cost- and time-intensive. Study 3 shows that a selection-only approach is a viable alternative; it results in annotations of comparable quality compared to annotations from trained participants. The results generalized over both the DC and QA method and therefore indicate that a selection-only approach could also be effective for other crowdsourced discourse annotation tasks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Scholman, Merel; Dong, Tianai; Yung, Frances Pik Yu; Demberg, Vera

DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations Journal Article

Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 22), Marseille, France, pp. 3281-3290, 2022.

We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech, literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses. Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into non-connective signals of discourse relations.

@article{Scholman_et-al22.2,
title = {DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations},
author = {Merel Scholman and Tianai Dong and Frances Pik Yu Yung and Vera Demberg},
url = {https://aclanthology.org/2022.lrec-1.351/},
year = {2022},
date = {2022},
journal = {Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 22), Marseille, France},
pages = {3281-3290},
abstract = {We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech, literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses. Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into non-connective signals of discourse relations.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Successfully