Publications

Abdullah, Badr M.; Zaitova, Iuliia; Avgustinova, Tania; Möbius, Bernd; Klakow, Dietrich

How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings Inproceedings

Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics, pp. 407-419, 2021.

How do neural networks “perceive” speech sounds from unknown languages? Does the typological similarity between the model’s training language (L1) and an unknown language (L2) have an impact on the model representations of L2 speech signals? To answer these questions, we present a novel experimental design based on representational similarity analysis (RSA) to analyze acoustic word embeddings (AWEs)—vector representations of variable-duration spoken-word segments. First, we train monolingual AWE models on seven Indo-European languages with various degrees of typological similarity. We then employ RSA to quantify the cross-lingual similarity by simulating native and non-native spoken-word processing using AWEs. Our experiments show that typological similarity indeed affects the representational similarity of the models in our study. We further discuss the implications of our work on modeling speech processing and language similarity with neural networks.

@inproceedings{abdullah-etal-2021-familiar,
title = {How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings},
author = {Badr M. Abdullah and Iuliia Zaitova and Tania Avgustinova and Bernd M{\"o}bius and Dietrich Klakow},
url = {https://aclanthology.org/2021.blackboxnlp-1.32/},
doi = {https://doi.org/10.18653/v1/2021.blackboxnlp-1.32},
year = {2021},
date = {2021},
booktitle = {Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP},
pages = {407-419},
publisher = {Association for Computational Linguistics},
abstract = {How do neural networks “perceive” speech sounds from unknown languages? Does the typological similarity between the model’s training language (L1) and an unknown language (L2) have an impact on the model representations of L2 speech signals? To answer these questions, we present a novel experimental design based on representational similarity analysis (RSA) to analyze acoustic word embeddings (AWEs)—vector representations of variable-duration spoken-word segments. First, we train monolingual AWE models on seven Indo-European languages with various degrees of typological similarity. We then employ RSA to quantify the cross-lingual similarity by simulating native and non-native spoken-word processing using AWEs. Our experiments show that typological similarity indeed affects the representational similarity of the models in our study. We further discuss the implications of our work on modeling speech processing and language similarity with neural networks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Zouhar, Vilém; Mosbach, Marius; Biswas, Debanjali; Klakow, Dietrich

Artefact Retrieval: Overview of NLP Models with Knowledge Base Access Inproceedings

Workshop on Commonsense Reasoning and Knowledge Bases, 2021.

Many NLP models gain performance by having access to a knowledge base. A lot of research has been devoted to devising and improving the way the knowledge base is accessed and incorporated into the model, resulting in a number of mechanisms and pipelines. Despite the diversity of proposed mechanisms, there are patterns in the designs of such systems. In this paper, we systematically describe the typology of *artefacts* (items retrieved from a knowledge base), retrieval mechanisms and the way these artefacts are *fused* into the model. This further allows us to uncover combinations of design decisions that had not yet been tried. Most of the focus is given to language models, though we also show how question answering, fact-checking and knowledgable dialogue models fit into this system as well. Having an abstract model which can describe the architecture of specific models also helps with transferring these architectures between multiple NLP tasks.

@inproceedings{zouhar2021artefact,
title = {Artefact Retrieval: Overview of NLP Models with Knowledge Base Access},
author = {Vil{\'e}m Zouhar and Marius Mosbach and Debanjali Biswas and Dietrich Klakow},
url = {https://arxiv.org/abs/2201.09651},
year = {2021},
date = {2021},
booktitle = {Workshop on Commonsense Reasoning and Knowledge Bases},
abstract = {Many NLP models gain performance by having access to a knowledge base. A lot of research has been devoted to devising and improving the way the knowledge base is accessed and incorporated into the model, resulting in a number of mechanisms and pipelines. Despite the diversity of proposed mechanisms, there are patterns in the designs of such systems. In this paper, we systematically describe the typology of *artefacts* (items retrieved from a knowledge base), retrieval mechanisms and the way these artefacts are *fused* into the model. This further allows us to uncover combinations of design decisions that had not yet been tried. Most of the focus is given to language models, though we also show how question answering, fact-checking and knowledgable dialogue models fit into this system as well. Having an abstract model which can describe the architecture of specific models also helps with transferring these architectures between multiple NLP tasks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Hoek, Jet; Scholman, Merel; Sanders, Ted J. M.

Is there less agreement when the discourse is underspecified? Inproceedings

Proceedings of the Integrating Perspectives on Discourse Annotation (DiscAnn) Workshop, University of Tübingen, Germany, 2021.

When annotating coherence relations, interannotator agreement tends to be lower on implicit relations than on relations that are explicitly marked by means of a connective or a cue phrase. This paper explores one possible explanation for this: the additional inferencing involved in interpreting implicit relations compared to explicit relations. If this is the main source of disagreements, agreement should be highly related to the specificity of the connective. Using the CCR framework, we annotated relations from TED talks that were marked by a very specific marker, marked by a highly ambiguous connective, or not marked by means of a connective at all. We indeed reached higher inter-annotator agreement on explicit than on implicit relations. However, agreement on underspecified relations was not necessarily in between, which is what would be expected if agreement on implicit relations mainly suffers because annotators have less specific instructions for inferring the relation.

@inproceedings{hoek-etal-2021-discann,
title = {Is there less agreement when the discourse is underspecified?},
author = {Jet Hoek and Merel Scholman and Ted J. M. Sanders},
url = {https://aclanthology.org/2021.discann-1.1/},
year = {2021},
date = {2021},
booktitle = {Proceedings of the Integrating Perspectives on Discourse Annotation (DiscAnn) Workshop},
address = {University of T{\"u}bingen, Germany},
abstract = {When annotating coherence relations, interannotator agreement tends to be lower on implicit relations than on relations that are explicitly marked by means of a connective or a cue phrase. This paper explores one possible explanation for this: the additional inferencing involved in interpreting implicit relations compared to explicit relations. If this is the main source of disagreements, agreement should be highly related to the specificity of the connective. Using the CCR framework, we annotated relations from TED talks that were marked by a very specific marker, marked by a highly ambiguous connective, or not marked by means of a connective at all. We indeed reached higher inter-annotator agreement on explicit than on implicit relations. However, agreement on underspecified relations was not necessarily in between, which is what would be expected if agreement on implicit relations mainly suffers because annotators have less specific instructions for inferring the relation.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Yung, Frances Pik Yu; Scholman, Merel; Demberg, Vera

A practical perspective on connective generation Inproceedings

Proceedings of the Second Workshop on Computational Approaches to Discourse (CODI), Association for Computational Linguistics, pp. 72-83, Punta Cana, Dominican Republic and Online, 2021.

In data-driven natural language generation, we typically know what relation should be expressed and need to select a connective to lexicalize it. In the current contribution, we analyse whether a sophisticated connective generation module is necessary to select a connective, or whether this can be solved with simple methods (such as random choice between connectives that are known to express a given relation, or usage of a generic language model). Comparing these methods to the distributions of connective choices from a human connective insertion task, we find mixed results: for some relations, it is acceptable to lexicalize them using any of the connectives that mark this relation. However, for other relations (temporals, concessives) either a more detailed relation distinction needs to be introduced, or a more sophisticated connective choice module would be necessary.

@inproceedings{yung-etal-2021-practical,
title = {A practical perspective on connective generation},
author = {Frances Pik Yu Yung and Merel Scholman and Vera Demberg},
url = {https://aclanthology.org/2021.codi-main.7},
doi = {https://doi.org/10.18653/v1/2021.codi-main.7},
year = {2021},
date = {2021},
booktitle = {Proceedings of the Second Workshop on Computational Approaches to Discourse (CODI)},
pages = {72-83},
publisher = {Association for Computational Linguistics},
address = {Punta Cana, Dominican Republic and Online},
abstract = {In data-driven natural language generation, we typically know what relation should be expressed and need to select a connective to lexicalize it. In the current contribution, we analyse whether a sophisticated connective generation module is necessary to select a connective, or whether this can be solved with simple methods (such as random choice between connectives that are known to express a given relation, or usage of a generic language model). Comparing these methods to the distributions of connective choices from a human connective insertion task, we find mixed results: for some relations, it is acceptable to lexicalize them using any of the connectives that mark this relation. However, for other relations (temporals, concessives) either a more detailed relation distinction needs to be introduced, or a more sophisticated connective choice module would be necessary.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Scholman, Merel; Dong, Tianai; Yung, Frances Pik Yu; Demberg, Vera

Comparison of methods for explicit discourse connective identification across various domains Inproceedings

Proceedings of the Second Workshop on Computational Approaches to Discourse (CODI), Association for Computational Linguistics, pp. 95-106, Punta Cana, Dominican Republic and Online, 2021.

Existing parse methods use varying approaches to identify explicit discourse connectives, but their performance has not been consistently evaluated in comparison to each other, nor have they been evaluated consistently on text other than newspaper articles. We here assess the performance on explicit connective identification of three parse methods (PDTB e2e, Lin et al., 2014; the winner of CONLL2015, Wang et al., 2015; and DisSent, Nie et al., 2019), along with a simple heuristic. We also examine how well these systems generalize to different datasets, namely written newspaper text (PDTB), written scientific text (BioDRB), prepared spoken text (TED-MDB) and spontaneous spoken text (Disco-SPICE). The results show that the e2e parser outperforms the other parse methods in all datasets. However, performance drops significantly from the PDTB to all other datasets. We provide a more fine-grained analysis of domain differences and connectives that prove difficult to parse, in order to highlight the areas where gains can be made.

@inproceedings{scholman-etal-2021-comparison,
title = {Comparison of methods for explicit discourse connective identification across various domains},
author = {Merel Scholman and Tianai Dong and Frances Pik Yu Yung and Vera Demberg},
url = {https://aclanthology.org/2021.codi-main.9},
doi = {https://doi.org/10.18653/v1/2021.codi-main.9},
year = {2021},
date = {2021},
booktitle = {Proceedings of the Second Workshop on Computational Approaches to Discourse (CODI)},
pages = {95-106},
publisher = {Association for Computational Linguistics},
address = {Punta Cana, Dominican Republic and Online},
abstract = {Existing parse methods use varying approaches to identify explicit discourse connectives, but their performance has not been consistently evaluated in comparison to each other, nor have they been evaluated consistently on text other than newspaper articles. We here assess the performance on explicit connective identification of three parse methods (PDTB e2e, Lin et al., 2014; the winner of CONLL2015, Wang et al., 2015; and DisSent, Nie et al., 2019), along with a simple heuristic. We also examine how well these systems generalize to different datasets, namely written newspaper text (PDTB), written scientific text (BioDRB), prepared spoken text (TED-MDB) and spontaneous spoken text (Disco-SPICE). The results show that the e2e parser outperforms the other parse methods in all datasets. However, performance drops significantly from the PDTB to all other datasets. We provide a more fine-grained analysis of domain differences and connectives that prove difficult to parse, in order to highlight the areas where gains can be made.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Lemke, Tyll Robin

Satzäquivalente — Syntax oder Pragmatik? Incollection

Külpmann, Robert; Finkbeiner, Rita;  (Ed.): Neues zur Selbstständigkeit von Sätzen, Linguistische Berichte, Sonderheft, Buske, pp. 81-104, Hamburg, 2021, ISBN 978-3-96769-170-2 .

„Satzäquivalente“ scheinen einen Widerspruch zwischen Syntax und Pragmatik darzustellen, da sie trotz nichtsententialer Form die selben Funktionen wie Sätze erfüllen. Wir stellen zwei Experimente vor, die Vorhersagen zweier theoretischer Perspektiven auf diese Ausdrücke untersuchen. Einerseits generieren elliptische Ansätze (Morgan, 1973; Merchant, 2004; Reich, 2007) Satzäquivalente mittels Ellipse aus vollständigen Sätzen, andererseits schlagen nichtsententiale Ansätze (Barton & Progovac, 2005; Stainton, 2006) vor, dass die Syntax subsententiale Ausdrücke generieren kann.

@incollection{Lemke2021a,
title = {Satz{\"a}quivalente — Syntax oder Pragmatik?},
author = {Tyll Robin Lemke},
editor = {Robert K{\"u}lpmann and Rita Finkbeiner},
url = {https://buske.de/zeitschriften-bei-sonderhefte/linguistische-berichte-sonderhefte/neues-zur-selbststandigkeit-von-satzen-16620.html},
doi = {https://doi.org/10.46771/978-3-96769-170-2},
year = {2021},
date = {2021},
booktitle = {Neues zur Selbstst{\"a}ndigkeit von S{\"a}tzen},
isbn = {978-3-96769-170-2},
pages = {81-104},
publisher = {Buske},
address = {Hamburg},
abstract = {"Satz{\"a}quivalente" scheinen einen Widerspruch zwischen Syntax und Pragmatik darzustellen, da sie trotz nichtsententialer Form die selben Funktionen wie S{\"a}tze erf{\"u}llen. Wir stellen zwei Experimente vor, die Vorhersagen zweier theoretischer Perspektiven auf diese Ausdr{\"u}cke untersuchen. Einerseits generieren elliptische Ans{\"a}tze (Morgan, 1973; Merchant, 2004; Reich, 2007) Satz{\"a}quivalente mittels Ellipse aus vollst{\"a}ndigen S{\"a}tzen, andererseits schlagen nichtsententiale Ans{\"a}tze (Barton & Progovac, 2005; Stainton, 2006) vor, dass die Syntax subsententiale Ausdr{\"u}cke generieren kann.},
pubstate = {published},
type = {incollection}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin

Experimental investigations on the syntax and usage of fragments Miscellaneous

Experimental investigations on the syntax and usage of fragments, Open Germanic Linguistics, Language Science Press, Berlin, 2021.

This book investigates the syntax and usage of fragments (Morgan 1973), apparently subsentential utterances like „A coffee, please!“ which fulfill the same communicative function as the corresponding full sentence „I’d like to have a coffee, please!“. Even though such utterances are frequently used, they challenge the central role that has been attributed to the notion of sentence in linguistic theory, particularly from a semantic perspective.

The first part of the book is dedicated to the syntactic analysis of fragments, which is investigated with experimental methods. Currently there are several competing theoretical analyses of fragments, which rely almost only on introspective data. The experiments presented in this book constitute a first systematic evaluation of some of their crucial predictions and, taken together, support an in situ ellipsis account of fragments, as has been suggested by Reich (2007).

The second part of the book addresses the questions of why fragments are used at all, and under which circumstances they are preferred over complete sentences. Syntactic accounts impose licensing conditions on fragments, but they do not explain, why fragments are sometimes (dis)preferred provided that their usage is licensed. This book proposes an information-theoretic account of fragments, which predicts that the usage of fragments in constrained by a general tendency to distribute processing effort uniformly across the utterance. With respect to fragments, this leads to two predictions, which are empirically confirmed: Speakers tend towards omitting predictable words and they insert additional redundancy before unpredictable words.

@miscellaneous{Lemke2021,
title = {Experimental investigations on the syntax and usage of fragments},
author = {Tyll Robin Lemke},
url = {https://langsci-press.org/catalog/book/321},
doi = {https://doi.org/10.5281/zenodo.5596236},
year = {2021},
date = {2021},
booktitle = {Experimental investigations on the syntax and usage of fragments},
publisher = {Language Science Press},
address = {Berlin},
abstract = {This book investigates the syntax and usage of fragments (Morgan 1973), apparently subsentential utterances like "A coffee, please!" which fulfill the same communicative function as the corresponding full sentence "I'd like to have a coffee, please!". Even though such utterances are frequently used, they challenge the central role that has been attributed to the notion of sentence in linguistic theory, particularly from a semantic perspective. The first part of the book is dedicated to the syntactic analysis of fragments, which is investigated with experimental methods. Currently there are several competing theoretical analyses of fragments, which rely almost only on introspective data. The experiments presented in this book constitute a first systematic evaluation of some of their crucial predictions and, taken together, support an in situ ellipsis account of fragments, as has been suggested by Reich (2007). The second part of the book addresses the questions of why fragments are used at all, and under which circumstances they are preferred over complete sentences. Syntactic accounts impose licensing conditions on fragments, but they do not explain, why fragments are sometimes (dis)preferred provided that their usage is licensed. This book proposes an information-theoretic account of fragments, which predicts that the usage of fragments in constrained by a general tendency to distribute processing effort uniformly across the utterance. With respect to fragments, this leads to two predictions, which are empirically confirmed: Speakers tend towards omitting predictable words and they insert additional redundancy before unpredictable words.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B3

Kalimuthu, Marimuthu; Mogadala, Aditya; Mosbach, Marius; Klakow, Dietrich

Fusion Models for Improved Image Captioning Inproceedings

Pattern Recognition. ICPR International Workshops and Challenges, pp. 381-395, Cham, 2020.

Visual captioning aims to generate textual descriptions given images or videos. Traditionally, image captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This limitation hinders the generalization capabilities of these models while also rendering them liable to making mistakes. Language models can, however, be trained on vast amounts of freely available unlabelled data and have recently emerged as successful language encoders and coherent text generators. Meanwhile, several unimodal and multimodal fusion techniques have been proven to work well for natural language generation and automatic speech recognition. Building on these recent developments, and with the aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks. Next, we employ the same fusion strategies to integrate a pretrained Masked Language Model (MLM), namely BERT, with a visual captioning model, viz. Show, Attend, and Tell, for emending both syntactic and semantic errors in captions. Our caption emendation experiments on three benchmark image captioning datasets, viz. Flickr8k, Flickr30k, and MSCOCO, show improvements over the baseline, indicating the usefulness of our proposed multimodal fusion strategies. Further, we perform a preliminary qualitative analysis on the emended captions and identify error categories based on the type of corrections.

@inproceedings{Kalimuthu2021fusion,
title = {Fusion Models for Improved Image Captioning},
author = {Marimuthu Kalimuthu and Aditya Mogadala and Marius Mosbach and Dietrich Klakow},
url = {https://arxiv.org/abs/2010.15251},
doi = {https://doi.org/10.1007/978-3-030-68780-9_32},
year = {2020},
date = {2020},
booktitle = {Pattern Recognition. ICPR International Workshops and Challenges},
pages = {381-395},
address = {Cham},
abstract = {Visual captioning aims to generate textual descriptions given images or videos. Traditionally, image captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This limitation hinders the generalization capabilities of these models while also rendering them liable to making mistakes. Language models can, however, be trained on vast amounts of freely available unlabelled data and have recently emerged as successful language encoders and coherent text generators. Meanwhile, several unimodal and multimodal fusion techniques have been proven to work well for natural language generation and automatic speech recognition. Building on these recent developments, and with the aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks. Next, we employ the same fusion strategies to integrate a pretrained Masked Language Model (MLM), namely BERT, with a visual captioning model, viz. Show, Attend, and Tell, for emending both syntactic and semantic errors in captions. Our caption emendation experiments on three benchmark image captioning datasets, viz. Flickr8k, Flickr30k, and MSCOCO, show improvements over the baseline, indicating the usefulness of our proposed multimodal fusion strategies. Further, we perform a preliminary qualitative analysis on the emended captions and identify error categories based on the type of corrections.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Mogadala, Aditya; Mosbach, Marius; Klakow, Dietrich

Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation Inproceedings

Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond, Workshop at ICML, 2020.

Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore. The challenge here proliferate over the standard vision conditioned sentence-level generation (e.g., image or video captioning) as it requires to produce a brief and coherent story describing the visual content. In this paper, we mask this Vision-to-Sequence as Graph-to-Sequence learning problem and approach it with the Transformer architecture. To be specific, we introduce Sparse Graph-to-Sequence Transformer (SGST) for encoding the graph and decoding a sequence. The encoder aims to directly encode graph-level semantics, while the decoder is used to generate longer sequences. Experiments conducted with the benchmark image paragraph dataset show that our proposed achieve 13.3% improvement on the CIDEr evaluation measure when comparing to the previous state-of-the-art approach.

@inproceedings{mogadala2020sparse,
title = {Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation},
author = {Aditya Mogadala and Marius Mosbach and Dietrich Klakow},
url = {https://arxiv.org/abs/2007.06077},
year = {2020},
date = {2020},
booktitle = {Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond, Workshop at ICML},
abstract = {Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore. The challenge here proliferate over the standard vision conditioned sentence-level generation (e.g., image or video captioning) as it requires to produce a brief and coherent story describing the visual content. In this paper, we mask this Vision-to-Sequence as Graph-to-Sequence learning problem and approach it with the Transformer architecture. To be specific, we introduce Sparse Graph-to-Sequence Transformer (SGST) for encoding the graph and decoding a sequence. The encoder aims to directly encode graph-level semantics, while the decoder is used to generate longer sequences. Experiments conducted with the benchmark image paragraph dataset show that our proposed achieve 13.3% improvement on the CIDEr evaluation measure when comparing to the previous state-of-the-art approach.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Ferber, Patrick; Hoffmann, Jörg; Helmert, Malte

Neural network heuristics for classical planning: A study of hyperparameter space Inproceedings

24th European Conference on Artificial Intelligence (ECAI’20), 2020.

Neural networks (NN) have been shown to be powerful state-value predictors in several complex games. Can similar successes be achieved in classical planning? Towards a systematic exploration of that question, we contribute a study of hyperparameter space in the most canonical setup: input = state, feed-forward NN, supervised learning, generalization only over initial state. We investigate a broad range of hyperparameters pertaining to NN design and training. We evaluate these techniques through their use as heuristic functions in Fast Downward. The results on IPC benchmarks show that highly competitive heuristics can be learned, yielding substantially smaller search spaces than standard techniques on some domains. But the heuristic functions are costly to evaluate, and the range of domains where useful heuristics are learned is limited. Our study provides the basis for further research improving on current weaknesses.

@inproceedings{Ferber2020network,
title = {Neural network heuristics for classical planning: A study of hyperparameter space},
author = {Patrick Ferber and J{\"o}rg Hoffmann and Malte Helmert},
url = {https://ecai2020.eu/papers/433_paper.pdf},
year = {2020},
date = {2020},
booktitle = {24th European Conference on Artificial Intelligence (ECAI’20)},
abstract = {Neural networks (NN) have been shown to be powerful state-value predictors in several complex games. Can similar successes be achieved in classical planning? Towards a systematic exploration of that question, we contribute a study of hyperparameter space in the most canonical setup: input = state, feed-forward NN, supervised learning, generalization only over initial state. We investigate a broad range of hyperparameters pertaining to NN design and training. We evaluate these techniques through their use as heuristic functions in Fast Downward. The results on IPC benchmarks show that highly competitive heuristics can be learned, yielding substantially smaller search spaces than standard techniques on some domains. But the heuristic functions are costly to evaluate, and the range of domains where useful heuristics are learned is limited. Our study provides the basis for further research improving on current weaknesses.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Mecklinger, Axel; Bader, Regine

From fluency to recognition decisions: A broader view of familiarity-based remembering Journal Article

Neuropsychologia, 146, pp. 107527, 2020.

The goal of this article is to critically examine current claims and assumptions about the FN400, an event-related potential (ERP) component which has been related to familiarity memory though some uncertainty exists regarding the cognitive processes captured by the FN400. It is proposed that familiarity can be multiply determined and that an important distinction has to be made between a recent-exposure, relative familiarity mechanism indexed by the FN400 and an absolute/baseline familiarity mechanism being reflected by a coincidental but topographically distinct ERP effect. We suggest a broader conceptualization of the memory processes reflected by the FN400 and propose an unexpected fluency-attribution account of familiarity according to which familiarity results from a fast assessment of ongoing processing fluency relative to previous events or current expectations. The computations underlying fluency attribution may be closely related to those characterizing the relative familiarity mechanism underlying the FN400. We also argue that concerted activation of the perirhinal cortex (PrC) and the lateral prefrontal cortex (PFC) plays a pivotal role for fluency attributions and the generation of the FN400.

@article{MecklingerBader2020,
title = {From fluency to recognition decisions: A broader view of familiarity-based remembering},
author = {Axel Mecklinger and Regine Bader},
url = {https://www.sciencedirect.com/science/article/abs/pii/S0028393220302001},
doi = {https://doi.org/10.1016/j.neuropsychologia.2020.107527},
year = {2020},
date = {2020},
journal = {Neuropsychologia},
pages = {107527},
volume = {146},
abstract = {The goal of this article is to critically examine current claims and assumptions about the FN400, an event-related potential (ERP) component which has been related to familiarity memory though some uncertainty exists regarding the cognitive processes captured by the FN400. It is proposed that familiarity can be multiply determined and that an important distinction has to be made between a recent-exposure, relative familiarity mechanism indexed by the FN400 and an absolute/baseline familiarity mechanism being reflected by a coincidental but topographically distinct ERP effect. We suggest a broader conceptualization of the memory processes reflected by the FN400 and propose an unexpected fluency-attribution account of familiarity according to which familiarity results from a fast assessment of ongoing processing fluency relative to previous events or current expectations. The computations underlying fluency attribution may be closely related to those characterizing the relative familiarity mechanism underlying the FN400. We also argue that concerted activation of the perirhinal cortex (PrC) and the lateral prefrontal cortex (PFC) plays a pivotal role for fluency attributions and the generation of the FN400.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A6

Höltje, Gerrit; Mecklinger, Axel

Feedback timing modulates interactions between reward learning and memory encoding: Evidence from event-related potentials Journal Article

Cognitive, Affective and Behavioral Neuroscience, 20, pp. 250-264, 2020.

Feedback-based learning relies on a procedural learning system driven by reward prediction errors (RPEs). The processing of temporally delayed feedback is supported by brain structures associated with declarative memory processes, but it is still unknown how delayed feedback processing and memory encoding interact. In this study, a subsequent memory paradigm was employed to investigate how the incidental encoding of feedback pictures presented with a short (SD, 500 ms) or long (LD, 6500 ms) delay in a probabilistic learning task affects the event-related potential (ERP) correlate of RPEs (i.e., the feedback-related negativity; FRN). In an ensuing test phase, a surprise recognition memory test for the feedback pictures was conducted. FRN amplitudes measured in the feedback-locked ERPs recorded during the learning phase (FRNpeak) and in the negative minus positive feedback difference wave (FRNdiff) were compared for subsequently remembered and forgotten feedback pictures. Feedback processing as reflected in the FRNpeak was diminished for remembered LD feedback pictures, indicating that delayed feedback processing and memory encoding competed for similar neural processing resources. As evidenced by large FRNdiff amplitudes in the SD condition, the evaluation of shortly delayed feedback strongly relied on the procedural learning system. A complementary model-based single trial analysis was conducted to validate models of the functional significance of the FRN. Consistent with previous studies, feedback-locked N170 and P300 amplitudes were sensitive to feedback delay. In the test phase, memory for LD feedback pictures was better than for SD pictures and accompanied by a late old-new effect, presumably reflecting extended recollective processing.

@article{hoeltje2020feedback,
title = {Feedback timing modulates interactions between reward learning and memory encoding: Evidence from event-related potentials},
author = {Gerrit H{\"o}ltje and Axel Mecklinger},
url = {https://pubmed.ncbi.nlm.nih.gov/31900874/},
doi = {https://doi.org/10.3758/s13415-019-00765-5},
year = {2020},
date = {2020},
journal = {Cognitive, Affective and Behavioral Neuroscience},
pages = {250-264},
volume = {20},
number = {2},
abstract = {Feedback-based learning relies on a procedural learning system driven by reward prediction errors (RPEs). The processing of temporally delayed feedback is supported by brain structures associated with declarative memory processes, but it is still unknown how delayed feedback processing and memory encoding interact. In this study, a subsequent memory paradigm was employed to investigate how the incidental encoding of feedback pictures presented with a short (SD, 500 ms) or long (LD, 6500 ms) delay in a probabilistic learning task affects the event-related potential (ERP) correlate of RPEs (i.e., the feedback-related negativity; FRN). In an ensuing test phase, a surprise recognition memory test for the feedback pictures was conducted. FRN amplitudes measured in the feedback-locked ERPs recorded during the learning phase (FRNpeak) and in the negative minus positive feedback difference wave (FRNdiff) were compared for subsequently remembered and forgotten feedback pictures. Feedback processing as reflected in the FRNpeak was diminished for remembered LD feedback pictures, indicating that delayed feedback processing and memory encoding competed for similar neural processing resources. As evidenced by large FRNdiff amplitudes in the SD condition, the evaluation of shortly delayed feedback strongly relied on the procedural learning system. A complementary model-based single trial analysis was conducted to validate models of the functional significance of the FRN. Consistent with previous studies, feedback-locked N170 and P300 amplitudes were sensitive to feedback delay. In the test phase, memory for LD feedback pictures was better than for SD pictures and accompanied by a late old-new effect, presumably reflecting extended recollective processing.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A6

Ortmann, Katrin

Automatic Topological Field Identification in (Historical) German Texts Inproceedings

Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 10-18, Barcelona, Spain (online), 2020.

For the study of certain linguistic phenomena and their development over time, large amounts of textual data must be enriched with relevant annotations. Since the manual creation of such annotations requires a lot of effort, automating the process with NLP methods would be convenient. But the required amounts of training data are usually not available for non-standard or historical language. The present study investigates whether models trained on modern newspaper text can be used to automatically identify topological fields, i.e. syntactic structures, in different modern and historical German texts. The evaluation shows that, in general, it is possible to transfer a parser model to other registers or time periods with overall F1-scores >92%. However, an error analysis makes clear that additional rules and domain-specific training data would be beneficial if sentence structures differ significantly from the training data, e.g. in the case of Early New High German.

@inproceedings{Ortmann2020b,
title = {Automatic Topological Field Identification in (Historical) German Texts},
author = {Katrin Ortmann},
url = {https://www.aclweb.org/anthology/2020.latechclfl-1.2},
year = {2020},
date = {2020-12-12},
booktitle = {Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature},
pages = {10-18},
address = {Barcelona, Spain (online)},
abstract = {For the study of certain linguistic phenomena and their development over time, large amounts of textual data must be enriched with relevant annotations. Since the manual creation of such annotations requires a lot of effort, automating the process with NLP methods would be convenient. But the required amounts of training data are usually not available for non-standard or historical language. The present study investigates whether models trained on modern newspaper text can be used to automatically identify topological fields, i.e. syntactic structures, in different modern and historical German texts. The evaluation shows that, in general, it is possible to transfer a parser model to other registers or time periods with overall F1-scores >92%. However, an error analysis makes clear that additional rules and domain-specific training data would be beneficial if sentence structures differ significantly from the training data, e.g. in the case of Early New High German.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C6

Höltje, Gerrit

Interactions between immediate and delayed feedback processing and memory encoding: an investigation using event-related potentials PhD Thesis

Saarland University, Saarbruecken, Germany, 2020.

Feedback-based learning relies on a procedural learning system mediated by dopaminergic reward prediction error (RPE) signals. Recent neuroimaging research indicates that the processing of temporally delayed feedback is supported by the hippocampus, a brain structure associated with declarative memory processes, but it is still unknown how delayed feedback processing and memory encoding interact. In this dissertation project, in a series of three experiments, a subsequent memory paradigm was employed to investigate how the incidental encoding of feedback pictures in a probabilistic learning task affects the event-related potential (ERP) correlate of RPEs in feedback processing, i.e., the feedback-related negativity (FRN), and how this interaction is modulated by feedback timing, valence, and explicit outcome expectations. In Experiment 1, task-unrelated scene pictures were presented together with performance feedback in the learning task. In an ensuing test phase, a surprise recognition memory test for the pictures was conducted. FRN amplitudes measured in the feedback-locked ERPs recorded during the learning phase (FRNpeak) and in the negative minus positive feedback difference wave (FRNdiff) were compared for subsequently remembered and forgotten feedback pictures. Pictures were remembered better when presented together with positive than with negative feedback, and ERP amplitudes in the FRNdiff time window predicted subsequent memory only for positive feedback pictures. Consistent with previous studies, shortly delayed (SD, 500 ms) feedback elicited larger FRNdiff amplitudes than long delayed feedback (LD, 6500 ms), whereas the reverse pattern was found in FRNpeak amplitudes. As evidenced by behavioral estimates and ERP old/new effects, positive feedback enhanced memory by boosting familiarity-based recognition. However, feedback timing did not affect memory, presumably because participants did not need to process the scene pictures in order to learn from feedback. In Experiment 2, the picture category signaled the valence of the feedback. LD feedback pictures were associated with better memory and more recollective processing than shortly delayed ones. Feedback processing as reflected in the FRNpeak was attenuated for remembered as compared to forgotten LD feedback pictures. This suggests that when feedback was delayed, feedback processing and memory encoding competed for similar neural processing resources. As evidence by large FRNdiff amplitudes in the SD condition, the evaluation of shortly delayed feedback strongly relied on the procedural learning system. A complementary model-based single trial analysis was conducted to validate models of the functional significance of the FRN. Consistent with previous studies, feedback-locked N170 and P300 amplitudes were sensitive to feedback delay. Experiment 3 tested the hypothesis that the putative involvement of declarative learning processes in delayed feedback processing is mediated by the spontaneous generation of explicit outcome expectations during the feedback delay. A delayed feedback condition was compared with a Prediction condition in which participants were asked on each trial to predict the category of the upcoming feedback picture. Memory for the feedback pictures did not differ between the Prediction and Delay conditions. The FRNpeak subsequent memory effect obtained in Experiment 2 was replicated in both conditions, but more pronounced in the Prediction condition. As evidenced by ERP old/new effects, negative feedback pictures that disconfirmed explicit outcome expectations were associated with stronger recollective processing than those presented in the Delay condition. Positive feedback pictures elicited a recognition bias and increased familiarity signals in the memory test, which could reflect a generalization of reward value to pictures of the same category (indoor or outdoor scene). Taken together, the findings obtained in this dissertation show multiple ways by which feedback processing and memory encoding can interact, and how this interaction is shaped by feedback timing, valence, and explicit outcome expectations.


Feedbackbasiertes Lernen beruht auf einem prozeduralen Lernsystem, das auf der neurobiologischen Ebene durch dopaminerge Belohnungsvorhersagefehlersignale vermittelt wird. Studien mit bildgebenden Verfahren weisen darauf hin, dass die Verarbeitung von zeitlich verzögertem Feedback durch den Hippocampus unterstützt wird, eine Hirnstruktur, die mit deklarativen Gedächtnisprozessen assoziiert ist. Es ist jedoch noch nicht bekannt, wie die Verarbeitung von verzögertem Feedback mit der Gedächtnisenkodierung interagiert. In diesem Dissertationsprojekt wurde in einer Serie von drei Experimenten die Methode der nachfolgenden Erinnerung verwendet, um zu untersuchen, wie die inzidentelle Enkodierung von Feedbackbildern in einer probabilistischen Lernaufgabe sich auf das im ereigniskorrelierten Potenzial (EKP) messbare Korrelat von Belohnungsvorhersagefehlern in der Feedbackverarbeitung, die Feedback-Negativierung (FRN), auswirkt und wie diese Interaktion durch zeitliche Charakteristika und Valenz des Feedbacks sowie durch explizite Ergebniserwartungen moduliert wird. Im ersten Experiment wurden Bilder von Innenräumen und Landschaften zusammen mit dem Feedback in der Lernaufgabe präsentiert, wobei die Bilder nicht relevant für die Aufgabe waren. In der darauf folgenden Testphase wurde ein unerwarteter Rekognitionstest für die Bilder durchgeführt. FRN-Amplituden wurden in den während der Feedbackpräsentation aufgezeichneten EKP gemessen (FRNpeak), sowie in der Differenzwelle, die durch die Subtraktion der durch positives Feedback erzeugten EKP von den durch negatives Feedback erzeugten EKP gebildet wurde (FRNdiff). Beide FRN-Maße wurden für später erinnerte und später vergessene Bilder verglichen. Bilder, die zusammen mit positivem Feedback gezeigt wurden, wurden besser erinnert als solche, die mit negativem Feedback gepaart wurden, und EKP-Amplituden im Zeitfenster der FRNdiff prädizierten spätere Erinnerung ausschließlich für Bilder, die zusammen mit positivem Feedback präsentiert wurden. Übereinstimmend mit früheren Studien erzeugte kurz verzögertes Feedback (500 ms) größere FRNdiff-Amplituden als lang verzögertes Feedback (6500 ms), wohingegen das umgekehrte Muster für FRNpeak-Amplituden gefunden wurde. Wie durch behaviorale Maße und EKP-Alt/Neu-Effekte belegt, stärkte die Verarbeitung von positivem Feedback vor allem das vertrautheitsbasierte Erinnern der zeitgleich präsentierten Bilder, jedoch wirkten sich die zeitlichen Parameter der Feedbackpräsentation nicht auf das Gedächtnis aus, vermutlich weil eine Verarbeitung der Bilder nicht notwendig war, um das Feedback zum Lernen zu nutzen. Im zweiten Experiment wurde daher die Bildkategorie (Innenraum oder Landschaft), mit der Valenz des Feedbacks verknüpft. Lang verzögerte Feedbackbilder waren mit besserer Erinnerung und stärkerer rekollektiver Verarbeitung assoziiert als solche, die mit kurzer Verzögerung präsentiert worden waren. Die Feedbackverarbeitung, gemessen als FRNpeak-Amplitude, war geringer für lang verzögerte Feedbackbilder, die anschließend erinnert wurden als für solche, die nicht erinnert wurden. Dies legt nahe, dass die Verarbeitung von zeitlich verzögertem Feedback und die Gedächtnisenkodierung auf ähnliche neuronale Verarbeitungskapazitäten zugreifen. Wie anhand von FRNdiff-Amplituden ersichtlich, beruhte die Evaluation von zeitlich kurz verzögertem Feedback in starkem Ausmaß auf dem prozeduralen Lernsystem. Eine ergänzende, modellbasierte Analyse auf der Ebene einzelner Lerndurchgänge wurde durchgeführt, um Modelle der funktionalen Bedeutsamkeit der FRN zu validieren. Übereinstimmend mit vorherigen Studien wurden durch die Feedbackverarbeitung hervorgerufene N170- und P300-Amplituden durch die zeitliche Verzögerung des Feedbacks moduliert. Das dritte Experiment überprüfte die Hypothese, dass die mutmaßliche Beteiligung von deklarativen Lernprozessen bei der Verarbeitung von verzögertem Feedback durch die spontane Entwicklung expliziter Ergebniserwartungen während der Feedbackverzögerung vermittelt wird. Eine Bedingung mit verzögertem Feedback wurde mit einer Vorhersage-Bedingung kontrastiert, in der die Probanden in jedem Lerndurchgang die Kategorie des Feedbackbildes prädizierten. Die Erinnerung an die Feedbackbilder unterschied sich nicht zwischen den beiden Bedingungen. Der Effekt der nachfolgenden Erinnerung in den FRNpeak-Amplituden, der in Experiment 2 gefunden wurde, wurde in beiden Bedingungen repliziert, war jedoch in der Vorhersage-Bedingung stärker ausgeprägt. Wie durch EKP-Alt/Neu-Effekte belegt, waren negative Feedbackbilder, die die explizite Erwartung eines positiven Ergebnisses verletzten, mit einer stärkeren rekollektiven Verarbeitung verknüpft. Positive Bilder waren im Gedächtnistest mit besonders vielen falsch positiven Gedächtnisurteilen assoziiert, was mit einer Generalisierung des Belohnungswertes zu Bildern der gleichen Kategorie zusammenhängen könnte. Zusammengefasst zeigen die Ergebnisse dieser Dissertation, dass die Feedbackverarbeitung und die Gedächtnisenkodierung auf mehreren Wegen interagieren können. Die zeitlichen Charakteristika der Feedbackpräsentation, die Valenz des Feedbacks und explizite Ergebniserwartungen stellen wichtige Faktoren dar, die diese Interaktion beeinflussen.

@phdthesis{Höltje_Diss_2020,
title = {Interactions between immediate and delayed feedback processing and memory encoding: an investigation using event-related potentials},
author = {Gerrit H{\"o}ltje},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/30348},
doi = {https://doi.org/https://dx.doi.org/10.22028/D291-32889},
year = {2020},
date = {2020},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {Feedback-based learning relies on a procedural learning system mediated by dopaminergic reward prediction error (RPE) signals. Recent neuroimaging research indicates that the processing of temporally delayed feedback is supported by the hippocampus, a brain structure associated with declarative memory processes, but it is still unknown how delayed feedback processing and memory encoding interact. In this dissertation project, in a series of three experiments, a subsequent memory paradigm was employed to investigate how the incidental encoding of feedback pictures in a probabilistic learning task affects the event-related potential (ERP) correlate of RPEs in feedback processing, i.e., the feedback-related negativity (FRN), and how this interaction is modulated by feedback timing, valence, and explicit outcome expectations. In Experiment 1, task-unrelated scene pictures were presented together with performance feedback in the learning task. In an ensuing test phase, a surprise recognition memory test for the pictures was conducted. FRN amplitudes measured in the feedback-locked ERPs recorded during the learning phase (FRNpeak) and in the negative minus positive feedback difference wave (FRNdiff) were compared for subsequently remembered and forgotten feedback pictures. Pictures were remembered better when presented together with positive than with negative feedback, and ERP amplitudes in the FRNdiff time window predicted subsequent memory only for positive feedback pictures. Consistent with previous studies, shortly delayed (SD, 500 ms) feedback elicited larger FRNdiff amplitudes than long delayed feedback (LD, 6500 ms), whereas the reverse pattern was found in FRNpeak amplitudes. As evidenced by behavioral estimates and ERP old/new effects, positive feedback enhanced memory by boosting familiarity-based recognition. However, feedback timing did not affect memory, presumably because participants did not need to process the scene pictures in order to learn from feedback. In Experiment 2, the picture category signaled the valence of the feedback. LD feedback pictures were associated with better memory and more recollective processing than shortly delayed ones. Feedback processing as reflected in the FRNpeak was attenuated for remembered as compared to forgotten LD feedback pictures. This suggests that when feedback was delayed, feedback processing and memory encoding competed for similar neural processing resources. As evidence by large FRNdiff amplitudes in the SD condition, the evaluation of shortly delayed feedback strongly relied on the procedural learning system. A complementary model-based single trial analysis was conducted to validate models of the functional significance of the FRN. Consistent with previous studies, feedback-locked N170 and P300 amplitudes were sensitive to feedback delay. Experiment 3 tested the hypothesis that the putative involvement of declarative learning processes in delayed feedback processing is mediated by the spontaneous generation of explicit outcome expectations during the feedback delay. A delayed feedback condition was compared with a Prediction condition in which participants were asked on each trial to predict the category of the upcoming feedback picture. Memory for the feedback pictures did not differ between the Prediction and Delay conditions. The FRNpeak subsequent memory effect obtained in Experiment 2 was replicated in both conditions, but more pronounced in the Prediction condition. As evidenced by ERP old/new effects, negative feedback pictures that disconfirmed explicit outcome expectations were associated with stronger recollective processing than those presented in the Delay condition. Positive feedback pictures elicited a recognition bias and increased familiarity signals in the memory test, which could reflect a generalization of reward value to pictures of the same category (indoor or outdoor scene). Taken together, the findings obtained in this dissertation show multiple ways by which feedback processing and memory encoding can interact, and how this interaction is shaped by feedback timing, valence, and explicit outcome expectations.


Feedbackbasiertes Lernen beruht auf einem prozeduralen Lernsystem, das auf der neurobiologischen Ebene durch dopaminerge Belohnungsvorhersagefehlersignale vermittelt wird. Studien mit bildgebenden Verfahren weisen darauf hin, dass die Verarbeitung von zeitlich verz{\"o}gertem Feedback durch den Hippocampus unterst{\"u}tzt wird, eine Hirnstruktur, die mit deklarativen Ged{\"a}chtnisprozessen assoziiert ist. Es ist jedoch noch nicht bekannt, wie die Verarbeitung von verz{\"o}gertem Feedback mit der Ged{\"a}chtnisenkodierung interagiert. In diesem Dissertationsprojekt wurde in einer Serie von drei Experimenten die Methode der nachfolgenden Erinnerung verwendet, um zu untersuchen, wie die inzidentelle Enkodierung von Feedbackbildern in einer probabilistischen Lernaufgabe sich auf das im ereigniskorrelierten Potenzial (EKP) messbare Korrelat von Belohnungsvorhersagefehlern in der Feedbackverarbeitung, die Feedback-Negativierung (FRN), auswirkt und wie diese Interaktion durch zeitliche Charakteristika und Valenz des Feedbacks sowie durch explizite Ergebniserwartungen moduliert wird. Im ersten Experiment wurden Bilder von Innenr{\"a}umen und Landschaften zusammen mit dem Feedback in der Lernaufgabe pr{\"a}sentiert, wobei die Bilder nicht relevant f{\"u}r die Aufgabe waren. In der darauf folgenden Testphase wurde ein unerwarteter Rekognitionstest f{\"u}r die Bilder durchgef{\"u}hrt. FRN-Amplituden wurden in den w{\"a}hrend der Feedbackpr{\"a}sentation aufgezeichneten EKP gemessen (FRNpeak), sowie in der Differenzwelle, die durch die Subtraktion der durch positives Feedback erzeugten EKP von den durch negatives Feedback erzeugten EKP gebildet wurde (FRNdiff). Beide FRN-Ma{\ss}e wurden f{\"u}r sp{\"a}ter erinnerte und sp{\"a}ter vergessene Bilder verglichen. Bilder, die zusammen mit positivem Feedback gezeigt wurden, wurden besser erinnert als solche, die mit negativem Feedback gepaart wurden, und EKP-Amplituden im Zeitfenster der FRNdiff pr{\"a}dizierten sp{\"a}tere Erinnerung ausschlie{\ss}lich f{\"u}r Bilder, die zusammen mit positivem Feedback pr{\"a}sentiert wurden. {\"U}bereinstimmend mit fr{\"u}heren Studien erzeugte kurz verz{\"o}gertes Feedback (500 ms) gr{\"o}{\ss}ere FRNdiff-Amplituden als lang verz{\"o}gertes Feedback (6500 ms), wohingegen das umgekehrte Muster f{\"u}r FRNpeak-Amplituden gefunden wurde. Wie durch behaviorale Ma{\ss}e und EKP-Alt/Neu-Effekte belegt, st{\"a}rkte die Verarbeitung von positivem Feedback vor allem das vertrautheitsbasierte Erinnern der zeitgleich pr{\"a}sentierten Bilder, jedoch wirkten sich die zeitlichen Parameter der Feedbackpr{\"a}sentation nicht auf das Ged{\"a}chtnis aus, vermutlich weil eine Verarbeitung der Bilder nicht notwendig war, um das Feedback zum Lernen zu nutzen. Im zweiten Experiment wurde daher die Bildkategorie (Innenraum oder Landschaft), mit der Valenz des Feedbacks verkn{\"u}pft. Lang verz{\"o}gerte Feedbackbilder waren mit besserer Erinnerung und st{\"a}rkerer rekollektiver Verarbeitung assoziiert als solche, die mit kurzer Verz{\"o}gerung pr{\"a}sentiert worden waren. Die Feedbackverarbeitung, gemessen als FRNpeak-Amplitude, war geringer f{\"u}r lang verz{\"o}gerte Feedbackbilder, die anschlie{\ss}end erinnert wurden als f{\"u}r solche, die nicht erinnert wurden. Dies legt nahe, dass die Verarbeitung von zeitlich verz{\"o}gertem Feedback und die Ged{\"a}chtnisenkodierung auf {\"a}hnliche neuronale Verarbeitungskapazit{\"a}ten zugreifen. Wie anhand von FRNdiff-Amplituden ersichtlich, beruhte die Evaluation von zeitlich kurz verz{\"o}gertem Feedback in starkem Ausma{\ss} auf dem prozeduralen Lernsystem. Eine erg{\"a}nzende, modellbasierte Analyse auf der Ebene einzelner Lerndurchg{\"a}nge wurde durchgef{\"u}hrt, um Modelle der funktionalen Bedeutsamkeit der FRN zu validieren. {\"U}bereinstimmend mit vorherigen Studien wurden durch die Feedbackverarbeitung hervorgerufene N170- und P300-Amplituden durch die zeitliche Verz{\"o}gerung des Feedbacks moduliert. Das dritte Experiment {\"u}berpr{\"u}fte die Hypothese, dass die mutma{\ss}liche Beteiligung von deklarativen Lernprozessen bei der Verarbeitung von verz{\"o}gertem Feedback durch die spontane Entwicklung expliziter Ergebniserwartungen w{\"a}hrend der Feedbackverz{\"o}gerung vermittelt wird. Eine Bedingung mit verz{\"o}gertem Feedback wurde mit einer Vorhersage-Bedingung kontrastiert, in der die Probanden in jedem Lerndurchgang die Kategorie des Feedbackbildes pr{\"a}dizierten. Die Erinnerung an die Feedbackbilder unterschied sich nicht zwischen den beiden Bedingungen. Der Effekt der nachfolgenden Erinnerung in den FRNpeak-Amplituden, der in Experiment 2 gefunden wurde, wurde in beiden Bedingungen repliziert, war jedoch in der Vorhersage-Bedingung st{\"a}rker ausgepr{\"a}gt. Wie durch EKP-Alt/Neu-Effekte belegt, waren negative Feedbackbilder, die die explizite Erwartung eines positiven Ergebnisses verletzten, mit einer st{\"a}rkeren rekollektiven Verarbeitung verkn{\"u}pft. Positive Bilder waren im Ged{\"a}chtnistest mit besonders vielen falsch positiven Ged{\"a}chtnisurteilen assoziiert, was mit einer Generalisierung des Belohnungswertes zu Bildern der gleichen Kategorie zusammenh{\"a}ngen k{\"o}nnte. Zusammengefasst zeigen die Ergebnisse dieser Dissertation, dass die Feedbackverarbeitung und die Ged{\"a}chtnisenkodierung auf mehreren Wegen interagieren k{\"o}nnen. Die zeitlichen Charakteristika der Feedbackpr{\"a}sentation, die Valenz des Feedbacks und explizite Ergebniserwartungen stellen wichtige Faktoren dar, die diese Interaktion beeinflussen.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   A6

Shi, Wei

Addressing the data bottleneck in implicit discourse relation classification PhD Thesis

Saarland University, Saarbruecken, Germany, 2020.

When humans comprehend language, their interpretation consists of more than just the sum of the content of the sentences. Additional logic and semantic links (known as coherence relations or discourse relations) are inferred between sentences/clauses in the text. The identification of discourse relations is beneficial for various NLP applications such as question-answering, summarization, machine translation, information extraction, etc. Discourse relations are categorized into implicit and explicit discourse relations depending on whether there is an explicit discourse marker between the arguments. In this thesis, we mainly focus on the implicit discourse relation classification, given that with the explicit markers acting as informative cues, the explicit relations are relatively easier to identify for machines. The recent neural network-based approaches in particular suffer from insufficient training (and test) data. As shown in Chapter 3 of this thesis, we start out by showing to what extent the limited data size is a problem in implicit discourse relation classification and propose data augmentation methods with the help of cross-lingual data. And then we propose several approaches for better exploiting and encoding various types of existing data in the discourse relation classification task. Most of the existing machine learning methods train on sections 2-21 of the PDTB and test on section 23, which only includes a total of less than 800 implicit discourse relation instances. With the help of cross validation, we argue that the standard test section of the PDTB is too small to draw conclusions upon. With more test samples in the cross validation, we would come to very different conclusions about whether a feature is generally useful. Second, we propose a simple approach to automatically extract samples of implicit discourse relations from multilingual parallel corpus via back-translation. After back-translating from target languages, it is easy for the discourse parser to identify those examples that are originally implicit but explicit in the back-translations. Having those additional data in the training set, the experiments show significant improvements on different settings. Finally, having better encoding ability is also of crucial importance in terms of improving classification performance. We propose different methods including a sequence-to-sequence neural network and a memory component to help have a better representation of the arguments. We also show that having the correct next sentence is beneficial for the task within and across domains, with the help of the BERT (Devlin et al., 2019) model. When it comes to a new domain, it is beneficial to integrate external domain-specific knowledge. In Chapter 8, we show that with the entity-enhancement, the performance on BioDRB is improved significantly, comparing with other BERT-based methods. In sum, the studies reported in this dissertation contribute to addressing the data bottleneck problem in implicit discourse relation classification and propose corresponding approaches that achieve 54.82% and 69.57% on PDTB and BioDRB respectively.


Wenn Menschen Sprache verstehen, besteht ihre Interpretation aus mehr als nur der Summe des Inhalts der Sätze. Zwischen Sätzen im Text werden zusätzliche logische und semantische Verknüpfungen (sogenannte Kohärenzrelationen oder Diskursrelationen) hergeleitet. Die Identifizierung von Diskursrelationen ist für verschiedene NLP-Anwendungen wie Frage- Antwort, Zusammenfassung, maschinelle Übersetzung, Informationsextraktion usw. von Vorteil. Diskursrelationen werden in implizite und explizite Diskursrelationen unterteilt, je nachdem, ob es eine explizite Diskursrelationen zwischen den Argumenten gibt. In dieser Arbeit konzentrieren wir uns hauptsächlich auf die Klassifizierung der impliziten Diskursrelationen, da die expliziten Marker als hilfreiche Hinweise dienen und die expliziten Beziehungen für Maschinen relativ leicht zu identifizieren sind. Es wurden verschiedene Ansätze vorgeschlagen, die bei der impliziten Diskursrelationsklassifikation beeindruckende Ergebnisse erzielt haben. Die meisten von ihnen leiden jedoch darunter, dass die Daten für auf neuronalen Netzen basierende Methoden unzureichend sind. In dieser Arbeit gehen wir zunächst auf das Problem begrenzter Daten bei dieser Aufgabe ein und schlagen dann Methoden zur Datenanreicherung mit Hilfe von sprachübergreifenden Daten vor. Zuletzt schlagen wir mehrere Methoden vor, um die Argumente aus verschiedenen Aspekten besser kodieren zu können. Die meisten der existierenden Methoden des maschinellen Lernens werden auf den Abschnitten 2-21 der PDTB trainiert und auf dem Abschnitt 23 getestet, der insgesamt nur weniger als 800 implizite Diskursrelationsinstanzen enthält. Mit Hilfe der Kreuzvalidierung argumentieren wir, dass der Standardtestausschnitt der PDTB zu klein ist um daraus Schlussfolgerungen zu ziehen. Mit mehr Teststichproben in der Kreuzvalidierung würden wir zu anderen Schlussfolgerungen darüber kommen, ob ein Merkmal für diese Aufgabe generell vorteilhaft ist oder nicht, insbesondere wenn wir einen relativ großen Labelsatz verwenden. Wenn wir nur unseren kleinen Standardtestsatz herausstellen, laufen wir Gefahr, falsche Schlüsse darüber zu ziehen, welche Merkmale hilfreich sind. Zweitens schlagen wir einen einfachen Ansatz zur automatischen Extraktion von Samples impliziter Diskursrelationen aus mehrsprachigen Parallelkorpora durch Rückübersetzung vor. Er ist durch den Explikationsprozess motiviert, wenn Menschen einen Text übersetzen. Nach der Rückübersetzung aus den Zielsprachen ist es für den Diskursparser leicht, diejenigen Beispiele zu identifizieren, die ursprünglich implizit, in den Rückübersetzungen aber explizit enthalten sind. Da diese zusätzlichen Daten im Trainingsset enthalten sind, zeigen die Experimente signifikante Verbesserungen in verschiedenen Situationen. Wir verwenden zunächst nur französisch-englische Paare und haben keine Kontrolle über die Qualität und konzentrieren uns meist auf die satzinternen Relationen. Um diese Fragen in Angriff zu nehmen, erweitern wir die Idee später mit mehr Vorverarbeitungsschritten und mehr Sprachpaaren. Mit den Mehrheitsentscheidungen aus verschiedenen Sprachpaaren sind die gemappten impliziten Labels zuverlässiger. Schließlich ist auch eine bessere Kodierfähigkeit von entscheidender Bedeutung für die Verbesserung der Klassifizierungsleistung. Wir schlagen ein neues Modell vor, das aus einem Klassifikator und einem Sequenz-zu-Sequenz-Modell besteht. Neben der korrekten Vorhersage des Labels werden sie auch darauf trainiert, eine Repräsentation der Diskursrelationsargumente zu erzeugen, indem sie versuchen, die Argumente einschließlich eines geeigneten impliziten Konnektivs vorherzusagen. Die neuartige sekundäre Aufgabe zwingt die interne Repräsentation dazu, die Semantik der Relationsargumente vollständiger zu kodieren und eine feinkörnigere Klassifikation vorzunehmen. Um das allgemeine Wissen in Kontexten weiter zu erfassen, setzen wir auch ein Gedächtnisnetzwerk ein, um eine explizite Kontextrepräsentation von Trainingsbeispielen für Kontexte zu erhalten. Für jede Testinstanz erzeugen wir durch gewichtetes Lesen des Gedächtnisses einen Wissensvektor. Wir evaluieren das vorgeschlagene Modell unter verschiedenen Bedingungen und die Ergebnisse zeigen, dass das Modell mit dem Speichernetzwerk die Vorhersage von Diskursrelationen erleichtern kann, indem es Beispiele auswählt, die eine ähnliche semantische Repräsentation und Diskursrelationen aufweisen. Auch wenn ein besseres Verständnis, eine Kodierung und semantische Interpretation für die Aufgabe der impliziten Diskursrelationsklassifikation unerlässlich und nützlich sind, so leistet sie doch nur einen Teil der Arbeit. Ein guter impliziter Diskursrelationsklassifikator sollte sich auch der bevorstehenden Ereignisse, Ursachen, Folgen usw. bewusst sein, um die Diskurserwartung in die Satzdarstellungen zu kodieren. Mit Hilfe des kürzlich vorgeschlagenen BERT-Modells versuchen wir herauszufinden, ob es für die Aufgabe vorteilhaft ist, den richtigen nächsten Satz zu haben oder nicht. Die experimentellen Ergebnisse zeigen, dass das Entfernen der Aufgabe zur Vorhersage des nächsten Satzes die Leistung sowohl innerhalb der Domäne als auch domänenübergreifend stark beeinträchtigt. Die begrenzte Fähigkeit von BioBERT, domänenspezifisches Wissen, d.h. Entitätsinformationen, Entitätsbeziehungen etc. zu erlernen, motiviert uns, externes Wissen in die vortrainierten Sprachmodelle zu integrieren. Wir schlagen eine unüberwachte Methode vor, bei der Information-Retrieval-System und Wissensgraphen-Techniken verwendet werden, mit der Annahme, dass, wenn zwei Instanzen ähnliche Entitäten in beiden relationalen Argumenten teilen, die Wahrscheinlichkeit groß ist, dass sie die gleiche oder eine ähnliche Diskursrelation haben. Der Ansatz erzielt vergleichbare Ergebnisse auf BioDRB, verglichen mit Baselinemodellen. Anschließend verwenden wir die extrahierten relevanten Entitäten zur Verbesserung des vortrainierten Modells K-BERT, um die Bedeutung der Argumente besser zu kodieren und das ursprüngliche BERT und BioBERT mit einer Genauigkeit von 6,5% bzw. 2% zu übertreffen. Zusammenfassend trägt diese Dissertation dazu bei, das Problem des Datenengpasses bei der impliziten Diskursrelationsklassifikation anzugehen, und schlägt entsprechende Ansätze in verschiedenen Aspekten vor, u.a. die Darstellung des begrenzten Datenproblems und der Risiken bei der Schlussfolgerung daraus; die Erfassung automatisch annotierter Daten durch den Explikationsprozess während der manuellen Übersetzung zwischen Englisch und anderen Sprachen; eine bessere Repräsentation von Diskursrelationsargumenten; Entity-Enhancement mit einer unüberwachten Methode und einem vortrainierten Sprachmodell.2

@phdthesis{Shi_Diss_2020,
title = {Addressing the data bottleneck in implicit discourse relation classification},
author = {Wei Shi},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/30143},
doi = {https://doi.org/https://dx.doi.org/10.22028/D291-32711},
year = {2020},
date = {2020},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {When humans comprehend language, their interpretation consists of more than just the sum of the content of the sentences. Additional logic and semantic links (known as coherence relations or discourse relations) are inferred between sentences/clauses in the text. The identification of discourse relations is beneficial for various NLP applications such as question-answering, summarization, machine translation, information extraction, etc. Discourse relations are categorized into implicit and explicit discourse relations depending on whether there is an explicit discourse marker between the arguments. In this thesis, we mainly focus on the implicit discourse relation classification, given that with the explicit markers acting as informative cues, the explicit relations are relatively easier to identify for machines. The recent neural network-based approaches in particular suffer from insufficient training (and test) data. As shown in Chapter 3 of this thesis, we start out by showing to what extent the limited data size is a problem in implicit discourse relation classification and propose data augmentation methods with the help of cross-lingual data. And then we propose several approaches for better exploiting and encoding various types of existing data in the discourse relation classification task. Most of the existing machine learning methods train on sections 2-21 of the PDTB and test on section 23, which only includes a total of less than 800 implicit discourse relation instances. With the help of cross validation, we argue that the standard test section of the PDTB is too small to draw conclusions upon. With more test samples in the cross validation, we would come to very different conclusions about whether a feature is generally useful. Second, we propose a simple approach to automatically extract samples of implicit discourse relations from multilingual parallel corpus via back-translation. After back-translating from target languages, it is easy for the discourse parser to identify those examples that are originally implicit but explicit in the back-translations. Having those additional data in the training set, the experiments show significant improvements on different settings. Finally, having better encoding ability is also of crucial importance in terms of improving classification performance. We propose different methods including a sequence-to-sequence neural network and a memory component to help have a better representation of the arguments. We also show that having the correct next sentence is beneficial for the task within and across domains, with the help of the BERT (Devlin et al., 2019) model. When it comes to a new domain, it is beneficial to integrate external domain-specific knowledge. In Chapter 8, we show that with the entity-enhancement, the performance on BioDRB is improved significantly, comparing with other BERT-based methods. In sum, the studies reported in this dissertation contribute to addressing the data bottleneck problem in implicit discourse relation classification and propose corresponding approaches that achieve 54.82% and 69.57% on PDTB and BioDRB respectively.


Wenn Menschen Sprache verstehen, besteht ihre Interpretation aus mehr als nur der Summe des Inhalts der S{\"a}tze. Zwischen S{\"a}tzen im Text werden zus{\"a}tzliche logische und semantische Verkn{\"u}pfungen (sogenannte Koh{\"a}renzrelationen oder Diskursrelationen) hergeleitet. Die Identifizierung von Diskursrelationen ist f{\"u}r verschiedene NLP-Anwendungen wie Frage- Antwort, Zusammenfassung, maschinelle {\"U}bersetzung, Informationsextraktion usw. von Vorteil. Diskursrelationen werden in implizite und explizite Diskursrelationen unterteilt, je nachdem, ob es eine explizite Diskursrelationen zwischen den Argumenten gibt. In dieser Arbeit konzentrieren wir uns haupts{\"a}chlich auf die Klassifizierung der impliziten Diskursrelationen, da die expliziten Marker als hilfreiche Hinweise dienen und die expliziten Beziehungen f{\"u}r Maschinen relativ leicht zu identifizieren sind. Es wurden verschiedene Ans{\"a}tze vorgeschlagen, die bei der impliziten Diskursrelationsklassifikation beeindruckende Ergebnisse erzielt haben. Die meisten von ihnen leiden jedoch darunter, dass die Daten f{\"u}r auf neuronalen Netzen basierende Methoden unzureichend sind. In dieser Arbeit gehen wir zun{\"a}chst auf das Problem begrenzter Daten bei dieser Aufgabe ein und schlagen dann Methoden zur Datenanreicherung mit Hilfe von sprach{\"u}bergreifenden Daten vor. Zuletzt schlagen wir mehrere Methoden vor, um die Argumente aus verschiedenen Aspekten besser kodieren zu k{\"o}nnen. Die meisten der existierenden Methoden des maschinellen Lernens werden auf den Abschnitten 2-21 der PDTB trainiert und auf dem Abschnitt 23 getestet, der insgesamt nur weniger als 800 implizite Diskursrelationsinstanzen enth{\"a}lt. Mit Hilfe der Kreuzvalidierung argumentieren wir, dass der Standardtestausschnitt der PDTB zu klein ist um daraus Schlussfolgerungen zu ziehen. Mit mehr Teststichproben in der Kreuzvalidierung w{\"u}rden wir zu anderen Schlussfolgerungen dar{\"u}ber kommen, ob ein Merkmal f{\"u}r diese Aufgabe generell vorteilhaft ist oder nicht, insbesondere wenn wir einen relativ gro{\ss}en Labelsatz verwenden. Wenn wir nur unseren kleinen Standardtestsatz herausstellen, laufen wir Gefahr, falsche Schl{\"u}sse dar{\"u}ber zu ziehen, welche Merkmale hilfreich sind. Zweitens schlagen wir einen einfachen Ansatz zur automatischen Extraktion von Samples impliziter Diskursrelationen aus mehrsprachigen Parallelkorpora durch R{\"u}ck{\"u}bersetzung vor. Er ist durch den Explikationsprozess motiviert, wenn Menschen einen Text {\"u}bersetzen. Nach der R{\"u}ck{\"u}bersetzung aus den Zielsprachen ist es f{\"u}r den Diskursparser leicht, diejenigen Beispiele zu identifizieren, die urspr{\"u}nglich implizit, in den R{\"u}ck{\"u}bersetzungen aber explizit enthalten sind. Da diese zus{\"a}tzlichen Daten im Trainingsset enthalten sind, zeigen die Experimente signifikante Verbesserungen in verschiedenen Situationen. Wir verwenden zun{\"a}chst nur franz{\"o}sisch-englische Paare und haben keine Kontrolle {\"u}ber die Qualit{\"a}t und konzentrieren uns meist auf die satzinternen Relationen. Um diese Fragen in Angriff zu nehmen, erweitern wir die Idee sp{\"a}ter mit mehr Vorverarbeitungsschritten und mehr Sprachpaaren. Mit den Mehrheitsentscheidungen aus verschiedenen Sprachpaaren sind die gemappten impliziten Labels zuverl{\"a}ssiger. Schlie{\ss}lich ist auch eine bessere Kodierf{\"a}higkeit von entscheidender Bedeutung f{\"u}r die Verbesserung der Klassifizierungsleistung. Wir schlagen ein neues Modell vor, das aus einem Klassifikator und einem Sequenz-zu-Sequenz-Modell besteht. Neben der korrekten Vorhersage des Labels werden sie auch darauf trainiert, eine Repr{\"a}sentation der Diskursrelationsargumente zu erzeugen, indem sie versuchen, die Argumente einschlie{\ss}lich eines geeigneten impliziten Konnektivs vorherzusagen. Die neuartige sekund{\"a}re Aufgabe zwingt die interne Repr{\"a}sentation dazu, die Semantik der Relationsargumente vollst{\"a}ndiger zu kodieren und eine feink{\"o}rnigere Klassifikation vorzunehmen. Um das allgemeine Wissen in Kontexten weiter zu erfassen, setzen wir auch ein Ged{\"a}chtnisnetzwerk ein, um eine explizite Kontextrepr{\"a}sentation von Trainingsbeispielen f{\"u}r Kontexte zu erhalten. F{\"u}r jede Testinstanz erzeugen wir durch gewichtetes Lesen des Ged{\"a}chtnisses einen Wissensvektor. Wir evaluieren das vorgeschlagene Modell unter verschiedenen Bedingungen und die Ergebnisse zeigen, dass das Modell mit dem Speichernetzwerk die Vorhersage von Diskursrelationen erleichtern kann, indem es Beispiele ausw{\"a}hlt, die eine {\"a}hnliche semantische Repr{\"a}sentation und Diskursrelationen aufweisen. Auch wenn ein besseres Verst{\"a}ndnis, eine Kodierung und semantische Interpretation f{\"u}r die Aufgabe der impliziten Diskursrelationsklassifikation unerl{\"a}sslich und n{\"u}tzlich sind, so leistet sie doch nur einen Teil der Arbeit. Ein guter impliziter Diskursrelationsklassifikator sollte sich auch der bevorstehenden Ereignisse, Ursachen, Folgen usw. bewusst sein, um die Diskurserwartung in die Satzdarstellungen zu kodieren. Mit Hilfe des k{\"u}rzlich vorgeschlagenen BERT-Modells versuchen wir herauszufinden, ob es f{\"u}r die Aufgabe vorteilhaft ist, den richtigen n{\"a}chsten Satz zu haben oder nicht. Die experimentellen Ergebnisse zeigen, dass das Entfernen der Aufgabe zur Vorhersage des n{\"a}chsten Satzes die Leistung sowohl innerhalb der Dom{\"a}ne als auch dom{\"a}nen{\"u}bergreifend stark beeintr{\"a}chtigt. Die begrenzte F{\"a}higkeit von BioBERT, dom{\"a}nenspezifisches Wissen, d.h. Entit{\"a}tsinformationen, Entit{\"a}tsbeziehungen etc. zu erlernen, motiviert uns, externes Wissen in die vortrainierten Sprachmodelle zu integrieren. Wir schlagen eine un{\"u}berwachte Methode vor, bei der Information-Retrieval-System und Wissensgraphen-Techniken verwendet werden, mit der Annahme, dass, wenn zwei Instanzen {\"a}hnliche Entit{\"a}ten in beiden relationalen Argumenten teilen, die Wahrscheinlichkeit gro{\ss} ist, dass sie die gleiche oder eine {\"a}hnliche Diskursrelation haben. Der Ansatz erzielt vergleichbare Ergebnisse auf BioDRB, verglichen mit Baselinemodellen. Anschlie{\ss}end verwenden wir die extrahierten relevanten Entit{\"a}ten zur Verbesserung des vortrainierten Modells K-BERT, um die Bedeutung der Argumente besser zu kodieren und das urspr{\"u}ngliche BERT und BioBERT mit einer Genauigkeit von 6,5% bzw. 2% zu {\"u}bertreffen. Zusammenfassend tr{\"a}gt diese Dissertation dazu bei, das Problem des Datenengpasses bei der impliziten Diskursrelationsklassifikation anzugehen, und schl{\"a}gt entsprechende Ans{\"a}tze in verschiedenen Aspekten vor, u.a. die Darstellung des begrenzten Datenproblems und der Risiken bei der Schlussfolgerung daraus; die Erfassung automatisch annotierter Daten durch den Explikationsprozess w{\"a}hrend der manuellen {\"U}bersetzung zwischen Englisch und anderen Sprachen; eine bessere Repr{\"a}sentation von Diskursrelationsargumenten; Entity-Enhancement mit einer un{\"u}berwachten Methode und einem vortrainierten Sprachmodell.2},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   B2

Mosbach, Marius; Degaetano-Ortlieb, Stefania; Krielke, Marie-Pauline; Abdullah, Badr M.; Klakow, Dietrich

A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English Inproceedings

Proceedings of the 28th International Conference on Computational Linguistics, pp. 771-787, 2020.

Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a naturalistic dataset, probing shows that all three models indeed capture linguistic knowledge about grammaticality, achieving high performance. Evaluation on diagnostic cases and masked prediction tasks considering fine-grained linguistic knowledge, however, shows pronounced model-specific weaknesses especially on semantic knowledge, strongly impacting models’ performance. Our results highlight the importance of (a) model comparison in evaluation task and (b) building up claims of model performance and the linguistic knowledge they capture beyond purely probing-based evaluations.

@inproceedings{Mosbach2020,
title = {A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English},
author = {Marius Mosbach and Stefania Degaetano-Ortlieb and Marie-Pauline Krielke and Badr M. Abdullah and Dietrich Klakow},
url = {https://aclanthology.org/2020.coling-main.67/},
year = {2020},
date = {2020},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
pages = {771-787},
abstract = {Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a naturalistic dataset, probing shows that all three models indeed capture linguistic knowledge about grammaticality, achieving high performance. Evaluation on diagnostic cases and masked prediction tasks considering fine-grained linguistic knowledge, however, shows pronounced model-specific weaknesses especially on semantic knowledge, strongly impacting models’ performance. Our results highlight the importance of (a) model comparison in evaluation task and (b) building up claims of model performance and the linguistic knowledge they capture beyond purely probing-based evaluations.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B1 B4 C4

Juzek, Tom; Krielke, Marie-Pauline; Teich, Elke

Exploring diachronic syntactic shifts with dependency length: the case of scientific English Inproceedings

Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), Association for Computational Linguistics, pp. 109-119, Barcelona, Spain (Online), 2020.

We report on an application of universal dependencies for the study of diachronic shifts in syntactic usage patterns. Our focus is on the evolution of Scientific English in the Late Modern English period (ca. 1700-1900). Our data set is the Royal Society Corpus (RSC), comprising the full set of publications of the Royal Society of London between 1665 and 1996. Our starting assumption is that over time, Scientific English develops specific syntactic choice preferences that increase efficiency in (expert-to-expert) communication. The specific hypothesis we pursue in this paper is that changing syntactic choice preferences lead to greater dependency locality/dependency length minimization, which is associated with positive effects for the efficiency of human as well as computational linguistic processing. As a basis for our measurements, we parsed the RSC using Stanford CoreNLP. Overall, we observe a decrease in dependency length, with long dependency structures becoming less frequent and short dependency structures becoming more frequent over time, notably pertaining to the nominal phrase, thus marking an overall push towards greater communicative efficiency.

@inproceedings{juzek-etal-2020-exploring,
title = {Exploring diachronic syntactic shifts with dependency length: the case of scientific English},
author = {Tom Juzek and Marie-Pauline Krielke and Elke Teich},
url = {https://www.aclweb.org/anthology/2020.udw-1.13},
year = {2020},
date = {2020},
booktitle = {Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)},
pages = {109-119},
publisher = {Association for Computational Linguistics},
address = {Barcelona, Spain (Online)},
abstract = {We report on an application of universal dependencies for the study of diachronic shifts in syntactic usage patterns. Our focus is on the evolution of Scientific English in the Late Modern English period (ca. 1700-1900). Our data set is the Royal Society Corpus (RSC), comprising the full set of publications of the Royal Society of London between 1665 and 1996. Our starting assumption is that over time, Scientific English develops specific syntactic choice preferences that increase efficiency in (expert-to-expert) communication. The specific hypothesis we pursue in this paper is that changing syntactic choice preferences lead to greater dependency locality/dependency length minimization, which is associated with positive effects for the efficiency of human as well as computational linguistic processing. As a basis for our measurements, we parsed the RSC using Stanford CoreNLP. Overall, we observe a decrease in dependency length, with long dependency structures becoming less frequent and short dependency structures becoming more frequent over time, notably pertaining to the nominal phrase, thus marking an overall push towards greater communicative efficiency.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Teich, Elke

Language variation and change: A communicative perspective Miscellaneous

Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, DGfS 2020, Hamburg, 2020.

It is widely acknowledged that language use and language structure are closely interlinked, linguistic structure emerging from language use (Bybee & Hopper 2001). Language use, in turn, is characterized by variation; in fact, speakers’ ability to adapt to changing contexts is a prerequisite for language to be functional (Weinreich et al. 1968).

Taking the perspective of rational communication, in my talk I will revisit some core questions of diachronic linguistic change: Why does a change happen? Which features are involved in change? How does change proceed? What are the eff ects of change? Recent work on online human language use reveals that speakers try to optimize their linguistic productions by encoding their messages with uniform information density (see Crocker et al. 2016 for an overview). Here, a major determinant in linguistic choice is predictability in context. Predictability in context is commonly represented by information content measured in bits (Shannon information): The more predictable a linguistic unit (e.g. word) is in a given context, the fewer bits are needed for encoding and the shorter its linguistic encoding may be (and vice versa, the more “surprising” a unit is in a given context, the more bits are needed for encoding and the more explicit its encoding tends to be). In this view, one major function of linguistic variation is to modulate information content so as to optimize message transmission.

In my talk, I apply this perspective to diachronic linguistic change. I show that speakers’ continuous adaptation to changing contextual conditions pushes towards linguistic innovation and results in temporary, high levels of expressivity, but the concern for maintaining communicative function pulls towards convergence and results in conventionalization. The diachronic scenario I discuss is mid-term change (200–250 years) in English in the late Modern period, focusing on the discourse domain of science (Degaetano-Ortlieb & Teich 2019). In terms of methods, I use computational language models to estimate predictability in context; and to assess diachronic change, I apply selected measures of information content, including entropy and surprisal.

@miscellaneous{Teich2020a,
title = {Language variation and change: A communicative perspective},
author = {Elke Teich},
url = {https://www.zfs.uni-hamburg.de/en/dgfs2020/programm/keynotes/elke-teich.html},
year = {2020},
date = {2020-11-04},
booktitle = {Jahrestagung der Deutschen Gesellschaft f{\"u}r Sprachwissenschaft, DGfS 2020},
address = {Hamburg},
abstract = {It is widely acknowledged that language use and language structure are closely interlinked, linguistic structure emerging from language use (Bybee & Hopper 2001). Language use, in turn, is characterized by variation; in fact, speakers’ ability to adapt to changing contexts is a prerequisite for language to be functional (Weinreich et al. 1968). Taking the perspective of rational communication, in my talk I will revisit some core questions of diachronic linguistic change: Why does a change happen? Which features are involved in change? How does change proceed? What are the eff ects of change? Recent work on online human language use reveals that speakers try to optimize their linguistic productions by encoding their messages with uniform information density (see Crocker et al. 2016 for an overview). Here, a major determinant in linguistic choice is predictability in context. Predictability in context is commonly represented by information content measured in bits (Shannon information): The more predictable a linguistic unit (e.g. word) is in a given context, the fewer bits are needed for encoding and the shorter its linguistic encoding may be (and vice versa, the more “surprising” a unit is in a given context, the more bits are needed for encoding and the more explicit its encoding tends to be). In this view, one major function of linguistic variation is to modulate information content so as to optimize message transmission. In my talk, I apply this perspective to diachronic linguistic change. I show that speakers’ continuous adaptation to changing contextual conditions pushes towards linguistic innovation and results in temporary, high levels of expressivity, but the concern for maintaining communicative function pulls towards convergence and results in conventionalization. The diachronic scenario I discuss is mid-term change (200–250 years) in English in the late Modern period, focusing on the discourse domain of science (Degaetano-Ortlieb & Teich 2019). In terms of methods, I use computational language models to estimate predictability in context; and to assess diachronic change, I apply selected measures of information content, including entropy and surprisal.},
note = {Key note},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B1

Ortmann, Katrin; Dipper, Stefanie

Automatic Orality Identification in Historical Texts Inproceedings

Proceedings of The 12th Language Resources and Evaluation Conference (LREC), European Language Resources Association, pp. 1293-1302, Marseille, France, 2020.

Independently of the medial representation (written/spoken), language can exhibit characteristics of conceptual orality or literacy, which mainly manifest themselves on the lexical or syntactic level. In this paper we aim at automatically identifying conceptually-oral historical texts, with the ultimate goal of gaining knowledge about spoken data of historical time stages.

We apply a set of general linguistic features that have been proven to be effective for the classification of modern language data to historical German texts from various registers. Many of the features turn out to be equally useful in determining the conceptuality of historical data as they are for modern data, especially the frequency of different types of pronouns and the ratio of verbs to nouns. Other features like sentence length, particles or interjections point to peculiarities of the historical data and reveal problems with the adoption of a feature set that was developed on modern language data.

@inproceedings{Ortmann2020,
title = {Automatic Orality Identification in Historical Texts},
author = {Katrin Ortmann and Stefanie Dipper},
url = {https://www.aclweb.org/anthology/2020.lrec-1.162/},
year = {2020},
date = {2020},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
pages = {1293-1302},
publisher = {European Language Resources Association},
address = {Marseille, France},
abstract = {Independently of the medial representation (written/spoken), language can exhibit characteristics of conceptual orality or literacy, which mainly manifest themselves on the lexical or syntactic level. In this paper we aim at automatically identifying conceptually-oral historical texts, with the ultimate goal of gaining knowledge about spoken data of historical time stages. We apply a set of general linguistic features that have been proven to be effective for the classification of modern language data to historical German texts from various registers. Many of the features turn out to be equally useful in determining the conceptuality of historical data as they are for modern data, especially the frequency of different types of pronouns and the ratio of verbs to nouns. Other features like sentence length, particles or interjections point to peculiarities of the historical data and reveal problems with the adoption of a feature set that was developed on modern language data.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C6

Stenger, Irina; Avgustinova, Tania

How intelligible is spoken Bulgarian for Russian native speakers in an intercomprehension scenario? Inproceedings

Micheva, Vanya et al. (Ed.): Proceedings of the International Annual Conference of the Institute for Bulgarian Language, 2, pp. 142-151, Sofia, Bulgaria, 2020.

In a web-based experiment, Bulgarian audio stimuli in the form of recorded isolated words are presented to Russian native speakers who are required to write a suitable Russian translation. The degree of intelligibility, as revealed by the cognate guessing task, is relatively high for this pair of languages. We correlate the obtained intercomprehension scores with established linguistic factors in order to determine their influence on the cross-linguistic spoken word recognition. A detailed error analysis focuses on sound correspondences that cause translation problems in such an intercomprehension scenario.

@inproceedings{Stenger2020b,
title = {How intelligible is spoken Bulgarian for Russian native speakers in an intercomprehension scenario?},
author = {Irina Stenger and Tania Avgustinova},
editor = {Vanya Micheva et al.},
year = {2020},
date = {2020},
booktitle = {Proceedings of the International Annual Conference of the Institute for Bulgarian Language},
pages = {142-151},
address = {Sofia, Bulgaria},
abstract = {In a web-based experiment, Bulgarian audio stimuli in the form of recorded isolated words are presented to Russian native speakers who are required to write a suitable Russian translation. The degree of intelligibility, as revealed by the cognate guessing task, is relatively high for this pair of languages. We correlate the obtained intercomprehension scores with established linguistic factors in order to determine their influence on the cross-linguistic spoken word recognition. A detailed error analysis focuses on sound correspondences that cause translation problems in such an intercomprehension scenario.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Successfully