Publications

Vogels, Jorrig; Howcroft, David M.; Demberg, Vera

Referential overspecification in response to the listener's cognitive load Inproceedings

International Cognitive Linguistics Conference, Tarttu, Estonia, 2017.

According to the Uniform Information Density hypothesis (UID; Jaeger 2010, inter alia), speakers strive to distribute information equally over their utterances. They do this to avoid both peaks and troughs in information density, which may lead to processing difficulty for the listener. Several studies have shown how speakers consistently make linguistic choices that result in a more equal distribution of information (e.g., Jaeger 2010, Mahowald, Fedorenko, Piantadosi, & Gibson 2013, Piantadosi, Tily, & Gibson 2011). However, it is not clear whether speakers also adapt the information density of their utterances to the processing capacity of a specific addressee. For example, when the addressee is involved in a difficult task that is clearly reducing his cognitive capacity for processing linguistic information, will the speaker lower the overall information density of her utterances to accommodate the reduced processing capacity?

@inproceedings{Vogels2017,
title = {Referential overspecification in response to the listener's cognitive load},
author = {Jorrig Vogels and David M. Howcroft and Vera Demberg},
year = {2017},
date = {2017},
publisher = {International Cognitive Linguistics Conference},
address = {Tarttu, Estonia},
abstract = {According to the Uniform Information Density hypothesis (UID; Jaeger 2010, inter alia), speakers strive to distribute information equally over their utterances. They do this to avoid both peaks and troughs in information density, which may lead to processing difficulty for the listener. Several studies have shown how speakers consistently make linguistic choices that result in a more equal distribution of information (e.g., Jaeger 2010, Mahowald, Fedorenko, Piantadosi, & Gibson 2013, Piantadosi, Tily, & Gibson 2011). However, it is not clear whether speakers also adapt the information density of their utterances to the processing capacity of a specific addressee. For example, when the addressee is involved in a difficult task that is clearly reducing his cognitive capacity for processing linguistic information, will the speaker lower the overall information density of her utterances to accommodate the reduced processing capacity?},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Häuser, Katja; Demberg, Vera; Kray, Jutta

Age-differences in recovery from prediction error: Evidence from a simulated driving and combined sentence verification task. Inproceedings

39th Annual Meeting of the Cognitive Science Society, 2017.

@inproceedings{Häuser2017,
title = {Age-differences in recovery from prediction error: Evidence from a simulated driving and combined sentence verification task.},
author = {Katja H{\"a}user and Vera Demberg and Jutta Kray},
year = {2017},
date = {2017-10-17},
publisher = {39th Annual Meeting of the Cognitive Science Society},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Howcroft, David M.; Klakow, Dietrich; Demberg, Vera

The Extended SPaRKy Restaurant Corpus: Designing a Corpus with Variable Information Density Inproceedings

Proc. Interspeech 2017, pp. 3757-3761, 2017.

Natural language generation (NLG) systems rely on corpora for both hand-crafted approaches in a traditional NLG architecture and for statistical end-to-end (learned) generation systems. Limitations in existing resources, however, make it difficult to develop systems which can vary the linguistic properties of an utterance as needed. For example, when users’ attention is split between a linguistic and a secondary task such as driving, a generation system may need to reduce the information density of an utterance to compensate for the reduction in user attention. We introduce a new corpus in the restaurant recommendation and comparison domain, collected in a paraphrasing paradigm, where subjects wrote texts targeting either a general audience or an elderly family member. This design resulted in a corpus of more than 5000 texts which exhibit a variety of lexical and syntactic choices and differ with respect to average word & sentence length and surprisal. The corpus includes two levels of meaning representation: flat ‘semantic stacks’ for propositional content and Rhetorical Structure Theory (RST) relations between these propositions.

@inproceedings{Howcroft2017b,
title = {The Extended SPaRKy Restaurant Corpus: Designing a Corpus with Variable Information Density},
author = {David M. Howcroft and Dietrich Klakow and Vera Demberg},
url = {http://dx.doi.org/10.21437/Interspeech.2017-1555},
doi = {https://doi.org/10.21437/Interspeech.2017-1555},
year = {2017},
date = {2017-10-17},
booktitle = {Proc. Interspeech 2017},
pages = {3757-3761},
abstract = {Natural language generation (NLG) systems rely on corpora for both hand-crafted approaches in a traditional NLG architecture and for statistical end-to-end (learned) generation systems. Limitations in existing resources, however, make it difficult to develop systems which can vary the linguistic properties of an utterance as needed. For example, when users’ attention is split between a linguistic and a secondary task such as driving, a generation system may need to reduce the information density of an utterance to compensate for the reduction in user attention. We introduce a new corpus in the restaurant recommendation and comparison domain, collected in a paraphrasing paradigm, where subjects wrote texts targeting either a general audience or an elderly family member. This design resulted in a corpus of more than 5000 texts which exhibit a variety of lexical and syntactic choices and differ with respect to average word & sentence length and surprisal. The corpus includes two levels of meaning representation: flat ‘semantic stacks’ for propositional content and Rhetorical Structure Theory (RST) relations between these propositions.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Howcroft, David M.; Vogels, Jorrig; Demberg, Vera

G-TUNA: a corpus of referring expressions in German, including duration information Inproceedings

Proceedings of the 10th International Conference on Natural Language Generation, Association for Computational Linguistics, pp. 149-153, Santiago de Compostela, Spain, 2017.

Corpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on automatic referring expression generation. We here present G-TUNA, a new corpus of referring expressions for German. Using the furniture stimuli set developed for the TUNA and D-TUNA corpora, our corpus extends on these corpora by providing data collected in a simulated driving dual-task setting, and additionally provides exact duration annotations for the spoken referring expressions. This corpus will hence allow researchers to analyze the interaction between referring expression length and speech rate, under conditions where the listener is under high vs. low cognitive load.

@inproceedings{W17-3522,
title = {G-TUNA: a corpus of referring expressions in German, including duration information},
author = {David M. Howcroft and Jorrig Vogels and Vera Demberg},
url = {http://www.aclweb.org/anthology/W17-3522},
doi = {https://doi.org/10.18653/v1/W17-3522},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 10th International Conference on Natural Language Generation},
pages = {149-153},
publisher = {Association for Computational Linguistics},
address = {Santiago de Compostela, Spain},
abstract = {Corpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on automatic referring expression generation. We here present G-TUNA, a new corpus of referring expressions for German. Using the furniture stimuli set developed for the TUNA and D-TUNA corpora, our corpus extends on these corpora by providing data collected in a simulated driving dual-task setting, and additionally provides exact duration annotations for the spoken referring expressions. This corpus will hence allow researchers to analyze the interaction between referring expression length and speech rate, under conditions where the listener is under high vs. low cognitive load.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Howcroft, David M.; Demberg, Vera

Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking Inproceedings

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Association for Computational Linguistics, pp. 958-968, Valencia, Spain, 2017.

While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has focused for many years on processing measures that provide difficulty estimates on a word-by-word basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline.

@inproceedings{Howcroft2017,
title = {Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking},
author = {David M. Howcroft and Vera Demberg},
url = {http://www.aclweb.org/anthology/E17-1090},
year = {2017},
date = {2017-10-17},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
pages = {958-968},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has focused for many years on processing measures that provide difficulty estimates on a word-by-word basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Roth, Michael; Thater, Stefan; Ostermann, Simon; Pinkal, Manfred

Aligning Script Events with Narrative Texts Inproceedings

Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), Association for Computational Linguistics, Vancouver, Canada, 2017.

Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets.

We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.

@inproceedings{ostermann-EtAl:2017:starSEM,
title = {Aligning Script Events with Narrative Texts},
author = {Michael Roth and Stefan Thater andSimon Ostermann and Manfred Pinkal},
url = {http://www.aclweb.org/anthology/S17-1016},
year = {2017},
date = {2017-10-17},
booktitle = {Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)},
publisher = {Association for Computational Linguistics},
address = {Vancouver, Canada},
abstract = {Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets. We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A3

Nguyen, Dai Quoc; Nguyen, Dat Quoc; Modi, Ashutosh; Thater, Stefan; Pinkal, Manfred

A Mixture Model for Learning Multi-Sense Word Embeddings Inproceedings

Association for Computational Linguistics, pp. 121-127, Vancouver, Canada, 2017.

Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings.

Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.

@inproceedings{nguyen-EtAl:2017:starSEM,
title = {A Mixture Model for Learning Multi-Sense Word Embeddings},
author = {Dai Quoc Nguyen and Dat Quoc Nguyen and Ashutosh Modi and Stefan Thater and Manfred Pinkal},
url = {http://www.aclweb.org/anthology/S17-1015},
year = {2017},
date = {2017},
pages = {121-127},
publisher = {Association for Computational Linguistics},
address = {Vancouver, Canada},
abstract = {Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings. Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A2 A3

Nguyen, Dai Quoc; Nguyen, Dat Quoc; Chu, Cuong Xuan; Thater, Stefan; Pinkal, Manfred

Sequence to Sequence Learning for Event Prediction Inproceedings

Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Asian Federation of Natural Language Processing, pp. 37-42, Taipei, Taiwan, 2017.

This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively.

Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.

@inproceedings{nguyen-EtAl:2017:I17-2,
title = {Sequence to Sequence Learning for Event Prediction},
author = {Dai Quoc Nguyen and Dat Quoc Nguyen and Cuong Xuan Chu and Stefan Thater and Manfred Pinkal},
url = {http://www.aclweb.org/anthology/I17-2007},
year = {2017},
date = {2017-10-17},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
pages = {37-42},
publisher = {Asian Federation of Natural Language Processing},
address = {Taipei, Taiwan},
abstract = {This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively. Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A3 A2

Tourtouri, Elli; Delogu, Francesca; Crocker, Matthew W.

Overspecifications efficiently manage referential entropy in situated communication Inproceedings

Paper presented at the 39th Annual Conference of the German Linguistic Society (DGfS), Saarland University, Saarbruecken, Germany, 2017.

@inproceedings{Tourtourietal2017a,
title = {Overspecifications efficiently manage referential entropy in situated communication},
author = {Elli Tourtouri and Francesca Delogu and Matthew W. Crocker},
year = {2017},
date = {2017},
booktitle = {Paper presented at the 39th Annual Conference of the German Linguistic Society (DGfS)},
publisher = {Saarland University},
address = {Saarbruecken, Germany},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A1 C3

Delogu, Francesca; Crocker, Matthew W.; Drenhaus, Heiner

Teasing apart coercion and surprisal: Evidence from ERPs and eye-movements Journal Article

Cognition, 161, pp. 46-59, 2017.

Previous behavioral and electrophysiological studies have presented evidence suggesting that coercion expressions (e.g., began the book) are more difficult to process than control expressions like read the book. While this processing cost has been attributed to a specific coercion operation for recovering an event-sense of the complement (e.g., began reading the book), an alternative view based on the Surprisal Theory of language processing would attribute the cost to the relative unpredictability of the complement noun in the coercion compared to the control condition, with no need to postulate coercion-specific mechanisms. In two experiments, monitoring eye-tracking and event-related potentials (ERPs), respectively, we sought to determine whether there is any evidence for coercion-specific processing cost above-and-beyond the difficulty predicted by surprisal, by contrasting coercing and control expressions with a further control condition in which the predictability of the complement noun was similar to that in the coercion condition (e.g., bought the book). While the eye-tracking study showed significant effects of surprisal and a marginal effect of coercion on late reading measures, the ERP study clearly supported the surprisal account. Overall, our findings suggest that the coercion cost largely reflects the surprisal of the complement noun with coercion specific operations possibly influencing later processing stages.

@article{Brouwer2017,
title = {Teasing apart coercion and surprisal: Evidence from ERPs and eye-movements},
author = {Francesca Delogu and Matthew W. Crocker and Heiner Drenhaus},
url = {https://www.sciencedirect.com/science/article/pii/S0010027716303122},
doi = {https://doi.org/10.1016/j.cognition.2016.12.017},
year = {2017},
date = {2017},
journal = {Cognition},
pages = {46-59},
volume = {161},
abstract = {

Previous behavioral and electrophysiological studies have presented evidence suggesting that coercion expressions (e.g., began the book) are more difficult to process than control expressions like read the book. While this processing cost has been attributed to a specific coercion operation for recovering an event-sense of the complement (e.g., began reading the book), an alternative view based on the Surprisal Theory of language processing would attribute the cost to the relative unpredictability of the complement noun in the coercion compared to the control condition, with no need to postulate coercion-specific mechanisms. In two experiments, monitoring eye-tracking and event-related potentials (ERPs), respectively, we sought to determine whether there is any evidence for coercion-specific processing cost above-and-beyond the difficulty predicted by surprisal, by contrasting coercing and control expressions with a further control condition in which the predictability of the complement noun was similar to that in the coercion condition (e.g., bought the book). While the eye-tracking study showed significant effects of surprisal and a marginal effect of coercion on late reading measures, the ERP study clearly supported the surprisal account. Overall, our findings suggest that the coercion cost largely reflects the surprisal of the complement noun with coercion specific operations possibly influencing later processing stages.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Brouwer, Harm; Crocker, Matthew W.; Venhuizen, Noortje

Neural semantics Journal Article

From Semantics to Dialectometry: Festschrift in Honour of John Nerbonne, pp. 75-83, 2017.

The study of language is ultimately about meaning: how can meaning be constructed from linguistic signal, and how can it be represented? he human language comprehension system is highly eicient and accurate at atributing meaning to linguistic input. Hence, in trying to identify computational principles and representations for meaning construction, we should consider how these could be implemented at the neural level in the brain. Here, we introduce a framework for such a neural semantics. his framework ofers meaning representations that are neurally plausible (can be implemented in neural hardware), expressive (capture negation, quantiication, and modality), compositional (capture complex propositional meaning as the sum of its parts), graded (are probabilistic in nature), and inferential (allow for inferences beyond literal propositional content). Moreover, it is shown how these meaning representations can be constructed incrementally, on a word-by-word basis in a neurocomputational model of language processing.

@article{Brouwer2017b,
title = {Neural semantics},
author = {Harm Brouwer and Matthew W. Crocker and Noortje Venhuizen},
url = {https://research.rug.nl/en/publications/from-semantics-to-dialectometry-festschrift-in-honor-of-john-nerb},
year = {2017},
date = {2017},
journal = {From Semantics to Dialectometry: Festschrift in Honour of John Nerbonne},
pages = {75-83},
abstract = {The study of language is ultimately about meaning: how can meaning be constructed from linguistic signal, and how can it be represented? he human language comprehension system is highly eicient and accurate at atributing meaning to linguistic input. Hence, in trying to identify computational principles and representations for meaning construction, we should consider how these could be implemented at the neural level in the brain. Here, we introduce a framework for such a neural semantics. his framework ofers meaning representations that are neurally plausible (can be implemented in neural hardware), expressive (capture negation, quantiication, and modality), compositional (capture complex propositional meaning as the sum of its parts), graded (are probabilistic in nature), and inferential (allow for inferences beyond literal propositional content). Moreover, it is shown how these meaning representations can be constructed incrementally, on a word-by-word basis in a neurocomputational model of language processing.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Brouwer, Harm; Crocker, Matthew W.

On the proper treatment of the N400 and P600 in Language comprehension Journal Article

Frontiers in Psychology, 8, 2017, ISSN 1664-1078.

Event-Related Potentials (ERPs)—stimulus-locked, scalp-recorded voltage fluctuations caused by post-synaptic neural activity—have proven invaluable to the study of language comprehension. Of interest in the ERP signal are systematic, reoccurring voltage fluctuations called components, which are taken to reflect the neural activity underlying specific computational operations carried out in given neuroanatomical networks (cf. Näätänen and Picton, 1987). For language processing, the N400 component and the P600 component are of particular salience (see Kutas et al., 2006, for a review). The typical approach to determining whether a target word in a sentence leads to differential modulation of these components, relative to a control word, is to look for effects on mean amplitude in predetermined time-windows on the respective ERP waveforms, e.g., 350–550 ms for the N400 component and 600–900 ms for the P600 component. The common mode of operation in psycholinguistics, then, is to tabulate the presence/absence of N400- and/or P600-effects across studies, and to use this categorical data to inform neurocognitive models that attribute specific functional roles to the N400 and P600 component (see Kuperberg, 2007; Bornkessel-Schlesewsky and Schlesewsky, 2008; Brouwer et al., 2012, for reviews).

Here, we assert that this Waveform-based Component Structure (WCS) approach to ERPs leads to inconsistent data patterns, and hence, misinforms neurocognitive models of the electrophysiology of language processing. The reason for this is that the WCS approach ignores the latent component structure underlying ERP waveforms (cf. Luck, 2005), thereby leading to conclusions about component structure that do not factor in spatiotemporal component overlap of the N400 and the P600. This becomes particularly problematic when spatiotemporal component overlap interacts with differential P600 modulations due to task demands (cf. Kolk et al., 2003). While the problem of spatiotemporal component overlap is generally acknowledged, and occasionally invoked to account for within-study inconsistencies in the data, its implications are often overlooked in psycholinguistic theorizing that aims to integrate findings across studies. We believe WCS-centric theorizing to be the single largest reason for the lack of convergence regarding the processes underlying the N400 and the P600, thereby seriously hindering the advancement of neurocognitive theories and models of language processing.

@article{Brouwer2017,
title = {On the proper treatment of the N400 and P600 in Language comprehension},
author = {Harm Brouwer and Matthew W. Crocker},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2017.01327/full},
doi = {https://doi.org/10.3389/fpsyg.2017.01327},
year = {2017},
date = {2017},
journal = {Frontiers in Psychology},
volume = {8},
abstract = {

Event-Related Potentials (ERPs)—stimulus-locked, scalp-recorded voltage fluctuations caused by post-synaptic neural activity—have proven invaluable to the study of language comprehension. Of interest in the ERP signal are systematic, reoccurring voltage fluctuations called components, which are taken to reflect the neural activity underlying specific computational operations carried out in given neuroanatomical networks (cf. N{\"a}{\"a}t{\"a}nen and Picton, 1987). For language processing, the N400 component and the P600 component are of particular salience (see Kutas et al., 2006, for a review). The typical approach to determining whether a target word in a sentence leads to differential modulation of these components, relative to a control word, is to look for effects on mean amplitude in predetermined time-windows on the respective ERP waveforms, e.g., 350–550 ms for the N400 component and 600–900 ms for the P600 component. The common mode of operation in psycholinguistics, then, is to tabulate the presence/absence of N400- and/or P600-effects across studies, and to use this categorical data to inform neurocognitive models that attribute specific functional roles to the N400 and P600 component (see Kuperberg, 2007; Bornkessel-Schlesewsky and Schlesewsky, 2008; Brouwer et al., 2012, for reviews).

Here, we assert that this Waveform-based Component Structure (WCS) approach to ERPs leads to inconsistent data patterns, and hence, misinforms neurocognitive models of the electrophysiology of language processing. The reason for this is that the WCS approach ignores the latent component structure underlying ERP waveforms (cf. Luck, 2005), thereby leading to conclusions about component structure that do not factor in spatiotemporal component overlap of the N400 and the P600. This becomes particularly problematic when spatiotemporal component overlap interacts with differential P600 modulations due to task demands (cf. Kolk et al., 2003). While the problem of spatiotemporal component overlap is generally acknowledged, and occasionally invoked to account for within-study inconsistencies in the data, its implications are often overlooked in psycholinguistic theorizing that aims to integrate findings across studies. We believe WCS-centric theorizing to be the single largest reason for the lack of convergence regarding the processes underlying the N400 and the P600, thereby seriously hindering the advancement of neurocognitive theories and models of language processing.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Rabs, Elisabeth; Drenhaus, Heiner; Delogu, Francesca; Crocker, Matthew W.

The influence of script knowledge on language processing: Evidence from ERPs Miscellaneous

23rd AMLaP Conference, Lancaster, UK, 2017.
Previous research has shown that the semantic expectedness of a word – as established by the linguistic context – is negatively correlated with N400 amplitude. While such evidence has been used to argue that the N400 indexes semantic integration processes, findings can often be explained in terms of facilitated lexical retrieval, which, among other factors, is influenced by lexical/semantic priming. In the present study we examine this issue by manipulating script event knowledge – a person’s knowledge about structured event sequences – which has been previously shown to modulate the N400. An ERP-study (German) investigated whether N400 modulation by a mentioned script event is due to priming alone, or is further sensitive to linguistic cues which would be expected to modulate script influence.

@miscellaneous{Rabs2017,
title = {The influence of script knowledge on language processing: Evidence from ERPs},
author = {Elisabeth Rabs and Heiner Drenhaus and Francesca Delogu and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/320988782_The_Influence_of_Script_Knowledge_on_Language_Processing_Evidence_from_ERPs},
year = {2017},
date = {2017},
publisher = {23rd AMLaP Conference},
address = {Lancaster, UK},
abstract = {

Previous research has shown that the semantic expectedness of a word – as established by the linguistic context – is negatively correlated with N400 amplitude. While such evidence has been used to argue that the N400 indexes semantic integration processes, findings can often be explained in terms of facilitated lexical retrieval, which, among other factors, is influenced by lexical/semantic priming. In the present study we examine this issue by manipulating script event knowledge – a person’s knowledge about structured event sequences – which has been previously shown to modulate the N400. An ERP-study (German) investigated whether N400 modulation by a mentioned script event is due to priming alone, or is further sensitive to linguistic cues which would be expected to modulate script influence.
},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   A1

Delogu, Francesca; Brouwer, Harm; Crocker, Matthew W.

The influence of lexical priming versus event knowledge on the N400 and the P600 Miscellaneous

23rd AMLaP Conference, Lancaster, UK, 2017.
In online language comprehension, the N400 component of the Event-Related Potentials (ERP) signal is inversely proportional to semantic expectancy (Kutas & Federmeier, 2011). Among other factors, a word’s expectancy is influenced by both lexical-level (Bentin et al., 1985) as well as event-level (Metusalem et al., 2012) priming: the N400 amplitude is reduced if the eliciting word is semantically related to prior words in the context and/or when it is consistent with the event being described. Perhaps the most extreme instance of such facilitatory effects arises in the processing of reversal anomalies (see Brouwer et al., 2012 for review). Here, a word that renders a sentence semantically anomalous, such as “eat” in “For breakfast the eggs would eat”, produces no difference in N400 amplitude relative to a non-anomalous control “For breakfast the boys would eat” (Kuperberg et al., 2007). Indeed, the absence of an N400-effect for contrasts such as these suggest that the critical word eat is equally facilitated in both the target and the control condition. An open question, however, is whether these effects are predominantly driven by lexical-level or event-level priming. To address this question, we conducted an ERP experiment in which we explicitly deactivate the event under discussion in order to mitigate event-level priming effects on the critical word.

@miscellaneous{Delogu2017b,
title = {The influence of lexical priming versus event knowledge on the N400 and the P600},
author = {Francesca Delogu and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/319543522_The_influence_of_lexical_priming_versus_event_knowledge_on_the_N400_and_the_P600},
year = {2017},
date = {2017},
publisher = {23rd AMLaP Conference},
address = {Lancaster, UK},
abstract = {

In online language comprehension, the N400 component of the Event-Related Potentials (ERP) signal is inversely proportional to semantic expectancy (Kutas & Federmeier, 2011). Among other factors, a word’s expectancy is influenced by both lexical-level (Bentin et al., 1985) as well as event-level (Metusalem et al., 2012) priming: the N400 amplitude is reduced if the eliciting word is semantically related to prior words in the context and/or when it is consistent with the event being described. Perhaps the most extreme instance of such facilitatory effects arises in the processing of reversal anomalies (see Brouwer et al., 2012 for review). Here, a word that renders a sentence semantically anomalous, such as “eat” in “For breakfast the eggs would eat”, produces no difference in N400 amplitude relative to a non-anomalous control “For breakfast the boys would eat” (Kuperberg et al., 2007). Indeed, the absence of an N400-effect for contrasts such as these suggest that the critical word eat is equally facilitated in both the target and the control condition. An open question, however, is whether these effects are predominantly driven by lexical-level or event-level priming. To address this question, we conducted an ERP experiment in which we explicitly deactivate the event under discussion in order to mitigate event-level priming effects on the critical word.
},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   A1

Simova, Iliana; Uszkoreit, Hans

Word Embeddings as Features for Supervised Coreference Resolution Inproceedings

Proceedings of Recent Advances in Natural Language Processing, INCOMA Ltd., pp. 686-693, Varna, Bulgaria, 2017.

A common reason for errors in coreference resolution is the lack of semantic information to help determine the compatibility between mentions referring to the same entity. Distributed representations, which have been shown successful in encoding relatedness between words, could potentially be a good source of such knowledge. Moreover, being obtained in an unsupervised manner, they could help address data sparsity issues in labeled training data at a small cost. In this work we investigate whether and to what extend features derived from word embeddings can be successfully used for supervised coreference resolution. We experiment with several word embedding models, and several different types of embeddingbased features, including embedding cluster and cosine similarity-based features. Our evaluations show improvements in the performance of a supervised state-of-theart coreference system.

@inproceedings{simova:2017,
title = {Word Embeddings as Features for Supervised Coreference Resolution},
author = {Iliana Simova and Hans Uszkoreit},
url = {https://aclanthology.org/R17-1088/},
doi = {https://doi.org/10.26615/978-954-452-049-6_088},
year = {2017},
date = {2017},
booktitle = {Proceedings of Recent Advances in Natural Language Processing},
pages = {686-693},
publisher = {INCOMA Ltd.},
address = {Varna, Bulgaria},
abstract = {A common reason for errors in coreference resolution is the lack of semantic information to help determine the compatibility between mentions referring to the same entity. Distributed representations, which have been shown successful in encoding relatedness between words, could potentially be a good source of such knowledge. Moreover, being obtained in an unsupervised manner, they could help address data sparsity issues in labeled training data at a small cost. In this work we investigate whether and to what extend features derived from word embeddings can be successfully used for supervised coreference resolution. We experiment with several word embedding models, and several different types of embeddingbased features, including embedding cluster and cosine similarity-based features. Our evaluations show improvements in the performance of a supervised state-of-theart coreference system.},
keywords = {B5, sfb 1102},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B5

Le Maguer, Sébastien; Steiner, Ingmar

The "Uprooted" MaryTTS Entry for the Blizzard Challenge 2017 Inproceedings

Blizzard Challenge, Stockholm, Sweden, 2017.

The MaryTTS system is a modular text-to-speech (TTS) system which has been developed for nearly 20 years. This paper describes the MaryTTS entry for the Blizzard Challenge 2017. In contrast to last year’s MaryTTS system, based on a unit selection baseline using the latest stable MaryTTS version, the basis for this year’s system is a new, experimental version with a completely redesigned architecture.

@inproceedings{LeMaguer2017BC,
title = {The "Uprooted" MaryTTS Entry for the Blizzard Challenge 2017},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner},
url = {http://mary.dfki.de/documentation/publications/index.html},
year = {2017},
date = {2017},
booktitle = {Blizzard Challenge},
address = {Stockholm, Sweden},
abstract = {The MaryTTS system is a modular text-to-speech (TTS) system which has been developed for nearly 20 years. This paper describes the MaryTTS entry for the Blizzard Challenge 2017. In contrast to last year’s MaryTTS system, based on a unit selection baseline using the latest stable MaryTTS version, the basis for this year’s system is a new, experimental version with a completely redesigned architecture.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Gessinger, Iona; Raveh, Eran; Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Shadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence Inproceedings

Interspeech, pp. 3797-3801, Stockholm, Sweden, 2017.

To shed light on the question whether humans converge phonetically to synthesized speech, a shadowing experiment was conducted using three different types of stimuli – natural speaker, diphone synthesis, and HMM synthesis. Three segment-level phonetic features of German that are well-known to vary across native speakers were examined. The first feature triggered convergence in roughly one third of the cases for all stimulus types. The second feature showed generally a small amount of convergence, which may be due to the nature of the feature itself. Still the effect was strongest for the natural stimuli, followed by the HMM stimuli and weakest for the diphone stimuli. The effect of the third feature was clearly observable for the natural stimuli and less pronounced in the synthetic stimuli. This is presumably a result of the partly insufficient perceptibility of this target feature in the synthetic stimuli and demonstrates the necessity of gaining fine-grained control over the synthesis output, should it be intended to implement capabilities of phonetic convergence on the segmental level in spoken dialogue systems

@inproceedings{Gessinger2017IS,
title = {Shadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence},
author = {Iona Gessinger and Eran Raveh and S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/29623},
year = {2017},
date = {2017},
booktitle = {Interspeech},
pages = {3797-3801},
address = {Stockholm, Sweden},
abstract = {To shed light on the question whether humans converge phonetically to synthesized speech, a shadowing experiment was conducted using three different types of stimuli – natural speaker, diphone synthesis, and HMM synthesis. Three segment-level phonetic features of German that are well-known to vary across native speakers were examined. The first feature triggered convergence in roughly one third of the cases for all stimulus types. The second feature showed generally a small amount of convergence, which may be due to the nature of the feature itself. Still the effect was strongest for the natural stimuli, followed by the HMM stimuli and weakest for the diphone stimuli. The effect of the third feature was clearly observable for the natural stimuli and less pronounced in the synthetic stimuli. This is presumably a result of the partly insufficient perceptibility of this target feature in the synthetic stimuli and demonstrates the necessity of gaining fine-grained control over the synthesis output, should it be intended to implement capabilities of phonetic convergence on the segmental level in spoken dialogue systems},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Le Maguer, Sébastien; Steiner, Ingmar; Hewer, Alexander

An HMM/DNN comparison for synchronized text-to-speech and tongue motion synthesis Inproceedings

Proc. Interspeech 2017, pp. 239-243, Stockholm, Sweden, 2017.

We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a statistical shape space model of the tongue surface to an articulatory speech corpus and training a speech synthesis system directly on the tongue model parameter weights. We focus our analysis on the application of two standard methodologies, based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs), respectively, to train both acoustic models and the tongue model parameter weights. We evaluate both methodologies at every step by comparing the predicted articulatory movements against the reference data. The results show that even with less than 2h of data, DNNs already outperform HMMs.

@inproceedings{LeMaguer2017IS,
title = {An HMM/DNN comparison for synchronized text-to-speech and tongue motion synthesis},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner and Alexander Hewer},
url = {https://www.isca-speech.org/archive/interspeech_2017/maguer17_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2017-936},
year = {2017},
date = {2017},
booktitle = {Proc. Interspeech 2017},
pages = {239-243},
address = {Stockholm, Sweden},
abstract = {We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a statistical shape space model of the tongue surface to an articulatory speech corpus and training a speech synthesis system directly on the tongue model parameter weights. We focus our analysis on the application of two standard methodologies, based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs), respectively, to train both acoustic models and the tongue model parameter weights. We evaluate both methodologies at every step by comparing the predicted articulatory movements against the reference data. The results show that even with less than 2h of data, DNNs already outperform HMMs.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Delogu, Francesca; Brouwer, Harm; Crocker, Matthew W.

The P600 - not the N400 - indexes semantic integration Inproceedings

9th Annual Meeting of the Society for the Neurobiology of Language (SNL), Baltimore, US, 2017.
The N400 and P600 are the two most salient language-sensitive components of the Event-Related Potential (ERP) signal. Yet, their functional interpretation is still a matter of debate. Traditionally, the N400 is taken to reflect processes of semantic integration while the P600 is linked to structural reanalysis [1,2]. These views have, however, been challenged by so-called Semantic Illusions (SIs), where semantically anomalous target words produce P600-rather than N400-effects (e.g., “For breakfast the eggs/boys would eat”, [3]). To account for these findings, complex multi-stream models of language processing have been proposed in an attempt to maintain the traditional views on the N400 and the P600 (see [4] for a review). However, these models fail to account for SIs in wider discourse [5] and/or in absence of semantic violations [6]. In contrast, the Retrieval-Integration (RI) account [4] puts forward an explanation for elicitation pattern of the N400 and the P600 by rethinking their functional interpretations. According to the RI account, N400 amplitude reflects retrieval of lexical-semantic information form long-term memory, and is therefore sensitive to priming (in line with [7,8]), while processes of semantic integration are indexed by the P600. To provide decisive evidence for the P600/Integration hypothesis, we conducted an ERP study in which twenty-one participants read short discourses in which a non-anomalous target word (“menu”) was easy (a. John entered the restaurant. Before long he opened the menu and […]) vs. difficult (b. John left the restaurant. Before long he opened the menu and […]) to integrate into the unfolding discourse representation, but, crucially, was equally primed by the two contexts (through the word “restaurant”). The reduced plausibility of (b) compared to (a) was confirmed by offline plausibility ratings. Here, traditional accounts predict that difficulty in integrating the target word in (b) should elicit an N400-effect, and no P600-effect. By contrast, the RI account predicts no N400-effect (due to similar priming), but a P600-effect indexing semantic integration difficulty. As predicted by RI, we observed a larger P600 for (b) relative to (a), and no difference in N400 amplitude. Importantly, an N400-effect was observed for a further control condition in which the target word “menu” was not primed by the context (e.g., “John entered the apartment”), which elicited an increased N400 amplitude relative to (a) and (b). Taken together, our results provide clear evidence for the RI account: semantic integration is indexed by the P600 component, while the N400 is predominantly driven by priming. Our findings highlight the importance of establishing specific linking hypotheses to the N400 and P600 components in order to properly interpret ERP results for the development of more informed neurobiological models of language. [1] Brown & Hagoort (1993), JCN; [2] Osterhout & Holcomb (1992), JML; [3] Kuperberg et al. (2003), Brain Res Cogn Brain Res.; [4] Brouwer et al. (2012), Brain Res.; [5] Nieuwland & Van Berkum (2005), Cogn. Brain Res.; [6] Chow & Phillips (2013), Brain Res.; [7] Kutas & Federmeier (2000), TiCS; [8] Lau et al. (2008), Nat. Rev. Neurosci.

@inproceedings{Delogu2017c,
title = {The P600 - not the N400 - indexes semantic integration},
author = {Francesca Delogu and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/320979082_The_P600_-_not_the_N400_-_indexes_semantic_integration},
year = {2017},
date = {2017},
publisher = {9th Annual Meeting of the Society for the Neurobiology of Language (SNL)},
address = {Baltimore, US},
abstract = {

The N400 and P600 are the two most salient language-sensitive components of the Event-Related Potential (ERP) signal. Yet, their functional interpretation is still a matter of debate. Traditionally, the N400 is taken to reflect processes of semantic integration while the P600 is linked to structural reanalysis [1,2]. These views have, however, been challenged by so-called Semantic Illusions (SIs), where semantically anomalous target words produce P600-rather than N400-effects (e.g., “For breakfast the eggs/boys would eat”, [3]). To account for these findings, complex multi-stream models of language processing have been proposed in an attempt to maintain the traditional views on the N400 and the P600 (see [4] for a review). However, these models fail to account for SIs in wider discourse [5] and/or in absence of semantic violations [6]. In contrast, the Retrieval-Integration (RI) account [4] puts forward an explanation for elicitation pattern of the N400 and the P600 by rethinking their functional interpretations. According to the RI account, N400 amplitude reflects retrieval of lexical-semantic information form long-term memory, and is therefore sensitive to priming (in line with [7,8]), while processes of semantic integration are indexed by the P600. To provide decisive evidence for the P600/Integration hypothesis, we conducted an ERP study in which twenty-one participants read short discourses in which a non-anomalous target word (“menu”) was easy (a. John entered the restaurant. Before long he opened the menu and [...]) vs. difficult (b. John left the restaurant. Before long he opened the menu and [...]) to integrate into the unfolding discourse representation, but, crucially, was equally primed by the two contexts (through the word “restaurant”). The reduced plausibility of (b) compared to (a) was confirmed by offline plausibility ratings. Here, traditional accounts predict that difficulty in integrating the target word in (b) should elicit an N400-effect, and no P600-effect. By contrast, the RI account predicts no N400-effect (due to similar priming), but a P600-effect indexing semantic integration difficulty. As predicted by RI, we observed a larger P600 for (b) relative to (a), and no difference in N400 amplitude. Importantly, an N400-effect was observed for a further control condition in which the target word “menu” was not primed by the context (e.g., “John entered the apartment”), which elicited an increased N400 amplitude relative to (a) and (b). Taken together, our results provide clear evidence for the RI account: semantic integration is indexed by the P600 component, while the N400 is predominantly driven by priming. Our findings highlight the importance of establishing specific linking hypotheses to the N400 and P600 components in order to properly interpret ERP results for the development of more informed neurobiological models of language. [1] Brown & Hagoort (1993), JCN; [2] Osterhout & Holcomb (1992), JML; [3] Kuperberg et al. (2003), Brain Res Cogn Brain Res.; [4] Brouwer et al. (2012), Brain Res.; [5] Nieuwland & Van Berkum (2005), Cogn. Brain Res.; [6] Chow & Phillips (2013), Brain Res.; [7] Kutas & Federmeier (2000), TiCS; [8] Lau et al. (2008), Nat. Rev. Neurosci.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A1

Oualil, Youssef; Klakow, Dietrich

A batch noise contrastive estimation approach for training large vocabulary language models Inproceedings

18th Annual Conference of the International Speech Communication Association (INTERSPEECH), 2017.

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.

@inproceedings{Oualil2017,
title = {A batch noise contrastive estimation approach for training large vocabulary language models},
author = {Youssef Oualil and Dietrich Klakow},
url = {https://arxiv.org/abs/1708.05997},
year = {2017},
date = {2017},
publisher = {18th Annual Conference of the International Speech Communication Association (INTERSPEECH)},
abstract = {Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Successfully