Publications

Roth, Michael; Thater, Stefan; Ostermann, Simon; Pinkal, Manfred

Aligning Script Events with Narrative Texts Inproceedings

Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), Association for Computational Linguistics, Vancouver, Canada, 2017.

Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets.

We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.

@inproceedings{ostermann-EtAl:2017:starSEM,
title = {Aligning Script Events with Narrative Texts},
author = {Michael Roth and Stefan Thater andSimon Ostermann and Manfred Pinkal},
url = {http://www.aclweb.org/anthology/S17-1016},
year = {2017},
date = {2017-10-17},
booktitle = {Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)},
publisher = {Association for Computational Linguistics},
address = {Vancouver, Canada},
abstract = {Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets. We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A3

Nguyen, Dai Quoc; Nguyen, Dat Quoc; Modi, Ashutosh; Thater, Stefan; Pinkal, Manfred

A Mixture Model for Learning Multi-Sense Word Embeddings Inproceedings

Association for Computational Linguistics, pp. 121-127, Vancouver, Canada, 2017.

Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings.

Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.

@inproceedings{nguyen-EtAl:2017:starSEM,
title = {A Mixture Model for Learning Multi-Sense Word Embeddings},
author = {Dai Quoc Nguyen and Dat Quoc Nguyen and Ashutosh Modi and Stefan Thater and Manfred Pinkal},
url = {http://www.aclweb.org/anthology/S17-1015},
year = {2017},
date = {2017},
pages = {121-127},
publisher = {Association for Computational Linguistics},
address = {Vancouver, Canada},
abstract = {Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings. Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A2 A3

Nguyen, Dai Quoc; Nguyen, Dat Quoc; Chu, Cuong Xuan; Thater, Stefan; Pinkal, Manfred

Sequence to Sequence Learning for Event Prediction Inproceedings

Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Asian Federation of Natural Language Processing, pp. 37-42, Taipei, Taiwan, 2017.

This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively.

Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.

@inproceedings{nguyen-EtAl:2017:I17-2,
title = {Sequence to Sequence Learning for Event Prediction},
author = {Dai Quoc Nguyen and Dat Quoc Nguyen and Cuong Xuan Chu and Stefan Thater and Manfred Pinkal},
url = {http://www.aclweb.org/anthology/I17-2007},
year = {2017},
date = {2017-10-17},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
pages = {37-42},
publisher = {Asian Federation of Natural Language Processing},
address = {Taipei, Taiwan},
abstract = {This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively. Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A3 A2

Tourtouri, Elli; Delogu, Francesca; Crocker, Matthew W.

Overspecifications efficiently manage referential entropy in situated communication Inproceedings

Paper presented at the 39th Annual Conference of the German Linguistic Society (DGfS), Saarland University, Saarbruecken, Germany, 2017.

@inproceedings{Tourtourietal2017a,
title = {Overspecifications efficiently manage referential entropy in situated communication},
author = {Elli Tourtouri and Francesca Delogu and Matthew W. Crocker},
year = {2017},
date = {2017},
booktitle = {Paper presented at the 39th Annual Conference of the German Linguistic Society (DGfS)},
publisher = {Saarland University},
address = {Saarbruecken, Germany},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A1 C3

Delogu, Francesca; Crocker, Matthew W.; Drenhaus, Heiner

Teasing apart coercion and surprisal: Evidence from ERPs and eye-movements Journal Article

Cognition, 161, pp. 46-59, 2017.

Previous behavioral and electrophysiological studies have presented evidence suggesting that coercion expressions (e.g., began the book) are more difficult to process than control expressions like read the book. While this processing cost has been attributed to a specific coercion operation for recovering an event-sense of the complement (e.g., began reading the book), an alternative view based on the Surprisal Theory of language processing would attribute the cost to the relative unpredictability of the complement noun in the coercion compared to the control condition, with no need to postulate coercion-specific mechanisms. In two experiments, monitoring eye-tracking and event-related potentials (ERPs), respectively, we sought to determine whether there is any evidence for coercion-specific processing cost above-and-beyond the difficulty predicted by surprisal, by contrasting coercing and control expressions with a further control condition in which the predictability of the complement noun was similar to that in the coercion condition (e.g., bought the book). While the eye-tracking study showed significant effects of surprisal and a marginal effect of coercion on late reading measures, the ERP study clearly supported the surprisal account. Overall, our findings suggest that the coercion cost largely reflects the surprisal of the complement noun with coercion specific operations possibly influencing later processing stages.

@article{Brouwer2017,
title = {Teasing apart coercion and surprisal: Evidence from ERPs and eye-movements},
author = {Francesca Delogu and Matthew W. Crocker and Heiner Drenhaus},
url = {https://www.sciencedirect.com/science/article/pii/S0010027716303122},
doi = {https://doi.org/10.1016/j.cognition.2016.12.017},
year = {2017},
date = {2017},
journal = {Cognition},
pages = {46-59},
volume = {161},
abstract = {

Previous behavioral and electrophysiological studies have presented evidence suggesting that coercion expressions (e.g., began the book) are more difficult to process than control expressions like read the book. While this processing cost has been attributed to a specific coercion operation for recovering an event-sense of the complement (e.g., began reading the book), an alternative view based on the Surprisal Theory of language processing would attribute the cost to the relative unpredictability of the complement noun in the coercion compared to the control condition, with no need to postulate coercion-specific mechanisms. In two experiments, monitoring eye-tracking and event-related potentials (ERPs), respectively, we sought to determine whether there is any evidence for coercion-specific processing cost above-and-beyond the difficulty predicted by surprisal, by contrasting coercing and control expressions with a further control condition in which the predictability of the complement noun was similar to that in the coercion condition (e.g., bought the book). While the eye-tracking study showed significant effects of surprisal and a marginal effect of coercion on late reading measures, the ERP study clearly supported the surprisal account. Overall, our findings suggest that the coercion cost largely reflects the surprisal of the complement noun with coercion specific operations possibly influencing later processing stages.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Brouwer, Harm; Crocker, Matthew W.; Venhuizen, Noortje

Neural semantics Journal Article

From Semantics to Dialectometry: Festschrift in Honour of John Nerbonne, pp. 75-83, 2017.

The study of language is ultimately about meaning: how can meaning be constructed from linguistic signal, and how can it be represented? he human language comprehension system is highly eicient and accurate at atributing meaning to linguistic input. Hence, in trying to identify computational principles and representations for meaning construction, we should consider how these could be implemented at the neural level in the brain. Here, we introduce a framework for such a neural semantics. his framework ofers meaning representations that are neurally plausible (can be implemented in neural hardware), expressive (capture negation, quantiication, and modality), compositional (capture complex propositional meaning as the sum of its parts), graded (are probabilistic in nature), and inferential (allow for inferences beyond literal propositional content). Moreover, it is shown how these meaning representations can be constructed incrementally, on a word-by-word basis in a neurocomputational model of language processing.

@article{Brouwer2017b,
title = {Neural semantics},
author = {Harm Brouwer and Matthew W. Crocker and Noortje Venhuizen},
url = {https://research.rug.nl/en/publications/from-semantics-to-dialectometry-festschrift-in-honor-of-john-nerb},
year = {2017},
date = {2017},
journal = {From Semantics to Dialectometry: Festschrift in Honour of John Nerbonne},
pages = {75-83},
abstract = {The study of language is ultimately about meaning: how can meaning be constructed from linguistic signal, and how can it be represented? he human language comprehension system is highly eicient and accurate at atributing meaning to linguistic input. Hence, in trying to identify computational principles and representations for meaning construction, we should consider how these could be implemented at the neural level in the brain. Here, we introduce a framework for such a neural semantics. his framework ofers meaning representations that are neurally plausible (can be implemented in neural hardware), expressive (capture negation, quantiication, and modality), compositional (capture complex propositional meaning as the sum of its parts), graded (are probabilistic in nature), and inferential (allow for inferences beyond literal propositional content). Moreover, it is shown how these meaning representations can be constructed incrementally, on a word-by-word basis in a neurocomputational model of language processing.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Brouwer, Harm; Crocker, Matthew W.

On the proper treatment of the N400 and P600 in Language comprehension Journal Article

Frontiers in Psychology, 8, 2017, ISSN 1664-1078.

Event-Related Potentials (ERPs)—stimulus-locked, scalp-recorded voltage fluctuations caused by post-synaptic neural activity—have proven invaluable to the study of language comprehension. Of interest in the ERP signal are systematic, reoccurring voltage fluctuations called components, which are taken to reflect the neural activity underlying specific computational operations carried out in given neuroanatomical networks (cf. Näätänen and Picton, 1987). For language processing, the N400 component and the P600 component are of particular salience (see Kutas et al., 2006, for a review). The typical approach to determining whether a target word in a sentence leads to differential modulation of these components, relative to a control word, is to look for effects on mean amplitude in predetermined time-windows on the respective ERP waveforms, e.g., 350–550 ms for the N400 component and 600–900 ms for the P600 component. The common mode of operation in psycholinguistics, then, is to tabulate the presence/absence of N400- and/or P600-effects across studies, and to use this categorical data to inform neurocognitive models that attribute specific functional roles to the N400 and P600 component (see Kuperberg, 2007; Bornkessel-Schlesewsky and Schlesewsky, 2008; Brouwer et al., 2012, for reviews).

Here, we assert that this Waveform-based Component Structure (WCS) approach to ERPs leads to inconsistent data patterns, and hence, misinforms neurocognitive models of the electrophysiology of language processing. The reason for this is that the WCS approach ignores the latent component structure underlying ERP waveforms (cf. Luck, 2005), thereby leading to conclusions about component structure that do not factor in spatiotemporal component overlap of the N400 and the P600. This becomes particularly problematic when spatiotemporal component overlap interacts with differential P600 modulations due to task demands (cf. Kolk et al., 2003). While the problem of spatiotemporal component overlap is generally acknowledged, and occasionally invoked to account for within-study inconsistencies in the data, its implications are often overlooked in psycholinguistic theorizing that aims to integrate findings across studies. We believe WCS-centric theorizing to be the single largest reason for the lack of convergence regarding the processes underlying the N400 and the P600, thereby seriously hindering the advancement of neurocognitive theories and models of language processing.

@article{Brouwer2017,
title = {On the proper treatment of the N400 and P600 in Language comprehension},
author = {Harm Brouwer and Matthew W. Crocker},
url = {https://www.frontiersin.org/articles/10.3389/fpsyg.2017.01327/full},
doi = {https://doi.org/10.3389/fpsyg.2017.01327},
year = {2017},
date = {2017},
journal = {Frontiers in Psychology},
volume = {8},
abstract = {

Event-Related Potentials (ERPs)—stimulus-locked, scalp-recorded voltage fluctuations caused by post-synaptic neural activity—have proven invaluable to the study of language comprehension. Of interest in the ERP signal are systematic, reoccurring voltage fluctuations called components, which are taken to reflect the neural activity underlying specific computational operations carried out in given neuroanatomical networks (cf. N{\"a}{\"a}t{\"a}nen and Picton, 1987). For language processing, the N400 component and the P600 component are of particular salience (see Kutas et al., 2006, for a review). The typical approach to determining whether a target word in a sentence leads to differential modulation of these components, relative to a control word, is to look for effects on mean amplitude in predetermined time-windows on the respective ERP waveforms, e.g., 350–550 ms for the N400 component and 600–900 ms for the P600 component. The common mode of operation in psycholinguistics, then, is to tabulate the presence/absence of N400- and/or P600-effects across studies, and to use this categorical data to inform neurocognitive models that attribute specific functional roles to the N400 and P600 component (see Kuperberg, 2007; Bornkessel-Schlesewsky and Schlesewsky, 2008; Brouwer et al., 2012, for reviews).

Here, we assert that this Waveform-based Component Structure (WCS) approach to ERPs leads to inconsistent data patterns, and hence, misinforms neurocognitive models of the electrophysiology of language processing. The reason for this is that the WCS approach ignores the latent component structure underlying ERP waveforms (cf. Luck, 2005), thereby leading to conclusions about component structure that do not factor in spatiotemporal component overlap of the N400 and the P600. This becomes particularly problematic when spatiotemporal component overlap interacts with differential P600 modulations due to task demands (cf. Kolk et al., 2003). While the problem of spatiotemporal component overlap is generally acknowledged, and occasionally invoked to account for within-study inconsistencies in the data, its implications are often overlooked in psycholinguistic theorizing that aims to integrate findings across studies. We believe WCS-centric theorizing to be the single largest reason for the lack of convergence regarding the processes underlying the N400 and the P600, thereby seriously hindering the advancement of neurocognitive theories and models of language processing.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Rabs, Elisabeth; Drenhaus, Heiner; Delogu, Francesca; Crocker, Matthew W.

The influence of script knowledge on language processing: Evidence from ERPs Miscellaneous

23rd AMLaP Conference, Lancaster, UK, 2017.
Previous research has shown that the semantic expectedness of a word – as established by the linguistic context – is negatively correlated with N400 amplitude. While such evidence has been used to argue that the N400 indexes semantic integration processes, findings can often be explained in terms of facilitated lexical retrieval, which, among other factors, is influenced by lexical/semantic priming. In the present study we examine this issue by manipulating script event knowledge – a person’s knowledge about structured event sequences – which has been previously shown to modulate the N400. An ERP-study (German) investigated whether N400 modulation by a mentioned script event is due to priming alone, or is further sensitive to linguistic cues which would be expected to modulate script influence.

@miscellaneous{Rabs2017,
title = {The influence of script knowledge on language processing: Evidence from ERPs},
author = {Elisabeth Rabs and Heiner Drenhaus and Francesca Delogu and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/320988782_The_Influence_of_Script_Knowledge_on_Language_Processing_Evidence_from_ERPs},
year = {2017},
date = {2017},
publisher = {23rd AMLaP Conference},
address = {Lancaster, UK},
abstract = {

Previous research has shown that the semantic expectedness of a word – as established by the linguistic context – is negatively correlated with N400 amplitude. While such evidence has been used to argue that the N400 indexes semantic integration processes, findings can often be explained in terms of facilitated lexical retrieval, which, among other factors, is influenced by lexical/semantic priming. In the present study we examine this issue by manipulating script event knowledge – a person’s knowledge about structured event sequences – which has been previously shown to modulate the N400. An ERP-study (German) investigated whether N400 modulation by a mentioned script event is due to priming alone, or is further sensitive to linguistic cues which would be expected to modulate script influence.
},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   A1

Delogu, Francesca; Brouwer, Harm; Crocker, Matthew W.

The influence of lexical priming versus event knowledge on the N400 and the P600 Miscellaneous

23rd AMLaP Conference, Lancaster, UK, 2017.
In online language comprehension, the N400 component of the Event-Related Potentials (ERP) signal is inversely proportional to semantic expectancy (Kutas & Federmeier, 2011). Among other factors, a word’s expectancy is influenced by both lexical-level (Bentin et al., 1985) as well as event-level (Metusalem et al., 2012) priming: the N400 amplitude is reduced if the eliciting word is semantically related to prior words in the context and/or when it is consistent with the event being described. Perhaps the most extreme instance of such facilitatory effects arises in the processing of reversal anomalies (see Brouwer et al., 2012 for review). Here, a word that renders a sentence semantically anomalous, such as “eat” in “For breakfast the eggs would eat”, produces no difference in N400 amplitude relative to a non-anomalous control “For breakfast the boys would eat” (Kuperberg et al., 2007). Indeed, the absence of an N400-effect for contrasts such as these suggest that the critical word eat is equally facilitated in both the target and the control condition. An open question, however, is whether these effects are predominantly driven by lexical-level or event-level priming. To address this question, we conducted an ERP experiment in which we explicitly deactivate the event under discussion in order to mitigate event-level priming effects on the critical word.

@miscellaneous{Delogu2017b,
title = {The influence of lexical priming versus event knowledge on the N400 and the P600},
author = {Francesca Delogu and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/319543522_The_influence_of_lexical_priming_versus_event_knowledge_on_the_N400_and_the_P600},
year = {2017},
date = {2017},
publisher = {23rd AMLaP Conference},
address = {Lancaster, UK},
abstract = {

In online language comprehension, the N400 component of the Event-Related Potentials (ERP) signal is inversely proportional to semantic expectancy (Kutas & Federmeier, 2011). Among other factors, a word’s expectancy is influenced by both lexical-level (Bentin et al., 1985) as well as event-level (Metusalem et al., 2012) priming: the N400 amplitude is reduced if the eliciting word is semantically related to prior words in the context and/or when it is consistent with the event being described. Perhaps the most extreme instance of such facilitatory effects arises in the processing of reversal anomalies (see Brouwer et al., 2012 for review). Here, a word that renders a sentence semantically anomalous, such as “eat” in “For breakfast the eggs would eat”, produces no difference in N400 amplitude relative to a non-anomalous control “For breakfast the boys would eat” (Kuperberg et al., 2007). Indeed, the absence of an N400-effect for contrasts such as these suggest that the critical word eat is equally facilitated in both the target and the control condition. An open question, however, is whether these effects are predominantly driven by lexical-level or event-level priming. To address this question, we conducted an ERP experiment in which we explicitly deactivate the event under discussion in order to mitigate event-level priming effects on the critical word.
},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   A1

Simova, Iliana; Uszkoreit, Hans

Word Embeddings as Features for Supervised Coreference Resolution Inproceedings

Proceedings of Recent Advances in Natural Language Processing, INCOMA Ltd., pp. 686-693, Varna, Bulgaria, 2017.

A common reason for errors in coreference resolution is the lack of semantic information to help determine the compatibility between mentions referring to the same entity. Distributed representations, which have been shown successful in encoding relatedness between words, could potentially be a good source of such knowledge. Moreover, being obtained in an unsupervised manner, they could help address data sparsity issues in labeled training data at a small cost. In this work we investigate whether and to what extend features derived from word embeddings can be successfully used for supervised coreference resolution. We experiment with several word embedding models, and several different types of embeddingbased features, including embedding cluster and cosine similarity-based features. Our evaluations show improvements in the performance of a supervised state-of-theart coreference system.

@inproceedings{simova:2017,
title = {Word Embeddings as Features for Supervised Coreference Resolution},
author = {Iliana Simova and Hans Uszkoreit},
url = {https://aclanthology.org/R17-1088/},
doi = {https://doi.org/10.26615/978-954-452-049-6_088},
year = {2017},
date = {2017},
booktitle = {Proceedings of Recent Advances in Natural Language Processing},
pages = {686-693},
publisher = {INCOMA Ltd.},
address = {Varna, Bulgaria},
abstract = {A common reason for errors in coreference resolution is the lack of semantic information to help determine the compatibility between mentions referring to the same entity. Distributed representations, which have been shown successful in encoding relatedness between words, could potentially be a good source of such knowledge. Moreover, being obtained in an unsupervised manner, they could help address data sparsity issues in labeled training data at a small cost. In this work we investigate whether and to what extend features derived from word embeddings can be successfully used for supervised coreference resolution. We experiment with several word embedding models, and several different types of embeddingbased features, including embedding cluster and cosine similarity-based features. Our evaluations show improvements in the performance of a supervised state-of-theart coreference system.},
keywords = {B5, sfb 1102},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B5

Le Maguer, Sébastien; Steiner, Ingmar

The "Uprooted" MaryTTS Entry for the Blizzard Challenge 2017 Inproceedings

Blizzard Challenge, Stockholm, Sweden, 2017.

The MaryTTS system is a modular text-to-speech (TTS) system which has been developed for nearly 20 years. This paper describes the MaryTTS entry for the Blizzard Challenge 2017. In contrast to last year’s MaryTTS system, based on a unit selection baseline using the latest stable MaryTTS version, the basis for this year’s system is a new, experimental version with a completely redesigned architecture.

@inproceedings{LeMaguer2017BC,
title = {The "Uprooted" MaryTTS Entry for the Blizzard Challenge 2017},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner},
url = {http://mary.dfki.de/documentation/publications/index.html},
year = {2017},
date = {2017},
booktitle = {Blizzard Challenge},
address = {Stockholm, Sweden},
abstract = {The MaryTTS system is a modular text-to-speech (TTS) system which has been developed for nearly 20 years. This paper describes the MaryTTS entry for the Blizzard Challenge 2017. In contrast to last year’s MaryTTS system, based on a unit selection baseline using the latest stable MaryTTS version, the basis for this year’s system is a new, experimental version with a completely redesigned architecture.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Gessinger, Iona; Raveh, Eran; Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Shadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence Inproceedings

Interspeech, pp. 3797-3801, Stockholm, Sweden, 2017.

To shed light on the question whether humans converge phonetically to synthesized speech, a shadowing experiment was conducted using three different types of stimuli – natural speaker, diphone synthesis, and HMM synthesis. Three segment-level phonetic features of German that are well-known to vary across native speakers were examined. The first feature triggered convergence in roughly one third of the cases for all stimulus types. The second feature showed generally a small amount of convergence, which may be due to the nature of the feature itself. Still the effect was strongest for the natural stimuli, followed by the HMM stimuli and weakest for the diphone stimuli. The effect of the third feature was clearly observable for the natural stimuli and less pronounced in the synthetic stimuli. This is presumably a result of the partly insufficient perceptibility of this target feature in the synthetic stimuli and demonstrates the necessity of gaining fine-grained control over the synthesis output, should it be intended to implement capabilities of phonetic convergence on the segmental level in spoken dialogue systems

@inproceedings{Gessinger2017IS,
title = {Shadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence},
author = {Iona Gessinger and Eran Raveh and S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/29623},
year = {2017},
date = {2017},
booktitle = {Interspeech},
pages = {3797-3801},
address = {Stockholm, Sweden},
abstract = {To shed light on the question whether humans converge phonetically to synthesized speech, a shadowing experiment was conducted using three different types of stimuli – natural speaker, diphone synthesis, and HMM synthesis. Three segment-level phonetic features of German that are well-known to vary across native speakers were examined. The first feature triggered convergence in roughly one third of the cases for all stimulus types. The second feature showed generally a small amount of convergence, which may be due to the nature of the feature itself. Still the effect was strongest for the natural stimuli, followed by the HMM stimuli and weakest for the diphone stimuli. The effect of the third feature was clearly observable for the natural stimuli and less pronounced in the synthetic stimuli. This is presumably a result of the partly insufficient perceptibility of this target feature in the synthetic stimuli and demonstrates the necessity of gaining fine-grained control over the synthesis output, should it be intended to implement capabilities of phonetic convergence on the segmental level in spoken dialogue systems},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Le Maguer, Sébastien; Steiner, Ingmar; Hewer, Alexander

An HMM/DNN comparison for synchronized text-to-speech and tongue motion synthesis Inproceedings

Proc. Interspeech 2017, pp. 239-243, Stockholm, Sweden, 2017.

We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a statistical shape space model of the tongue surface to an articulatory speech corpus and training a speech synthesis system directly on the tongue model parameter weights. We focus our analysis on the application of two standard methodologies, based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs), respectively, to train both acoustic models and the tongue model parameter weights. We evaluate both methodologies at every step by comparing the predicted articulatory movements against the reference data. The results show that even with less than 2h of data, DNNs already outperform HMMs.

@inproceedings{LeMaguer2017IS,
title = {An HMM/DNN comparison for synchronized text-to-speech and tongue motion synthesis},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner and Alexander Hewer},
url = {https://www.isca-speech.org/archive/interspeech_2017/maguer17_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2017-936},
year = {2017},
date = {2017},
booktitle = {Proc. Interspeech 2017},
pages = {239-243},
address = {Stockholm, Sweden},
abstract = {We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a statistical shape space model of the tongue surface to an articulatory speech corpus and training a speech synthesis system directly on the tongue model parameter weights. We focus our analysis on the application of two standard methodologies, based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs), respectively, to train both acoustic models and the tongue model parameter weights. We evaluate both methodologies at every step by comparing the predicted articulatory movements against the reference data. The results show that even with less than 2h of data, DNNs already outperform HMMs.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Delogu, Francesca; Brouwer, Harm; Crocker, Matthew W.

The P600 - not the N400 - indexes semantic integration Inproceedings

9th Annual Meeting of the Society for the Neurobiology of Language (SNL), Baltimore, US, 2017.
The N400 and P600 are the two most salient language-sensitive components of the Event-Related Potential (ERP) signal. Yet, their functional interpretation is still a matter of debate. Traditionally, the N400 is taken to reflect processes of semantic integration while the P600 is linked to structural reanalysis [1,2]. These views have, however, been challenged by so-called Semantic Illusions (SIs), where semantically anomalous target words produce P600-rather than N400-effects (e.g., “For breakfast the eggs/boys would eat”, [3]). To account for these findings, complex multi-stream models of language processing have been proposed in an attempt to maintain the traditional views on the N400 and the P600 (see [4] for a review). However, these models fail to account for SIs in wider discourse [5] and/or in absence of semantic violations [6]. In contrast, the Retrieval-Integration (RI) account [4] puts forward an explanation for elicitation pattern of the N400 and the P600 by rethinking their functional interpretations. According to the RI account, N400 amplitude reflects retrieval of lexical-semantic information form long-term memory, and is therefore sensitive to priming (in line with [7,8]), while processes of semantic integration are indexed by the P600. To provide decisive evidence for the P600/Integration hypothesis, we conducted an ERP study in which twenty-one participants read short discourses in which a non-anomalous target word (“menu”) was easy (a. John entered the restaurant. Before long he opened the menu and […]) vs. difficult (b. John left the restaurant. Before long he opened the menu and […]) to integrate into the unfolding discourse representation, but, crucially, was equally primed by the two contexts (through the word “restaurant”). The reduced plausibility of (b) compared to (a) was confirmed by offline plausibility ratings. Here, traditional accounts predict that difficulty in integrating the target word in (b) should elicit an N400-effect, and no P600-effect. By contrast, the RI account predicts no N400-effect (due to similar priming), but a P600-effect indexing semantic integration difficulty. As predicted by RI, we observed a larger P600 for (b) relative to (a), and no difference in N400 amplitude. Importantly, an N400-effect was observed for a further control condition in which the target word “menu” was not primed by the context (e.g., “John entered the apartment”), which elicited an increased N400 amplitude relative to (a) and (b). Taken together, our results provide clear evidence for the RI account: semantic integration is indexed by the P600 component, while the N400 is predominantly driven by priming. Our findings highlight the importance of establishing specific linking hypotheses to the N400 and P600 components in order to properly interpret ERP results for the development of more informed neurobiological models of language. [1] Brown & Hagoort (1993), JCN; [2] Osterhout & Holcomb (1992), JML; [3] Kuperberg et al. (2003), Brain Res Cogn Brain Res.; [4] Brouwer et al. (2012), Brain Res.; [5] Nieuwland & Van Berkum (2005), Cogn. Brain Res.; [6] Chow & Phillips (2013), Brain Res.; [7] Kutas & Federmeier (2000), TiCS; [8] Lau et al. (2008), Nat. Rev. Neurosci.

@inproceedings{Delogu2017c,
title = {The P600 - not the N400 - indexes semantic integration},
author = {Francesca Delogu and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/320979082_The_P600_-_not_the_N400_-_indexes_semantic_integration},
year = {2017},
date = {2017},
publisher = {9th Annual Meeting of the Society for the Neurobiology of Language (SNL)},
address = {Baltimore, US},
abstract = {

The N400 and P600 are the two most salient language-sensitive components of the Event-Related Potential (ERP) signal. Yet, their functional interpretation is still a matter of debate. Traditionally, the N400 is taken to reflect processes of semantic integration while the P600 is linked to structural reanalysis [1,2]. These views have, however, been challenged by so-called Semantic Illusions (SIs), where semantically anomalous target words produce P600-rather than N400-effects (e.g., “For breakfast the eggs/boys would eat”, [3]). To account for these findings, complex multi-stream models of language processing have been proposed in an attempt to maintain the traditional views on the N400 and the P600 (see [4] for a review). However, these models fail to account for SIs in wider discourse [5] and/or in absence of semantic violations [6]. In contrast, the Retrieval-Integration (RI) account [4] puts forward an explanation for elicitation pattern of the N400 and the P600 by rethinking their functional interpretations. According to the RI account, N400 amplitude reflects retrieval of lexical-semantic information form long-term memory, and is therefore sensitive to priming (in line with [7,8]), while processes of semantic integration are indexed by the P600. To provide decisive evidence for the P600/Integration hypothesis, we conducted an ERP study in which twenty-one participants read short discourses in which a non-anomalous target word (“menu”) was easy (a. John entered the restaurant. Before long he opened the menu and [...]) vs. difficult (b. John left the restaurant. Before long he opened the menu and [...]) to integrate into the unfolding discourse representation, but, crucially, was equally primed by the two contexts (through the word “restaurant”). The reduced plausibility of (b) compared to (a) was confirmed by offline plausibility ratings. Here, traditional accounts predict that difficulty in integrating the target word in (b) should elicit an N400-effect, and no P600-effect. By contrast, the RI account predicts no N400-effect (due to similar priming), but a P600-effect indexing semantic integration difficulty. As predicted by RI, we observed a larger P600 for (b) relative to (a), and no difference in N400 amplitude. Importantly, an N400-effect was observed for a further control condition in which the target word “menu” was not primed by the context (e.g., “John entered the apartment”), which elicited an increased N400 amplitude relative to (a) and (b). Taken together, our results provide clear evidence for the RI account: semantic integration is indexed by the P600 component, while the N400 is predominantly driven by priming. Our findings highlight the importance of establishing specific linking hypotheses to the N400 and P600 components in order to properly interpret ERP results for the development of more informed neurobiological models of language. [1] Brown & Hagoort (1993), JCN; [2] Osterhout & Holcomb (1992), JML; [3] Kuperberg et al. (2003), Brain Res Cogn Brain Res.; [4] Brouwer et al. (2012), Brain Res.; [5] Nieuwland & Van Berkum (2005), Cogn. Brain Res.; [6] Chow & Phillips (2013), Brain Res.; [7] Kutas & Federmeier (2000), TiCS; [8] Lau et al. (2008), Nat. Rev. Neurosci.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A1

Oualil, Youssef; Klakow, Dietrich

A batch noise contrastive estimation approach for training large vocabulary language models Inproceedings

18th Annual Conference of the International Speech Communication Association (INTERSPEECH), 2017.

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.

@inproceedings{Oualil2017,
title = {A batch noise contrastive estimation approach for training large vocabulary language models},
author = {Youssef Oualil and Dietrich Klakow},
url = {https://arxiv.org/abs/1708.05997},
year = {2017},
date = {2017},
publisher = {18th Annual Conference of the International Speech Communication Association (INTERSPEECH)},
abstract = {Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Brandt, Erika; Zimmerer, Frank; Möbius, Bernd; Andreeva, Bistra

Mel-cepstral distortion of German vowels in different information density contexts Inproceedings

Proceedings of Interspeech, Stockholm, Sweden, 2017.

This study investigated whether German vowels differ significantly from each other in mel-cepstral distortion (MCD) when they stand in different information density (ID) contexts. We hypothesized that vowels in the same ID contexts are more similar to each other than vowels that stand in different ID conditions. Read speech material from PhonDat2 of 16 German natives (m = 10, f = 6) was analyzed. Bi-phone and word language models were calculated based on DeWaC. To account for additional variability in the data, prosodic factors, as well as corpusspecific frequency values were also entered into the statistical models. Results showed that vowels in different ID conditions were significantly different in their MCD values. Unigram word probability and corpus-specific word frequency showed the expected effect on vowel similarity with a hierarchy between noncontrasting and contrasting conditions. However, these did not form a homogeneous group since there were group-internal significant differences. The largest distance can be found between vowels produced at fast speech rate, and between unstressed vowels.

@inproceedings{Brandt/etal:2017,
title = {Mel-cepstral distortion of German vowels in different information density contexts},
author = {Erika Brandt and Frank Zimmerer and Bernd M{\"o}bius and Bistra Andreeva},
url = {https://www.researchgate.net/publication/319185343_Mel-Cepstral_Distortion_of_German_Vowels_in_Different_Information_Density_Contexts},
year = {2017},
date = {2017},
booktitle = {Proceedings of Interspeech},
address = {Stockholm, Sweden},
abstract = {This study investigated whether German vowels differ significantly from each other in mel-cepstral distortion (MCD) when they stand in different information density (ID) contexts. We hypothesized that vowels in the same ID contexts are more similar to each other than vowels that stand in different ID conditions. Read speech material from PhonDat2 of 16 German natives (m = 10, f = 6) was analyzed. Bi-phone and word language models were calculated based on DeWaC. To account for additional variability in the data, prosodic factors, as well as corpusspecific frequency values were also entered into the statistical models. Results showed that vowels in different ID conditions were significantly different in their MCD values. Unigram word probability and corpus-specific word frequency showed the expected effect on vowel similarity with a hierarchy between noncontrasting and contrasting conditions. However, these did not form a homogeneous group since there were group-internal significant differences. The largest distance can be found between vowels produced at fast speech rate, and between unstressed vowels.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Horch, Eva; Reich, Ingo

The Fragment Corpus Inproceedings

Proceedings of the 9th International Corpus Linguistics Conference, pp. 392-393, Birmingham, UK, 2017.

We present the Fragment Corpus (FraC), a corpus for the investigation of fragments (see Morgan 1973), i.e. incomplete sentences, in German. The corpus is a mixed-register corpus and consists of 17 different text types including written (newspaper texts, legal texts, blogs etc.) and spoken texts (dialogues, interviews, radio moderations etc.), as well as social media texts (tweets, sms, chats). Each subcorpus comprises approx. 2000 utterance units (including fragments, following the orthographic notion of sentence) which amounts to a total corpus size of 380K tokens. The data was taken from electronically available sources.

@inproceedings{HorchReich:17,
title = {The Fragment Corpus},
author = {Eva Horch and Ingo Reich},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/30290},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 9th International Corpus Linguistics Conference},
pages = {392-393},
address = {Birmingham, UK},
abstract = {We present the Fragment Corpus (FraC), a corpus for the investigation of fragments (see Morgan 1973), i.e. incomplete sentences, in German. The corpus is a mixed-register corpus and consists of 17 different text types including written (newspaper texts, legal texts, blogs etc.) and spoken texts (dialogues, interviews, radio moderations etc.), as well as social media texts (tweets, sms, chats). Each subcorpus comprises approx. 2000 utterance units (including fragments, following the orthographic notion of sentence) which amounts to a total corpus size of 380K tokens. The data was taken from electronically available sources.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Wanzare, Lilian Diana Awuor; Zarcone, Alessandra; Thater, Stefan; Pinkal, Manfred

Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering Inproceedings

Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, Association for Computational Linguistics, pp. 1-11, Valencia, Spain, 2017.

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.

@inproceedings{wanzare-EtAl:2017:LSDSem,
title = {Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering},
author = {Lilian Diana Awuor Wanzare and Alessandra Zarcone and Stefan Thater and Manfred Pinkal},
url = {https://www.aclweb.org/anthology/W17-0901},
doi = {https://doi.org/10.18653/v1/W17-0901},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics},
pages = {1-11},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A2

Brouwer, Harm; Crocker, Matthew W.; Venhuizen, Noortje; Hoeks, John

A neurocomputational model of the N400 and P600 in language processing Journal Article

Cognitive Sciences, 41, pp. 1318-1352, 2017.

Ten years ago, researchers using event‐related brain potentials (ERPs) to study language comprehension were puzzled by what looked like a Semantic Illusion: Semantically anomalous, but structurally well‐formed sentences did not affect the N400 component—traditionally taken to reflect semantic integration—but instead produced a P600 effect, which is generally linked to syntactic processing. This finding led to a considerable amount of debate, and a number of complex processing models have been proposed as an explanation. What these models have in common is that they postulate two or more separate processing streams, in order to reconcile the Semantic Illusion and other semantically induced P600 effects with the traditional interpretations of the N400 and the P600. Recently, however, these multi‐stream models have been called into question, and a simpler single‐stream model has been proposed. According to this alternative model, the N400 component reflects the retrieval of word meaning from semantic memory, and the P600 component indexes the integration of this meaning into the unfolding utterance interpretation. In the present paper, we provide support for this “Retrieval–Integration (RI)” account by instantiating it as a neurocomputational model. This neurocomputational model is the first to successfully simulate the N400 and P600 amplitude in language comprehension, and simulations with this model provide a proof of concept of the single‐stream RI account of semantically induced patterns of N400 and P600 modulations.

@article{Brouwer2017,
title = {A neurocomputational model of the N400 and P600 in language processing},
author = {Harm Brouwer and Matthew W. Crocker and Noortje Venhuizen and John Hoeks},
url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5484319/},
year = {2017},
date = {2017},
journal = {Cognitive Sciences},
pages = {1318-1352},
volume = {41},
abstract = {

Ten years ago, researchers using event‐related brain potentials (ERPs) to study language comprehension were puzzled by what looked like a Semantic Illusion: Semantically anomalous, but structurally well‐formed sentences did not affect the N400 component—traditionally taken to reflect semantic integration—but instead produced a P600 effect, which is generally linked to syntactic processing. This finding led to a considerable amount of debate, and a number of complex processing models have been proposed as an explanation. What these models have in common is that they postulate two or more separate processing streams, in order to reconcile the Semantic Illusion and other semantically induced P600 effects with the traditional interpretations of the N400 and the P600. Recently, however, these multi‐stream models have been called into question, and a simpler single‐stream model has been proposed. According to this alternative model, the N400 component reflects the retrieval of word meaning from semantic memory, and the P600 component indexes the integration of this meaning into the unfolding utterance interpretation. In the present paper, we provide support for this “Retrieval–Integration (RI)” account by instantiating it as a neurocomputational model. This neurocomputational model is the first to successfully simulate the N400 and P600 amplitude in language comprehension, and simulations with this model provide a proof of concept of the single‐stream RI account of semantically induced patterns of N400 and P600 modulations.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Raveh, Eran; Gessinger, Iona; Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Investigating Phonetic Convergence in a Shadowing Experiment with Synthetic Stimuli Inproceedings

Trouvain, Jürgen; Steiner, Ingmar; Möbius, Bernd;  (Ed.): 28th Conference on Electronic Speech Signal Processing (ESSV), pp. 254-261, Saarbrücken, Germany, 2017.

This paper presents a shadowing experiment with synthetic stimuli, whose goal is to investigate phonetic convergence in a human-computer interaction paradigm. Comparisons to the results of a previous experiment with natural stimuli are made. The process of generating the synthetic stimuli, which are based on the natural ones, is described as well.

@inproceedings{Raveh2017ESSV,
title = {Investigating Phonetic Convergence in a Shadowing Experiment with Synthetic Stimuli},
author = {Eran Raveh and Iona Gessinger and S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
editor = {J{\"u}rgen Trouvain and Ingmar Steiner and Bernd M{\"o}bius},
url = {https://www.semanticscholar.org/paper/Investigating-Phonetic-Convergence-in-a-Shadowing-Raveh-Gessinger/c296fb0e3ad53cd690a2845827c762046fce2bbe},
year = {2017},
date = {2017},
booktitle = {28th Conference on Electronic Speech Signal Processing (ESSV)},
pages = {254-261},
address = {Saarbr{\"u}cken, Germany},
abstract = {This paper presents a shadowing experiment with synthetic stimuli, whose goal is to investigate phonetic convergence in a human-computer interaction paradigm. Comparisons to the results of a previous experiment with natural stimuli are made. The process of generating the synthetic stimuli, which are based on the natural ones, is described as well.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Successfully