Publications

Delogu, Francesca; Brouwer, Harm; Crocker, Matthew W.

The influence of lexical priming versus event knowledge on the N400 and the P600 Miscellaneous

23rd AMLaP Conference, Lancaster, UK, 2017.
In online language comprehension, the N400 component of the Event-Related Potentials (ERP) signal is inversely proportional to semantic expectancy (Kutas & Federmeier, 2011). Among other factors, a word’s expectancy is influenced by both lexical-level (Bentin et al., 1985) as well as event-level (Metusalem et al., 2012) priming: the N400 amplitude is reduced if the eliciting word is semantically related to prior words in the context and/or when it is consistent with the event being described. Perhaps the most extreme instance of such facilitatory effects arises in the processing of reversal anomalies (see Brouwer et al., 2012 for review). Here, a word that renders a sentence semantically anomalous, such as “eat” in “For breakfast the eggs would eat”, produces no difference in N400 amplitude relative to a non-anomalous control “For breakfast the boys would eat” (Kuperberg et al., 2007). Indeed, the absence of an N400-effect for contrasts such as these suggest that the critical word eat is equally facilitated in both the target and the control condition. An open question, however, is whether these effects are predominantly driven by lexical-level or event-level priming. To address this question, we conducted an ERP experiment in which we explicitly deactivate the event under discussion in order to mitigate event-level priming effects on the critical word.

@miscellaneous{Delogu2017b,
title = {The influence of lexical priming versus event knowledge on the N400 and the P600},
author = {Francesca Delogu and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/319543522_The_influence_of_lexical_priming_versus_event_knowledge_on_the_N400_and_the_P600},
year = {2017},
date = {2017},
publisher = {23rd AMLaP Conference},
address = {Lancaster, UK},
abstract = {

In online language comprehension, the N400 component of the Event-Related Potentials (ERP) signal is inversely proportional to semantic expectancy (Kutas & Federmeier, 2011). Among other factors, a word’s expectancy is influenced by both lexical-level (Bentin et al., 1985) as well as event-level (Metusalem et al., 2012) priming: the N400 amplitude is reduced if the eliciting word is semantically related to prior words in the context and/or when it is consistent with the event being described. Perhaps the most extreme instance of such facilitatory effects arises in the processing of reversal anomalies (see Brouwer et al., 2012 for review). Here, a word that renders a sentence semantically anomalous, such as “eat” in “For breakfast the eggs would eat”, produces no difference in N400 amplitude relative to a non-anomalous control “For breakfast the boys would eat” (Kuperberg et al., 2007). Indeed, the absence of an N400-effect for contrasts such as these suggest that the critical word eat is equally facilitated in both the target and the control condition. An open question, however, is whether these effects are predominantly driven by lexical-level or event-level priming. To address this question, we conducted an ERP experiment in which we explicitly deactivate the event under discussion in order to mitigate event-level priming effects on the critical word.
},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   A1

Simova, Iliana; Uszkoreit, Hans

Word Embeddings as Features for Supervised Coreference Resolution Inproceedings

Proceedings of Recent Advances in Natural Language Processing, INCOMA Ltd., pp. 686-693, Varna, Bulgaria, 2017.

A common reason for errors in coreference resolution is the lack of semantic information to help determine the compatibility between mentions referring to the same entity. Distributed representations, which have been shown successful in encoding relatedness between words, could potentially be a good source of such knowledge. Moreover, being obtained in an unsupervised manner, they could help address data sparsity issues in labeled training data at a small cost. In this work we investigate whether and to what extend features derived from word embeddings can be successfully used for supervised coreference resolution. We experiment with several word embedding models, and several different types of embeddingbased features, including embedding cluster and cosine similarity-based features. Our evaluations show improvements in the performance of a supervised state-of-theart coreference system.

@inproceedings{simova:2017,
title = {Word Embeddings as Features for Supervised Coreference Resolution},
author = {Iliana Simova and Hans Uszkoreit},
url = {https://aclanthology.org/R17-1088/},
doi = {https://doi.org/10.26615/978-954-452-049-6_088},
year = {2017},
date = {2017},
booktitle = {Proceedings of Recent Advances in Natural Language Processing},
pages = {686-693},
publisher = {INCOMA Ltd.},
address = {Varna, Bulgaria},
abstract = {A common reason for errors in coreference resolution is the lack of semantic information to help determine the compatibility between mentions referring to the same entity. Distributed representations, which have been shown successful in encoding relatedness between words, could potentially be a good source of such knowledge. Moreover, being obtained in an unsupervised manner, they could help address data sparsity issues in labeled training data at a small cost. In this work we investigate whether and to what extend features derived from word embeddings can be successfully used for supervised coreference resolution. We experiment with several word embedding models, and several different types of embeddingbased features, including embedding cluster and cosine similarity-based features. Our evaluations show improvements in the performance of a supervised state-of-theart coreference system.},
keywords = {B5, sfb 1102},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B5

Le Maguer, Sébastien; Steiner, Ingmar

The "Uprooted" MaryTTS Entry for the Blizzard Challenge 2017 Inproceedings

Blizzard Challenge, Stockholm, Sweden, 2017.

The MaryTTS system is a modular text-to-speech (TTS) system which has been developed for nearly 20 years. This paper describes the MaryTTS entry for the Blizzard Challenge 2017. In contrast to last year’s MaryTTS system, based on a unit selection baseline using the latest stable MaryTTS version, the basis for this year’s system is a new, experimental version with a completely redesigned architecture.

@inproceedings{LeMaguer2017BC,
title = {The "Uprooted" MaryTTS Entry for the Blizzard Challenge 2017},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner},
url = {http://mary.dfki.de/documentation/publications/index.html},
year = {2017},
date = {2017},
booktitle = {Blizzard Challenge},
address = {Stockholm, Sweden},
abstract = {The MaryTTS system is a modular text-to-speech (TTS) system which has been developed for nearly 20 years. This paper describes the MaryTTS entry for the Blizzard Challenge 2017. In contrast to last year’s MaryTTS system, based on a unit selection baseline using the latest stable MaryTTS version, the basis for this year’s system is a new, experimental version with a completely redesigned architecture.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Gessinger, Iona; Raveh, Eran; Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Shadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence Inproceedings

Interspeech, pp. 3797-3801, Stockholm, Sweden, 2017.

To shed light on the question whether humans converge phonetically to synthesized speech, a shadowing experiment was conducted using three different types of stimuli – natural speaker, diphone synthesis, and HMM synthesis. Three segment-level phonetic features of German that are well-known to vary across native speakers were examined. The first feature triggered convergence in roughly one third of the cases for all stimulus types. The second feature showed generally a small amount of convergence, which may be due to the nature of the feature itself. Still the effect was strongest for the natural stimuli, followed by the HMM stimuli and weakest for the diphone stimuli. The effect of the third feature was clearly observable for the natural stimuli and less pronounced in the synthetic stimuli. This is presumably a result of the partly insufficient perceptibility of this target feature in the synthetic stimuli and demonstrates the necessity of gaining fine-grained control over the synthesis output, should it be intended to implement capabilities of phonetic convergence on the segmental level in spoken dialogue systems

@inproceedings{Gessinger2017IS,
title = {Shadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence},
author = {Iona Gessinger and Eran Raveh and S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/29623},
year = {2017},
date = {2017},
booktitle = {Interspeech},
pages = {3797-3801},
address = {Stockholm, Sweden},
abstract = {To shed light on the question whether humans converge phonetically to synthesized speech, a shadowing experiment was conducted using three different types of stimuli – natural speaker, diphone synthesis, and HMM synthesis. Three segment-level phonetic features of German that are well-known to vary across native speakers were examined. The first feature triggered convergence in roughly one third of the cases for all stimulus types. The second feature showed generally a small amount of convergence, which may be due to the nature of the feature itself. Still the effect was strongest for the natural stimuli, followed by the HMM stimuli and weakest for the diphone stimuli. The effect of the third feature was clearly observable for the natural stimuli and less pronounced in the synthetic stimuli. This is presumably a result of the partly insufficient perceptibility of this target feature in the synthetic stimuli and demonstrates the necessity of gaining fine-grained control over the synthesis output, should it be intended to implement capabilities of phonetic convergence on the segmental level in spoken dialogue systems},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Le Maguer, Sébastien; Steiner, Ingmar; Hewer, Alexander

An HMM/DNN comparison for synchronized text-to-speech and tongue motion synthesis Inproceedings

Proc. Interspeech 2017, pp. 239-243, Stockholm, Sweden, 2017.

We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a statistical shape space model of the tongue surface to an articulatory speech corpus and training a speech synthesis system directly on the tongue model parameter weights. We focus our analysis on the application of two standard methodologies, based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs), respectively, to train both acoustic models and the tongue model parameter weights. We evaluate both methodologies at every step by comparing the predicted articulatory movements against the reference data. The results show that even with less than 2h of data, DNNs already outperform HMMs.

@inproceedings{LeMaguer2017IS,
title = {An HMM/DNN comparison for synchronized text-to-speech and tongue motion synthesis},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner and Alexander Hewer},
url = {https://www.isca-speech.org/archive/interspeech_2017/maguer17_interspeech.html},
doi = {https://doi.org/10.21437/Interspeech.2017-936},
year = {2017},
date = {2017},
booktitle = {Proc. Interspeech 2017},
pages = {239-243},
address = {Stockholm, Sweden},
abstract = {We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a statistical shape space model of the tongue surface to an articulatory speech corpus and training a speech synthesis system directly on the tongue model parameter weights. We focus our analysis on the application of two standard methodologies, based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs), respectively, to train both acoustic models and the tongue model parameter weights. We evaluate both methodologies at every step by comparing the predicted articulatory movements against the reference data. The results show that even with less than 2h of data, DNNs already outperform HMMs.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Delogu, Francesca; Brouwer, Harm; Crocker, Matthew W.

The P600 - not the N400 - indexes semantic integration Inproceedings

9th Annual Meeting of the Society for the Neurobiology of Language (SNL), Baltimore, US, 2017.
The N400 and P600 are the two most salient language-sensitive components of the Event-Related Potential (ERP) signal. Yet, their functional interpretation is still a matter of debate. Traditionally, the N400 is taken to reflect processes of semantic integration while the P600 is linked to structural reanalysis [1,2]. These views have, however, been challenged by so-called Semantic Illusions (SIs), where semantically anomalous target words produce P600-rather than N400-effects (e.g., “For breakfast the eggs/boys would eat”, [3]). To account for these findings, complex multi-stream models of language processing have been proposed in an attempt to maintain the traditional views on the N400 and the P600 (see [4] for a review). However, these models fail to account for SIs in wider discourse [5] and/or in absence of semantic violations [6]. In contrast, the Retrieval-Integration (RI) account [4] puts forward an explanation for elicitation pattern of the N400 and the P600 by rethinking their functional interpretations. According to the RI account, N400 amplitude reflects retrieval of lexical-semantic information form long-term memory, and is therefore sensitive to priming (in line with [7,8]), while processes of semantic integration are indexed by the P600. To provide decisive evidence for the P600/Integration hypothesis, we conducted an ERP study in which twenty-one participants read short discourses in which a non-anomalous target word (“menu”) was easy (a. John entered the restaurant. Before long he opened the menu and […]) vs. difficult (b. John left the restaurant. Before long he opened the menu and […]) to integrate into the unfolding discourse representation, but, crucially, was equally primed by the two contexts (through the word “restaurant”). The reduced plausibility of (b) compared to (a) was confirmed by offline plausibility ratings. Here, traditional accounts predict that difficulty in integrating the target word in (b) should elicit an N400-effect, and no P600-effect. By contrast, the RI account predicts no N400-effect (due to similar priming), but a P600-effect indexing semantic integration difficulty. As predicted by RI, we observed a larger P600 for (b) relative to (a), and no difference in N400 amplitude. Importantly, an N400-effect was observed for a further control condition in which the target word “menu” was not primed by the context (e.g., “John entered the apartment”), which elicited an increased N400 amplitude relative to (a) and (b). Taken together, our results provide clear evidence for the RI account: semantic integration is indexed by the P600 component, while the N400 is predominantly driven by priming. Our findings highlight the importance of establishing specific linking hypotheses to the N400 and P600 components in order to properly interpret ERP results for the development of more informed neurobiological models of language. [1] Brown & Hagoort (1993), JCN; [2] Osterhout & Holcomb (1992), JML; [3] Kuperberg et al. (2003), Brain Res Cogn Brain Res.; [4] Brouwer et al. (2012), Brain Res.; [5] Nieuwland & Van Berkum (2005), Cogn. Brain Res.; [6] Chow & Phillips (2013), Brain Res.; [7] Kutas & Federmeier (2000), TiCS; [8] Lau et al. (2008), Nat. Rev. Neurosci.

@inproceedings{Delogu2017c,
title = {The P600 - not the N400 - indexes semantic integration},
author = {Francesca Delogu and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/320979082_The_P600_-_not_the_N400_-_indexes_semantic_integration},
year = {2017},
date = {2017},
publisher = {9th Annual Meeting of the Society for the Neurobiology of Language (SNL)},
address = {Baltimore, US},
abstract = {

The N400 and P600 are the two most salient language-sensitive components of the Event-Related Potential (ERP) signal. Yet, their functional interpretation is still a matter of debate. Traditionally, the N400 is taken to reflect processes of semantic integration while the P600 is linked to structural reanalysis [1,2]. These views have, however, been challenged by so-called Semantic Illusions (SIs), where semantically anomalous target words produce P600-rather than N400-effects (e.g., “For breakfast the eggs/boys would eat”, [3]). To account for these findings, complex multi-stream models of language processing have been proposed in an attempt to maintain the traditional views on the N400 and the P600 (see [4] for a review). However, these models fail to account for SIs in wider discourse [5] and/or in absence of semantic violations [6]. In contrast, the Retrieval-Integration (RI) account [4] puts forward an explanation for elicitation pattern of the N400 and the P600 by rethinking their functional interpretations. According to the RI account, N400 amplitude reflects retrieval of lexical-semantic information form long-term memory, and is therefore sensitive to priming (in line with [7,8]), while processes of semantic integration are indexed by the P600. To provide decisive evidence for the P600/Integration hypothesis, we conducted an ERP study in which twenty-one participants read short discourses in which a non-anomalous target word (“menu”) was easy (a. John entered the restaurant. Before long he opened the menu and [...]) vs. difficult (b. John left the restaurant. Before long he opened the menu and [...]) to integrate into the unfolding discourse representation, but, crucially, was equally primed by the two contexts (through the word “restaurant”). The reduced plausibility of (b) compared to (a) was confirmed by offline plausibility ratings. Here, traditional accounts predict that difficulty in integrating the target word in (b) should elicit an N400-effect, and no P600-effect. By contrast, the RI account predicts no N400-effect (due to similar priming), but a P600-effect indexing semantic integration difficulty. As predicted by RI, we observed a larger P600 for (b) relative to (a), and no difference in N400 amplitude. Importantly, an N400-effect was observed for a further control condition in which the target word “menu” was not primed by the context (e.g., “John entered the apartment”), which elicited an increased N400 amplitude relative to (a) and (b). Taken together, our results provide clear evidence for the RI account: semantic integration is indexed by the P600 component, while the N400 is predominantly driven by priming. Our findings highlight the importance of establishing specific linking hypotheses to the N400 and P600 components in order to properly interpret ERP results for the development of more informed neurobiological models of language. [1] Brown & Hagoort (1993), JCN; [2] Osterhout & Holcomb (1992), JML; [3] Kuperberg et al. (2003), Brain Res Cogn Brain Res.; [4] Brouwer et al. (2012), Brain Res.; [5] Nieuwland & Van Berkum (2005), Cogn. Brain Res.; [6] Chow & Phillips (2013), Brain Res.; [7] Kutas & Federmeier (2000), TiCS; [8] Lau et al. (2008), Nat. Rev. Neurosci.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A1

Oualil, Youssef; Klakow, Dietrich

A batch noise contrastive estimation approach for training large vocabulary language models Inproceedings

18th Annual Conference of the International Speech Communication Association (INTERSPEECH), 2017.

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.

@inproceedings{Oualil2017,
title = {A batch noise contrastive estimation approach for training large vocabulary language models},
author = {Youssef Oualil and Dietrich Klakow},
url = {https://arxiv.org/abs/1708.05997},
year = {2017},
date = {2017},
publisher = {18th Annual Conference of the International Speech Communication Association (INTERSPEECH)},
abstract = {Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Brandt, Erika; Zimmerer, Frank; Möbius, Bernd; Andreeva, Bistra

Mel-cepstral distortion of German vowels in different information density contexts Inproceedings

Proceedings of Interspeech, Stockholm, Sweden, 2017.

This study investigated whether German vowels differ significantly from each other in mel-cepstral distortion (MCD) when they stand in different information density (ID) contexts. We hypothesized that vowels in the same ID contexts are more similar to each other than vowels that stand in different ID conditions. Read speech material from PhonDat2 of 16 German natives (m = 10, f = 6) was analyzed. Bi-phone and word language models were calculated based on DeWaC. To account for additional variability in the data, prosodic factors, as well as corpusspecific frequency values were also entered into the statistical models. Results showed that vowels in different ID conditions were significantly different in their MCD values. Unigram word probability and corpus-specific word frequency showed the expected effect on vowel similarity with a hierarchy between noncontrasting and contrasting conditions. However, these did not form a homogeneous group since there were group-internal significant differences. The largest distance can be found between vowels produced at fast speech rate, and between unstressed vowels.

@inproceedings{Brandt/etal:2017,
title = {Mel-cepstral distortion of German vowels in different information density contexts},
author = {Erika Brandt and Frank Zimmerer and Bernd M{\"o}bius and Bistra Andreeva},
url = {https://www.researchgate.net/publication/319185343_Mel-Cepstral_Distortion_of_German_Vowels_in_Different_Information_Density_Contexts},
year = {2017},
date = {2017},
booktitle = {Proceedings of Interspeech},
address = {Stockholm, Sweden},
abstract = {This study investigated whether German vowels differ significantly from each other in mel-cepstral distortion (MCD) when they stand in different information density (ID) contexts. We hypothesized that vowels in the same ID contexts are more similar to each other than vowels that stand in different ID conditions. Read speech material from PhonDat2 of 16 German natives (m = 10, f = 6) was analyzed. Bi-phone and word language models were calculated based on DeWaC. To account for additional variability in the data, prosodic factors, as well as corpusspecific frequency values were also entered into the statistical models. Results showed that vowels in different ID conditions were significantly different in their MCD values. Unigram word probability and corpus-specific word frequency showed the expected effect on vowel similarity with a hierarchy between noncontrasting and contrasting conditions. However, these did not form a homogeneous group since there were group-internal significant differences. The largest distance can be found between vowels produced at fast speech rate, and between unstressed vowels.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Horch, Eva; Reich, Ingo

The Fragment Corpus Inproceedings

Proceedings of the 9th International Corpus Linguistics Conference, pp. 392-393, Birmingham, UK, 2017.

We present the Fragment Corpus (FraC), a corpus for the investigation of fragments (see Morgan 1973), i.e. incomplete sentences, in German. The corpus is a mixed-register corpus and consists of 17 different text types including written (newspaper texts, legal texts, blogs etc.) and spoken texts (dialogues, interviews, radio moderations etc.), as well as social media texts (tweets, sms, chats). Each subcorpus comprises approx. 2000 utterance units (including fragments, following the orthographic notion of sentence) which amounts to a total corpus size of 380K tokens. The data was taken from electronically available sources.

@inproceedings{HorchReich:17,
title = {The Fragment Corpus},
author = {Eva Horch and Ingo Reich},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/30290},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 9th International Corpus Linguistics Conference},
pages = {392-393},
address = {Birmingham, UK},
abstract = {We present the Fragment Corpus (FraC), a corpus for the investigation of fragments (see Morgan 1973), i.e. incomplete sentences, in German. The corpus is a mixed-register corpus and consists of 17 different text types including written (newspaper texts, legal texts, blogs etc.) and spoken texts (dialogues, interviews, radio moderations etc.), as well as social media texts (tweets, sms, chats). Each subcorpus comprises approx. 2000 utterance units (including fragments, following the orthographic notion of sentence) which amounts to a total corpus size of 380K tokens. The data was taken from electronically available sources.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Wanzare, Lilian Diana Awuor; Zarcone, Alessandra; Thater, Stefan; Pinkal, Manfred

Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering Inproceedings

Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, Association for Computational Linguistics, pp. 1-11, Valencia, Spain, 2017.

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.

@inproceedings{wanzare-EtAl:2017:LSDSem,
title = {Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering},
author = {Lilian Diana Awuor Wanzare and Alessandra Zarcone and Stefan Thater and Manfred Pinkal},
url = {https://www.aclweb.org/anthology/W17-0901},
doi = {https://doi.org/10.18653/v1/W17-0901},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics},
pages = {1-11},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A2

Brouwer, Harm; Crocker, Matthew W.; Venhuizen, Noortje; Hoeks, John

A neurocomputational model of the N400 and P600 in language processing Journal Article

Cognitive Sciences, 41, pp. 1318-1352, 2017.

Ten years ago, researchers using event‐related brain potentials (ERPs) to study language comprehension were puzzled by what looked like a Semantic Illusion: Semantically anomalous, but structurally well‐formed sentences did not affect the N400 component—traditionally taken to reflect semantic integration—but instead produced a P600 effect, which is generally linked to syntactic processing. This finding led to a considerable amount of debate, and a number of complex processing models have been proposed as an explanation. What these models have in common is that they postulate two or more separate processing streams, in order to reconcile the Semantic Illusion and other semantically induced P600 effects with the traditional interpretations of the N400 and the P600. Recently, however, these multi‐stream models have been called into question, and a simpler single‐stream model has been proposed. According to this alternative model, the N400 component reflects the retrieval of word meaning from semantic memory, and the P600 component indexes the integration of this meaning into the unfolding utterance interpretation. In the present paper, we provide support for this “Retrieval–Integration (RI)” account by instantiating it as a neurocomputational model. This neurocomputational model is the first to successfully simulate the N400 and P600 amplitude in language comprehension, and simulations with this model provide a proof of concept of the single‐stream RI account of semantically induced patterns of N400 and P600 modulations.

@article{Brouwer2017,
title = {A neurocomputational model of the N400 and P600 in language processing},
author = {Harm Brouwer and Matthew W. Crocker and Noortje Venhuizen and John Hoeks},
url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5484319/},
year = {2017},
date = {2017},
journal = {Cognitive Sciences},
pages = {1318-1352},
volume = {41},
abstract = {

Ten years ago, researchers using event‐related brain potentials (ERPs) to study language comprehension were puzzled by what looked like a Semantic Illusion: Semantically anomalous, but structurally well‐formed sentences did not affect the N400 component—traditionally taken to reflect semantic integration—but instead produced a P600 effect, which is generally linked to syntactic processing. This finding led to a considerable amount of debate, and a number of complex processing models have been proposed as an explanation. What these models have in common is that they postulate two or more separate processing streams, in order to reconcile the Semantic Illusion and other semantically induced P600 effects with the traditional interpretations of the N400 and the P600. Recently, however, these multi‐stream models have been called into question, and a simpler single‐stream model has been proposed. According to this alternative model, the N400 component reflects the retrieval of word meaning from semantic memory, and the P600 component indexes the integration of this meaning into the unfolding utterance interpretation. In the present paper, we provide support for this “Retrieval–Integration (RI)” account by instantiating it as a neurocomputational model. This neurocomputational model is the first to successfully simulate the N400 and P600 amplitude in language comprehension, and simulations with this model provide a proof of concept of the single‐stream RI account of semantically induced patterns of N400 and P600 modulations.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Raveh, Eran; Gessinger, Iona; Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Investigating Phonetic Convergence in a Shadowing Experiment with Synthetic Stimuli Inproceedings

Trouvain, Jürgen; Steiner, Ingmar; Möbius, Bernd;  (Ed.): 28th Conference on Electronic Speech Signal Processing (ESSV), pp. 254-261, Saarbrücken, Germany, 2017.

This paper presents a shadowing experiment with synthetic stimuli, whose goal is to investigate phonetic convergence in a human-computer interaction paradigm. Comparisons to the results of a previous experiment with natural stimuli are made. The process of generating the synthetic stimuli, which are based on the natural ones, is described as well.

@inproceedings{Raveh2017ESSV,
title = {Investigating Phonetic Convergence in a Shadowing Experiment with Synthetic Stimuli},
author = {Eran Raveh and Iona Gessinger and S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
editor = {J{\"u}rgen Trouvain and Ingmar Steiner and Bernd M{\"o}bius},
url = {https://www.semanticscholar.org/paper/Investigating-Phonetic-Convergence-in-a-Shadowing-Raveh-Gessinger/c296fb0e3ad53cd690a2845827c762046fce2bbe},
year = {2017},
date = {2017},
booktitle = {28th Conference on Electronic Speech Signal Processing (ESSV)},
pages = {254-261},
address = {Saarbr{\"u}cken, Germany},
abstract = {This paper presents a shadowing experiment with synthetic stimuli, whose goal is to investigate phonetic convergence in a human-computer interaction paradigm. Comparisons to the results of a previous experiment with natural stimuli are made. The process of generating the synthetic stimuli, which are based on the natural ones, is described as well.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Steiner, Ingmar; Le Maguer, Sébastien; Manzoni, Judith; Gilles, Peter; Trouvain, Jürgen

Developing new language tools for MaryTTS: the case of Luxembourgish Inproceedings

Trouvain, Jürgen; Steiner, Ingmar; Möbius, Bernd;  (Ed.): 28th Conference on Electronic Speech Signal Processing (ESSV), pp. 186-192, Saarbrücken, Germany, 2017.

We present new methods and resources which have been used to create a text to speech (TTS) synthesis system for the Luxembourgish language. The system uses the MaryTTS platform, which is extended with new natural language processing (NLP) components. We designed and recorded a multilingual, phonetically balanced speech corpus, and used it to build a new Luxembourgish synthesis voice. All speech data and software has been published under an open-source license and is freely available online.

@inproceedings{Steiner2017ESSVb,
title = {Developing new language tools for MaryTTS: the case of Luxembourgish},
author = {Ingmar Steiner and S{\'e}bastien Le Maguer and Judith Manzoni and Peter Gilles and J{\"u}rgen Trouvain},
editor = {J{\"u}rgen Trouvain and Ingmar Steiner and Bernd M{\"o}bius},
url = {https://www.semanticscholar.org/paper/THE-CASE-OF-LUXEMBOURGISH-Steiner-Maguer/7ca34b3c6460008c013a6ac799336a5f30fc9878},
year = {2017},
date = {2017},
booktitle = {28th Conference on Electronic Speech Signal Processing (ESSV)},
pages = {186-192},
address = {Saarbr{\"u}cken, Germany},
abstract = {We present new methods and resources which have been used to create a text to speech (TTS) synthesis system for the Luxembourgish language. The system uses the MaryTTS platform, which is extended with new natural language processing (NLP) components. We designed and recorded a multilingual, phonetically balanced speech corpus, and used it to build a new Luxembourgish synthesis voice. All speech data and software has been published under an open-source license and is freely available online.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Zimmerer, Frank; Andreeva, Bistra; Möbius, Bernd; Malisz, Zofia; Ferragne, Emmanuel; Pellegrino, François; Brandt, Erika

Perzeption von Sprechgeschwindigkeit und der (nicht nachgewiesene) Einfluss von Surprisal Inproceedings

Möbius, Bernd;  (Ed.): Elektronische Sprachsignalverarbeitung 2017 - Tagungsband der 28. Konferenz, Saarbrücken, 15.-17. März 2017. Studientexte zur Sprachkommunikation, Band 86, pp. 174-179, 2017.

In zwei Perzeptionsexperimenten wurde die Perzeption von Sprechgeschwindigkeit untersucht. Ein Faktor, der dabei besonders im Zentrum des Interesses steht, ist Surprisal, ein informationstheoretisches Maß für die Vorhersagbarkeit einer linguistischen Einheit im Kontext. Zusammengenommen legen die Ergebnisse der Experimente den Schluss nahe, dass Surprisal keinen signifikanten Einfluss auf die Wahrnehmung von Sprechgeschwindigkeit ausübt.

@inproceedings{Zimmerer/etal:2017a,
title = {Perzeption von Sprechgeschwindigkeit und der (nicht nachgewiesene) Einfluss von Surprisal},
author = {Frank Zimmerer and Bistra Andreeva and Bernd M{\"o}bius and Zofia Malisz and Emmanuel Ferragne and François Pellegrino and Erika Brandt},
editor = {Bernd M{\"o}bius},
url = {https://www.researchgate.net/publication/318589916_PERZEPTION_VON_SPRECHGESCHWINDIGKEIT_UND_DER_NICHT_NACHGEWIESENE_EINFLUSS_VON_SURPRISAL},
year = {2017},
date = {2017-03-15},
booktitle = {Elektronische Sprachsignalverarbeitung 2017 - Tagungsband der 28. Konferenz, Saarbr{\"u}cken, 15.-17. M{\"a}rz 2017. Studientexte zur Sprachkommunikation, Band 86},
pages = {174-179},
abstract = {In zwei Perzeptionsexperimenten wurde die Perzeption von Sprechgeschwindigkeit untersucht. Ein Faktor, der dabei besonders im Zentrum des Interesses steht, ist Surprisal, ein informationstheoretisches Ma{\ss} f{\"u}r die Vorhersagbarkeit einer linguistischen Einheit im Kontext. Zusammengenommen legen die Ergebnisse der Experimente den Schluss nahe, dass Surprisal keinen signifikanten Einfluss auf die Wahrnehmung von Sprechgeschwindigkeit aus{\"u}bt.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Le Maguer, Sébastien; Steiner, Ingmar

Uprooting MaryTTS: Agile Processing and Voicebuilding Inproceedings

Trouvain, Jürgen; Steiner, Ingmar; Möbius, Bernd;  (Ed.): 28th Conference on Electronic Speech Signal Processing (ESSV), pp. 152-159, Saarbrücken, Germany, 2017.

MaryTTS is a modular speech synthesis system whose development started around 2003. The system is open-source and has grown significantly thanks to the contribution of the community. However, the drawback is an increase in the complexity of the system. This complexity has now reached a stage where the system is complicated to analyze and maintain. The current paper presents the new architecture of the MaryTTS system. This architecture aims to simplify the maintenance but also to provide more flexibility in the use of the system. To achieve this goal we have completely redesigned the core of the system using the structure ROOTS. We also have changed the module sequence logic to make the system more consistent with the designer. Finally, the voicebuilding has been redesigned to follow a continuous delivery methodology. All of these changes lead to more accurate development of the system and therefore more consistent results in its use.

@inproceedings{LeMaguer2017ESSV,
title = {Uprooting MaryTTS: Agile Processing and Voicebuilding},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner},
editor = {J{\"u}rgen Trouvain and Ingmar Steiner and Bernd M{\"o}bius},
url = {https://www.essv.de/paper.php?id=232},
year = {2017},
date = {2017-03-15},
booktitle = {28th Conference on Electronic Speech Signal Processing (ESSV)},
pages = {152-159},
address = {Saarbr{\"u}cken, Germany},
abstract = {MaryTTS is a modular speech synthesis system whose development started around 2003. The system is open-source and has grown significantly thanks to the contribution of the community. However, the drawback is an increase in the complexity of the system. This complexity has now reached a stage where the system is complicated to analyze and maintain. The current paper presents the new architecture of the MaryTTS system. This architecture aims to simplify the maintenance but also to provide more flexibility in the use of the system. To achieve this goal we have completely redesigned the core of the system using the structure ROOTS. We also have changed the module sequence logic to make the system more consistent with the designer. Finally, the voicebuilding has been redesigned to follow a continuous delivery methodology. All of these changes lead to more accurate development of the system and therefore more consistent results in its use.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Singh, Mittul; Greenberg, Clayton; Oualil, Youssef; Klakow, Dietrich

Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling Inproceedings

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, Osaka, Japan, 2016.

Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings.

Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embeddings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the-art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.

@inproceedings{singh-EtAl:2016:COLING1,
title = {Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling},
author = {Mittul Singh and Clayton Greenberg and Youssef Oualil and Dietrich Klakow},
url = {http://aclweb.org/anthology/C16-1194},
year = {2016},
date = {2016-12-01},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
publisher = {The COLING 2016 Organizing Committee},
address = {Osaka, Japan},
abstract = {Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings. Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embeddings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the-art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Schwenger, Maximilian; Torralba, Álvaro; Hoffmann, Jörg; Howcroft, David M.; Demberg, Vera

From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation Inproceedings

Calzolari, Nicoletta; Matsumoto, Yuji; Prasad, Rashmi (Ed.): COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, ACL, pp. 1524-1534, Osaka, 2016, ISBN 978-4-87974-702-0.

The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can effectively detect partial solutions (edges) which cannot ultimately be part of a complete sentence because of their syntactic category. Formulating the completion of an edge into a sentence as finding a solution path in a large state-transition system, we demonstrate a connection to AI Planning which is concerned with this kind of problem. We design a compilation from OpenCCG into AI Planning allowing the detection of infeasible edges via AI Planning dead-end detection methods (proving the absence of a solution to the compilation). Our experiments show that this can filter out large fractions of infeasible edges in, and thus benefit the performance of, complex realization processes.

@inproceedings{DBLP:conf/coling/SchwengerTHHD16,
title = {From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation},
author = {Maximilian Schwenger and {\'A}lvaro Torralba and J{\"o}rg Hoffmann and David M. Howcroft and Vera Demberg},
editor = {Nicoletta Calzolari and Yuji Matsumoto and Rashmi Prasad},
url = {https://davehowcroft.com/publication/2016-12_coling_detecting-infeasible-edges/},
year = {2016},
date = {2016-12-01},
booktitle = {COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers},
isbn = {978-4-87974-702-0},
pages = {1524-1534},
publisher = {ACL},
address = {Osaka},
abstract = {The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can effectively detect partial solutions (edges) which cannot ultimately be part of a complete sentence because of their syntactic category. Formulating the completion of an edge into a sentence as finding a solution path in a large state-transition system, we demonstrate a connection to AI Planning which is concerned with this kind of problem. We design a compilation from OpenCCG into AI Planning allowing the detection of infeasible edges via AI Planning dead-end detection methods (proving the absence of a solution to the compilation). Our experiments show that this can filter out large fractions of infeasible edges in, and thus benefit the performance of, complex realization processes.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Stenger, Irina

How reading intercomprehension works among Slavic languages with Cyrillic script Inproceedings

Köllner, Marisa; Ziai, Ramon (Ed.): ESSLLI 2016, pp. 30-42, 2016.

@inproceedings{Stenger2016,
title = {How reading intercomprehension works among Slavic languages with Cyrillic script},
author = {Irina Stenger},
editor = {Marisa K{\"o}llner and Ramon Ziai},
url = {https://esslli2016.unibz.it/wp-content/uploads/2016/09/esslli-stus-2016-proceedings.pdf},
year = {2016},
date = {2016},
pages = {30-42},
publisher = {ESSLLI 2016},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Calvillo, Jesús; Brouwer, Harm; Crocker, Matthew W.

Connectionist semantic systematicity in language production Inproceedings

38th Annual Conference of the Cognitive Science Society, Austin, Texas, USA, 2016.

A novel connectionist model of sentence production is presented, which employs rich situation model representations originally proposed for modeling systematicity in comprehension (Frank, Haselager, & van Rooij, 2009). The high overall performance of our model demonstrates that such representations are not only suitable for comprehension, but also for modeling language production. Further, the model is able to produce novel encodings (active vs. passive) for a particular semantics, as well as generate such encodings for previously unseen situations, thus demonstrating both syntactic and semantic systematicity. Our results provide yet further evidence that such connectionist approaches can achieve systematicity, in production as well as comprehension.

@inproceedings{Calvillo2016,
title = {Connectionist semantic systematicity in language production},
author = {Jesús Calvillo and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/306400823_Connectionist_Semantic_Systematicity_in_Language_Production},
year = {2016},
date = {2016},
publisher = {38th Annual Conference of the Cognitive Science Society},
address = {Austin, Texas, USA},
abstract = {A novel connectionist model of sentence production is presented, which employs rich situation model representations originally proposed for modeling systematicity in comprehension (Frank, Haselager, & van Rooij, 2009). The high overall performance of our model demonstrates that such representations are not only suitable for comprehension, but also for modeling language production. Further, the model is able to produce novel encodings (active vs. passive) for a particular semantics, as well as generate such encodings for previously unseen situations, thus demonstrating both syntactic and semantic systematicity. Our results provide yet further evidence that such connectionist approaches can achieve systematicity, in production as well as comprehension.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Malisz, Zofia; O'Dell, Michael; Nieminen, Tommi; Wagner, Petra

Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish Journal Article

Phonetica, 73, pp. 229-255, 2016.

This study was aimed at analyzing empirical duration data for Polish spoken at different tempos using an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O’Dell and Nieminen, 1999, 2009). We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared with parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.

@article{Malisz/etal:2016,
title = {Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish},
author = {Zofia Malisz and Michael O'Dell and Tommi Nieminen and Petra Wagner},
url = {https://www.degruyter.com/document/doi/10.1159/000450829/html},
doi = {https://doi.org/10.1159/000450829},
year = {2016},
date = {2016},
journal = {Phonetica},
pages = {229-255},
volume = {73},
abstract = {

This study was aimed at analyzing empirical duration data for Polish spoken at different tempos using an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O'Dell and Nieminen, 1999, 2009). We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared with parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Successfully