Publications

Brandt, Erika; Zimmerer, Frank; Möbius, Bernd; Andreeva, Bistra

Mel-cepstral distortion of German vowels in different information density contexts Inproceedings

Proceedings of Interspeech, Stockholm, Sweden, 2017.

This study investigated whether German vowels differ significantly from each other in mel-cepstral distortion (MCD) when they stand in different information density (ID) contexts. We hypothesized that vowels in the same ID contexts are more similar to each other than vowels that stand in different ID conditions. Read speech material from PhonDat2 of 16 German natives (m = 10, f = 6) was analyzed. Bi-phone and word language models were calculated based on DeWaC. To account for additional variability in the data, prosodic factors, as well as corpusspecific frequency values were also entered into the statistical models. Results showed that vowels in different ID conditions were significantly different in their MCD values. Unigram word probability and corpus-specific word frequency showed the expected effect on vowel similarity with a hierarchy between noncontrasting and contrasting conditions. However, these did not form a homogeneous group since there were group-internal significant differences. The largest distance can be found between vowels produced at fast speech rate, and between unstressed vowels.

@inproceedings{Brandt/etal:2017,
title = {Mel-cepstral distortion of German vowels in different information density contexts},
author = {Erika Brandt and Frank Zimmerer and Bernd M{\"o}bius and Bistra Andreeva},
url = {https://www.researchgate.net/publication/319185343_Mel-Cepstral_Distortion_of_German_Vowels_in_Different_Information_Density_Contexts},
year = {2017},
date = {2017},
booktitle = {Proceedings of Interspeech},
address = {Stockholm, Sweden},
abstract = {This study investigated whether German vowels differ significantly from each other in mel-cepstral distortion (MCD) when they stand in different information density (ID) contexts. We hypothesized that vowels in the same ID contexts are more similar to each other than vowels that stand in different ID conditions. Read speech material from PhonDat2 of 16 German natives (m = 10, f = 6) was analyzed. Bi-phone and word language models were calculated based on DeWaC. To account for additional variability in the data, prosodic factors, as well as corpusspecific frequency values were also entered into the statistical models. Results showed that vowels in different ID conditions were significantly different in their MCD values. Unigram word probability and corpus-specific word frequency showed the expected effect on vowel similarity with a hierarchy between noncontrasting and contrasting conditions. However, these did not form a homogeneous group since there were group-internal significant differences. The largest distance can be found between vowels produced at fast speech rate, and between unstressed vowels.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Horch, Eva; Reich, Ingo

The Fragment Corpus Inproceedings

Proceedings of the 9th International Corpus Linguistics Conference, pp. 392-393, Birmingham, UK, 2017.

We present the Fragment Corpus (FraC), a corpus for the investigation of fragments (see Morgan 1973), i.e. incomplete sentences, in German. The corpus is a mixed-register corpus and consists of 17 different text types including written (newspaper texts, legal texts, blogs etc.) and spoken texts (dialogues, interviews, radio moderations etc.), as well as social media texts (tweets, sms, chats). Each subcorpus comprises approx. 2000 utterance units (including fragments, following the orthographic notion of sentence) which amounts to a total corpus size of 380K tokens. The data was taken from electronically available sources.

@inproceedings{HorchReich:17,
title = {The Fragment Corpus},
author = {Eva Horch and Ingo Reich},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/30290},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 9th International Corpus Linguistics Conference},
pages = {392-393},
address = {Birmingham, UK},
abstract = {We present the Fragment Corpus (FraC), a corpus for the investigation of fragments (see Morgan 1973), i.e. incomplete sentences, in German. The corpus is a mixed-register corpus and consists of 17 different text types including written (newspaper texts, legal texts, blogs etc.) and spoken texts (dialogues, interviews, radio moderations etc.), as well as social media texts (tweets, sms, chats). Each subcorpus comprises approx. 2000 utterance units (including fragments, following the orthographic notion of sentence) which amounts to a total corpus size of 380K tokens. The data was taken from electronically available sources.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Wanzare, Lilian Diana Awuor; Zarcone, Alessandra; Thater, Stefan; Pinkal, Manfred

Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering Inproceedings

Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, Association for Computational Linguistics, pp. 1-11, Valencia, Spain, 2017.

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.

@inproceedings{wanzare-EtAl:2017:LSDSem,
title = {Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering},
author = {Lilian Diana Awuor Wanzare and Alessandra Zarcone and Stefan Thater and Manfred Pinkal},
url = {https://www.aclweb.org/anthology/W17-0901},
doi = {https://doi.org/10.18653/v1/W17-0901},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics},
pages = {1-11},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A2

Brouwer, Harm; Crocker, Matthew W.; Venhuizen, Noortje; Hoeks, John

A neurocomputational model of the N400 and P600 in language processing Journal Article

Cognitive Sciences, 41, pp. 1318-1352, 2017.

Ten years ago, researchers using event‐related brain potentials (ERPs) to study language comprehension were puzzled by what looked like a Semantic Illusion: Semantically anomalous, but structurally well‐formed sentences did not affect the N400 component—traditionally taken to reflect semantic integration—but instead produced a P600 effect, which is generally linked to syntactic processing. This finding led to a considerable amount of debate, and a number of complex processing models have been proposed as an explanation. What these models have in common is that they postulate two or more separate processing streams, in order to reconcile the Semantic Illusion and other semantically induced P600 effects with the traditional interpretations of the N400 and the P600. Recently, however, these multi‐stream models have been called into question, and a simpler single‐stream model has been proposed. According to this alternative model, the N400 component reflects the retrieval of word meaning from semantic memory, and the P600 component indexes the integration of this meaning into the unfolding utterance interpretation. In the present paper, we provide support for this “Retrieval–Integration (RI)” account by instantiating it as a neurocomputational model. This neurocomputational model is the first to successfully simulate the N400 and P600 amplitude in language comprehension, and simulations with this model provide a proof of concept of the single‐stream RI account of semantically induced patterns of N400 and P600 modulations.

@article{Brouwer2017,
title = {A neurocomputational model of the N400 and P600 in language processing},
author = {Harm Brouwer and Matthew W. Crocker and Noortje Venhuizen and John Hoeks},
url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5484319/},
year = {2017},
date = {2017},
journal = {Cognitive Sciences},
pages = {1318-1352},
volume = {41},
abstract = {

Ten years ago, researchers using event‐related brain potentials (ERPs) to study language comprehension were puzzled by what looked like a Semantic Illusion: Semantically anomalous, but structurally well‐formed sentences did not affect the N400 component—traditionally taken to reflect semantic integration—but instead produced a P600 effect, which is generally linked to syntactic processing. This finding led to a considerable amount of debate, and a number of complex processing models have been proposed as an explanation. What these models have in common is that they postulate two or more separate processing streams, in order to reconcile the Semantic Illusion and other semantically induced P600 effects with the traditional interpretations of the N400 and the P600. Recently, however, these multi‐stream models have been called into question, and a simpler single‐stream model has been proposed. According to this alternative model, the N400 component reflects the retrieval of word meaning from semantic memory, and the P600 component indexes the integration of this meaning into the unfolding utterance interpretation. In the present paper, we provide support for this “Retrieval–Integration (RI)” account by instantiating it as a neurocomputational model. This neurocomputational model is the first to successfully simulate the N400 and P600 amplitude in language comprehension, and simulations with this model provide a proof of concept of the single‐stream RI account of semantically induced patterns of N400 and P600 modulations.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Raveh, Eran; Gessinger, Iona; Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Investigating Phonetic Convergence in a Shadowing Experiment with Synthetic Stimuli Inproceedings

Trouvain, Jürgen; Steiner, Ingmar; Möbius, Bernd;  (Ed.): 28th Conference on Electronic Speech Signal Processing (ESSV), pp. 254-261, Saarbrücken, Germany, 2017.

This paper presents a shadowing experiment with synthetic stimuli, whose goal is to investigate phonetic convergence in a human-computer interaction paradigm. Comparisons to the results of a previous experiment with natural stimuli are made. The process of generating the synthetic stimuli, which are based on the natural ones, is described as well.

@inproceedings{Raveh2017ESSV,
title = {Investigating Phonetic Convergence in a Shadowing Experiment with Synthetic Stimuli},
author = {Eran Raveh and Iona Gessinger and S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
editor = {J{\"u}rgen Trouvain and Ingmar Steiner and Bernd M{\"o}bius},
url = {https://www.semanticscholar.org/paper/Investigating-Phonetic-Convergence-in-a-Shadowing-Raveh-Gessinger/c296fb0e3ad53cd690a2845827c762046fce2bbe},
year = {2017},
date = {2017},
booktitle = {28th Conference on Electronic Speech Signal Processing (ESSV)},
pages = {254-261},
address = {Saarbr{\"u}cken, Germany},
abstract = {This paper presents a shadowing experiment with synthetic stimuli, whose goal is to investigate phonetic convergence in a human-computer interaction paradigm. Comparisons to the results of a previous experiment with natural stimuli are made. The process of generating the synthetic stimuli, which are based on the natural ones, is described as well.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Steiner, Ingmar; Le Maguer, Sébastien; Manzoni, Judith; Gilles, Peter; Trouvain, Jürgen

Developing new language tools for MaryTTS: the case of Luxembourgish Inproceedings

Trouvain, Jürgen; Steiner, Ingmar; Möbius, Bernd;  (Ed.): 28th Conference on Electronic Speech Signal Processing (ESSV), pp. 186-192, Saarbrücken, Germany, 2017.

We present new methods and resources which have been used to create a text to speech (TTS) synthesis system for the Luxembourgish language. The system uses the MaryTTS platform, which is extended with new natural language processing (NLP) components. We designed and recorded a multilingual, phonetically balanced speech corpus, and used it to build a new Luxembourgish synthesis voice. All speech data and software has been published under an open-source license and is freely available online.

@inproceedings{Steiner2017ESSVb,
title = {Developing new language tools for MaryTTS: the case of Luxembourgish},
author = {Ingmar Steiner and S{\'e}bastien Le Maguer and Judith Manzoni and Peter Gilles and J{\"u}rgen Trouvain},
editor = {J{\"u}rgen Trouvain and Ingmar Steiner and Bernd M{\"o}bius},
url = {https://www.semanticscholar.org/paper/THE-CASE-OF-LUXEMBOURGISH-Steiner-Maguer/7ca34b3c6460008c013a6ac799336a5f30fc9878},
year = {2017},
date = {2017},
booktitle = {28th Conference on Electronic Speech Signal Processing (ESSV)},
pages = {186-192},
address = {Saarbr{\"u}cken, Germany},
abstract = {We present new methods and resources which have been used to create a text to speech (TTS) synthesis system for the Luxembourgish language. The system uses the MaryTTS platform, which is extended with new natural language processing (NLP) components. We designed and recorded a multilingual, phonetically balanced speech corpus, and used it to build a new Luxembourgish synthesis voice. All speech data and software has been published under an open-source license and is freely available online.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Zimmerer, Frank; Andreeva, Bistra; Möbius, Bernd; Malisz, Zofia; Ferragne, Emmanuel; Pellegrino, François; Brandt, Erika

Perzeption von Sprechgeschwindigkeit und der (nicht nachgewiesene) Einfluss von Surprisal Inproceedings

Möbius, Bernd;  (Ed.): Elektronische Sprachsignalverarbeitung 2017 - Tagungsband der 28. Konferenz, Saarbrücken, 15.-17. März 2017. Studientexte zur Sprachkommunikation, Band 86, pp. 174-179, 2017.

In zwei Perzeptionsexperimenten wurde die Perzeption von Sprechgeschwindigkeit untersucht. Ein Faktor, der dabei besonders im Zentrum des Interesses steht, ist Surprisal, ein informationstheoretisches Maß für die Vorhersagbarkeit einer linguistischen Einheit im Kontext. Zusammengenommen legen die Ergebnisse der Experimente den Schluss nahe, dass Surprisal keinen signifikanten Einfluss auf die Wahrnehmung von Sprechgeschwindigkeit ausübt.

@inproceedings{Zimmerer/etal:2017a,
title = {Perzeption von Sprechgeschwindigkeit und der (nicht nachgewiesene) Einfluss von Surprisal},
author = {Frank Zimmerer and Bistra Andreeva and Bernd M{\"o}bius and Zofia Malisz and Emmanuel Ferragne and François Pellegrino and Erika Brandt},
editor = {Bernd M{\"o}bius},
url = {https://www.researchgate.net/publication/318589916_PERZEPTION_VON_SPRECHGESCHWINDIGKEIT_UND_DER_NICHT_NACHGEWIESENE_EINFLUSS_VON_SURPRISAL},
year = {2017},
date = {2017-03-15},
booktitle = {Elektronische Sprachsignalverarbeitung 2017 - Tagungsband der 28. Konferenz, Saarbr{\"u}cken, 15.-17. M{\"a}rz 2017. Studientexte zur Sprachkommunikation, Band 86},
pages = {174-179},
abstract = {In zwei Perzeptionsexperimenten wurde die Perzeption von Sprechgeschwindigkeit untersucht. Ein Faktor, der dabei besonders im Zentrum des Interesses steht, ist Surprisal, ein informationstheoretisches Ma{\ss} f{\"u}r die Vorhersagbarkeit einer linguistischen Einheit im Kontext. Zusammengenommen legen die Ergebnisse der Experimente den Schluss nahe, dass Surprisal keinen signifikanten Einfluss auf die Wahrnehmung von Sprechgeschwindigkeit aus{\"u}bt.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Le Maguer, Sébastien; Steiner, Ingmar

Uprooting MaryTTS: Agile Processing and Voicebuilding Inproceedings

Trouvain, Jürgen; Steiner, Ingmar; Möbius, Bernd;  (Ed.): 28th Conference on Electronic Speech Signal Processing (ESSV), pp. 152-159, Saarbrücken, Germany, 2017.

MaryTTS is a modular speech synthesis system whose development started around 2003. The system is open-source and has grown significantly thanks to the contribution of the community. However, the drawback is an increase in the complexity of the system. This complexity has now reached a stage where the system is complicated to analyze and maintain. The current paper presents the new architecture of the MaryTTS system. This architecture aims to simplify the maintenance but also to provide more flexibility in the use of the system. To achieve this goal we have completely redesigned the core of the system using the structure ROOTS. We also have changed the module sequence logic to make the system more consistent with the designer. Finally, the voicebuilding has been redesigned to follow a continuous delivery methodology. All of these changes lead to more accurate development of the system and therefore more consistent results in its use.

@inproceedings{LeMaguer2017ESSV,
title = {Uprooting MaryTTS: Agile Processing and Voicebuilding},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner},
editor = {J{\"u}rgen Trouvain and Ingmar Steiner and Bernd M{\"o}bius},
url = {https://www.essv.de/paper.php?id=232},
year = {2017},
date = {2017-03-15},
booktitle = {28th Conference on Electronic Speech Signal Processing (ESSV)},
pages = {152-159},
address = {Saarbr{\"u}cken, Germany},
abstract = {MaryTTS is a modular speech synthesis system whose development started around 2003. The system is open-source and has grown significantly thanks to the contribution of the community. However, the drawback is an increase in the complexity of the system. This complexity has now reached a stage where the system is complicated to analyze and maintain. The current paper presents the new architecture of the MaryTTS system. This architecture aims to simplify the maintenance but also to provide more flexibility in the use of the system. To achieve this goal we have completely redesigned the core of the system using the structure ROOTS. We also have changed the module sequence logic to make the system more consistent with the designer. Finally, the voicebuilding has been redesigned to follow a continuous delivery methodology. All of these changes lead to more accurate development of the system and therefore more consistent results in its use.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C5

Singh, Mittul; Greenberg, Clayton; Oualil, Youssef; Klakow, Dietrich

Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling Inproceedings

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, Osaka, Japan, 2016.

Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings.

Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embeddings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the-art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.

@inproceedings{singh-EtAl:2016:COLING1,
title = {Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling},
author = {Mittul Singh and Clayton Greenberg and Youssef Oualil and Dietrich Klakow},
url = {http://aclweb.org/anthology/C16-1194},
year = {2016},
date = {2016-12-01},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
publisher = {The COLING 2016 Organizing Committee},
address = {Osaka, Japan},
abstract = {Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings. Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embeddings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the-art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Schwenger, Maximilian; Torralba, Álvaro; Hoffmann, Jörg; Howcroft, David M.; Demberg, Vera

From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation Inproceedings

Calzolari, Nicoletta; Matsumoto, Yuji; Prasad, Rashmi (Ed.): COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, ACL, pp. 1524-1534, Osaka, 2016, ISBN 978-4-87974-702-0.

The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can effectively detect partial solutions (edges) which cannot ultimately be part of a complete sentence because of their syntactic category. Formulating the completion of an edge into a sentence as finding a solution path in a large state-transition system, we demonstrate a connection to AI Planning which is concerned with this kind of problem. We design a compilation from OpenCCG into AI Planning allowing the detection of infeasible edges via AI Planning dead-end detection methods (proving the absence of a solution to the compilation). Our experiments show that this can filter out large fractions of infeasible edges in, and thus benefit the performance of, complex realization processes.

@inproceedings{DBLP:conf/coling/SchwengerTHHD16,
title = {From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation},
author = {Maximilian Schwenger and {\'A}lvaro Torralba and J{\"o}rg Hoffmann and David M. Howcroft and Vera Demberg},
editor = {Nicoletta Calzolari and Yuji Matsumoto and Rashmi Prasad},
url = {https://davehowcroft.com/publication/2016-12_coling_detecting-infeasible-edges/},
year = {2016},
date = {2016-12-01},
booktitle = {COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers},
isbn = {978-4-87974-702-0},
pages = {1524-1534},
publisher = {ACL},
address = {Osaka},
abstract = {The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can effectively detect partial solutions (edges) which cannot ultimately be part of a complete sentence because of their syntactic category. Formulating the completion of an edge into a sentence as finding a solution path in a large state-transition system, we demonstrate a connection to AI Planning which is concerned with this kind of problem. We design a compilation from OpenCCG into AI Planning allowing the detection of infeasible edges via AI Planning dead-end detection methods (proving the absence of a solution to the compilation). Our experiments show that this can filter out large fractions of infeasible edges in, and thus benefit the performance of, complex realization processes.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Stenger, Irina

How reading intercomprehension works among Slavic languages with Cyrillic script Inproceedings

Köllner, Marisa; Ziai, Ramon (Ed.): ESSLLI 2016, pp. 30-42, 2016.

@inproceedings{Stenger2016,
title = {How reading intercomprehension works among Slavic languages with Cyrillic script},
author = {Irina Stenger},
editor = {Marisa K{\"o}llner and Ramon Ziai},
url = {https://esslli2016.unibz.it/wp-content/uploads/2016/09/esslli-stus-2016-proceedings.pdf},
year = {2016},
date = {2016},
pages = {30-42},
publisher = {ESSLLI 2016},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Calvillo, Jesús; Brouwer, Harm; Crocker, Matthew W.

Connectionist semantic systematicity in language production Inproceedings

38th Annual Conference of the Cognitive Science Society, Austin, Texas, USA, 2016.

A novel connectionist model of sentence production is presented, which employs rich situation model representations originally proposed for modeling systematicity in comprehension (Frank, Haselager, & van Rooij, 2009). The high overall performance of our model demonstrates that such representations are not only suitable for comprehension, but also for modeling language production. Further, the model is able to produce novel encodings (active vs. passive) for a particular semantics, as well as generate such encodings for previously unseen situations, thus demonstrating both syntactic and semantic systematicity. Our results provide yet further evidence that such connectionist approaches can achieve systematicity, in production as well as comprehension.

@inproceedings{Calvillo2016,
title = {Connectionist semantic systematicity in language production},
author = {Jesús Calvillo and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/306400823_Connectionist_Semantic_Systematicity_in_Language_Production},
year = {2016},
date = {2016},
publisher = {38th Annual Conference of the Cognitive Science Society},
address = {Austin, Texas, USA},
abstract = {A novel connectionist model of sentence production is presented, which employs rich situation model representations originally proposed for modeling systematicity in comprehension (Frank, Haselager, & van Rooij, 2009). The high overall performance of our model demonstrates that such representations are not only suitable for comprehension, but also for modeling language production. Further, the model is able to produce novel encodings (active vs. passive) for a particular semantics, as well as generate such encodings for previously unseen situations, thus demonstrating both syntactic and semantic systematicity. Our results provide yet further evidence that such connectionist approaches can achieve systematicity, in production as well as comprehension.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Malisz, Zofia; O'Dell, Michael; Nieminen, Tommi; Wagner, Petra

Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish Journal Article

Phonetica, 73, pp. 229-255, 2016.

This study was aimed at analyzing empirical duration data for Polish spoken at different tempos using an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O’Dell and Nieminen, 1999, 2009). We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared with parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.

@article{Malisz/etal:2016,
title = {Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish},
author = {Zofia Malisz and Michael O'Dell and Tommi Nieminen and Petra Wagner},
url = {https://www.degruyter.com/document/doi/10.1159/000450829/html},
doi = {https://doi.org/10.1159/000450829},
year = {2016},
date = {2016},
journal = {Phonetica},
pages = {229-255},
volume = {73},
abstract = {

This study was aimed at analyzing empirical duration data for Polish spoken at different tempos using an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O'Dell and Nieminen, 1999, 2009). We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared with parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Schulz, Erika; Oh, Yoon Mi; Andreeva, Bistra; Möbius, Bernd

Impact of Prosodic Structure and Information Density on Vowel Space Size Inproceedings

Proceedings of Speech Prosody, pp. 350-354, Boston, MA, USA, 2016.

We investigated the influence of prosodic structure and information density on vowel space size. Vowels were measured in five languages from the BonnTempo corpus, French, German, Finnish, Czech, and Polish, each with three female and three male speakers. Speakers read the text at normal, slow, and fast speech rate. The Euclidean distance between vowel space midpoint and formant values for each speaker was used as a measure for vowel distinctiveness. The prosodic model consisted of prominence and boundary. Information density was calculated for each language using the surprisal of the biphone Xn|Xn−1. On average, there is a positive relationship between vowel space expansion and information density. Detailed analysis revealed that this relationship did not hold for Finnish, and was only weak for Polish. When vowel distinctiveness was modeled as a function of prosodic factors and information density in linear mixed effects models (LMM), only prosodic factors were significant in explaining the variance in vowel space expansion. All prosodic factors, except word boundary, showed significant positive results in LMM. Vowels were more distinct in stressed syllables, before a prosodic boundary and at normal and slow speech rate compared to fast speech.

@inproceedings{Schulz/etal:2016a,
title = {Impact of Prosodic Structure and Information Density on Vowel Space Size},
author = {Erika Schulz and Yoon Mi Oh and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.researchgate.net/publication/303755409_Impact_of_prosodic_structure_and_information_density_on_vowel_space_size},
year = {2016},
date = {2016},
booktitle = {Proceedings of Speech Prosody},
pages = {350-354},
address = {Boston, MA, USA},
abstract = {We investigated the influence of prosodic structure and information density on vowel space size. Vowels were measured in five languages from the BonnTempo corpus, French, German, Finnish, Czech, and Polish, each with three female and three male speakers. Speakers read the text at normal, slow, and fast speech rate. The Euclidean distance between vowel space midpoint and formant values for each speaker was used as a measure for vowel distinctiveness. The prosodic model consisted of prominence and boundary. Information density was calculated for each language using the surprisal of the biphone Xn|Xn−1. On average, there is a positive relationship between vowel space expansion and information density. Detailed analysis revealed that this relationship did not hold for Finnish, and was only weak for Polish. When vowel distinctiveness was modeled as a function of prosodic factors and information density in linear mixed effects models (LMM), only prosodic factors were significant in explaining the variance in vowel space expansion. All prosodic factors, except word boundary, showed significant positive results in LMM. Vowels were more distinct in stressed syllables, before a prosodic boundary and at normal and slow speech rate compared to fast speech.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Toward the use of information density based descriptive features in HMM based speech synthesis Inproceedings

8th International Conference on Speech Prosody, pp. 1029-1033, Boston, MA, USA, 2016.
Over the last decades, acoustic modeling for speech synthesis has been improved significantly. However, in most systems, the descriptive feature set used to represent annotated text has been the same for many years. Specifically, the prosody models in most systems are based on low level information such as syllable stress or word part-of-speech tags. In this paper, we propose to enrich the descriptive feature set by adding a linguistic measure computed from the predictability of an event, such as the occurrence of a syllable or word. By adding such descriptive features, we assume that we will improve prosody modeling. This new feature set is then used to train prosody models for speech synthesis. Results from an evaluation study indicate a preference for the new descriptive feature set over the conventional one.

@inproceedings{LeMaguer2016SP,
title = {Toward the use of information density based descriptive features in HMM based speech synthesis},
author = {S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
url = {https://www.researchgate.net/publication/305684951_Toward_the_use_of_information_density_based_descriptive_features_in_HMM_based_speech_synthesis},
year = {2016},
date = {2016},
booktitle = {8th International Conference on Speech Prosody},
pages = {1029-1033},
address = {Boston, MA, USA},
abstract = {

Over the last decades, acoustic modeling for speech synthesis has been improved significantly. However, in most systems, the descriptive feature set used to represent annotated text has been the same for many years. Specifically, the prosody models in most systems are based on low level information such as syllable stress or word part-of-speech tags. In this paper, we propose to enrich the descriptive feature set by adding a linguistic measure computed from the predictability of an event, such as the occurrence of a syllable or word. By adding such descriptive features, we assume that we will improve prosody modeling. This new feature set is then used to train prosody models for speech synthesis. Results from an evaluation study indicate a preference for the new descriptive feature set over the conventional one.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C1 C5

Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar; Lolive, Damien

De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synthèse par HMM Inproceedings

Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP, AFCP - ATALA, pp. 714-722, Paris, France, 2016.

Durant les dernières décennies, la modélisation acoustique effectuée par les systèmes de synthèse de parole paramétrique a fait l’objet d’une attention particulière. Toutefois, dans la plupart des systèmes connus, l’ensemble des descripteurs linguistiques utilisés pour représenter le texte reste identique. Plus specifiquement, la modélisation de la prosodie reste guidée par des descripteurs de bas niveau comme l’information d’accentuation de la syllabe ou bien l’étiquette grammaticale du mot. Dans cet article, nous proposons d’intégrer des informations basées sur la prédictibilité d’un évènement (la syllabe ou le mot). Plusieurs études indiquent une corrélation forte entre cette mesure, fortement présente dans la linguistique computationnelle, et certaines spécificités lors de la production humaine de la parole. Notre hypothèse est donc que l’ajout de ces descripteurs améliore la modélisation de la prosodie. Cet article se focalise sur une analyse objective de l’apport de ces descripteurs sur la synthèse HMM pour la langue anglaise et française.

@inproceedings{Lemaguer/etal:2016b,
title = {De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synthèse par HMM},
author = {S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner and Damien Lolive},
url = {https://aclanthology.org/2016.jeptalnrecital-jep.80},
year = {2016},
date = {2016},
booktitle = {Actes de la conf{\'e}rence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP},
pages = {714-722},
publisher = {AFCP - ATALA},
address = {Paris, France},
abstract = {Durant les dernières d{\'e}cennies, la mod{\'e}lisation acoustique effectu{\'e}e par les systèmes de synthèse de parole param{\'e}trique a fait l’objet d’une attention particulière. Toutefois, dans la plupart des systèmes connus, l’ensemble des descripteurs linguistiques utilis{\'e}s pour repr{\'e}senter le texte reste identique. Plus specifiquement, la mod{\'e}lisation de la prosodie reste guid{\'e}e par des descripteurs de bas niveau comme l’information d’accentuation de la syllabe ou bien l’{\'e}tiquette grammaticale du mot. Dans cet article, nous proposons d’int{\'e}grer des informations bas{\'e}es sur la pr{\'e}dictibilit{\'e} d’un {\'e}vènement (la syllabe ou le mot). Plusieurs {\'e}tudes indiquent une corr{\'e}lation forte entre cette mesure, fortement pr{\'e}sente dans la linguistique computationnelle, et certaines sp{\'e}cificit{\'e}s lors de la production humaine de la parole. Notre hypothèse est donc que l’ajout de ces descripteurs am{\'e}liore la mod{\'e}lisation de la prosodie. Cet article se focalise sur une analyse objective de l’apport de ces descripteurs sur la synthèse HMM pour la langue anglaise et française.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C1 C5

Rubino, Raphael; Degaetano-Ortlieb, Stefania; Teich, Elke; van Genabith, Josef

Modeling Diachronic Change in Scientific Writing with Information Density Inproceedings

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, pp. 750-761, Osaka, Japan, 2016.

Previous linguistic research on scientific writing has shown that language use in the scientific domain varies considerably in register and style over time. In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax. Our approach is based on distinguishing between sentences from 19th and 20th century scientific abstracts using supervised classification models. To the best of our knowledge, the introduction of information theoretic features to this task is novel. We show that these features outperform more traditional features, such as token or character n-grams, while leading to more compact models. We present a detailed analysis of feature informativeness in order to gain a better understanding of diachronic change on different linguistic levels.

@inproceedings{C16-1072,
title = {Modeling Diachronic Change in Scientific Writing with Information Density},
author = {Raphael Rubino and Stefania Degaetano-Ortlieb and Elke Teich and Josef van Genabith},
url = {https://aclanthology.org/C16-1072},
year = {2016},
date = {2016},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
pages = {750-761},
publisher = {The COLING 2016 Organizing Committee},
address = {Osaka, Japan},
abstract = {Previous linguistic research on scientific writing has shown that language use in the scientific domain varies considerably in register and style over time. In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax. Our approach is based on distinguishing between sentences from 19th and 20th century scientific abstracts using supervised classification models. To the best of our knowledge, the introduction of information theoretic features to this task is novel. We show that these features outperform more traditional features, such as token or character n-grams, while leading to more compact models. We present a detailed analysis of feature informativeness in order to gain a better understanding of diachronic change on different linguistic levels.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Rubino, Raphael; Lapshinova-Koltunski, Ekaterina; van Genabith, Josef

Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification Inproceedings

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 960-970, San Diego, California, 2016.

This paper introduces information density and machine translation quality estimation inspired features to automatically detect and classify human translated texts. We investigate two settings: discriminating between translations and comparable originally authored texts, and distinguishing two levels of translation professionalism. Our framework is based on delexicalised sentence-level dense feature vector representations combined with a supervised machine learning approach. The results show state-of-the-art performance for mixed-domain translationese detection with information density and quality estimation based features, while results on translation expertise classification are mixed.

@inproceedings{N16-1110,
title = {Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification},
author = {Raphael Rubino and Ekaterina Lapshinova-Koltunski and Josef van Genabith},
url = {http://aclweb.org/anthology/N16-1110},
doi = {https://doi.org/10.18653/v1/N16-1110},
year = {2016},
date = {2016},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages = {960-970},
publisher = {Association for Computational Linguistics},
address = {San Diego, California},
abstract = {This paper introduces information density and machine translation quality estimation inspired features to automatically detect and classify human translated texts. We investigate two settings: discriminating between translations and comparable originally authored texts, and distinguishing two levels of translation professionalism. Our framework is based on delexicalised sentence-level dense feature vector representations combined with a supervised machine learning approach. The results show state-of-the-art performance for mixed-domain translationese detection with information density and quality estimation based features, while results on translation expertise classification are mixed.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Singh, Mittul; Greenberg, Clayton; Klakow, Dietrich

The Custom Decay Language Model for Long Range Dependencies Book Chapter

Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Brno , Czech Republic, September 12-16, 2016, Proceedings, Springer International Publishing, pp. 343-351, Cham, 2016, ISBN 978-3-319-45510-5.

Significant correlations between words can be observed over long distances, but contemporary language models like N-grams, Skip grams, and recurrent neural network language models (RNNLMs) require a large number of parameters to capture these dependencies, if the models can do so at all. In this paper, we propose the Custom Decay Language Model (CDLM), which captures long range correlations while maintaining sub-linear increase in parameters with vocabulary size. This model has a robust and stable training procedure (unlike RNNLMs), a more powerful modeling scheme than the Skip models, and a customizable representation. In perplexity experiments, CDLMs outperform the Skip models using fewer number of parameters. A CDLM also nominally outperformed a similar-sized RNNLM, meaning that it learned as much as the RNNLM but without recurrence.

@inbook{Singh2016,
title = {The Custom Decay Language Model for Long Range Dependencies},
author = {Mittul Singh and Clayton Greenberg and Dietrich Klakow},
url = {http://dx.doi.org/10.1007/978-3-319-45510-5_39},
doi = {https://doi.org/10.1007/978-3-319-45510-5_39},
year = {2016},
date = {2016},
booktitle = {Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Brno , Czech Republic, September 12-16, 2016, Proceedings},
isbn = {978-3-319-45510-5},
pages = {343-351},
publisher = {Springer International Publishing},
address = {Cham},
abstract = {Significant correlations between words can be observed over long distances, but contemporary language models like N-grams, Skip grams, and recurrent neural network language models (RNNLMs) require a large number of parameters to capture these dependencies, if the models can do so at all. In this paper, we propose the Custom Decay Language Model (CDLM), which captures long range correlations while maintaining sub-linear increase in parameters with vocabulary size. This model has a robust and stable training procedure (unlike RNNLMs), a more powerful modeling scheme than the Skip models, and a customizable representation. In perplexity experiments, CDLMs outperform the Skip models using fewer number of parameters. A CDLM also nominally outperformed a similar-sized RNNLM, meaning that it learned as much as the RNNLM but without recurrence.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B4

Oualil, Youssef; Greenberg, Clayton; Singh, Mittul; Klakow, Dietrich; Oualil, Youssef; Mittul, Singh

Sequential recurrent neural networks for language modeling Journal Article

Interspeech 2016, pp. 3509-3513, 2016.

Feedforward Neural Network (FNN)-based language models estimate the probability of the next word based on the history of the last N words, whereas Recurrent Neural Networks (RNN) perform the same task based only on the last word and some context information that cycles in the network. This paper presents a novel approach, which bridges the gap between these two categories of networks. In particular, we propose an architecture which takes advantage of the explicit, sequential enumeration of the word history in FNN structure while enhancing each word representation at the projection layer through recurrent context information that evolves in the network. The context integration is performed using an additional word-dependent weight matrix that is also learned during the training. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.

@article{oualil2016sequential,
title = {Sequential recurrent neural networks for language modeling},
author = {Youssef Oualil and Clayton Greenberg and Mittul Singh and Dietrich Klakow andYoussef Oualil and Singh Mittul},
url = {https://arxiv.org/abs/1703.08068},
year = {2016},
date = {2016},
journal = {Interspeech 2016},
pages = {3509-3513},
abstract = {Feedforward Neural Network (FNN)-based language models estimate the probability of the next word based on the history of the last N words, whereas Recurrent Neural Networks (RNN) perform the same task based only on the last word and some context information that cycles in the network. This paper presents a novel approach, which bridges the gap between these two categories of networks. In particular, we propose an architecture which takes advantage of the explicit, sequential enumeration of the word history in FNN structure while enhancing each word representation at the projection layer through recurrent context information that evolves in the network. The context integration is performed using an additional word-dependent weight matrix that is also learned during the training. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B4

Successfully