Publications

Brandt, Erika; Zimmerer, Frank; Möbius, Bernd; Andreeva, Bistra

Mel-cepstral distortion of German vowels in different information density contexts Inproceedings

Proceedings of Interspeech, Stockholm, Sweden, 2017.

This study investigated whether German vowels differ significantly from each other in mel-cepstral distortion (MCD) when they stand in different information density (ID) contexts. We hypothesized that vowels in the same ID contexts are more similar to each other than vowels that stand in different ID conditions. Read speech material from PhonDat2 of 16 German natives (m = 10, f = 6) was analyzed. Bi-phone and word language models were calculated based on DeWaC. To account for additional variability in the data, prosodic factors, as well as corpusspecific frequency values were also entered into the statistical models. Results showed that vowels in different ID conditions were significantly different in their MCD values. Unigram word probability and corpus-specific word frequency showed the expected effect on vowel similarity with a hierarchy between noncontrasting and contrasting conditions. However, these did not form a homogeneous group since there were group-internal significant differences. The largest distance can be found between vowels produced at fast speech rate, and between unstressed vowels.

@inproceedings{Brandt/etal:2017,
title = {Mel-cepstral distortion of German vowels in different information density contexts},
author = {Erika Brandt and Frank Zimmerer and Bernd M{\"o}bius and Bistra Andreeva},
url = {https://www.researchgate.net/publication/319185343_Mel-Cepstral_Distortion_of_German_Vowels_in_Different_Information_Density_Contexts},
year = {2017},
date = {2017},
booktitle = {Proceedings of Interspeech},
address = {Stockholm, Sweden},
abstract = {This study investigated whether German vowels differ significantly from each other in mel-cepstral distortion (MCD) when they stand in different information density (ID) contexts. We hypothesized that vowels in the same ID contexts are more similar to each other than vowels that stand in different ID conditions. Read speech material from PhonDat2 of 16 German natives (m = 10, f = 6) was analyzed. Bi-phone and word language models were calculated based on DeWaC. To account for additional variability in the data, prosodic factors, as well as corpusspecific frequency values were also entered into the statistical models. Results showed that vowels in different ID conditions were significantly different in their MCD values. Unigram word probability and corpus-specific word frequency showed the expected effect on vowel similarity with a hierarchy between noncontrasting and contrasting conditions. However, these did not form a homogeneous group since there were group-internal significant differences. The largest distance can be found between vowels produced at fast speech rate, and between unstressed vowels.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Zimmerer, Frank; Andreeva, Bistra; Möbius, Bernd; Malisz, Zofia; Ferragne, Emmanuel; Pellegrino, François; Brandt, Erika

Perzeption von Sprechgeschwindigkeit und der (nicht nachgewiesene) Einfluss von Surprisal Inproceedings

Möbius, Bernd;  (Ed.): Elektronische Sprachsignalverarbeitung 2017 - Tagungsband der 28. Konferenz, Saarbrücken, 15.-17. März 2017. Studientexte zur Sprachkommunikation, Band 86, pp. 174-179, 2017.

In zwei Perzeptionsexperimenten wurde die Perzeption von Sprechgeschwindigkeit untersucht. Ein Faktor, der dabei besonders im Zentrum des Interesses steht, ist Surprisal, ein informationstheoretisches Maß für die Vorhersagbarkeit einer linguistischen Einheit im Kontext. Zusammengenommen legen die Ergebnisse der Experimente den Schluss nahe, dass Surprisal keinen signifikanten Einfluss auf die Wahrnehmung von Sprechgeschwindigkeit ausübt.

@inproceedings{Zimmerer/etal:2017a,
title = {Perzeption von Sprechgeschwindigkeit und der (nicht nachgewiesene) Einfluss von Surprisal},
author = {Frank Zimmerer and Bistra Andreeva and Bernd M{\"o}bius and Zofia Malisz and Emmanuel Ferragne and François Pellegrino and Erika Brandt},
editor = {Bernd M{\"o}bius},
url = {https://www.researchgate.net/publication/318589916_PERZEPTION_VON_SPRECHGESCHWINDIGKEIT_UND_DER_NICHT_NACHGEWIESENE_EINFLUSS_VON_SURPRISAL},
year = {2017},
date = {2017-03-15},
booktitle = {Elektronische Sprachsignalverarbeitung 2017 - Tagungsband der 28. Konferenz, Saarbr{\"u}cken, 15.-17. M{\"a}rz 2017. Studientexte zur Sprachkommunikation, Band 86},
pages = {174-179},
abstract = {In zwei Perzeptionsexperimenten wurde die Perzeption von Sprechgeschwindigkeit untersucht. Ein Faktor, der dabei besonders im Zentrum des Interesses steht, ist Surprisal, ein informationstheoretisches Ma{\ss} f{\"u}r die Vorhersagbarkeit einer linguistischen Einheit im Kontext. Zusammengenommen legen die Ergebnisse der Experimente den Schluss nahe, dass Surprisal keinen signifikanten Einfluss auf die Wahrnehmung von Sprechgeschwindigkeit aus{\"u}bt.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Malisz, Zofia; O'Dell, Michael; Nieminen, Tommi; Wagner, Petra

Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish Journal Article

Phonetica, 73, pp. 229-255, 2016.

This study was aimed at analyzing empirical duration data for Polish spoken at different tempos using an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O’Dell and Nieminen, 1999, 2009). We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared with parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.

@article{Malisz/etal:2016,
title = {Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish},
author = {Zofia Malisz and Michael O'Dell and Tommi Nieminen and Petra Wagner},
url = {https://www.degruyter.com/document/doi/10.1159/000450829/html},
doi = {https://doi.org/10.1159/000450829},
year = {2016},
date = {2016},
journal = {Phonetica},
pages = {229-255},
volume = {73},
abstract = {

This study was aimed at analyzing empirical duration data for Polish spoken at different tempos using an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O'Dell and Nieminen, 1999, 2009). We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared with parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Schulz, Erika; Oh, Yoon Mi; Andreeva, Bistra; Möbius, Bernd

Impact of Prosodic Structure and Information Density on Vowel Space Size Inproceedings

Proceedings of Speech Prosody, pp. 350-354, Boston, MA, USA, 2016.

We investigated the influence of prosodic structure and information density on vowel space size. Vowels were measured in five languages from the BonnTempo corpus, French, German, Finnish, Czech, and Polish, each with three female and three male speakers. Speakers read the text at normal, slow, and fast speech rate. The Euclidean distance between vowel space midpoint and formant values for each speaker was used as a measure for vowel distinctiveness. The prosodic model consisted of prominence and boundary. Information density was calculated for each language using the surprisal of the biphone Xn|Xn−1. On average, there is a positive relationship between vowel space expansion and information density. Detailed analysis revealed that this relationship did not hold for Finnish, and was only weak for Polish. When vowel distinctiveness was modeled as a function of prosodic factors and information density in linear mixed effects models (LMM), only prosodic factors were significant in explaining the variance in vowel space expansion. All prosodic factors, except word boundary, showed significant positive results in LMM. Vowels were more distinct in stressed syllables, before a prosodic boundary and at normal and slow speech rate compared to fast speech.

@inproceedings{Schulz/etal:2016a,
title = {Impact of Prosodic Structure and Information Density on Vowel Space Size},
author = {Erika Schulz and Yoon Mi Oh and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.researchgate.net/publication/303755409_Impact_of_prosodic_structure_and_information_density_on_vowel_space_size},
year = {2016},
date = {2016},
booktitle = {Proceedings of Speech Prosody},
pages = {350-354},
address = {Boston, MA, USA},
abstract = {We investigated the influence of prosodic structure and information density on vowel space size. Vowels were measured in five languages from the BonnTempo corpus, French, German, Finnish, Czech, and Polish, each with three female and three male speakers. Speakers read the text at normal, slow, and fast speech rate. The Euclidean distance between vowel space midpoint and formant values for each speaker was used as a measure for vowel distinctiveness. The prosodic model consisted of prominence and boundary. Information density was calculated for each language using the surprisal of the biphone Xn|Xn−1. On average, there is a positive relationship between vowel space expansion and information density. Detailed analysis revealed that this relationship did not hold for Finnish, and was only weak for Polish. When vowel distinctiveness was modeled as a function of prosodic factors and information density in linear mixed effects models (LMM), only prosodic factors were significant in explaining the variance in vowel space expansion. All prosodic factors, except word boundary, showed significant positive results in LMM. Vowels were more distinct in stressed syllables, before a prosodic boundary and at normal and slow speech rate compared to fast speech.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Toward the use of information density based descriptive features in HMM based speech synthesis Inproceedings

8th International Conference on Speech Prosody, pp. 1029-1033, Boston, MA, USA, 2016.
Over the last decades, acoustic modeling for speech synthesis has been improved significantly. However, in most systems, the descriptive feature set used to represent annotated text has been the same for many years. Specifically, the prosody models in most systems are based on low level information such as syllable stress or word part-of-speech tags. In this paper, we propose to enrich the descriptive feature set by adding a linguistic measure computed from the predictability of an event, such as the occurrence of a syllable or word. By adding such descriptive features, we assume that we will improve prosody modeling. This new feature set is then used to train prosody models for speech synthesis. Results from an evaluation study indicate a preference for the new descriptive feature set over the conventional one.

@inproceedings{LeMaguer2016SP,
title = {Toward the use of information density based descriptive features in HMM based speech synthesis},
author = {S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
url = {https://www.researchgate.net/publication/305684951_Toward_the_use_of_information_density_based_descriptive_features_in_HMM_based_speech_synthesis},
year = {2016},
date = {2016},
booktitle = {8th International Conference on Speech Prosody},
pages = {1029-1033},
address = {Boston, MA, USA},
abstract = {

Over the last decades, acoustic modeling for speech synthesis has been improved significantly. However, in most systems, the descriptive feature set used to represent annotated text has been the same for many years. Specifically, the prosody models in most systems are based on low level information such as syllable stress or word part-of-speech tags. In this paper, we propose to enrich the descriptive feature set by adding a linguistic measure computed from the predictability of an event, such as the occurrence of a syllable or word. By adding such descriptive features, we assume that we will improve prosody modeling. This new feature set is then used to train prosody models for speech synthesis. Results from an evaluation study indicate a preference for the new descriptive feature set over the conventional one.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C1 C5

Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar; Lolive, Damien

De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synthèse par HMM Inproceedings

Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP, AFCP - ATALA, pp. 714-722, Paris, France, 2016.

Durant les dernières décennies, la modélisation acoustique effectuée par les systèmes de synthèse de parole paramétrique a fait l’objet d’une attention particulière. Toutefois, dans la plupart des systèmes connus, l’ensemble des descripteurs linguistiques utilisés pour représenter le texte reste identique. Plus specifiquement, la modélisation de la prosodie reste guidée par des descripteurs de bas niveau comme l’information d’accentuation de la syllabe ou bien l’étiquette grammaticale du mot. Dans cet article, nous proposons d’intégrer des informations basées sur la prédictibilité d’un évènement (la syllabe ou le mot). Plusieurs études indiquent une corrélation forte entre cette mesure, fortement présente dans la linguistique computationnelle, et certaines spécificités lors de la production humaine de la parole. Notre hypothèse est donc que l’ajout de ces descripteurs améliore la modélisation de la prosodie. Cet article se focalise sur une analyse objective de l’apport de ces descripteurs sur la synthèse HMM pour la langue anglaise et française.

@inproceedings{Lemaguer/etal:2016b,
title = {De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synthèse par HMM},
author = {S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner and Damien Lolive},
url = {https://aclanthology.org/2016.jeptalnrecital-jep.80},
year = {2016},
date = {2016},
booktitle = {Actes de la conf{\'e}rence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP},
pages = {714-722},
publisher = {AFCP - ATALA},
address = {Paris, France},
abstract = {Durant les dernières d{\'e}cennies, la mod{\'e}lisation acoustique effectu{\'e}e par les systèmes de synthèse de parole param{\'e}trique a fait l’objet d’une attention particulière. Toutefois, dans la plupart des systèmes connus, l’ensemble des descripteurs linguistiques utilis{\'e}s pour repr{\'e}senter le texte reste identique. Plus specifiquement, la mod{\'e}lisation de la prosodie reste guid{\'e}e par des descripteurs de bas niveau comme l’information d’accentuation de la syllabe ou bien l’{\'e}tiquette grammaticale du mot. Dans cet article, nous proposons d’int{\'e}grer des informations bas{\'e}es sur la pr{\'e}dictibilit{\'e} d’un {\'e}vènement (la syllabe ou le mot). Plusieurs {\'e}tudes indiquent une corr{\'e}lation forte entre cette mesure, fortement pr{\'e}sente dans la linguistique computationnelle, et certaines sp{\'e}cificit{\'e}s lors de la production humaine de la parole. Notre hypothèse est donc que l’ajout de ces descripteurs am{\'e}liore la mod{\'e}lisation de la prosodie. Cet article se focalise sur une analyse objective de l’apport de ces descripteurs sur la synthèse HMM pour la langue anglaise et française.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C1 C5

Schulz, Erika; Malisz, Zofia; Andreeva, Bistra; Möbius, Bernd

Einfluss von Informationsdichte und prosodischer Struktur auf Vokalraumausdehnung Inproceedings

Phonetik und Phonologie 11, Marburg, 2015.

Vokalraumausdehnung wird von mehreren Faktoren bestimmt, z. B. von Geschlecht (Simpson und Ericsdotter 2007), Sprechstil (Bradlow, Kraus und Hayes 2003), Prosodie (Bergem 1993), Sprechgeschwindigkeit (Weirich und Simpson 2014) oder phonologischer Nachbarschaftsdichte (Munson und Solomon 2004). Auch Sprachredundanz kann als Prädiktor spektraler Ausprägung von Vokalen dienen (Aylett und Turk 2006). Diese Studie untersucht den Einfluss von Informationsdichte und prosodischen Strukturen auf Vokalraumausdehnung in Französisch, Deutsch, Amerikanischem Englisch und Finnisch.

@inproceedings{pundp11,
title = {Einfluss von Informationsdichte und prosodischer Struktur auf Vokalraumausdehnung},
author = {Erika Schulz and Zofia Malisz and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.online.uni-marburg.de/pundp11/talks/Schulz_etal.pdf},
year = {2015},
date = {2015},
booktitle = {Phonetik und Phonologie 11},
address = {Marburg},
abstract = {Vokalraumausdehnung wird von mehreren Faktoren bestimmt, z. B. von Geschlecht (Simpson und Ericsdotter 2007), Sprechstil (Bradlow, Kraus und Hayes 2003), Prosodie (Bergem 1993), Sprechgeschwindigkeit (Weirich und Simpson 2014) oder phonologischer Nachbarschaftsdichte (Munson und Solomon 2004). Auch Sprachredundanz kann als Pr{\"a}diktor spektraler Auspr{\"a}gung von Vokalen dienen (Aylett und Turk 2006). Diese Studie untersucht den Einfluss von Informationsdichte und prosodischen Strukturen auf Vokalraumausdehnung in Franz{\"o}sisch, Deutsch, Amerikanischem Englisch und Finnisch.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Malisz, Zofia; Schulz, Erika; Oh, Yoon Mi; Andreeva, Bistra; Möbius, Bernd

Dimensions of segmental variability: relationships between information density and prosodic structure Inproceedings

Workshop "Modeling variability in speech", Stuttgart, 2015.

Contextual predictability variation affects phonological and phonetic structure. Reduction and expansion of acoustic-phonetic features is also characteristic of prosodic variability. In this study, we assess the impact of surprisal and prosodic structure on phonetic encoding, both independently of each other and in interaction. We model segmental duration, vowel space size and spectral characteristics of vowels and consonants as a function of surprisal as well as of syllable prominence, phrase boundary, and speech rate. Correlates of phonetic encoding density are extracted from a subset of the BonnTempo corpus for six languages: American English, Czech, Finnish, French, German, and Polish. Surprisal is estimated from segmental n-gram language models trained on large text corpora. Our findings are generally compatible with a weak version of Aylett and Turk’s Smooth Signal Redundancy hypothesis, suggesting that prosodic structure mediates between the requirements of efficient communication and the speech signal. However, this mediation is not perfect, as we found evidence for additional, direct effects of changes in surprisal on the phonetic structure of utterances. These effects appear to be stable across different speech rates.

@inproceedings{malisz15,
title = {Dimensions of segmental variability: relationships between information density and prosodic structure},
author = {Zofia Malisz and Erika Schulz and Yoon Mi Oh and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.frontiersin.org/articles/10.3389/fcomm.2018.00025/full},
year = {2015},
date = {2015},
booktitle = {Workshop "Modeling variability in speech"},
address = {Stuttgart},
abstract = {

Contextual predictability variation affects phonological and phonetic structure. Reduction and expansion of acoustic-phonetic features is also characteristic of prosodic variability. In this study, we assess the impact of surprisal and prosodic structure on phonetic encoding, both independently of each other and in interaction. We model segmental duration, vowel space size and spectral characteristics of vowels and consonants as a function of surprisal as well as of syllable prominence, phrase boundary, and speech rate. Correlates of phonetic encoding density are extracted from a subset of the BonnTempo corpus for six languages: American English, Czech, Finnish, French, German, and Polish. Surprisal is estimated from segmental n-gram language models trained on large text corpora. Our findings are generally compatible with a weak version of Aylett and Turk's Smooth Signal Redundancy hypothesis, suggesting that prosodic structure mediates between the requirements of efficient communication and the speech signal. However, this mediation is not perfect, as we found evidence for additional, direct effects of changes in surprisal on the phonetic structure of utterances. These effects appear to be stable across different speech rates.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Successfully