Publications

Teich, Elke; Martínez Martínez, José; Karakanta, Alina

Translation, information theory and cognition Book Chapter

Alves, Fabio; Lykke Jakobsen, Arnt (Ed.): The Routledge Handbook of Translation and Cognition, Routledge, pp. 360-375, London, UK, 2020, ISBN 9781138037007.

The chapter sketches a formal basis for the probabilistic modelling of human translation on the basis of information theory. We provide a definition of Shannon information applied to linguistic communication and discuss its relevance for modelling translation. We further explain the concept of the noisy channel and provide the link to modelling human translational choice. We suggest that a number of translation-relevant variables, notably (dis)similarity between languages, level of expertise and translation mode (i.e., interpreting vs. translation), may be appropriately indexed by entropy, which in turn has been shown to indicate production effort.

@inbook{Teich-etal2020-handbook,
title = {Translation, information theory and cognition},
author = {Elke Teich and Jos{\'e} Mart{\'i}nez Mart{\'i}nez and Alina Karakanta},
editor = {Fabio Alves and Arnt Lykke Jakobsen},
url = {https://www.taylorfrancis.com/chapters/edit/10.4324/9781315178127-24/translation-information-theory-cognition-elke-teich-josé-martínez-martínez-alina-karakanta},
year = {2020},
date = {2020},
booktitle = {The Routledge Handbook of Translation and Cognition},
isbn = {9781138037007},
pages = {360-375},
publisher = {Routledge},
address = {London, UK},
abstract = {

The chapter sketches a formal basis for the probabilistic modelling of human translation on the basis of information theory. We provide a definition of Shannon information applied to linguistic communication and discuss its relevance for modelling translation. We further explain the concept of the noisy channel and provide the link to modelling human translational choice. We suggest that a number of translation-relevant variables, notably (dis)similarity between languages, level of expertise and translation mode (i.e., interpreting vs. translation), may be appropriately indexed by entropy, which in turn has been shown to indicate production effort.
},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B7

Bizzoni, Yuri; Juzek, Tom; España-Bonet, Cristina; Dutta Chowdhury, Koel; van Genabith, Josef; Teich, Elke

How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech Inproceedings

The 17th International Workshop on Spoken Language Translation, Seattle, WA, United States, 2020.

Translationese is a phenomenon present in human translations, simultaneous interpreting, and even machine translations. Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear. This study analyzes translationese patterns in translation, interpreting, and machine translation outputs in order to explore possible reasons. In our analysis we (i) detail two non-invasive ways of detecting translationese and (ii) compare translationese across human and machine translations from text and speech. We find that machine translation shows traces of translationese, but does not reproduce the patterns found in human translation, offering support to the hypothesis that such patterns are due to the model (human vs. machine) rather than to the data (written vs. spoken).

@inproceedings{Bizzoni2020,
title = {How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech},
author = {Yuri Bizzoni and Tom Juzek and Cristina Espa{\~n}a-Bonet and Koel Dutta Chowdhury and Josef van Genabith and Elke Teich},
url = {https://aclanthology.org/2020.iwslt-1.34/},
doi = {https://doi.org/10.18653/v1/2020.iwslt-1.34},
year = {2020},
date = {2020},
booktitle = {The 17th International Workshop on Spoken Language Translation},
address = {Seattle, WA, United States},
abstract = {Translationese is a phenomenon present in human translations, simultaneous interpreting, and even machine translations. Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear. This study analyzes translationese patterns in translation, interpreting, and machine translation outputs in order to explore possible reasons. In our analysis we (i) detail two non-invasive ways of detecting translationese and (ii) compare translationese across human and machine translations from text and speech. We find that machine translation shows traces of translationese, but does not reproduce the patterns found in human translation, offering support to the hypothesis that such patterns are due to the model (human vs. machine) rather than to the data (written vs. spoken).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B6 B7

Adelani, David; Hedderich, Michael; Zhu, Dawei; van Berg, Esther; Klakow, Dietrich

Distant Supervision and Noisy Label Learning for Low Resource Named Entity Recognition: A Study on Hausa and Yorùbá Miscellaneous

, 2020.

The lack of labeled training data has limited the development of natural language processing tools, such as named entity recognition, for many languages spoken in developing countries. Techniques such as distant and weak supervision can be used to create labeled data in a (semi-) automatic way.

Additionally, to alleviate some of the negative effects of the errors in automatic annotation, noise-handling methods can be integrated. Pretrained word embeddings are another key component of most neural named entity classifiers. With the advent of more complex contextual word embeddings, an interesting trade-off between model size and performance arises. While these techniques have been shown to work well in high-resource settings, we want to study how they perform in low-resource scenarios.

In this work, we perform named entity recognition for Hausa and Yorùbá, two languages that are widely spoken in several developing countries. We evaluate different embedding approaches and show that distant supervision can be successfully leveraged in a realistic low-resource scenario where it can more than double a classifier’s performance.

@miscellaneous{Adelani2020,
title = {Distant Supervision and Noisy Label Learning for Low Resource Named Entity Recognition: A Study on Hausa and Yorùb{\'a}},
author = {David Adelani and Michael Hedderich and Dawei Zhu and Esther van Berg and Dietrich Klakow},
url = {https://arxiv.org/abs/2003.08370},
year = {2020},
date = {2020},
abstract = {The lack of labeled training data has limited the development of natural language processing tools, such as named entity recognition, for many languages spoken in developing countries. Techniques such as distant and weak supervision can be used to create labeled data in a (semi-) automatic way. Additionally, to alleviate some of the negative effects of the errors in automatic annotation, noise-handling methods can be integrated. Pretrained word embeddings are another key component of most neural named entity classifiers. With the advent of more complex contextual word embeddings, an interesting trade-off between model size and performance arises. While these techniques have been shown to work well in high-resource settings, we want to study how they perform in low-resource scenarios. In this work, we perform named entity recognition for Hausa and Yorùb{\'a}, two languages that are widely spoken in several developing countries. We evaluate different embedding approaches and show that distant supervision can be successfully leveraged in a realistic low-resource scenario where it can more than double a classifier's performance.},
pubstate = {published},
type = {miscellaneous}
}

Copy BibTeX to Clipboard

Project:   B4

Lemke, Tyll Robin; Schäfer, Lisa; Drenhaus, Heiner; Reich, Ingo

Script Knowledge Constrains Ellipses in Fragments - Evidence from Production Data and Language Modeling Inproceedings

Proceedings of the Society for Computation in Linguistics, 3, 2020.

We investigate the effect of script-based (Schank and Abelson 1977) extralinguistic context on the omission of words in fragments. Our data elicited with a production task show that predictable words are more often omitted than unpredictable ones, as predicted by the Uniform Information Density (UID) hypothesis (Levy and Jaeger, 2007).

We take into account effects of linguistic and extralinguistic context on predictability and propose a method for estimating the surprisal of words in presence of ellipsis. Our study extends previous evidence for UID in two ways: First, we show that not only local linguistic context, but also extralinguistic context determines the likelihood of omissions. Second, we find UID effects on the omission of content words.

@inproceedings{Lemke2020,
title = {Script Knowledge Constrains Ellipses in Fragments - Evidence from Production Data and Language Modeling},
author = {Tyll Robin Lemke and Lisa Sch{\"a}fer and Heiner Drenhaus and Ingo Reich},
url = {https://scholarworks.umass.edu/scil/vol3/iss1/45},
doi = {https://doi.org/https://doi.org/10.7275/mpby-zr74 },
year = {2020},
date = {2020},
booktitle = {Proceedings of the Society for Computation in Linguistics},
abstract = {We investigate the effect of script-based (Schank and Abelson 1977) extralinguistic context on the omission of words in fragments. Our data elicited with a production task show that predictable words are more often omitted than unpredictable ones, as predicted by the Uniform Information Density (UID) hypothesis (Levy and Jaeger, 2007). We take into account effects of linguistic and extralinguistic context on predictability and propose a method for estimating the surprisal of words in presence of ellipsis. Our study extends previous evidence for UID in two ways: First, we show that not only local linguistic context, but also extralinguistic context determines the likelihood of omissions. Second, we find UID effects on the omission of content words.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Fischer, Stefan; Knappen, Jörg; Menzel, Katrin; Teich, Elke

The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study Inproceedings

Proceedings of the 12th Language Resources and Evaluation Conference, European Language Resources Association, pp. 794-802, Marseille, France, 2020.

We present a new, extended version of the Royal Society Corpus (RSC), a diachronic corpus of scientific English now covering 300+ years of scientific writing (1665–1996). The corpus comprises 47 837 texts, primarily scientific articles, and is based on publications of the Royal Society of London, mainly its Philosophical Transactions and Proceedings.

The corpus has been built on the basis of the FAIR principles and is freely available under a Creative Commons license, excluding copy-righted parts. We provide information on how the corpus can be found, the file formats available for download as well as accessibility via a web-based corpus query platform. We show a number of analytic tools that we have implemented for better usability and provide an example of use of the corpus for linguistic analysis as well as examples of subsequent, external uses of earlier releases.

We place the RSC against the background of existing English diachronic/scientific corpora, elaborating on its value for linguistic and humanistic study.

@inproceedings{fischer-EtAl:2020:LREC,
title = {The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study},
author = {Stefan Fischer and J{\"o}rg Knappen and Katrin Menzel and Elke Teich},
url = {https://www.aclweb.org/anthology/2020.lrec-1.99/},
year = {2020},
date = {2020},
booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference},
pages = {794-802},
publisher = {European Language Resources Association},
address = {Marseille, France},
abstract = {We present a new, extended version of the Royal Society Corpus (RSC), a diachronic corpus of scientific English now covering 300+ years of scientific writing (1665–1996). The corpus comprises 47 837 texts, primarily scientific articles, and is based on publications of the Royal Society of London, mainly its Philosophical Transactions and Proceedings. The corpus has been built on the basis of the FAIR principles and is freely available under a Creative Commons license, excluding copy-righted parts. We provide information on how the corpus can be found, the file formats available for download as well as accessibility via a web-based corpus query platform. We show a number of analytic tools that we have implemented for better usability and provide an example of use of the corpus for linguistic analysis as well as examples of subsequent, external uses of earlier releases. We place the RSC against the background of existing English diachronic/scientific corpora, elaborating on its value for linguistic and humanistic study.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Bizzoni, Yuri; Degaetano-Ortlieb, Stefania; Fankhauser, Peter; Teich, Elke

Linguistic Variation and Change in 250 years of English Scientific Writing: A Data-driven Approach Journal Article

Jurgens, David (Ed.): Frontiers in Artificial Intelligence, section Language and Computation, 2020.

We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665.

Specifically, we explore the linguistic imprints of specialization and diversification in the science domain which accumulate in the formation of “scientific language” and field-specific sublanguages/registers (chemistry, biology etc.). We pursue an exploratory, data-driven approach using state-of-the-art computational language models and combine them with selected information-theoretic measures (entropy, relative entropy) for comparing models along relevant dimensions of variation (time, register).

Focusing on selected linguistic variables (lexis, grammar), we show how we deploy computational language models for capturing linguistic variation and change and discuss benefits and limitations.

@article{Bizzoni2020b,
title = {Linguistic Variation and Change in 250 years of English Scientific Writing: A Data-driven Approach},
author = {Yuri Bizzoni and Stefania Degaetano-Ortlieb and Peter Fankhauser and Elke Teich},
editor = {David Jurgens},
url = {https://www.frontiersin.org/articles/10.3389/frai.2020.00073/full},
doi = {https://doi.org/https://doi.org/10.3389/frai.2020.00073},
year = {2020},
date = {2020-10-18},
journal = {Frontiers in Artificial Intelligence, section Language and Computation},
abstract = {We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665. Specifically, we explore the linguistic imprints of specialization and diversification in the science domain which accumulate in the formation of “scientific language” and field-specific sublanguages/registers (chemistry, biology etc.). We pursue an exploratory, data-driven approach using state-of-the-art computational language models and combine them with selected information-theoretic measures (entropy, relative entropy) for comparing models along relevant dimensions of variation (time, register). Focusing on selected linguistic variables (lexis, grammar), we show how we deploy computational language models for capturing linguistic variation and change and discuss benefits and limitations.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B1

Wichlacz, Julia; Höller, Daniel; Torralba, Álvaro; Hoffmann, Jörg

Applying Monte-Carlo Tree Search in HTN Planning Inproceedings

Proceedings of the 13th International Symposium on Combinatorial Search (SoCS), AAAI Press, pp. 82-90, Vienna, Austria, 2020.

Search methods are useful in hierarchical task network (HTN) planning to make performance less dependent on the domain knowledge provided, and to minimize plan costs. Here we investigate Monte-Carlo tree search (MCTS) as a new algorithmic alternative in HTN planning. We implement combinations of MCTS with heuristic search in Panda. We furthermore investigate MCTS in JSHOP, to address lifted (non-grounded) planning, leveraging the fact that, in contrast to other search methods, MCTS does not require a grounded task representation. Our new methods yield coverage performance on par with the state of the art, but in addition can effectively minimize plan cost over time.

@inproceedings{Wichlacz20MCTSSOCS,
title = {Applying Monte-Carlo Tree Search in HTN Planning},
author = {Julia Wichlacz and Daniel H{\"o}ller and {\'A}lvaro Torralba and J{\"o}rg Hoffmann},
url = {https://ojs.aaai.org/index.php/SOCS/article/view/18538},
year = {2020},
date = {2020},
booktitle = {Proceedings of the 13th International Symposium on Combinatorial Search (SoCS)},
pages = {82-90},
publisher = {AAAI Press},
address = {Vienna, Austria},
abstract = {Search methods are useful in hierarchical task network (HTN) planning to make performance less dependent on the domain knowledge provided, and to minimize plan costs. Here we investigate Monte-Carlo tree search (MCTS) as a new algorithmic alternative in HTN planning. We implement combinations of MCTS with heuristic search in Panda. We furthermore investigate MCTS in JSHOP, to address lifted (non-grounded) planning, leveraging the fact that, in contrast to other search methods, MCTS does not require a grounded task representation. Our new methods yield coverage performance on par with the state of the art, but in addition can effectively minimize plan cost over time.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Höller, Daniel; Bercher, Pascal; Behnke, Gregor

Delete- and Ordering-Relaxation Heuristics for HTN Planning Inproceedings

Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), IJCAI organization, pp. 4076-4083, Yokohama, Japan, 2020.

In HTN planning, the hierarchy has a wide impact on solutions. First, there is (usually) no state-based goal given, the objective is given via the hierarchy. Second, it enforces actions to be in a plan. Third, planners are not allowed to add actions apart from those introduced via decomposition, i.e. via the hierarchy. However, no heuristic considers the interplay of hierarchy and actions in the plan exactly (without relaxation) because this makes heuristic calculation NP-hard even under delete relaxation. We introduce the problem class of delete- and ordering-free HTN planning as basis for novel HTN heuristics and show that its plan existence problem is still NP-complete. We then introduce heuristics based on the new class using an integer programming model to solve it.

@inproceedings{Hoeller2020IJCAI,
title = {Delete- and Ordering-Relaxation Heuristics for HTN Planning},
author = {Daniel H{\"o}ller and Pascal Bercher and Gregor Behnke},
url = {https://www.ijcai.org/proceedings/2020/564},
doi = {https://doi.org/10.24963/ijcai.2020/564},
year = {2020},
date = {2020},
booktitle = {Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {4076-4083},
publisher = {IJCAI organization},
address = {Yokohama, Japan},
abstract = {In HTN planning, the hierarchy has a wide impact on solutions. First, there is (usually) no state-based goal given, the objective is given via the hierarchy. Second, it enforces actions to be in a plan. Third, planners are not allowed to add actions apart from those introduced via decomposition, i.e. via the hierarchy. However, no heuristic considers the interplay of hierarchy and actions in the plan exactly (without relaxation) because this makes heuristic calculation NP-hard even under delete relaxation. We introduce the problem class of delete- and ordering-free HTN planning as basis for novel HTN heuristics and show that its plan existence problem is still NP-complete. We then introduce heuristics based on the new class using an integer programming model to solve it.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A7

Ryzhova, Margarita; Demberg, Vera

Processing particularized pragmatic inferences under load Inproceedings

Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020), 2020.

A long-standing question in language understanding is whether pragmatic inferences are effortful or whether they happen seamlessly without measurable cognitive effort. We here measure the strength of particularized pragmatic inferences in a setting with high vs. low cognitive load. Cognitive load is induced by a secondary dot tracking task.

If this type of pragmatic inference comes at no cognitive processing cost, inferences should be similarly strong in both the high and the low load condition. If they are effortful, we expect a smaller effect size in the dual tasking condition. Our results show that participants who have difficulty in dual tasking (as evidenced by incorrect answers to comprehension questions) exhibit a smaller pragmatic effect when they were distracted with a secondary task in comparison to the single task condition. This finding supports the idea that pragmatic inferences are effortful.

@inproceedings{Ryzhova2020,
title = {Processing particularized pragmatic inferences under load},
author = {Margarita Ryzhova and Vera Demberg},
url = {https://www.semanticscholar.org/paper/Processing-particularized-pragmatic-inferences-load-Ryzhova-Demberg/a5b8d4c72590eaaf965d91d8fafa2495f680313d},
year = {2020},
date = {2020-10-17},
booktitle = {Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020)},
abstract = {A long-standing question in language understanding is whether pragmatic inferences are effortful or whether they happen seamlessly without measurable cognitive effort. We here measure the strength of particularized pragmatic inferences in a setting with high vs. low cognitive load. Cognitive load is induced by a secondary dot tracking task. If this type of pragmatic inference comes at no cognitive processing cost, inferences should be similarly strong in both the high and the low load condition. If they are effortful, we expect a smaller effect size in the dual tasking condition. Our results show that participants who have difficulty in dual tasking (as evidenced by incorrect answers to comprehension questions) exhibit a smaller pragmatic effect when they were distracted with a secondary task in comparison to the single task condition. This finding supports the idea that pragmatic inferences are effortful.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A3

Scholman, Merel; Demberg, Vera; Sanders, Ted J. M.

Individual differences in expecting coherence relations: Exploring the variability in sensitivity to contextual signals in discourse Journal Article

Discourse Processes, 57, pp. 844-861, 2020.

The current study investigated how a contextual list signal influences comprehenders’ inference generation of upcoming discourse relations and whether individual differences in working memory capacity and linguistic experience influence the generation of these inferences. Participants were asked to complete two-sentence stories, the first sentence of which contained an expression of quantity (a few, multiple). Several individual-difference measures were calculated to explore whether individual characteristics can explain the sensitivity to the contextual list signal. The results revealed that participants were sensitive to a contextual list signal (i.e., they provided list continuations), and this sensitivity was modulated by the participants’ linguistic experience, as measured by an author recognition test. The results showed no evidence that working memory affected participants’ responses. These results extend prior research by showing that contextual signals influence participants’ coherence-relation-inference generation. Further, the results of the current study emphasize the importance of individual reader characteristics when it comes to coherence-relation inferences.

@article{Scholman2020,
title = {Individual differences in expecting coherence relations: Exploring the variability in sensitivity to contextual signals in discourse},
author = {Merel Scholman and Vera Demberg and Ted J. M. Sanders},
url = {https://www.tandfonline.com/doi/full/10.1080/0163853X.2020.1813492},
doi = {https://doi.org/10.1080/0163853X.2020.1813492},
year = {2020},
date = {2020-10-02},
journal = {Discourse Processes},
pages = {844-861},
volume = {57},
number = {10},
abstract = {The current study investigated how a contextual list signal influences comprehenders’ inference generation of upcoming discourse relations and whether individual differences in working memory capacity and linguistic experience influence the generation of these inferences. Participants were asked to complete two-sentence stories, the first sentence of which contained an expression of quantity (a few, multiple). Several individual-difference measures were calculated to explore whether individual characteristics can explain the sensitivity to the contextual list signal. The results revealed that participants were sensitive to a contextual list signal (i.e., they provided list continuations), and this sensitivity was modulated by the participants’ linguistic experience, as measured by an author recognition test. The results showed no evidence that working memory affected participants’ responses. These results extend prior research by showing that contextual signals influence participants’ coherence-relation-inference generation. Further, the results of the current study emphasize the importance of individual reader characteristics when it comes to coherence-relation inferences.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Brouwer, Harm; Delogu, Francesca; Crocker, Matthew W.

Splitting event‐related potentials: Modeling latent components using regression‐based waveform estimation Journal Article

European Journal of Neuroscience, 2020.

Event‐related potentials (ERPs) provide a multidimensional and real‐time window into neurocognitive processing. The typical Waveform‐based Component Structure (WCS) approach to ERPs assesses the modulation pattern of components—systematic, reoccurring voltage fluctuations reflecting specific computational operations—by looking at mean amplitude in predetermined time‐windows.

This WCS approach, however, often leads to inconsistent results within as well as across studies. It has been argued that at least some inconsistencies may be reconciled by considering spatiotemporal overlap between components; that is, components may overlap in both space and time, and given their additive nature, this means that the WCS may fail to accurately represent its underlying latent component structure (LCS). We employ regression‐based ERP (rERP) estimation to extend traditional approaches with an additional layer of analysis, which enables the explicit modeling of the LCS underlying WCS. To demonstrate its utility, we incrementally derive an rERP analysis of a recent study on language comprehension with seemingly inconsistent WCS‐derived results.

Analysis of the resultant regression models allows one to derive an explanation for the WCS in terms of how relevant regression predictors combine in space and time, and crucially, how individual predictors may be mapped onto unique components in LCS, revealing how these spatiotemporally overlap in the WCS. We conclude that rERP estimation allows for investigating how scalp‐recorded voltages derive from the spatiotemporal combination of experimentally manipulated factors. Moreover, when factors can be uniquely mapped onto components, rERPs may offer explanations for seemingly inconsistent ERP waveforms at the level of their underlying latent component structure.

@article{Brouwer2020,
title = {Splitting event‐related potentials: Modeling latent components using regression‐based waveform estimation},
author = {Harm Brouwer and Francesca Delogu and Matthew W. Crocker},
url = {https://onlinelibrary.wiley.com/doi/10.1111/ejn.14961},
doi = {https://doi.org/10.1111/ejn.14961},
year = {2020},
date = {2020-09-08},
journal = {European Journal of Neuroscience},
abstract = {Event‐related potentials (ERPs) provide a multidimensional and real‐time window into neurocognitive processing. The typical Waveform‐based Component Structure (WCS) approach to ERPs assesses the modulation pattern of components—systematic, reoccurring voltage fluctuations reflecting specific computational operations—by looking at mean amplitude in predetermined time‐windows. This WCS approach, however, often leads to inconsistent results within as well as across studies. It has been argued that at least some inconsistencies may be reconciled by considering spatiotemporal overlap between components; that is, components may overlap in both space and time, and given their additive nature, this means that the WCS may fail to accurately represent its underlying latent component structure (LCS). We employ regression‐based ERP (rERP) estimation to extend traditional approaches with an additional layer of analysis, which enables the explicit modeling of the LCS underlying WCS. To demonstrate its utility, we incrementally derive an rERP analysis of a recent study on language comprehension with seemingly inconsistent WCS‐derived results. Analysis of the resultant regression models allows one to derive an explanation for the WCS in terms of how relevant regression predictors combine in space and time, and crucially, how individual predictors may be mapped onto unique components in LCS, revealing how these spatiotemporally overlap in the WCS. We conclude that rERP estimation allows for investigating how scalp‐recorded voltages derive from the spatiotemporal combination of experimentally manipulated factors. Moreover, when factors can be uniquely mapped onto components, rERPs may offer explanations for seemingly inconsistent ERP waveforms at the level of their underlying latent component structure.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   A1

Dutta Chowdhury, Koel; España-Bonet, Cristina; van Genabith, Josef

Understanding Translationese in Multi-view Embedding Spaces Inproceedings

Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, pp. 6056-6062, Barcelona, Catalonia (Online), 2020.

Recent studies use a combination of lexical and syntactic features to show that footprints of the source language remain visible in translations, to the extent that it is possible to predict the original source language from the translation. In this paper, we focus on embedding-based semantic spaces, exploiting departures from isomorphism between spaces built from original target language and translations into this target language to predict relations between languages in an unsupervised way. We use different views of the data {—} words, parts of speech, semantic tags and synsets {—} to track translationese. Our analysis shows that (i) semantic distances between original target language and translations into this target language can be detected using the notion of isomorphism, (ii) language family ties with characteristics similar to linguistically motivated phylogenetic trees can be inferred from the distances and (iii) with delexicalised embeddings exhibiting source-language interference most significantly, other levels of abstraction display the same tendency, indicating the lexicalised results to be not “just“ due to possible topic differences between original and translated texts. To the best of our knowledge, this is the first time departures from isomorphism between embedding spaces are used to track translationese.

@inproceedings{DuttaEtal:COLING:2020,
title = {Understanding Translationese in Multi-view Embedding Spaces},
author = {Koel Dutta Chowdhury and Cristina Espa{\~n}a-Bonet and Josef van Genabith},
url = {https://www.aclweb.org/anthology/2020.coling-main.532/},
doi = {https://doi.org/10.18653/v1/2020.coling-main.532},
year = {2020},
date = {2020},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
pages = {6056-6062},
publisher = {International Committee on Computational Linguistics},
address = {Barcelona, Catalonia (Online)},
abstract = {Recent studies use a combination of lexical and syntactic features to show that footprints of the source language remain visible in translations, to the extent that it is possible to predict the original source language from the translation. In this paper, we focus on embedding-based semantic spaces, exploiting departures from isomorphism between spaces built from original target language and translations into this target language to predict relations between languages in an unsupervised way. We use different views of the data {---} words, parts of speech, semantic tags and synsets {---} to track translationese. Our analysis shows that (i) semantic distances between original target language and translations into this target language can be detected using the notion of isomorphism, (ii) language family ties with characteristics similar to linguistically motivated phylogenetic trees can be inferred from the distances and (iii) with delexicalised embeddings exhibiting source-language interference most significantly, other levels of abstraction display the same tendency, indicating the lexicalised results to be not “just“ due to possible topic differences between original and translated texts. To the best of our knowledge, this is the first time departures from isomorphism between embedding spaces are used to track translationese.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Hedderich, Michael; Adelani, David; Zhu, Dawei; Jesujoba , Alabi; Udia, Markus; Klakow, Dietrich

Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages Inproceedings

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 2580-2591, 2020.

Multilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages. However, recent works also showed that results from high-resource languages could not be easily transferred to realistic, low-resource scenarios. In this work, we study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and on both NER and topic classification. We show that in combination with transfer learning or distant supervision, these models can achieve with as little as 10 or 100 labeled sentences the same performance as baselines with much more supervised training data. However, we also find settings where this does not hold. Our discussions and additional experiments on assumptions such as time and hardware restrictions highlight challenges and opportunities in low-resource learning.

@inproceedings{hedderich-etal-2020-transfer,
title = {Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages},
author = {Michael Hedderich and David Adelani and Dawei Zhu and Alabi Jesujoba and Markus Udia and Dietrich Klakow},
url = {https://www.aclweb.org/anthology/2020.emnlp-main.204},
doi = {https://doi.org/10.18653/v1/2020.emnlp-main.204},
year = {2020},
date = {2020},
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages = {2580-2591},
publisher = {Association for Computational Linguistics},
abstract = {Multilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages. However, recent works also showed that results from high-resource languages could not be easily transferred to realistic, low-resource scenarios. In this work, we study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and on both NER and topic classification. We show that in combination with transfer learning or distant supervision, these models can achieve with as little as 10 or 100 labeled sentences the same performance as baselines with much more supervised training data. However, we also find settings where this does not hold. Our discussions and additional experiments on assumptions such as time and hardware restrictions highlight challenges and opportunities in low-resource learning.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Mosbach, Marius; Khokhlova, Anna; Hedderich, Michael; Klakow, Dietrich

On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers Inproceedings

Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, pp. 2502-2516, 2020.

Fine-tuning pre-trained contextualized embedding models has become an integral part of the NLP pipeline. At the same time, probing has emerged as a way to investigate the linguistic knowledge captured by pre-trained models. Very little is, however, understood about how fine-tuning affects the representations of pre-trained models and thereby the linguistic knowledge they encode. This paper contributes towards closing this gap. We study three different pre-trained models: BERT, RoBERTa, and ALBERT, and investigate through sentence-level probing how fine-tuning affects their representations. We find that for some probing tasks fine-tuning leads to substantial changes in accuracy, possibly suggesting that fine-tuning introduces or even removes linguistic knowledge from a pre-trained model. These changes, however, vary greatly across different models, fine-tuning and probing tasks. Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method. Based on our findings, we argue that both positive and negative effects of fine-tuning on probing require a careful interpretation.

@inproceedings{mosbach-etal-2020-interplay-fine,
title = {On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers},
author = {Marius Mosbach and Anna Khokhlova and Michael Hedderich and Dietrich Klakow},
url = {https://www.aclweb.org/anthology/2020.findings-emnlp.227},
doi = {https://doi.org/10.18653/v1/2020.findings-emnlp.227},
year = {2020},
date = {2020},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
pages = {2502-2516},
publisher = {Association for Computational Linguistics},
abstract = {Fine-tuning pre-trained contextualized embedding models has become an integral part of the NLP pipeline. At the same time, probing has emerged as a way to investigate the linguistic knowledge captured by pre-trained models. Very little is, however, understood about how fine-tuning affects the representations of pre-trained models and thereby the linguistic knowledge they encode. This paper contributes towards closing this gap. We study three different pre-trained models: BERT, RoBERTa, and ALBERT, and investigate through sentence-level probing how fine-tuning affects their representations. We find that for some probing tasks fine-tuning leads to substantial changes in accuracy, possibly suggesting that fine-tuning introduces or even removes linguistic knowledge from a pre-trained model. These changes, however, vary greatly across different models, fine-tuning and probing tasks. Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method. Based on our findings, we argue that both positive and negative effects of fine-tuning on probing require a careful interpretation.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Crible, Ludivine; Demberg, Vera

When Do We Leave Discourse Relations Underspecified? The Effect of Formality and Relation Type Journal Article

Discours, 2020.

Speakers have several options when they express a discourse relation: they can leave it implicit, or make it explicit, usually through a connective. Although not all connectives can go with every relation, there is one that is particularly frequent and compatible with very many discourse relations, namely and. In this paper, we investigate the effect of discourse relation type and text genre on the production and perception of underspecified relations of contrast and consequence signalled by and. We combine a corpus study of spoken English, a production experiment and a perception experiment in order to test two hypotheses: (1) and is more compatible with relations of consequence than of contrast, due to factors of cognitive complexity and conceptual differences; (2) and is more compatible with informal than formal genres, because of requirements of recipient design. The three studies partially converge in identifying a stable effect of relation type and genre on the production and perception of underspecified relations of consequence and contrast marked by and.

@article{Crible2020,
title = {When Do We Leave Discourse Relations Underspecified? The Effect of Formality and Relation Type},
author = {Ludivine Crible and Vera Demberg},
url = {https://journals.openedition.org/discours/10848},
doi = {https://doi.org/10.4000/discours.10848},
year = {2020},
date = {2020},
journal = {Discours},
number = {26},
abstract = {Speakers have several options when they express a discourse relation: they can leave it implicit, or make it explicit, usually through a connective. Although not all connectives can go with every relation, there is one that is particularly frequent and compatible with very many discourse relations, namely and. In this paper, we investigate the effect of discourse relation type and text genre on the production and perception of underspecified relations of contrast and consequence signalled by and. We combine a corpus study of spoken English, a production experiment and a perception experiment in order to test two hypotheses: (1) and is more compatible with relations of consequence than of contrast, due to factors of cognitive complexity and conceptual differences; (2) and is more compatible with informal than formal genres, because of requirements of recipient design. The three studies partially converge in identifying a stable effect of relation type and genre on the production and perception of underspecified relations of consequence and contrast marked by and.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Avgustinova, Tania

Surprisal in Intercomprehension Book Chapter

Slavcheva, Milena; Simov, Kiril; Osenova, Petya; Boytcheva, Svetla (Ed.): Knowledge, Language, Models, INCOMA Ltd., pp. 6-19, Shoumen, Bulgaria, 2020, ISBN 978-954-452-062-5.

A large-scale interdisciplinary research collaboration at Saarland University (Crocker et al. 2016) investigates the hypothesis that language use may be driven by the optimal utilization of the communication channel. The information-theoretic concepts of entropy (Shannon, 1949) and surprisal (Hale 2001; Levy 2008) have gained in popularity due to their potential to predict human linguistic behavior. The underlying assumption is that there is a certain total amount of information contained in a message, which is distributed over the individual units constituting it. Capturing this distribution of information is the goal of surprisal-based modeling with the intention of predicting the processing effort experienced by humans upon encountering these units. The ease of processing linguistic material is thus correlated with its contextually determined predictability, which may be appropriately indexed by Shannon’s notion of information. Multilingualism pervasiveness suggests that human language competence is used quite robustly, taking on various types of information and employing multi-source compensatory and guessing strategies. While it is not realistic to require from every single person to master several languages, it is certainly beneficial to strive and promote a significantly higher degree of receptive skills facilitating the access to other languages. Taking advantage of linguistic similarity – genetic, typological or areal – is the key to acquiring such abilities as efficiently as possible. Awareness that linguistic structures known of a specific language apply to other varieties in which similar phenomena are detectable is indeed essential

@inbook{TAfestGA,
title = {Surprisal in Intercomprehension},
author = {Tania Avgustinova},
editor = {Milena Slavcheva and Kiril Simov and Petya Osenova and Svetla Boytcheva},
url = {https://www.coli.uni-saarland.de/~tania/ta-pub/Avgustinova2020.Festschrift.pdf},
year = {2020},
date = {2020},
booktitle = {Knowledge, Language, Models},
isbn = {978-954-452-062-5},
pages = {6-19},
publisher = {INCOMA Ltd.},
address = {Shoumen, Bulgaria},
abstract = {A large-scale interdisciplinary research collaboration at Saarland University (Crocker et al. 2016) investigates the hypothesis that language use may be driven by the optimal utilization of the communication channel. The information-theoretic concepts of entropy (Shannon, 1949) and surprisal (Hale 2001; Levy 2008) have gained in popularity due to their potential to predict human linguistic behavior. The underlying assumption is that there is a certain total amount of information contained in a message, which is distributed over the individual units constituting it. Capturing this distribution of information is the goal of surprisal-based modeling with the intention of predicting the processing effort experienced by humans upon encountering these units. The ease of processing linguistic material is thus correlated with its contextually determined predictability, which may be appropriately indexed by Shannon’s notion of information. Multilingualism pervasiveness suggests that human language competence is used quite robustly, taking on various types of information and employing multi-source compensatory and guessing strategies. While it is not realistic to require from every single person to master several languages, it is certainly beneficial to strive and promote a significantly higher degree of receptive skills facilitating the access to other languages. Taking advantage of linguistic similarity – genetic, typological or areal – is the key to acquiring such abilities as efficiently as possible. Awareness that linguistic structures known of a specific language apply to other varieties in which similar phenomena are detectable is indeed essential},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   C4

Tourtouri, Elli

Rational redundancy in situated communication PhD Thesis

Saarland University, Saarbrücken, 2020.

Contrary to the Gricean maxims of Quantity (Grice, 1975), it has been repeatedly shown that speakers often include redundant information in their utterances (over- specifications). Previous research on referential communication has long debated whether this redundancy is the result of speaker-internal or addressee-oriented processes, while it is also unclear whether referential redundancy hinders or facilitates comprehension. We present a bounded-rational account of referential redundancy, according to which any word in an utterance, even if it is redundant, can be beneficial to comprehension, to the extent that it facilitates the reduction of listeners’ uncertainty regarding the target referent in a co-present visual scene. Information-theoretic metrics, such as Shannon’s entropy (Shannon, 1948), were employed in order to quantify this uncertainty in bits of information, and gain an estimate of the cognitive effort related to referential processing. Under this account, speakers may, therefore, utilise redundant adjectives in order to reduce the visually-determined entropy (and thereby their listeners’ cognitive effort) more uniformly across their utterances. In a series of experiments, we examined both the comprehension and the production of over-specifications in complex visual contexts. Our findings are in line with the bounded-rational account. Specifically, we present evidence that: (a) in view of complex visual scenes, listeners’ processing and identification of the target referent may be facilitated by the use of redundant adjectives, as well as by a more uniform reduction of uncertainty across the utterance, and (b) that, while both speaker-internal and addressee-oriented processes are at play in the production of over-specifications, listeners’ processing concerns may also influence the encoding of redundant adjectives, at least for some speakers, who encode redundant adjectives more frequently when these adjectives contribute to a more uniform reduction of referential entropy.

@phdthesis{Tourtouri2020,
title = {Rational redundancy in situated communication},
author = {Elli Tourtouri},
url = {https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/29453},
doi = {https://doi.org/10.22028/D291-31436},
year = {2020},
date = {2020},
school = {Saarland University},
address = {Saarbr{\"u}cken},
abstract = {Contrary to the Gricean maxims of Quantity (Grice, 1975), it has been repeatedly shown that speakers often include redundant information in their utterances (over- specifications). Previous research on referential communication has long debated whether this redundancy is the result of speaker-internal or addressee-oriented processes, while it is also unclear whether referential redundancy hinders or facilitates comprehension. We present a bounded-rational account of referential redundancy, according to which any word in an utterance, even if it is redundant, can be beneficial to comprehension, to the extent that it facilitates the reduction of listeners’ uncertainty regarding the target referent in a co-present visual scene. Information-theoretic metrics, such as Shannon’s entropy (Shannon, 1948), were employed in order to quantify this uncertainty in bits of information, and gain an estimate of the cognitive effort related to referential processing. Under this account, speakers may, therefore, utilise redundant adjectives in order to reduce the visually-determined entropy (and thereby their listeners’ cognitive effort) more uniformly across their utterances. In a series of experiments, we examined both the comprehension and the production of over-specifications in complex visual contexts. Our findings are in line with the bounded-rational account. Specifically, we present evidence that: (a) in view of complex visual scenes, listeners’ processing and identification of the target referent may be facilitated by the use of redundant adjectives, as well as by a more uniform reduction of uncertainty across the utterance, and (b) that, while both speaker-internal and addressee-oriented processes are at play in the production of over-specifications, listeners’ processing concerns may also influence the encoding of redundant adjectives, at least for some speakers, who encode redundant adjectives more frequently when these adjectives contribute to a more uniform reduction of referential entropy.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   C3

Batliner, Anton; Möbius, Bernd

Prosody in automatic speech processing Book Chapter

Gussenhoven, Carlos; Chen, Aoju (Ed.): The Oxford Handbook of Language Prosody, Chap. 46, Oxford University Press, pp. 633-645, 2020, ISBN 9780198832232.

Automatic speech processing (ASP) is understood as covering word recognition, the processing of higher linguistic components (syntax, semantics, and pragmatics), and the processing of computational paralinguistics (CP), which deals with speaker states and traits. This chapter attempts to track the role of prosody in ASP from the word level up to CP. A short history of the field from 1980 to 2020 distinguishes the early years (until 2000)— when the prosodic contribution to the modelling of linguistic phenomena, such as accents, boundaries, syntax, semantics, and dialogue acts, was the focus—from the later years, when the focus shifted to paralinguistics; prosody ceased to be visible. Different types of predictor variables are addressed, among them high-performance power features as well as leverage features, which can also be employed in teaching and therapy.

@inbook{Batliner/Moebius:2020,
title = {Prosody in automatic speech processing},
author = {Anton Batliner and Bernd M{\"o}bius},
editor = {Carlos Gussenhoven and Aoju Chen},
url = {https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780198832232.001.0001/oxfordhb-9780198832232-e-42},
doi = {https://doi.org/10.1093/oxfordhb/9780198832232.013.42},
year = {2020},
date = {2020},
booktitle = {The Oxford Handbook of Language Prosody, Chap. 46},
isbn = {9780198832232},
pages = {633-645},
publisher = {Oxford University Press},
abstract = {Automatic speech processing (ASP) is understood as covering word recognition, the processing of higher linguistic components (syntax, semantics, and pragmatics), and the processing of computational paralinguistics (CP), which deals with speaker states and traits. This chapter attempts to track the role of prosody in ASP from the word level up to CP. A short history of the field from 1980 to 2020 distinguishes the early years (until 2000)— when the prosodic contribution to the modelling of linguistic phenomena, such as accents, boundaries, syntax, semantics, and dialogue acts, was the focus—from the later years, when the focus shifted to paralinguistics; prosody ceased to be visible. Different types of predictor variables are addressed, among them high-performance power features as well as leverage features, which can also be employed in teaching and therapy.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   C1

Karpiňski, Maciej; Andreeva, Bistra; Asu, Eva Liina; Beňuš, Štefan; Daugavet, Anna; Mády, Katalin

Central and Eastern Europe Book Chapter

Gussenhoven, Carlos; Chen, Aoju (Ed.): The Oxford Handbook of Language Prosody, Chap. 15, Oxford University Press, pp. 225-235, 2020, ISBN 9780198832232.

The languages of Central and Eastern Europe addressed in this chapter form a typologically divergent collection that includes Slavic (Belarusian, Bulgarian, Czech, Macedonian, Polish, Russian, pluricentric Bosnian-Croatian-Montenegrin-Serbian, Slovak, Slovenian, Ukrainian), Baltic (Latvian, Lithuanian), Finno-Ugric (Hungarian, Finnish, Estonian), and Romance (Romanian). Their prosodic features and structures have been explored to various depths, from different theoretical perspectives, sometimes on the basis of relatively sparse material. Still, enough is known to see that their typological divergence as well as other factors contribute to vivid differences in their prosodic systems. While belonging to intonational languages, they differ in pitch patterns and their usage, duration, and rhythm (some involve phonological duration), as well as prominence mechanisms, accentuation, and word stress (fixed or mobile). Several languages in the area have what is referred to by different traditions as pitch accents, tones or syllable accents, or intonations.

 

@inbook{Karpinski/etal:2020,
title = {Central and Eastern Europe},
author = {Maciej Karpiňski and Bistra Andreeva and Eva Liina Asu and Štefan Beňuš and Anna Daugavet and Katalin M{\'a}dy},
editor = {Carlos Gussenhoven and Aoju Chen},
url = {https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780198832232.001.0001/oxfordhb-9780198832232-e-14},
year = {2020},
date = {2020},
booktitle = {The Oxford Handbook of Language Prosody, Chap. 15},
isbn = {9780198832232},
pages = {225-235},
publisher = {Oxford University Press},
abstract = {The languages of Central and Eastern Europe addressed in this chapter form a typologically divergent collection that includes Slavic (Belarusian, Bulgarian, Czech, Macedonian, Polish, Russian, pluricentric Bosnian-Croatian-Montenegrin-Serbian, Slovak, Slovenian, Ukrainian), Baltic (Latvian, Lithuanian), Finno-Ugric (Hungarian, Finnish, Estonian), and Romance (Romanian). Their prosodic features and structures have been explored to various depths, from different theoretical perspectives, sometimes on the basis of relatively sparse material. Still, enough is known to see that their typological divergence as well as other factors contribute to vivid differences in their prosodic systems. While belonging to intonational languages, they differ in pitch patterns and their usage, duration, and rhythm (some involve phonological duration), as well as prominence mechanisms, accentuation, and word stress (fixed or mobile). Several languages in the area have what is referred to by different traditions as pitch accents, tones or syllable accents, or intonations.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   C1

Abdullah, Badr M.; Avgustinova, Tania; Möbius, Bernd; Klakow, Dietrich

Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages Inproceedings

Proceedings of Interspeech 2020, pp. 477-481, 2020.

State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with different acoustic conditions due to domain shift. In this paper, we present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains (read speech and radio broadcast) and examine two low-level signal descriptors (spectral and cepstral features) for this task. Our experiments show that (1) out-of-domain speech samples severely hinder the performance of neural LID models, and (2) while both spectral and cepstral features show comparable performance within-domain, spectral features show more robustness under domain mismatch. Moreover, we apply unsupervised domain adaptation to minimize the discrepancy between the two domains in our study. We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.

@inproceedings{abdullah_etal_is2020,
title = {Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages},
author = {Badr M. Abdullah and Tania Avgustinova and Bernd M{\"o}bius and Dietrich Klakow},
url = {https://arxiv.org/abs/2008.00545},
doi = {https://doi.org/10.21437/Interspeech.2020-2930},
year = {2020},
date = {2020},
booktitle = {Proceedings of Interspeech 2020},
pages = {477-481},
abstract = {State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with different acoustic conditions due to domain shift. In this paper, we present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains (read speech and radio broadcast) and examine two low-level signal descriptors (spectral and cepstral features) for this task. Our experiments show that (1) out-of-domain speech samples severely hinder the performance of neural LID models, and (2) while both spectral and cepstral features show comparable performance within-domain, spectral features show more robustness under domain mismatch. Moreover, we apply unsupervised domain adaptation to minimize the discrepancy between the two domains in our study. We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C1 C4

Successfully