Publications - SFB 1102

Singh, Mittul; Oualil, Youssef; Klakow, Dietrich

Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition Inproceedings

18th Annual Conference of the International Speech Communication Association (INTERSPEECH), Stockholm, Sweden, 2017.

Abstract
|
Links
|
BibTeX

Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the ﬁrst pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, as a second step in the decoding process. Ideally, these adapted RNNLMs should be applied for ﬁrst-pass decoding. Thus, we introduce two ways of applying adapted long-short-term-memory (LSTM) based RNNLMs for ﬁrst-pass decoding. Using available techniques to convert LSTMs to approximated versions for ﬁrst-pass decoding, we compare approximated LSTMs adapted in a Fast Marginal Adaptation framework (FMA) and an approximated version of architecture-based-adaptation of LSTM. On a conversational speech recognition task, these differently approximated and adapted LSTMs combined with a trigram LM outperform other adapted and unadapted LMs. Here, the architecture-adapted LSTM combination obtains a 35.9 % word error rate (WER) and is outperformed by FMA-based LSTM combination obtaining the overall lowest WER of 34.4 %

https://www.researchgate.net/publication/319185101_Approximated_and_Domain-Adapted_LSTM_Language_Models_for_First-Pass_Decoding_in_Speech_Recognition

@inproceedings{Singh2017,
title = {Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition},
author = {Mittul Singh and Youssef Oualil and Dietrich Klakow},
url = {https://www.researchgate.net/publication/319185101_Approximated_and_Domain-Adapted_LSTM_Language_Models_for_First-Pass_Decoding_in_Speech_Recognition},
year = {2017},
date = {2017},
publisher = {18th Annual Conference of the International Speech Communication Association (INTERSPEECH)},
address = {Stockholm, Sweden},
abstract = {Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the ﬁrst pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, as a second step in the decoding process. Ideally, these adapted RNNLMs should be applied for ﬁrst-pass decoding. Thus, we introduce two ways of applying adapted long-short-term-memory (LSTM) based RNNLMs for ﬁrst-pass decoding. Using available techniques to convert LSTMs to approximated versions for ﬁrst-pass decoding, we compare approximated LSTMs adapted in a Fast Marginal Adaptation framework (FMA) and an approximated version of architecture-based-adaptation of LSTM. On a conversational speech recognition task, these differently approximated and adapted LSTMs combined with a trigram LM outperform other adapted and unadapted LMs. Here, the architecture-adapted LSTM combination obtains a 35.9 % word error rate (WER) and is outperformed by FMA-based LSTM combination obtaining the overall lowest WER of 34.4 %},
pubstate = {published},
type = {inproceedings}
}