Singh, Mittul; Oualil, Youssef; Klakow, Dietrich

Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition

18th Annual Conference of the International Speech Communication Association (INTERSPEECH), Stockholm, Sweden, 2017.

Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the ﬁrst pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, as a second step in the decoding process. Ideally, these adapted RNNLMs should be applied for ﬁrst-pass decoding. Thus, we introduce two ways of applying adapted long-short-term-memory (LSTM) based RNNLMs for ﬁrst-pass decoding. Using available techniques to convert LSTMs to approximated versions for ﬁrst-pass decoding, we compare approximated LSTMs adapted in a Fast Marginal Adaptation framework (FMA) and an approximated version of architecture-based-adaptation of LSTM. On a conversational speech recognition task, these differently approximated and adapted LSTMs combined with a trigram LM outperform other adapted and unadapted LMs. Here, the architecture-adapted LSTM combination obtains a 35.9 % word error rate (WER) and is outperformed by FMA-based LSTM combination obtaining the overall lowest WER of 34.4 %

Back