Unsupervised Language Acquisition from Raw Speech

Abstract

Unsupervised Language Acquisition from Raw Speech

Reinhold Häb-Umbach
University of Paderborn, Nachrichtentechnik

We consider the problem of segmenting an input sequence of symbols in recurrent patterns. This is achieved by employing nonparametric Bayesian statistical models, in particular the Nested Pitman-Yor process. We then consider the problem that the input sequence is noisy, i.e., contains errors, and propose an iterative word segmentation algorithm. An application is automatic speech recognition for a language for which a pronunciation lexicon and a language model are unavailable. Results will be presented for an English task and, for the segmentation of noisefree input, for two austronesian languages, Wooi and Waima’s.

If you would like to meet with the speaker, please contact Dietrich Klakow.