AI, Automatic Speech Recognition and phonetics

Abstract

Over the last couple of years, we witness the enormous impact of AI-inspired end-to-end approaches for Automatic Speech Recognition. In this talk I will discuss the progress in the field, by sketching a number of experiments that aim to find phonetic relations in the latent representations of Wav2vec2.0. Also the use of end-to-end audio decoding will be discussed for the early detection of neurodegenerative diseases (Alzheimer, Parkinson etc.) from the speech signal. Finally, we will address the topic of interpretability of a foundation model, by discussing various techniques that can be used to ‚crack‘ the black box. A nice discussion point is insofar/whether the old ‚classical‘ knowledge about speech is represented in some way in the parameters of a trained end-to-end model, and how.