Batliner, Anton; Möbius, Bernd

Prosody in automatic speech processing

Gussenhoven, Carlos; Chen, Aoju (Ed.): The Oxford Handbook of Language Prosody, Chap. 46, Oxford University Press, pp. 633-645, 2020, ISBN 9780198832232.

Automatic speech processing (ASP) is understood as covering word recognition, the processing of higher linguistic components (syntax, semantics, and pragmatics), and the processing of computational paralinguistics (CP), which deals with speaker states and traits. This chapter attempts to track the role of prosody in ASP from the word level up to CP. A short history of the field from 1980 to 2020 distinguishes the early years (until 2000)— when the prosodic contribution to the modelling of linguistic phenomena, such as accents, boundaries, syntax, semantics, and dialogue acts, was the focus—from the later years, when the focus shifted to paralinguistics; prosody ceased to be visible. Different types of predictor variables are addressed, among them high-performance power features as well as leverage features, which can also be employed in teaching and therapy.