Zaitova, Iuliia; Abdullah, Badr M.; Klakow, Dietrich

Mapping Phonology to Semantics: A Computational Model of Cross-Lingual Spoken-Word Recognition

Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (October 2022, Gyeongju, Republic of Korea), Association for Computational Linguistics, pp. 54-63, 2022.

Closely related languages are often mutually intelligible to various degrees. Therefore, speakers of closely related languages are usually capable of (partially) comprehending each other’s speech without explicitly learning the target, second language. The cross-linguistic intelligibility among closely related languages is mainly driven by linguistic factors such as lexical similarities. This paper presents a computational model of spoken-word recognition and investigates its ability to recognize word forms from different languages than its native, training language. Our model is based on a recurrent neural network that learns to map a word’s phonological sequence onto a semantic representation of the word. Furthermore, we present a case study on the related Slavic languages and demonstrate that the cross-lingual performance of our model not only predicts mutual intelligibility to a large extent but also reflects the genetic classification of the languages in our study.