Alves, Diego; Gamallo, Pablo; Claro, Daniela; Teixeira, António; Real, Livy; Garcia, Marcos; Gonçalo Oliveira, Hugo; Amaro, Raquel
An evaluation of Portuguese language models‘ adaptation to African Portuguese varieties
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, Association for Computational Lingustics, pp. 544-550, Santiago de Compostela, Galicia/Spain, 2024.
In this study, we conduct a comparative evaluation of two state-of-the-art language models, Albertina PT-PT and Albertina PT-BR, which are trained on European Portuguese and Brazilian Portuguese, respectively. Our aim is to assess their suitability for African varieties of Portuguese. To evaluate their performance, we create two test sets for each variety, encompassing both spoken and written language. We measure the percentage of sentences in which one model outperforms the other in terms of perplexity. This evaluation seeks to ascertain whether one model shows more adaptability to the African varieties of Portuguese. Our findings reveal that Albertina PT-PT consistently outperforms Albertina PT-BR in scenarios involving spoken language corpora. However, in written registers, the advantage of Albertina PTPT is less pronounced for the Portuguese varieties of Guinea-Bissau, Mozambique, and São Tomé and Principe. These insights contribute to our understanding of the adaptability of existing language models to African Portuguese varieties and emphasize the need for specialized models to address the unique linguistic nuances of this region.