Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax Inproceedings
Chiruzzo, Luis; Ritter, Alan; Wang, Lu (Ed.): Findings of the Association for Computational Linguistics: NAACL 2025, Association for Computational Linguistics, pp. 4083-4092, Albuquerque, New Mexico, 2025, ISBN 979-8-89176-195-7.This study analyzes the attention patterns of fine-tuned encoder-only models based on the BERT architecture (BERT-based models) towards two distinct types of Multiword Expressions (MWEs): idioms and microsyntactic units (MSUs). Idioms present challenges in semantic non-compositionality, whereas MSUs demonstrate unconventional syntactic behavior that does not conform to standard grammatical categorizations. We aim to understand whether fine-tuning BERT-based models on specific tasks influences their attention to MWEs, and how this attention differs between semantic and syntactic tasks. We examine attention scores to MWEs in both pre-trained and fine-tuned BERT-based models. We utilize monolingual models and datasets in six Indo-European languages — English, German, Dutch, Polish, Russian, and Ukrainian. Our results show that fine-tuning significantly influences how models allocate attention to MWEs. Specifically, models fine-tuned on semantic tasks tend to distribute attention to idiomatic expressions more evenly across layers. Models fine-tuned on syntactic tasks show an increase in attention to MSUs in the lower layers, corresponding with syntactic processing requirements.
@inproceedings{zaitova-etal-2025-attention,
title = {Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax},
author = {Iuliia Zaitova and Vitalii Hirak and Badr M. Abdullah and Dietrich Klakow and Bernd M{\"o}bius and Tania Avgustinova},
editor = {Luis Chiruzzo and Alan Ritter and Lu Wang},
url = {https://aclanthology.org/2025.findings-naacl.228/},
year = {2025},
date = {2025},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
isbn = {979-8-89176-195-7},
pages = {4083-4092},
publisher = {Association for Computational Linguistics},
address = {Albuquerque, New Mexico},
abstract = {This study analyzes the attention patterns of fine-tuned encoder-only models based on the BERT architecture (BERT-based models) towards two distinct types of Multiword Expressions (MWEs): idioms and microsyntactic units (MSUs). Idioms present challenges in semantic non-compositionality, whereas MSUs demonstrate unconventional syntactic behavior that does not conform to standard grammatical categorizations. We aim to understand whether fine-tuning BERT-based models on specific tasks influences their attention to MWEs, and how this attention differs between semantic and syntactic tasks. We examine attention scores to MWEs in both pre-trained and fine-tuned BERT-based models. We utilize monolingual models and datasets in six Indo-European languages — English, German, Dutch, Polish, Russian, and Ukrainian. Our results show that fine-tuning significantly influences how models allocate attention to MWEs. Specifically, models fine-tuned on semantic tasks tend to distribute attention to idiomatic expressions more evenly across layers. Models fine-tuned on syntactic tasks show an increase in attention to MSUs in the lower layers, corresponding with syntactic processing requirements.},
pubstate = {published},
type = {inproceedings}
}
Project: C4