Exploiting referential gaze for uncertainty reduction in situated language processing: an information-theoretic approach
Saarland University, Saarbrücken, 2019.
A large body of contemporary psycholinguistic research utilizes the information-theoretic notions related to the transmission of information in an attempt to better understand and formalize the regularities of language production and comprehension. The overarching hypothesis is that prediction is a core mechanism underlying language comprehension. Anticipating what is likely to be mentioned next based on the previous context is what is assumed to allow for smooth and effortless communication. The anticipation of linguistic units that fit the current context reduces the uncertainty about the upcoming material, which consequently facilitates the processing of that material, in a typically noisy channel.
Situated language processing allows for the integration of not only linguistic but also non-linguistic visual information that contribute to establishing the context, and facilitate the creation of anticipations regarding the upcoming linguistic material. Moreover, noticing that our interlocutor is directing her attention to a certain object, inspires a shift in our visual attention towards the same entity. Since what is relevant for our interlocutor is highly likely to be relevant for us, too, whether simply conversationally, or more importantly, even existentially (Emery, 2000). Hence, following the speaker’s referential gaze cue towards an object relevant for the current conversation has been shown to benefit listeners’ language processing, measured by shorter reaction times on subsequent tasks (e.g., Staudte & Crocker, 2011; Staudte, Crocker, Heloir, & Kipp, 2014; Knoeferle & Kreysa, 2012; Macdonald & Tatler, 2013, 2014).
The present thesis aimed to provide an insight into the mechanisms behind this facilitation. We examined the dynamics of combining visual and linguistic information in creating anticipation for a specific object to be mentioned, and the effect this has on language processing. To this end we used a pupillary measure of cognitive load that is robust enough to allow for free eye movements (the Index of Cognitive Activity; Marshall, 2000). This enabled us to measure not only the visual attention during language comprehension, but also the immediately induced cognitive load at various relevant points during the auditory presentation of the linguistic stimulus.
Eight experiments were conducted towards addressing our research questions. The initial three experiments established the ICA measurement in the context of our linguistic manipulation. This series of experiments included reading, cognitive load during listening, and the examination of visual attention together with cognitive load in the visual world paradigm (VWP). Subsequently, we conducted five eye tracking experiments in the VWP where the linguistic context was further enriched by a referential gaze cue. All five experiments simultaneously assessed both visual attention and the immediate cognitive load induced at different stages of sentence processing. We manipulated the existence of the referential gaze cue (Exp. 4), the probability of mention of the cued object (Exp. 4, 5), the congruency of the gaze cue and the subsequent referring expression (Exp. 6), as well as the number of cued objects with equal probability of mention (Exp. 7, 8). Finally, we examined whether the gaze cue can take the role of fully disambiguating the target referent (Exp. 8).
We quantified the importance of the visual context in language processing, and showed that if a certain object from the visual context has a higher likelihood of mention given the linguistic context, its processing is facilitated, in comparison to the processing of the same sentence without the visual context. Furthermore, our results support the previous findings that the referential gaze cue leads to a shift in visual attention towards the cued object, thereby facilitating language processing. We expanded these findings by showing that it is the processing of the linguistic reference, that is the referent noun, that is facilitated by gaze-following. Importantly, perceiving and following the gaze cue did not prove costly in terms of cognitive effort, unless the cued object did not fit the verb selectional preferences. This is true regardless of the number of objects cued, or the lower likelihood of mention of the cued object.
We conclude that listeners strategically use visual information to reduce the referential uncertainty for upcoming nouns but that the visual cues, such as the referential gaze cue, do not underly the same kinds of expectations (and resulting cognitive costs) as linguistic references. We did not find evidence that the gaze cue is processed in a manner comparable to noun processing, rather, it is likely perceived as a relevant piece of information introduced in addition to the linguistic material, in order to aid language processing, but, importantly, not there to substitute it.