Zhai, Fangzhou

Towards wider coverage script knowledge for NLP

Saarländische Universitäts- und Landesbibliothek, Saarland University, Saarbruecken, Germany, 2023.

This thesis focuses on acquiring wide coverage script knowledge. Script knowledge constitutes a category of common sense knowledge that delineates the procedural aspects of daily activities, such as taking a train and going grocery shopping. It is believed to reside in human memory and is generally assumed by all conversational parties. Conversational utterances often omit details assumed to be known by listeners, who, in turn, comprehend these concise expressions based on their shared understanding, with common sense knowledge forming the basis. Common sense knowledge is indispensable for both the production and comprehension of conversation. As outlined in Chapters 2 and 3, Natural Language Processing (NLP) applications experience significant enhancements with access to script knowledge. Notably, various NLP tasks demonstrate substantial performance improvements when script knowledge is accessible, suggesting that these applications are not fully cognizant of script knowledge. However, acquiring high-quality script knowledge is costly, resulting in limited resources that cover only a few scenarios. Consequently, the practical utility of existing resources is constrained due to insufficient coverage of script knowledge. This thesis is dedicated to developing cost-effective methods for acquiring script knowledge to augment NLP applications and expand the coverage of explicit script knowledge. Previous resources have been generated through intricate manual annotation pipelines. In this work, we introduce automated methods to streamline the annotation process. Specifically, we propose a zero-shot script parser in Chapter 5. By leveraging representation learning, we extract script annotations from existing resources and employ this knowledge to automatically annotate texts from unknown scenarios. When applied to parallel descriptions of unknown scenarios, the acquired script knowledge proves adequate to support NLP applications, such as story generation (Chapter 6). In Chapter 7, we explore the potential of pretrained language models as a source of script knowledge.