Open Knowledge Representation for Texts
Ido Dagan
Bar Ilan University, Department of Computer Science,
How can we capture the information expressed in large amounts of text? And how can we allow people, as well as computer applications, to easily explore it? When comparing textual knowledge to formal knowledge representation (KR) paradigms, two prominent differences arise. First, typical KR paradigms rely on pre-specified vocabularies, which are limited in their scope, while natural language is inherently open. Second, in a formal knowledge base each fact is encoded in a single canonical manner, while in multiple texts facts may be repeated with some redundant, complementary and even contradictory information.
In this talk, I will outline a new research direction, which we term Open Knowledge Representation (OKR), which aims to represent textual information in a consolidated manner, based on the available natural language vocabulary and structure. I will describe our first specification for OKR structure, motivated by a use case of representing multiple tweets describing an event, for which we have created a medium-scale annotated dataset. Our structure merges co-referring individual proposition extractions, created in an Open-IE flavor, into a representation of consolidated entities and propositions, inspired by formal knowledge graphs. Different language expressions, denoting entities, arguments and propositions, are further organized into entailment graphs, which allow tracing information redundancy and containment. I will also present some analysis of our dataset and baseline results, illustrate the potential application of OKR for text exploration and point at possible directions in which the OKR paradigm might evolve.