Structuring Wikipedia for "a machine that can explain its decision in language”

Abstract

Structuring Wikipedia for “a machine that can explain its decision in language”

Satoshi Sekine
RIKEN Center for Advanced Intelligence Project, Japan

The final goal of our project is to build “a machine that can explain its decision in language.” One of the resources needed to achieve this goal is the world knowledge which can be easily handled by machines. Wikipedia is a great resource of world knowledge, in particular for named entities. However it is written to be read by humans, and in order to machines to access the resource, we need to make it well-structured. Unlike DBpedia, Freebase or YAGO which contains many noise because these are basically categorized and structured by crowds; we believe the structure, e.g. categories and attributes, has to be designed in a top-down manner. We employed “Extended Named Entity definition created at NYU and we are trying to transfer most of the Wikipedia entries into that structure. This task is known as Knowledge Base Population (KBP) and the technologies have been improving through shared tasks. However, the fruit of the advancements are not used for resource construction. We conducted the “SHINRA” project under the “Resource by Collaborative Contribution (RbCC)” scheme. We run a shared-task of structuring Japanese Wikipedia for 5 categories, but it also aim to create a resource based on the output of the participated systems. SHINRA-2018 project was started in December 2017 and concluded the first trial in September 2018. The tasks in SHINRA-2019 will include the categorization task for 9 languages, as well as the structuring task for 39 categories of Japanese Wikipedia.