Learning Shallow Semantics with Little or No Supervision

Abstract

Learning Shallow Semantics with Little or No Supervision

Ivan Titov

University of Amsterdam (UvA), Institute for Logic, Language and Computation (ILLC)

Inducing meaning representations from text is one of the key objectives of natural language processing. Most existing statistical semantic analyzers rely on large human-annotated datasets, which are expensive to create and exist only for a very limited number of languages. Even then, they are not very robust, cover only a small proportion of semantic constructions appearing in the labeled data, and are domain-dependent. We investigate approaches which do not use any labeled data but induce shallow semantic representations (i.e. semantic roles and frames) from unannotated texts. Unlike semantically-annotated data, unannotated texts are plentiful and available for many languages and many domains which makes our approach particularly promising. I will contrast the generative framework (incl. our non-parametric Bayesian model) and a new approach called reconstruction-error minimization (REM) for semantics. Unlike the more traditional generative framework, REM lets us effectively train expressive feature-rich models in an unsupervised way. Moreover, it allows us to specialize our representations to be useful for (basic forms of) semantic inference. We show that REM achieves state-of-the-art results on the unsupervised semantic role labeling task (across languages without any language-specific tuning) and significantly outperforms generative counterparts on the unsupervised relation discovery task.

Joint work with Ehsan Khoddam, Alex Klementiev and Diego Marcheggiani.

If you would like to meet with the speaker, please contact Manfred Pinkal.