Neural Feature and Representation Learning for Information Density Based Translationese Classification

Project B6

The project B6 continuation application is focused on addressing limitations of the information density methodological framework based on hand-crafted features and implemented and evaluated during the first phase of the project by making use of neural network approaches to capture and explore information density-based textual features, namely surprisal, with applications to translationese identification, machine translation evaluation and improvement.

A systematic comparison will be conducted between neural and standard count-based textual features with the objective of exploring the information encoded in continuous representations obtained by unsupervised and end-to-end learning methods. The combination of hand-crafted and neural information density features will provide an extension to the classic surprisal measure. In addition, neural approaches facilitate multi-granularity input representations and various context sizes for surprisal measure calculation. An important part of project B6 in the context of the CRC is the analysis and visualisation of representations learned by neural networks, in order to compare with features inspired by our linguistic intuitions.

Keywords: machine translation