Publications

Singh, Mittul; Greenberg, Clayton; Oualil, Youssef; Klakow, Dietrich

Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling Inproceedings

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, Osaka, Japan, 2016.

Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings.

Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embeddings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the-art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.

@inproceedings{singh-EtAl:2016:COLING1,
title = {Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling},
author = {Mittul Singh and Clayton Greenberg and Youssef Oualil and Dietrich Klakow},
url = {http://aclweb.org/anthology/C16-1194},
year = {2016},
date = {2016-12-01},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
publisher = {The COLING 2016 Organizing Committee},
address = {Osaka, Japan},
abstract = {Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings. Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embeddings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the-art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Schwenger, Maximilian; Torralba, Álvaro; Hoffmann, Jörg; Howcroft, David M.; Demberg, Vera

From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation Inproceedings

Calzolari, Nicoletta; Matsumoto, Yuji; Prasad, Rashmi (Ed.): COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, ACL, pp. 1524-1534, Osaka, 2016, ISBN 978-4-87974-702-0.

The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can effectively detect partial solutions (edges) which cannot ultimately be part of a complete sentence because of their syntactic category. Formulating the completion of an edge into a sentence as finding a solution path in a large state-transition system, we demonstrate a connection to AI Planning which is concerned with this kind of problem. We design a compilation from OpenCCG into AI Planning allowing the detection of infeasible edges via AI Planning dead-end detection methods (proving the absence of a solution to the compilation). Our experiments show that this can filter out large fractions of infeasible edges in, and thus benefit the performance of, complex realization processes.

@inproceedings{DBLP:conf/coling/SchwengerTHHD16,
title = {From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation},
author = {Maximilian Schwenger and {\'A}lvaro Torralba and J{\"o}rg Hoffmann and David M. Howcroft and Vera Demberg},
editor = {Nicoletta Calzolari and Yuji Matsumoto and Rashmi Prasad},
url = {https://davehowcroft.com/publication/2016-12_coling_detecting-infeasible-edges/},
year = {2016},
date = {2016-12-01},
booktitle = {COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers},
isbn = {978-4-87974-702-0},
pages = {1524-1534},
publisher = {ACL},
address = {Osaka},
abstract = {The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can effectively detect partial solutions (edges) which cannot ultimately be part of a complete sentence because of their syntactic category. Formulating the completion of an edge into a sentence as finding a solution path in a large state-transition system, we demonstrate a connection to AI Planning which is concerned with this kind of problem. We design a compilation from OpenCCG into AI Planning allowing the detection of infeasible edges via AI Planning dead-end detection methods (proving the absence of a solution to the compilation). Our experiments show that this can filter out large fractions of infeasible edges in, and thus benefit the performance of, complex realization processes.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   A4

Stenger, Irina

How reading intercomprehension works among Slavic languages with Cyrillic script Inproceedings

Köllner, Marisa; Ziai, Ramon (Ed.): ESSLLI 2016, pp. 30-42, 2016.

@inproceedings{Stenger2016,
title = {How reading intercomprehension works among Slavic languages with Cyrillic script},
author = {Irina Stenger},
editor = {Marisa K{\"o}llner and Ramon Ziai},
url = {https://esslli2016.unibz.it/wp-content/uploads/2016/09/esslli-stus-2016-proceedings.pdf},
year = {2016},
date = {2016},
pages = {30-42},
publisher = {ESSLLI 2016},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Calvillo, Jesús; Brouwer, Harm; Crocker, Matthew W.

Connectionist semantic systematicity in language production Inproceedings

38th Annual Conference of the Cognitive Science Society, Austin, Texas, USA, 2016.

A novel connectionist model of sentence production is presented, which employs rich situation model representations originally proposed for modeling systematicity in comprehension (Frank, Haselager, & van Rooij, 2009). The high overall performance of our model demonstrates that such representations are not only suitable for comprehension, but also for modeling language production. Further, the model is able to produce novel encodings (active vs. passive) for a particular semantics, as well as generate such encodings for previously unseen situations, thus demonstrating both syntactic and semantic systematicity. Our results provide yet further evidence that such connectionist approaches can achieve systematicity, in production as well as comprehension.

@inproceedings{Calvillo2016,
title = {Connectionist semantic systematicity in language production},
author = {Jesús Calvillo and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/306400823_Connectionist_Semantic_Systematicity_in_Language_Production},
year = {2016},
date = {2016},
publisher = {38th Annual Conference of the Cognitive Science Society},
address = {Austin, Texas, USA},
abstract = {A novel connectionist model of sentence production is presented, which employs rich situation model representations originally proposed for modeling systematicity in comprehension (Frank, Haselager, & van Rooij, 2009). The high overall performance of our model demonstrates that such representations are not only suitable for comprehension, but also for modeling language production. Further, the model is able to produce novel encodings (active vs. passive) for a particular semantics, as well as generate such encodings for previously unseen situations, thus demonstrating both syntactic and semantic systematicity. Our results provide yet further evidence that such connectionist approaches can achieve systematicity, in production as well as comprehension.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Malisz, Zofia; O'Dell, Michael; Nieminen, Tommi; Wagner, Petra

Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish Journal Article

Phonetica, 73, pp. 229-255, 2016.

This study was aimed at analyzing empirical duration data for Polish spoken at different tempos using an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O’Dell and Nieminen, 1999, 2009). We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared with parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.

@article{Malisz/etal:2016,
title = {Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish},
author = {Zofia Malisz and Michael O'Dell and Tommi Nieminen and Petra Wagner},
url = {https://www.degruyter.com/document/doi/10.1159/000450829/html},
doi = {https://doi.org/10.1159/000450829},
year = {2016},
date = {2016},
journal = {Phonetica},
pages = {229-255},
volume = {73},
abstract = {

This study was aimed at analyzing empirical duration data for Polish spoken at different tempos using an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O'Dell and Nieminen, 1999, 2009). We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared with parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C1

Schulz, Erika; Oh, Yoon Mi; Andreeva, Bistra; Möbius, Bernd

Impact of Prosodic Structure and Information Density on Vowel Space Size Inproceedings

Proceedings of Speech Prosody, pp. 350-354, Boston, MA, USA, 2016.

We investigated the influence of prosodic structure and information density on vowel space size. Vowels were measured in five languages from the BonnTempo corpus, French, German, Finnish, Czech, and Polish, each with three female and three male speakers. Speakers read the text at normal, slow, and fast speech rate. The Euclidean distance between vowel space midpoint and formant values for each speaker was used as a measure for vowel distinctiveness. The prosodic model consisted of prominence and boundary. Information density was calculated for each language using the surprisal of the biphone Xn|Xn−1. On average, there is a positive relationship between vowel space expansion and information density. Detailed analysis revealed that this relationship did not hold for Finnish, and was only weak for Polish. When vowel distinctiveness was modeled as a function of prosodic factors and information density in linear mixed effects models (LMM), only prosodic factors were significant in explaining the variance in vowel space expansion. All prosodic factors, except word boundary, showed significant positive results in LMM. Vowels were more distinct in stressed syllables, before a prosodic boundary and at normal and slow speech rate compared to fast speech.

@inproceedings{Schulz/etal:2016a,
title = {Impact of Prosodic Structure and Information Density on Vowel Space Size},
author = {Erika Schulz and Yoon Mi Oh and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.researchgate.net/publication/303755409_Impact_of_prosodic_structure_and_information_density_on_vowel_space_size},
year = {2016},
date = {2016},
booktitle = {Proceedings of Speech Prosody},
pages = {350-354},
address = {Boston, MA, USA},
abstract = {We investigated the influence of prosodic structure and information density on vowel space size. Vowels were measured in five languages from the BonnTempo corpus, French, German, Finnish, Czech, and Polish, each with three female and three male speakers. Speakers read the text at normal, slow, and fast speech rate. The Euclidean distance between vowel space midpoint and formant values for each speaker was used as a measure for vowel distinctiveness. The prosodic model consisted of prominence and boundary. Information density was calculated for each language using the surprisal of the biphone Xn|Xn−1. On average, there is a positive relationship between vowel space expansion and information density. Detailed analysis revealed that this relationship did not hold for Finnish, and was only weak for Polish. When vowel distinctiveness was modeled as a function of prosodic factors and information density in linear mixed effects models (LMM), only prosodic factors were significant in explaining the variance in vowel space expansion. All prosodic factors, except word boundary, showed significant positive results in LMM. Vowels were more distinct in stressed syllables, before a prosodic boundary and at normal and slow speech rate compared to fast speech.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C1

Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar

Toward the use of information density based descriptive features in HMM based speech synthesis Inproceedings

8th International Conference on Speech Prosody, pp. 1029-1033, Boston, MA, USA, 2016.
Over the last decades, acoustic modeling for speech synthesis has been improved significantly. However, in most systems, the descriptive feature set used to represent annotated text has been the same for many years. Specifically, the prosody models in most systems are based on low level information such as syllable stress or word part-of-speech tags. In this paper, we propose to enrich the descriptive feature set by adding a linguistic measure computed from the predictability of an event, such as the occurrence of a syllable or word. By adding such descriptive features, we assume that we will improve prosody modeling. This new feature set is then used to train prosody models for speech synthesis. Results from an evaluation study indicate a preference for the new descriptive feature set over the conventional one.

@inproceedings{LeMaguer2016SP,
title = {Toward the use of information density based descriptive features in HMM based speech synthesis},
author = {S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner},
url = {https://www.researchgate.net/publication/305684951_Toward_the_use_of_information_density_based_descriptive_features_in_HMM_based_speech_synthesis},
year = {2016},
date = {2016},
booktitle = {8th International Conference on Speech Prosody},
pages = {1029-1033},
address = {Boston, MA, USA},
abstract = {

Over the last decades, acoustic modeling for speech synthesis has been improved significantly. However, in most systems, the descriptive feature set used to represent annotated text has been the same for many years. Specifically, the prosody models in most systems are based on low level information such as syllable stress or word part-of-speech tags. In this paper, we propose to enrich the descriptive feature set by adding a linguistic measure computed from the predictability of an event, such as the occurrence of a syllable or word. By adding such descriptive features, we assume that we will improve prosody modeling. This new feature set is then used to train prosody models for speech synthesis. Results from an evaluation study indicate a preference for the new descriptive feature set over the conventional one.
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C1 C5

Le Maguer, Sébastien; Möbius, Bernd; Steiner, Ingmar; Lolive, Damien

De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synthèse par HMM Inproceedings

Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP, AFCP - ATALA, pp. 714-722, Paris, France, 2016.

Durant les dernières décennies, la modélisation acoustique effectuée par les systèmes de synthèse de parole paramétrique a fait l’objet d’une attention particulière. Toutefois, dans la plupart des systèmes connus, l’ensemble des descripteurs linguistiques utilisés pour représenter le texte reste identique. Plus specifiquement, la modélisation de la prosodie reste guidée par des descripteurs de bas niveau comme l’information d’accentuation de la syllabe ou bien l’étiquette grammaticale du mot. Dans cet article, nous proposons d’intégrer des informations basées sur la prédictibilité d’un évènement (la syllabe ou le mot). Plusieurs études indiquent une corrélation forte entre cette mesure, fortement présente dans la linguistique computationnelle, et certaines spécificités lors de la production humaine de la parole. Notre hypothèse est donc que l’ajout de ces descripteurs améliore la modélisation de la prosodie. Cet article se focalise sur une analyse objective de l’apport de ces descripteurs sur la synthèse HMM pour la langue anglaise et française.

@inproceedings{Lemaguer/etal:2016b,
title = {De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synthèse par HMM},
author = {S{\'e}bastien Le Maguer and Bernd M{\"o}bius and Ingmar Steiner and Damien Lolive},
url = {https://aclanthology.org/2016.jeptalnrecital-jep.80},
year = {2016},
date = {2016},
booktitle = {Actes de la conf{\'e}rence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP},
pages = {714-722},
publisher = {AFCP - ATALA},
address = {Paris, France},
abstract = {Durant les dernières d{\'e}cennies, la mod{\'e}lisation acoustique effectu{\'e}e par les systèmes de synthèse de parole param{\'e}trique a fait l’objet d’une attention particulière. Toutefois, dans la plupart des systèmes connus, l’ensemble des descripteurs linguistiques utilis{\'e}s pour repr{\'e}senter le texte reste identique. Plus specifiquement, la mod{\'e}lisation de la prosodie reste guid{\'e}e par des descripteurs de bas niveau comme l’information d’accentuation de la syllabe ou bien l’{\'e}tiquette grammaticale du mot. Dans cet article, nous proposons d’int{\'e}grer des informations bas{\'e}es sur la pr{\'e}dictibilit{\'e} d’un {\'e}vènement (la syllabe ou le mot). Plusieurs {\'e}tudes indiquent une corr{\'e}lation forte entre cette mesure, fortement pr{\'e}sente dans la linguistique computationnelle, et certaines sp{\'e}cificit{\'e}s lors de la production humaine de la parole. Notre hypothèse est donc que l’ajout de ces descripteurs am{\'e}liore la mod{\'e}lisation de la prosodie. Cet article se focalise sur une analyse objective de l’apport de ces descripteurs sur la synthèse HMM pour la langue anglaise et française.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   C1 C5

Rubino, Raphael; Degaetano-Ortlieb, Stefania; Teich, Elke; van Genabith, Josef

Modeling Diachronic Change in Scientific Writing with Information Density Inproceedings

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, pp. 750-761, Osaka, Japan, 2016.

Previous linguistic research on scientific writing has shown that language use in the scientific domain varies considerably in register and style over time. In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax. Our approach is based on distinguishing between sentences from 19th and 20th century scientific abstracts using supervised classification models. To the best of our knowledge, the introduction of information theoretic features to this task is novel. We show that these features outperform more traditional features, such as token or character n-grams, while leading to more compact models. We present a detailed analysis of feature informativeness in order to gain a better understanding of diachronic change on different linguistic levels.

@inproceedings{C16-1072,
title = {Modeling Diachronic Change in Scientific Writing with Information Density},
author = {Raphael Rubino and Stefania Degaetano-Ortlieb and Elke Teich and Josef van Genabith},
url = {https://aclanthology.org/C16-1072},
year = {2016},
date = {2016},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
pages = {750-761},
publisher = {The COLING 2016 Organizing Committee},
address = {Osaka, Japan},
abstract = {Previous linguistic research on scientific writing has shown that language use in the scientific domain varies considerably in register and style over time. In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax. Our approach is based on distinguishing between sentences from 19th and 20th century scientific abstracts using supervised classification models. To the best of our knowledge, the introduction of information theoretic features to this task is novel. We show that these features outperform more traditional features, such as token or character n-grams, while leading to more compact models. We present a detailed analysis of feature informativeness in order to gain a better understanding of diachronic change on different linguistic levels.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Rubino, Raphael; Lapshinova-Koltunski, Ekaterina; van Genabith, Josef

Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification Inproceedings

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 960-970, San Diego, California, 2016.

This paper introduces information density and machine translation quality estimation inspired features to automatically detect and classify human translated texts. We investigate two settings: discriminating between translations and comparable originally authored texts, and distinguishing two levels of translation professionalism. Our framework is based on delexicalised sentence-level dense feature vector representations combined with a supervised machine learning approach. The results show state-of-the-art performance for mixed-domain translationese detection with information density and quality estimation based features, while results on translation expertise classification are mixed.

@inproceedings{N16-1110,
title = {Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification},
author = {Raphael Rubino and Ekaterina Lapshinova-Koltunski and Josef van Genabith},
url = {http://aclweb.org/anthology/N16-1110},
doi = {https://doi.org/10.18653/v1/N16-1110},
year = {2016},
date = {2016},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages = {960-970},
publisher = {Association for Computational Linguistics},
address = {San Diego, California},
abstract = {This paper introduces information density and machine translation quality estimation inspired features to automatically detect and classify human translated texts. We investigate two settings: discriminating between translations and comparable originally authored texts, and distinguishing two levels of translation professionalism. Our framework is based on delexicalised sentence-level dense feature vector representations combined with a supervised machine learning approach. The results show state-of-the-art performance for mixed-domain translationese detection with information density and quality estimation based features, while results on translation expertise classification are mixed.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B6

Singh, Mittul; Greenberg, Clayton; Klakow, Dietrich

The Custom Decay Language Model for Long Range Dependencies Book Chapter

Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Brno , Czech Republic, September 12-16, 2016, Proceedings, Springer International Publishing, pp. 343-351, Cham, 2016, ISBN 978-3-319-45510-5.

Significant correlations between words can be observed over long distances, but contemporary language models like N-grams, Skip grams, and recurrent neural network language models (RNNLMs) require a large number of parameters to capture these dependencies, if the models can do so at all. In this paper, we propose the Custom Decay Language Model (CDLM), which captures long range correlations while maintaining sub-linear increase in parameters with vocabulary size. This model has a robust and stable training procedure (unlike RNNLMs), a more powerful modeling scheme than the Skip models, and a customizable representation. In perplexity experiments, CDLMs outperform the Skip models using fewer number of parameters. A CDLM also nominally outperformed a similar-sized RNNLM, meaning that it learned as much as the RNNLM but without recurrence.

@inbook{Singh2016,
title = {The Custom Decay Language Model for Long Range Dependencies},
author = {Mittul Singh and Clayton Greenberg and Dietrich Klakow},
url = {http://dx.doi.org/10.1007/978-3-319-45510-5_39},
doi = {https://doi.org/10.1007/978-3-319-45510-5_39},
year = {2016},
date = {2016},
booktitle = {Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Brno , Czech Republic, September 12-16, 2016, Proceedings},
isbn = {978-3-319-45510-5},
pages = {343-351},
publisher = {Springer International Publishing},
address = {Cham},
abstract = {Significant correlations between words can be observed over long distances, but contemporary language models like N-grams, Skip grams, and recurrent neural network language models (RNNLMs) require a large number of parameters to capture these dependencies, if the models can do so at all. In this paper, we propose the Custom Decay Language Model (CDLM), which captures long range correlations while maintaining sub-linear increase in parameters with vocabulary size. This model has a robust and stable training procedure (unlike RNNLMs), a more powerful modeling scheme than the Skip models, and a customizable representation. In perplexity experiments, CDLMs outperform the Skip models using fewer number of parameters. A CDLM also nominally outperformed a similar-sized RNNLM, meaning that it learned as much as the RNNLM but without recurrence.},
pubstate = {published},
type = {inbook}
}

Copy BibTeX to Clipboard

Project:   B4

Oualil, Youssef; Greenberg, Clayton; Singh, Mittul; Klakow, Dietrich; Oualil, Youssef; Mittul, Singh

Sequential recurrent neural networks for language modeling Journal Article

Interspeech 2016, pp. 3509-3513, 2016.

Feedforward Neural Network (FNN)-based language models estimate the probability of the next word based on the history of the last N words, whereas Recurrent Neural Networks (RNN) perform the same task based only on the last word and some context information that cycles in the network. This paper presents a novel approach, which bridges the gap between these two categories of networks. In particular, we propose an architecture which takes advantage of the explicit, sequential enumeration of the word history in FNN structure while enhancing each word representation at the projection layer through recurrent context information that evolves in the network. The context integration is performed using an additional word-dependent weight matrix that is also learned during the training. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.

@article{oualil2016sequential,
title = {Sequential recurrent neural networks for language modeling},
author = {Youssef Oualil and Clayton Greenberg and Mittul Singh and Dietrich Klakow andYoussef Oualil and Singh Mittul},
url = {https://arxiv.org/abs/1703.08068},
year = {2016},
date = {2016},
journal = {Interspeech 2016},
pages = {3509-3513},
abstract = {Feedforward Neural Network (FNN)-based language models estimate the probability of the next word based on the history of the last N words, whereas Recurrent Neural Networks (RNN) perform the same task based only on the last word and some context information that cycles in the network. This paper presents a novel approach, which bridges the gap between these two categories of networks. In particular, we propose an architecture which takes advantage of the explicit, sequential enumeration of the word history in FNN structure while enhancing each word representation at the projection layer through recurrent context information that evolves in the network. The context integration is performed using an additional word-dependent weight matrix that is also learned during the training. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B4

Sayeed, Asad; Greenberg, Clayton; Demberg, Vera

Thematic fit evaluation: an aspect of selectional preferences Journal Article

Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, pp. 99-105, 2016, ISBN 9781945626142.

In this paper, we discuss the human thematic fit judgement correlation task in the context of real-valued vector space word representations. Thematic fit is the extent to which an argument fulfils the selectional preference of a verb given a role: for example, how well “cake” fulfils the patient role of “cut”. In recent work, systems have been evaluated on this task by finding the correlations of their output judgements with human-collected judgement data. This task is a representationindependent way of evaluating models that can be applied whenever a system score can be generated, and it is applicable wherever predicate-argument relations are significant to performance in end-user tasks. Significant progress has been made on this cognitive modeling task, leaving considerable space for future, more comprehensive types of evaluation.

@article{Sayeed2016,
title = {Thematic fit evaluation: an aspect of selectional preferences},
author = {Asad Sayeed and Clayton Greenberg and Vera Demberg},
url = {https://www.researchgate.net/publication/306094219_Thematic_fit_evaluation_an_aspect_of_selectional_preferences},
year = {2016},
date = {2016},
journal = {Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP},
pages = {99-105},
abstract = {In this paper, we discuss the human thematic fit judgement correlation task in the context of real-valued vector space word representations. Thematic fit is the extent to which an argument fulfils the selectional preference of a verb given a role: for example, how well “cake” fulfils the patient role of “cut”. In recent work, systems have been evaluated on this task by finding the correlations of their output judgements with human-collected judgement data. This task is a representationindependent way of evaluating models that can be applied whenever a system score can be generated, and it is applicable wherever predicate-argument relations are significant to performance in end-user tasks. Significant progress has been made on this cognitive modeling task, leaving considerable space for future, more comprehensive types of evaluation.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Projects:   B2 B4

Reich, Ingo; Horch, Eva

On “Article Omission” in German and the “Uniform Information Density Hypothesis” Inproceedings

Dipper, Stefanie; Neubarth, Friedrich; Zinsmeister, Heike (Ed.): Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), 16, pp. 125-127, Bochum, 2016.

This paper investigates whether Information Theory (IT) in the tradition of Shannon (1948) and in particular the “Uniform Information Density Hypothesis” (UID, see Jager 2010) might contribute to our understanding of a phenomenon called “article omission” (AO) in the literature. To this effect, we trained language models on a corpus of 17 different text types (from prototypically written text types like legal texts to prototypically spoken text types like dialogue) with about 2.000 sentences each and compared the density profiles of minimal pairs. Our results suggest, firstly, that an overtly realized article significantly reduces the surprisal on the following head noun (as was to be expected). It also shows, however, that omitting the article results in a non-uniform distribution (thus contradicting the UID). Since empirically AO seems not to depend on specific lexical items, we also trained our language models on a more abstract level (part of speech). With respect to this level of analysis we were able to show that, again, an overtly realized article significantly reduces the surprisal on the following head noun, but at the same time AO results in a more uniform distribution of information. In the case of AO the UID thus seems to operate on the level of POS rather than on the lexical level.

@inproceedings{HorchReich2016,
title = {On “Article Omission” in German and the “Uniform Information Density Hypothesis”},
author = {Ingo Reich and Eva Horch},
editor = {Stefanie Dipper and Friedrich Neubarth and Heike Zinsmeister},
url = {https://www.linguistics.rub.de/konvens16/pub/16_konvensproc.pdf},
year = {2016},
date = {2016},
booktitle = {Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016)},
pages = {125-127},
address = {Bochum},
abstract = {This paper investigates whether Information Theory (IT) in the tradition of Shannon (1948) and in particular the “Uniform Information Density Hypothesis” (UID, see Jager 2010) might contribute to our understanding of a phenomenon called “article omission” (AO) in the literature. To this effect, we trained language models on a corpus of 17 different text types (from prototypically written text types like legal texts to prototypically spoken text types like dialogue) with about 2.000 sentences each and compared the density profiles of minimal pairs. Our results suggest, firstly, that an overtly realized article significantly reduces the surprisal on the following head noun (as was to be expected). It also shows, however, that omitting the article results in a non-uniform distribution (thus contradicting the UID). Since empirically AO seems not to depend on specific lexical items, we also trained our language models on a more abstract level (part of speech). With respect to this level of analysis we were able to show that, again, an overtly realized article significantly reduces the surprisal on the following head noun, but at the same time AO results in a more uniform distribution of information. In the case of AO the UID thus seems to operate on the level of POS rather than on the lexical level.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Rutherford, Attapol; Demberg, Vera; Xue, Nianwen

Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features Journal Article

CoRR, 2016.

Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Surface features achieve good performance, but they are not readily applicable to other languages without semantic lexicons. Previous neural models require parses, surface features, or a small label set to work well. Here, we propose neural network models that are based on feedforward and long-short term memory architecture without any surface features. To our surprise, our best configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Under various fine-grained label sets and a cross-linguistic setting, our feedforward models perform consistently better or at least just as well as systems that require hand-crafted surface features. Our models present the first neural Chinese discourse parser in the style of Chinese Discourse Treebank, showing that our results hold cross-linguistically.

@article{DBLP:journals/corr/RutherfordDX16,
title = {Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features},
author = {Attapol Rutherford and Vera Demberg and Nianwen Xue},
url = {http://arxiv.org/abs/1606.01990},
year = {2016},
date = {2016},
journal = {CoRR},
abstract = {Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Surface features achieve good performance, but they are not readily applicable to other languages without semantic lexicons. Previous neural models require parses, surface features, or a small label set to work well. Here, we propose neural network models that are based on feedforward and long-short term memory architecture without any surface features. To our surprise, our best configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Under various fine-grained label sets and a cross-linguistic setting, our feedforward models perform consistently better or at least just as well as systems that require hand-crafted surface features. Our models present the first neural Chinese discourse parser in the style of Chinese Discourse Treebank, showing that our results hold cross-linguistically.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Torabi Asr, Fatemeh; Demberg, Vera

But vs. Although under the microscope Inproceedings

Proceedings of the 38th Meeting of the Cognitive Science Society, pp. 366-371, Philadelphia, Pennsylvania, USA, 2016.

Previous experimental studies on concessive connectives have only looked at their local facilitating or predictive effect on discourse relation comprehension and have often viewed them as a class of discourse markers with similar effects. We look into the effect of two connectives, but and although, for inferring contrastive vs. concessive discourse relations to complement previous experimental work on causal inferences. An offline survey on AMTurk and an online eye-tracking-while-reading experiment are conducted to show that even between these two connectives, which mark the same set of relations, interpretations are biased. The bias is consistent with the distribution of the connective across discourse relations. This suggests that an account of discourse connective meaning based on probability distributions can better account for comprehension data than a classic categorical approach, or an approach where closely related connectives only have a core meaning and the rest of the interpretation comes from the discourse arguments.

@inproceedings{Asr2016b,
title = {But vs. Although under the microscope},
author = {Fatemeh Torabi Asr and Vera Demberg},
url = {https://www.semanticscholar.org/paper/But-vs.-Although-under-the-microscope-Asr-Demberg/68be3f7ec0d7642f4371d991fc15471416141dfd},
year = {2016},
date = {2016},
booktitle = {Proceedings of the 38th Meeting of the Cognitive Science Society},
pages = {366-371},
address = {Philadelphia, Pennsylvania, USA},
abstract = {Previous experimental studies on concessive connectives have only looked at their local facilitating or predictive effect on discourse relation comprehension and have often viewed them as a class of discourse markers with similar effects. We look into the effect of two connectives, but and although, for inferring contrastive vs. concessive discourse relations to complement previous experimental work on causal inferences. An offline survey on AMTurk and an online eye-tracking-while-reading experiment are conducted to show that even between these two connectives, which mark the same set of relations, interpretations are biased. The bias is consistent with the distribution of the connective across discourse relations. This suggests that an account of discourse connective meaning based on probability distributions can better account for comprehension data than a classic categorical approach, or an approach where closely related connectives only have a core meaning and the rest of the interpretation comes from the discourse arguments.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Rehbein, Ines; Scholman, Merel; Demberg, Vera

Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks Inproceedings

Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios (Ed.): Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), pp. 1039-1046, Portorož, Slovenia, 2016, ISBN 978-2-9517408-9-1.

In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.

@inproceedings{REHBEIN16.457,
title = {Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks},
author = {Ines Rehbein and Merel Scholman and Vera Demberg},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
url = {https://aclanthology.org/L16-1165},
year = {2016},
date = {2016},
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
isbn = {978-2-9517408-9-1},
pages = {1039-1046},
publisher = {European Language Resources Association (ELRA)},
address = {Portoro{\v{z}, Slovenia},
abstract = {In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Rehbein, Ines; Scholman, Merel; Demberg, Vera

Disco-SPICE (Spoken conversations from the SPICE-Ireland corpus annotated with discourse relations) Inproceedings

Annotating discourse relations in spoken language: A comparison of the PDTB and CCR frameworks. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 16), Portorož, Slovenia, 2016.

The resource contains all texts from the Broadcast interview and Telephone conversation genres from the SPICE-Ireland corpus, annotated with discourse relations according to the PDTB 3.0 and CCR frameworks. Contact person: Merel Scholman

@inproceedings{merel2016,
title = {Disco-SPICE (Spoken conversations from the SPICE-Ireland corpus annotated with discourse relations)},
author = {Ines Rehbein and Merel Scholman and Vera Demberg},
year = {2016},
date = {2016},
booktitle = {Annotating discourse relations in spoken language: A comparison of the PDTB and CCR frameworks. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 16)},
address = {Portoro{\v{z}, Slovenia},
abstract = {The resource contains all texts from the Broadcast interview and Telephone conversation genres from the SPICE-Ireland corpus, annotated with discourse relations according to the PDTB 3.0 and CCR frameworks. Contact person: Merel Scholman},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Demberg, Vera; Sayeed, Asad

The Frequency of Rapid Pupil Dilations as a Measure of Linguistic Processing Difficulty Journal Article

Andreas Stamatakis, Emmanuel (Ed.): PLOS ONE, 11, 2016.

While it has long been known that the pupil reacts to cognitive load, pupil size has received little attention in cognitive research because of its long latency and the difficulty of separating effects of cognitive load from the light reflex or effects due to eye movements. A novel measure, the Index of Cognitive Activity (ICA), relates cognitive effort to the frequency of small rapid dilations of the pupil. We report here on a total of seven experiments which test whether the ICA reliably indexes linguistically induced cognitive load: three experiments in reading (a manipulation of grammatical gender match / mismatch, an experiment of semantic fit, and an experiment comparing locally ambiguous subject versus object relative clauses, all in German), three dual-task experiments with simultaneous driving and spoken language comprehension (using the same manipulations as in the single-task reading experiments), and a visual world experiment comparing the processing of causal versus concessive discourse markers. These experiments are the first to investigate the effect and time course of the ICA in language processing. All of our experiments support the idea that the ICA indexes linguistic processing difficulty. The effects of our linguistic manipulations on the ICA are consistent for reading and auditory presentation. Furthermore, our experiments show that the ICA allows for usage within a multi-task paradigm. Its robustness with respect to eye movements means that it is a valid measure of processing difficulty for usage within the visual world paradigm, which will allow researchers to assess both visual attention and processing difficulty at the same time, using an eye-tracker. We argue that the ICA is indicative of activity in the locus caeruleus area of the brain stem, which has recently also been linked to P600 effects observed in psycholinguistic EEG experiments.

@article{demberg:sayeed:2016:plosone,
title = {The Frequency of Rapid Pupil Dilations as a Measure of Linguistic Processing Difficulty},
author = {Vera Demberg and Asad Sayeed},
editor = {Emmanuel Andreas Stamatakis},
url = {http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4723154/},
doi = {https://doi.org/10.1371/journal.pone.0146194},
year = {2016},
date = {2016},
journal = {PLOS ONE},
volume = {11},
number = {1},
abstract = {

While it has long been known that the pupil reacts to cognitive load, pupil size has received little attention in cognitive research because of its long latency and the difficulty of separating effects of cognitive load from the light reflex or effects due to eye movements. A novel measure, the Index of Cognitive Activity (ICA), relates cognitive effort to the frequency of small rapid dilations of the pupil. We report here on a total of seven experiments which test whether the ICA reliably indexes linguistically induced cognitive load: three experiments in reading (a manipulation of grammatical gender match / mismatch, an experiment of semantic fit, and an experiment comparing locally ambiguous subject versus object relative clauses, all in German), three dual-task experiments with simultaneous driving and spoken language comprehension (using the same manipulations as in the single-task reading experiments), and a visual world experiment comparing the processing of causal versus concessive discourse markers. These experiments are the first to investigate the effect and time course of the ICA in language processing. All of our experiments support the idea that the ICA indexes linguistic processing difficulty. The effects of our linguistic manipulations on the ICA are consistent for reading and auditory presentation. Furthermore, our experiments show that the ICA allows for usage within a multi-task paradigm. Its robustness with respect to eye movements means that it is a valid measure of processing difficulty for usage within the visual world paradigm, which will allow researchers to assess both visual attention and processing difficulty at the same time, using an eye-tracker. We argue that the ICA is indicative of activity in the locus caeruleus area of the brain stem, which has recently also been linked to P600 effects observed in psycholinguistic EEG experiments.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Kermes, Hannah; Degaetano-Ortlieb, Stefania; Knappen, Jörg; Khamis, Ashraf; Teich, Elke

The Royal Society Corpus: From Uncharted Data to Corpus Inproceedings

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), European Language Resources Association (ELRA), pp. 1928-1931, Portorož, Slovenia, 2016.

We present the Royal Society Corpus (RSC) built from the Philosophical Transactions and Proceedings of the Royal Society of London. At present, the corpus contains articles from the first two centuries of the journal (1665-1869) and amounts to around 35 million tokens. The motivation for building the RSC is to investigate the diachronic linguistic development of scientific English. Specifically, we assume that due to specialization, linguistic encodings become more compact over time (Halliday, 1988; Halliday and Martin, 1993), thus creating a specific discourse type characterized by high information density that is functional for expert communication. When building corpora from uncharted material, typically not all relevant meta-data (e.g. author, time, genre) or linguistic data (e.g. sentence/word boundaries, words, parts of speech) is readily available. We present an approach to obtain good quality meta-data and base text data adopting the concept of Agile Software Development.

@inproceedings{Kermes2016,
title = {The Royal Society Corpus: From Uncharted Data to Corpus},
author = {Hannah Kermes and Stefania Degaetano-Ortlieb and J{\"o}rg Knappen and Ashraf Khamis and Elke Teich},
url = {https://aclanthology.org/L16-1305},
year = {2016},
date = {2016},
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)},
pages = {1928-1931},
publisher = {European Language Resources Association (ELRA)},
address = {Portoro{\v{z}, Slovenia},
abstract = {We present the Royal Society Corpus (RSC) built from the Philosophical Transactions and Proceedings of the Royal Society of London. At present, the corpus contains articles from the first two centuries of the journal (1665-1869) and amounts to around 35 million tokens. The motivation for building the RSC is to investigate the diachronic linguistic development of scientific English. Specifically, we assume that due to specialization, linguistic encodings become more compact over time (Halliday, 1988; Halliday and Martin, 1993), thus creating a specific discourse type characterized by high information density that is functional for expert communication. When building corpora from uncharted material, typically not all relevant meta-data (e.g. author, time, genre) or linguistic data (e.g. sentence/word boundaries, words, parts of speech) is readily available. We present an approach to obtain good quality meta-data and base text data adopting the concept of Agile Software Development.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Successfully