Publications

Jachmann, Torsten; Drenhaus, Heiner; Staudte, Maria; Crocker, Matthew W.

The Influence of Speaker's Gaze on Sentence Comprehension: An ERP Investigation Inproceedings

Proceedings of the 39th Annual Conference of the Cognitive Science Society, pp. 2261-2266, 2017.

Behavioral studies demonstrate the influence of speaker gaze in visually-situated spoken language comprehension. We present an ERP experiment examining the influence of speaker’s gaze congruency on listeners’ comprehension of referential expressions related to a shared visual scene. We demonstrate that listeners exploit speakers’ gaze toward objects in order to form sentence continuation expectations: Compared to a congruent gaze condition, we observe an increased N400 when (a) the lack of gaze (neutral) does not allow for upcoming noun prediction, and (b) when the noun violates gaze-driven expectations (incongruent). The later also results in a late (sustained) positivity, reflecting the need to update the assumed situation model. We take the combination of the N400 and late positivity as evidence that speaker gaze influences both lexical retrieval and integration processes, respectively (Brouwer et al., in press). Moreover, speaker gaze is interpreted as reflecting referential intentions (Staudte & Crocker, 2011).

@inproceedings{Jachmann2017,
title = {The Influence of Speaker's Gaze on Sentence Comprehension: An ERP Investigation},
author = {Torsten Jachmann and Heiner Drenhaus and Maria Staudte and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/325969989_The_Influence_of_Speaker%27s_Gaze_on_Sentence_Comprehension_An_ERP_Investigation},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 39th Annual Conference of the Cognitive Science Society},
pages = {2261-2266},
abstract = {Behavioral studies demonstrate the influence of speaker gaze in visually-situated spoken language comprehension. We present an ERP experiment examining the influence of speaker’s gaze congruency on listeners’ comprehension of referential expressions related to a shared visual scene. We demonstrate that listeners exploit speakers’ gaze toward objects in order to form sentence continuation expectations: Compared to a congruent gaze condition, we observe an increased N400 when (a) the lack of gaze (neutral) does not allow for upcoming noun prediction, and (b) when the noun violates gaze-driven expectations (incongruent). The later also results in a late (sustained) positivity, reflecting the need to update the assumed situation model. We take the combination of the N400 and late positivity as evidence that speaker gaze influences both lexical retrieval and integration processes, respectively (Brouwer et al., in press). Moreover, speaker gaze is interpreted as reflecting referential intentions (Staudte & Crocker, 2011).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Tourtouri, Elli; Delogu, Francesca; Crocker, Matthew W.

Specificity and entropy reduction in situated referential processing Inproceedings

39th Annual Conference of the Cognitive Science Society, Austin, Texas, USA, 2017.

In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing

@inproceedings{Tourtouri2017,
title = {Specificity and entropy reduction in situated referential processing},
author = {Elli Tourtouri and Francesca Delogu and Matthew W. Crocker},
url = {https://www.mpi.nl/publications/item3309545/specificity-and-entropy-reduction-situated-referential-processing},
year = {2017},
date = {2017},
booktitle = {39th Annual Conference of the Cognitive Science Society},
address = {Austin, Texas, USA},
abstract = {In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Sikos, Les; Greenberg, Clayton; Drenhaus, Heiner; Crocker, Matthew W.

Information density of encodings: The role of syntactic variation in comprehension Inproceedings

Proceedings of the 39th Annual Conference of the Cognitive Science Society(CogSci 2017), pp. 3168-3173, Austin, Texas, USA, 2017.

The Uniform Information Density (UID) hypothesis links production strategies with comprehension processes, predicting that speakers will utilize flexibility in encoding in order to increase uniformity in the rate of information transmission, as measured by surprisal (Jaeger, 2010). Evidence in support of UID comes primarily from studies focusing on word-level effects, e.g. demonstrating that surprisal predicts the omission/inclusion of optional words. Here we investigate whether comprehenders are sensitive to the information density of alternative encodings that are more syntactically complex. We manipulated the syntactic encoding of complex noun phrases in German via meaning-preserving pre-nominal and post-nominal modification in contexts that were either predictive or non-predictive. We then used the G-maze reading task to measure online comprehension during self-paced reading. The results are consistent with the UID hypothesis. Length-adjusted reading times were facilitated for pre-nominally modified head nouns, and this effect was larger in non-predictive contexts.

@inproceedings{Sikos2017,
title = {Information density of encodings: The role of syntactic variation in comprehension},
author = {Les Sikos and Clayton Greenberg and Heiner Drenhaus and Matthew W. Crocker},
url = {https://www.semanticscholar.org/paper/Information-density-of-encodings%3A-The-role-of-in-Sikos-Greenberg/06a47324b53bc53e0e4762fd1547091d8b2392f1},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 39th Annual Conference of the Cognitive Science Society(CogSci 2017)},
pages = {3168-3173},
address = {Austin, Texas, USA},
abstract = {The Uniform Information Density (UID) hypothesis links production strategies with comprehension processes, predicting that speakers will utilize flexibility in encoding in order to increase uniformity in the rate of information transmission, as measured by surprisal (Jaeger, 2010). Evidence in support of UID comes primarily from studies focusing on word-level effects, e.g. demonstrating that surprisal predicts the omission/inclusion of optional words. Here we investigate whether comprehenders are sensitive to the information density of alternative encodings that are more syntactically complex. We manipulated the syntactic encoding of complex noun phrases in German via meaning-preserving pre-nominal and post-nominal modification in contexts that were either predictive or non-predictive. We then used the G-maze reading task to measure online comprehension during self-paced reading. The results are consistent with the UID hypothesis. Length-adjusted reading times were facilitated for pre-nominally modified head nouns, and this effect was larger in non-predictive contexts.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Calvillo, Jesús

Fast and Easy: Approximating uniform information density in language production Inproceedings

39th Annual Conference of the Cognitive Science Society, Austin, Texas, USA, 2017.

A model of sentence production is presented, which implements a strategy that produces sentences with more uniform surprisal profiles, as compared to other strategies, and in accordance to the Uniform Information Density Hypothesis (Jaeger, 2006; Levy & Jaeger, 2007). The model operates at the algorithmic level combining information concerning word probabilities and sentence lengths, representing a first attempt to model UID as resulting from underlying factors during language production. The sentences produced by this model showed indeed the expected tendency, having more uniform surprisal profiles and lower average word surprisal, in comparison to other production strategies.

@inproceedings{Calvillo2017,
title = {Fast and Easy: Approximating uniform information density in language production},
author = {Jesús Calvillo},
url = {https://cogsci.mindmodeling.org/2017/papers/0333/paper0333.pdf},
year = {2017},
date = {2017},
publisher = {39th Annual Conference of the Cognitive Science Society},
address = {Austin, Texas, USA},
abstract = {A model of sentence production is presented, which implements a strategy that produces sentences with more uniform surprisal profiles, as compared to other strategies, and in accordance to the Uniform Information Density Hypothesis (Jaeger, 2006; Levy & Jaeger, 2007). The model operates at the algorithmic level combining information concerning word probabilities and sentence lengths, representing a first attempt to model UID as resulting from underlying factors during language production. The sentences produced by this model showed indeed the expected tendency, having more uniform surprisal profiles and lower average word surprisal, in comparison to other production strategies.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C3

Oualil, Youssef

Sequential estimation techniques and application to multiple speaker tracking and language modeling PhD Thesis

Saarland University, Saarbruecken, Germany, 2017.

For many real-word applications, the considered data is given as a time sequence that becomes available in an orderly fashion, where the order incorporates important information about the entities of interest. The work presented in this thesis deals with two such cases by introducing new sequential estimation solutions. More precisely, we introduce a: I. Sequential Bayesian estimation framework to solve the multiple speaker localization, detection and tracking problem. This framework is a complete pipeline that includes 1) new observation estimators, which extract a fixed number of potential locations per time frame; 2) new unsupervised Bayesian detectors, which classify these estimates into noise/speaker classes and 3) new Bayesian filters, which use the speaker class estimates to track multiple speakers.

This framework was developed to tackle the low overlap detection rate of multiple speakers and to reduce the number of constraints generally imposed in standard solutions. II. Sequential neural estimation framework for language modeling, which overcomes some of the shortcomings of standard approaches through merging of different models in a hybrid architecture. That is, we introduce two solutions that tightly merge particular models and then show how a generalization can be achieved through a new mixture model. In order to speed-up the training of large vocabulary language models, we introduce a new extension of the noise contrastive estimation approach to batch training.

@phdthesis{Oualil2017b,
title = {Sequential estimation techniques and application to multiple speaker tracking and language modeling},
author = {Youssef Oualil},
url = {http://nbn-resolving.de/urn:nbn:de:bsz:291-scidok-ds-272280},
doi = {https://doi.org/http://dx.doi.org/10.22028/D291-27228},
year = {2017},
date = {2017},
school = {Saarland University},
address = {Saarbruecken, Germany},
abstract = {For many real-word applications, the considered data is given as a time sequence that becomes available in an orderly fashion, where the order incorporates important information about the entities of interest. The work presented in this thesis deals with two such cases by introducing new sequential estimation solutions. More precisely, we introduce a: I. Sequential Bayesian estimation framework to solve the multiple speaker localization, detection and tracking problem. This framework is a complete pipeline that includes 1) new observation estimators, which extract a fixed number of potential locations per time frame; 2) new unsupervised Bayesian detectors, which classify these estimates into noise/speaker classes and 3) new Bayesian filters, which use the speaker class estimates to track multiple speakers. This framework was developed to tackle the low overlap detection rate of multiple speakers and to reduce the number of constraints generally imposed in standard solutions. II. Sequential neural estimation framework for language modeling, which overcomes some of the shortcomings of standard approaches through merging of different models in a hybrid architecture. That is, we introduce two solutions that tightly merge particular models and then show how a generalization can be achieved through a new mixture model. In order to speed-up the training of large vocabulary language models, we introduce a new extension of the noise contrastive estimation approach to batch training.},
pubstate = {published},
type = {phdthesis}
}

Copy BibTeX to Clipboard

Project:   B4

Oualil, Youssef; Klakow, Dietrich

A neural network approach for mixing language models Inproceedings

ICASSP 2017, 2017.

The performance of Neural Network (NN)-based language models is steadily improving due to the emergence of new architectures, which are able to learn different natural language characteristics. This paper presents a novel framework, which shows that a significant improvement can be achieved by combining different existing heterogeneous models in a single architecture. This is done through 1) a feature layer, which separately learns different NN-based models and 2) a mixture layer, which merges the resulting model features. In doing so, this architecture benefits from the learning capabilities of each model with no noticeable increase in the number of model parameters or the training time. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.

@inproceedings{Oualil2017b,
title = {A neural network approach for mixing language models},
author = {Youssef Oualil and Dietrich Klakow},
url = {https://arxiv.org/abs/1708.06989},
year = {2017},
date = {2017},
publisher = {ICASSP 2017},
abstract = {The performance of Neural Network (NN)-based language models is steadily improving due to the emergence of new architectures, which are able to learn different natural language characteristics. This paper presents a novel framework, which shows that a significant improvement can be achieved by combining different existing heterogeneous models in a single architecture. This is done through 1) a feature layer, which separately learns different NN-based models and 2) a mixture layer, which merges the resulting model features. In doing so, this architecture benefits from the learning capabilities of each model with no noticeable increase in the number of model parameters or the training time. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Singh, Mittul; Oualil, Youssef; Klakow, Dietrich

Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition Inproceedings

18th Annual Conference of the International Speech Communication Association (INTERSPEECH), Stockholm, Sweden, 2017.

Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the first pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, as a second step in the decoding process. Ideally, these adapted RNNLMs should be applied for first-pass decoding. Thus, we introduce two ways of applying adapted long-short-term-memory (LSTM) based RNNLMs for first-pass decoding. Using available techniques to convert LSTMs to approximated versions for first-pass decoding, we compare approximated LSTMs adapted in a Fast Marginal Adaptation framework (FMA) and an approximated version of architecture-based-adaptation of LSTM. On a conversational speech recognition task, these differently approximated and adapted LSTMs combined with a trigram LM outperform other adapted and unadapted LMs. Here, the architecture-adapted LSTM combination obtains a 35.9 % word error rate (WER) and is outperformed by FMA-based LSTM combination obtaining the overall lowest WER of 34.4 %

@inproceedings{Singh2017,
title = {Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition},
author = {Mittul Singh and Youssef Oualil and Dietrich Klakow},
url = {https://www.researchgate.net/publication/319185101_Approximated_and_Domain-Adapted_LSTM_Language_Models_for_First-Pass_Decoding_in_Speech_Recognition},
year = {2017},
date = {2017},
publisher = {18th Annual Conference of the International Speech Communication Association (INTERSPEECH)},
address = {Stockholm, Sweden},
abstract = {Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the first pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, as a second step in the decoding process. Ideally, these adapted RNNLMs should be applied for first-pass decoding. Thus, we introduce two ways of applying adapted long-short-term-memory (LSTM) based RNNLMs for first-pass decoding. Using available techniques to convert LSTMs to approximated versions for first-pass decoding, we compare approximated LSTMs adapted in a Fast Marginal Adaptation framework (FMA) and an approximated version of architecture-based-adaptation of LSTM. On a conversational speech recognition task, these differently approximated and adapted LSTMs combined with a trigram LM outperform other adapted and unadapted LMs. Here, the architecture-adapted LSTM combination obtains a 35.9 % word error rate (WER) and is outperformed by FMA-based LSTM combination obtaining the overall lowest WER of 34.4 %},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Klakow, Dietrich; Trost, Thomas

Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings Inproceedings

Proceedings of TextGraphs-11: Graph-based Methods for Natural Language Processing (Workshop at ACL 2017), Association for Computational Linguistics, pp. 30-38, Vancouver, Canada, 2017.

Word embeddings are high-dimensional vector representations of words and are thus difficult to interpret. In order to deal with this, we introduce an unsupervised parameter free method for creating a hierarchical graphical clustering of the full ensemble of word vectors and show that this structure is a geometrically meaningful representation of the original relations between the words. This newly obtained representation can be used for better understanding and thus improving the embedding algorithm and exhibits semantic meaning, so it can also be utilized in a variety of language processing tasks like categorization or measuring similarity.

@inproceedings{TroKla2017,
title = {Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings},
author = {Dietrich Klakow and Thomas Trost},
url = {https://aclanthology.org/W17-2404},
doi = {https://doi.org/10.18653/v1/W17-2404"},
year = {2017},
date = {2017},
booktitle = {Proceedings of TextGraphs-11: Graph-based Methods for Natural Language Processing (Workshop at ACL 2017)},
pages = {30-38},
publisher = {Association for Computational Linguistics},
address = {Vancouver, Canada},
abstract = {Word embeddings are high-dimensional vector representations of words and are thus difficult to interpret. In order to deal with this, we introduce an unsupervised parameter free method for creating a hierarchical graphical clustering of the full ensemble of word vectors and show that this structure is a geometrically meaningful representation of the original relations between the words. This newly obtained representation can be used for better understanding and thus improving the embedding algorithm and exhibits semantic meaning, so it can also be utilized in a variety of language processing tasks like categorization or measuring similarity.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B4

Lemke, Tyll Robin; Horch, Eva; Reich, Ingo

Optimal encoding! - Information Theory constrains article omission in newspaper headlines Inproceedings

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Association for Computational Linguistics, pp. 131-135, Valencia, Spain, 2017.

In this paper we pursue the hypothesis that the distribution of article omission specifically is constrained by principles of Information Theory (Shannon 1948). In particular, Information Theory predicts a stronger preference for article omission before nouns which are relatively unpredictable in context of the preceding words. We investigated article omission in German newspaper headlines with a corpus and acceptability rating study. Both support our hypothesis: Articles are inserted more often before unpredictable nouns and subjects perceive article omission before predictable nouns as more well-formed than before unpredictable ones. This suggests that information theoretic principles constrain the distribution of article omission in headlines.

@inproceedings{LemkeHorchReich:17,
title = {Optimal encoding! - Information Theory constrains article omission in newspaper headlines},
author = {Tyll Robin Lemke and Eva Horch and Ingo Reich},
url = {https://www.aclweb.org/anthology/E17-2021},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
pages = {131-135},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {In this paper we pursue the hypothesis that the distribution of article omission specifically is constrained by principles of Information Theory (Shannon 1948). In particular, Information Theory predicts a stronger preference for article omission before nouns which are relatively unpredictable in context of the preceding words. We investigated article omission in German newspaper headlines with a corpus and acceptability rating study. Both support our hypothesis: Articles are inserted more often before unpredictable nouns and subjects perceive article omission before predictable nouns as more well-formed than before unpredictable ones. This suggests that information theoretic principles constrain the distribution of article omission in headlines.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Lemke, Tyll Robin

Sentential or not? - An experimental investigation on the syntax of fragments Inproceedings

Proceedings of Linguistic Evidence 2016, Tübingen, 2017.
This paper presents four experiments on the syntactic structure of fragments, i.e. nonsentential utterances with propositional meaning and illocutionary force (Morgan, 1973). The experiments evaluate the predictions of two competing theories of fragments: Merchant’s (2004) movement and deletion account and Barton & Progovac’s (2005) nonsentential account. Experiment 1 provides evidence for case connectivity effects, this suggests that there is indeed unarticulated linguistic structure in fragments (unlike argued by Barton & Progovac 2005). Experiments 2-4 address a central prediction of the movement and deletion account: only those constituents which may appear in the left periphery are possible fragments. Merchant et al. (2013) present two studies on preposition stranding and complement clause topicalization in favor of this. My experiments 2-4 replicate and extend these studies in German and English. Taken together, the acceptability pattern predicted by Merchant (2004) holds only for the preposition stranding data (exp. 2), but not for complement clauses (exp.3) or German multiple prefield constituents (exp.4).

@inproceedings{Lemke-toappear,
title = {Sentential or not? - An experimental investigation on the syntax of fragments},
author = {Tyll Robin Lemke},
url = {https://publikationen.uni-tuebingen.de/xmlui/handle/10900/77657},
doi = {https://doi.org/10.15496/publikation-19058},
year = {2017},
date = {2017},
booktitle = {Proceedings of Linguistic Evidence 2016},
address = {T{\"u}bingen},
abstract = {

This paper presents four experiments on the syntactic structure of fragments, i.e. nonsentential utterances with propositional meaning and illocutionary force (Morgan, 1973). The experiments evaluate the predictions of two competing theories of fragments: Merchant's (2004) movement and deletion account and Barton & Progovac's (2005) nonsentential account. Experiment 1 provides evidence for case connectivity effects, this suggests that there is indeed unarticulated linguistic structure in fragments (unlike argued by Barton & Progovac 2005). Experiments 2-4 address a central prediction of the movement and deletion account: only those constituents which may appear in the left periphery are possible fragments. Merchant et al. (2013) present two studies on preposition stranding and complement clause topicalization in favor of this. My experiments 2-4 replicate and extend these studies in German and English. Taken together, the acceptability pattern predicted by Merchant (2004) holds only for the preposition stranding data (exp. 2), but not for complement clauses (exp.3) or German multiple prefield constituents (exp.4).
},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B3

Reich, Ingo

On the omission of articles and copulae in German newspaper headlines Journal Article

Linguistic Variation, 17, pp. 186-204, 2017.

This paper argues based on a corpus-linguistic study that both omitted articles and copulae in German headlines are to be treated as null elements NA and NC. Both items need to be licensed by a specific (parsing) strategy known as discourse orientation (Huang, 1984), which is also applicable in the special register of headlines. It is shown that distinguishing between discourse and sentence orientation and correlating these two strategies with λ-binding and existential quantification, respectively, naturally accounts for an asymmetry in article omission observed in Stowell (1991).

@article{Reich-inpress,
title = {On the omission of articles and copulae in German newspaper headlines},
author = {Ingo Reich},
url = {https://benjamins.com/catalog/lv.14017.rei},
doi = {https://doi.org/https://doi.org/10.1075/lv.14017.rei},
year = {2017},
date = {2017},
journal = {Linguistic Variation},
pages = {186-204},
volume = {17},
number = {2},
abstract = {

This paper argues based on a corpus-linguistic study that both omitted articles and copulae in German headlines are to be treated as null elements NA and NC. Both items need to be licensed by a specific (parsing) strategy known as discourse orientation (Huang, 1984), which is also applicable in the special register of headlines. It is shown that distinguishing between discourse and sentence orientation and correlating these two strategies with λ-binding and existential quantification, respectively, naturally accounts for an asymmetry in article omission observed in Stowell (1991).

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B3

Hoek, Jet; Scholman, Merel

Evaluating discourse annotation: Some recent insights and new approaches Inproceedings

Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation (ISA-13), 2017.

Annotated data is an important resource for the linguistics community, which is why researchers need to be sure that such data are reliable. However, arriving at sufficiently reliable annotations appears to be an issue within the field of discourse, possibly due to the fact that coherence is a mental phenomenon rather than a textual one. In this paper, we discuss recent insights and developments regarding annotation and reliability evaluation that are relevant to the field of discourse. We focus on characteristics of coherence that impact reliability scores and look at how different measures are affected by this. We discuss benefits and disadvantages of these measures, and propose that discourse annotation results be accompanied by a detailed report of the annotation process and data, as well as a careful consideration of the reliability measure that is applied.

@inproceedings{hoek2017evaluating,
title = {Evaluating discourse annotation: Some recent insights and new approaches},
author = {Jet Hoek and Merel Scholman},
url = {https://aclanthology.org/W17-7401},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation (ISA-13)},
abstract = {Annotated data is an important resource for the linguistics community, which is why researchers need to be sure that such data are reliable. However, arriving at sufficiently reliable annotations appears to be an issue within the field of discourse, possibly due to the fact that coherence is a mental phenomenon rather than a textual one. In this paper, we discuss recent insights and developments regarding annotation and reliability evaluation that are relevant to the field of discourse. We focus on characteristics of coherence that impact reliability scores and look at how different measures are affected by this. We discuss benefits and disadvantages of these measures, and propose that discourse annotation results be accompanied by a detailed report of the annotation process and data, as well as a careful consideration of the reliability measure that is applied.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Shi, Wei; Yung, Frances Pik Yu; Rubino, Raphael; Demberg, Vera

Using explicit discourse connectives in translation for implicit discourse relation classification Inproceedings

Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Asian Federation of Natural Language Processing, pp. 484-495, Taipei, Taiwan, 2017.

Implicit discourse relation recognition is an extremely challenging task due to the lack of indicative connectives. Various neural network architectures have been proposed for this task recently, but most of them suffer from the shortage of labeled data. In this paper, we address this problem by procuring additional training data from parallel corpora: When humans translate a text, they sometimes add connectives (a process known as explicitation). We automatically back-translate it into an English connective and use it to infer a label with high confidence. We show that a training set several times larger than the original training set can be generated this way. With the extra labeled instances, we show that even a simple bidirectional Long Short-Term Memory Network can outperform the current state-of-the-art.

@inproceedings{Shi2017b,
title = {Using explicit discourse connectives in translation for implicit discourse relation classification},
author = {Wei Shi and Frances Pik Yu Yung and Raphael Rubino and Vera Demberg},
url = {https://aclanthology.org/I17-1049},
year = {2017},
date = {2017},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
pages = {484-495},
publisher = {Asian Federation of Natural Language Processing},
address = {Taipei, Taiwan},
abstract = {Implicit discourse relation recognition is an extremely challenging task due to the lack of indicative connectives. Various neural network architectures have been proposed for this task recently, but most of them suffer from the shortage of labeled data. In this paper, we address this problem by procuring additional training data from parallel corpora: When humans translate a text, they sometimes add connectives (a process known as explicitation). We automatically back-translate it into an English connective and use it to infer a label with high confidence. We show that a training set several times larger than the original training set can be generated this way. With the extra labeled instances, we show that even a simple bidirectional Long Short-Term Memory Network can outperform the current state-of-the-art.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Shi, Wei; Demberg, Vera

On the need of cross validation for discourse relation classification Inproceedings

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Association for Computational Linguistics, pp. 150-156, Valencia, Spain, 2017.

The task of implicit discourse relation classification has received increased attention in recent years, including two CoNNL shared tasks on the topic. Existing machine learning models for the task train on sections 2-21 of the PDTB and test on section 23, which includes a total of 761 implicit discourse relations. In this paper, we{‚}d like to make a methodological point, arguing that the standard test set is too small to draw conclusions about whether the inclusion of certain features constitute a genuine improvement, or whether one got lucky with some properties of the test set, and argue for the adoption of cross validation for the discourse relation classification task by the community.

@inproceedings{Shi2017,
title = {On the need of cross validation for discourse relation classification},
author = {Wei Shi and Vera Demberg},
url = {https://aclanthology.org/E17-2024},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
pages = {150-156},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {The task of implicit discourse relation classification has received increased attention in recent years, including two CoNNL shared tasks on the topic. Existing machine learning models for the task train on sections 2-21 of the PDTB and test on section 23, which includes a total of 761 implicit discourse relations. In this paper, we{'}d like to make a methodological point, arguing that the standard test set is too small to draw conclusions about whether the inclusion of certain features constitute a genuine improvement, or whether one got lucky with some properties of the test set, and argue for the adoption of cross validation for the discourse relation classification task by the community.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Scholman, Merel; Rohde, Hannah; Demberg, Vera

"On the one hand" as a cue to anticipate upcoming discourse structure Journal Article

Journal of Memory and Language, 97, pp. 47-60, 2017.

Research has shown that people anticipate upcoming linguistic content, but most work to date has focused on relatively short-range expectation-driven processes within the current sentence or between adjacent sentences. We use the discourse marker On the one hand to test whether comprehenders maintain expectations regarding upcoming content in discourse representations that span multiple sentences. Three experiments show that comprehenders anticipate more than just On the other hand; rather, they keep track of embedded constituents and establish non-local dependencies. Our results show that comprehenders disprefer a subsequent contrast marked with On the other hand when a passage has already provided intervening content that establishes an appropriate contrast with On the one hand. Furthermore, comprehenders maintain their expectation for an upcoming contrast across intervening material, even if the embedded constituent itself contains contrast. The results are taken to support expectation-driven models of processing in which comprehenders posit and maintain structural representations of discourse structure.

@article{Merel2017,
title = {"On the one hand" as a cue to anticipate upcoming discourse structure},
author = {Merel Scholman and Hannah Rohde and Vera Demberg},
url = {https://www.sciencedirect.com/science/article/pii/S0749596X17300566},
year = {2017},
date = {2017},
journal = {Journal of Memory and Language},
pages = {47-60},
volume = {97},
abstract = {

Research has shown that people anticipate upcoming linguistic content, but most work to date has focused on relatively short-range expectation-driven processes within the current sentence or between adjacent sentences. We use the discourse marker On the one hand to test whether comprehenders maintain expectations regarding upcoming content in discourse representations that span multiple sentences. Three experiments show that comprehenders anticipate more than just On the other hand; rather, they keep track of embedded constituents and establish non-local dependencies. Our results show that comprehenders disprefer a subsequent contrast marked with On the other hand when a passage has already provided intervening content that establishes an appropriate contrast with On the one hand. Furthermore, comprehenders maintain their expectation for an upcoming contrast across intervening material, even if the embedded constituent itself contains contrast. The results are taken to support expectation-driven models of processing in which comprehenders posit and maintain structural representations of discourse structure.

},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Scholman, Merel; Demberg, Vera

Examples and specifications that prove a point: Distinguishing between elaborative and argumentative discourse relations Journal Article

Dialogue and Discourse, 8, pp. 53-86, 2017.
Examples and specifications occur frequently in text, but not much is known about how how readers interpret them. Looking at how they’re annotated in existing discourse corpora, we find that anno-tators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate / specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation.

@article{Scholman2017,
title = {Examples and specifications that prove a point: Distinguishing between elaborative and argumentative discourse relations},
author = {Merel Scholman and Vera Demberg},
url = {https://www.researchgate.net/publication/318569668_Examples_and_Specifications_that_Prove_a_Point_Identifying_Elaborative_and_Argumentative_Discourse_Relations},
year = {2017},
date = {2017},
journal = {Dialogue and Discourse},
pages = {53-86},
volume = {8},
number = {2},
abstract = {

Examples and specifications occur frequently in text, but not much is known about how how readers interpret them. Looking at how they're annotated in existing discourse corpora, we find that anno-tators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate / specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Scholman, Merel; Demberg, Vera

Crowdsourcing discourse interpretations: On the influence of context and the reliability of a connective insertion task Inproceedings

Proceedings of the 11th Linguistic Annotation Workshop, Association for Computational Linguistics, pp. 24-33, Valencia, Spain, 2017.

Traditional discourse annotation tasks are considered costly and time-consuming, and the reliability and validity of these tasks is in question. In this paper, we investigate whether crowdsourcing can be used to obtain reliable discourse relation annotations. We also examine the influence of context on the reliability of the data. The results of a crowdsourced connective insertion task showed that the method can be used to obtain reliable annotations: The majority of the inserted connectives converged with the original label. Further, the method is sensitive to the fact that multiple senses can often be inferred for a single relation. Regarding the presence of context, the results show no significant difference in distributions of insertions between conditions overall. However, a by-item comparison revealed several characteristics of segments that determine whether the presence of context makes a difference in annotations. The findings discussed in this paper can be taken as evidence that crowdsourcing can be used as a valuable method to obtain insights into the sense(s) of relations.

@inproceedings{Scholman2017,
title = {Crowdsourcing discourse interpretations: On the influence of context and the reliability of a connective insertion task},
author = {Merel Scholman and Vera Demberg},
url = {https://aclanthology.org/W17-0803},
doi = {https://doi.org/10.18653/v1/W17-0803},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 11th Linguistic Annotation Workshop},
pages = {24-33},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {Traditional discourse annotation tasks are considered costly and time-consuming, and the reliability and validity of these tasks is in question. In this paper, we investigate whether crowdsourcing can be used to obtain reliable discourse relation annotations. We also examine the influence of context on the reliability of the data. The results of a crowdsourced connective insertion task showed that the method can be used to obtain reliable annotations: The majority of the inserted connectives converged with the original label. Further, the method is sensitive to the fact that multiple senses can often be inferred for a single relation. Regarding the presence of context, the results show no significant difference in distributions of insertions between conditions overall. However, a by-item comparison revealed several characteristics of segments that determine whether the presence of context makes a difference in annotations. The findings discussed in this paper can be taken as evidence that crowdsourcing can be used as a valuable method to obtain insights into the sense(s) of relations.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Rutherford, Attapol; Demberg, Vera; Xue, Nianwen

A systematic study of neural discourse models for implicit discourse relation Inproceedings

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long PapersDialogue and Discourse, Association for Computational Linguistics, pp. 281-291, Valencia, Spain, 2017.

Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Many neural network models have been proposed to tackle this problem. However, the comparison for this task is not unified, so we could hardly draw clear conclusions about the effectiveness of various architectures. Here, we propose neural network models that are based on feedforward and long-short term memory architecture and systematically study the effects of varying structures. To our surprise, the best-configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Further, we compare our best feedforward system with competitive convolutional and recurrent networks and find that feedforward can actually be more effective. For the first time for this task, we compile and publish outputs from previous neural and non-neural systems to establish the standard for further comparison.

@inproceedings{Rutherford2017,
title = {A systematic study of neural discourse models for implicit discourse relation},
author = {Attapol Rutherford and Vera Demberg and Nianwen Xue},
url = {https://aclanthology.org/E17-1027},
year = {2017},
date = {2017},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
pages = {281-291},
publisher = {Association for Computational Linguistics},
address = {Valencia, Spain},
abstract = {Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Many neural network models have been proposed to tackle this problem. However, the comparison for this task is not unified, so we could hardly draw clear conclusions about the effectiveness of various architectures. Here, we propose neural network models that are based on feedforward and long-short term memory architecture and systematically study the effects of varying structures. To our surprise, the best-configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Further, we compare our best feedforward system with competitive convolutional and recurrent networks and find that feedforward can actually be more effective. For the first time for this task, we compile and publish outputs from previous neural and non-neural systems to establish the standard for further comparison.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B2

Evers-Vermeul, Jacqueline; Hoek, Jet; Scholman, Merel

On temporality in discourse annotation: Theoretical and practical considerations Journal Article

Dialogue and Discourse, 8, pp. 1-20, 2017.
Temporal information is one of the prominent features that determine the coherence in a discourse. That is why we need an adequate way to deal with this type of information during discourse annotation. In this paper, we will argue that temporal order is a relational rather than a segment-specific property, and that it is a cognitively plausible notion: temporal order is expressed in the system of linguistic markers and is relevant in both acquisition and language processing. This means that temporal relations meet the requirements set by the Cognitive approach of Coherence Relations (CCR) to be considered coherence relations, and that CCR would need a way to distinguish temporal relations within its annotation system. We will present merits and drawbacks of different options of reaching this objective and argue in favor of adding temporal order as a new dimension to CCR.

@article{Vermeul2017,
title = {On temporality in discourse annotation: Theoretical and practical considerations},
author = {Jacqueline Evers-Vermeul and Jet Hoek and Merel Scholman},
url = {https://journals.uic.edu/ojs/index.php/dad/article/view/10777},
doi = {https://doi.org/10.5087/dad.2017.201},
year = {2017},
date = {2017},
journal = {Dialogue and Discourse},
pages = {1-20},
volume = {8},
number = {2},
abstract = {

Temporal information is one of the prominent features that determine the coherence in a discourse. That is why we need an adequate way to deal with this type of information during discourse annotation. In this paper, we will argue that temporal order is a relational rather than a segment-specific property, and that it is a cognitively plausible notion: temporal order is expressed in the system of linguistic markers and is relevant in both acquisition and language processing. This means that temporal relations meet the requirements set by the Cognitive approach of Coherence Relations (CCR) to be considered coherence relations, and that CCR would need a way to distinguish temporal relations within its annotation system. We will present merits and drawbacks of different options of reaching this objective and argue in favor of adding temporal order as a new dimension to CCR.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   B2

Degaetano-Ortlieb, Stefania

Variation in language use across social variables: a data-driven approach Inproceedings

Proceedings of the Corpus and Language Variation in English Research Conference (CLAVIER), Bari, Italy, 2017.

We present a data-driven approach to study language use over time according to social variables (henceforth SV), considering also interactions between different variables. Besides sociolinguistic studies on language variation according to SVs (e.g., Weinreich et al. 1968, Bernstein 1971, Eckert 1989, Milroy and Milroy 1985), recently computational approaches have gained prominence (see e.g., Eisenstein 2015, Danescu-Niculescu-Mizil et al. 2013, and Nguyen et al. 2017 for an overview), not least due to an increase in data availability based on social media and an increasing awareness of the importance of linguistic variation according to SVs in the NLP community.

@inproceedings{Degaetano-Ortlieb2017b,
title = {Variation in language use across social variables: a data-driven approach},
author = {Stefania Degaetano-Ortlieb},
url = {https://stefaniadegaetano.files.wordpress.com/2017/07/clavier2017_slingpro_accepted.pdf},
year = {2017},
date = {2017},
booktitle = {Proceedings of the Corpus and Language Variation in English Research Conference (CLAVIER)},
address = {Bari, Italy},
abstract = {We present a data-driven approach to study language use over time according to social variables (henceforth SV), considering also interactions between different variables. Besides sociolinguistic studies on language variation according to SVs (e.g., Weinreich et al. 1968, Bernstein 1971, Eckert 1989, Milroy and Milroy 1985), recently computational approaches have gained prominence (see e.g., Eisenstein 2015, Danescu-Niculescu-Mizil et al. 2013, and Nguyen et al. 2017 for an overview), not least due to an increase in data availability based on social media and an increasing awareness of the importance of linguistic variation according to SVs in the NLP community.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   B1

Successfully