Publications - SFB 1102

Modi, Ashutosh; Anikina, Tatjana; Ostermann, Simon; Pinkal, Manfred

InScript: Narrative texts annotated with script information Inproceedings

Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios (Ed.): Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), pp. 3485-3493, Portorož, Slovenia, 2016, ISBN 978-2-9517408-9-1.

Abstract
|
Links
|
BibTeX

This paper presents the InScript corpus (Narrative Texts Instantiating Script structure). InScript is a corpus of 1,000 stories centered around 10 different scenarios. Verbs and noun phrases are annotated with event and participant types, respectively. Additionally, the text is annotated with coreference information. The corpus shows rich lexical variation and will serve as a unique resource for the study of the role of script knowledge in natural language processing.

352_Paper (0.62MB)
https://aclanthology.org/L16-1555

@inproceedings{MODI16.352,
title = {InScript: Narrative texts annotated with script information},
author = {Ashutosh Modi and Tatjana Anikina and Simon Ostermann and Manfred Pinkal},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
url = {https://aclanthology.org/L16-1555},
year = {2016},
date = {2016-10-17},
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
isbn = {978-2-9517408-9-1},
pages = {3485-3493},
publisher = {European Language Resources Association (ELRA)},
address = {Portoro{\v{z}, Slovenia},
abstract = {This paper presents the InScript corpus (Narrative Texts Instantiating Script structure). InScript is a corpus of 1,000 stories centered around 10 different scenarios. Verbs and noun phrases are annotated with event and participant types, respectively. Additionally, the text is annotated with coreference information. The corpus shows rich lexical variation and will serve as a unique resource for the study of the role of script knowledge in natural language processing.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: A3

Venhuizen, Noortje; Brouwer, Harm; Crocker, Matthew W.

When the food arrives before the menu: Modeling event-driven surprisal in language comprehension Inproceedings

29th CUNY conference on Human Sentence Processing, Events in Language and Cognition workshops, University of Florida, 2016.

Abstract
|
Links
|
BibTeX

We present a neurocomputational—recurrent artificial neural network—model of language processing that integrates linguistic knowledge and world/event knowledge, and that produces word surprisal estimates that take into account both. Our model constructs a cognitively motivated situation model of the state-of-the-affairs as described by a sentence. Critically, these situation model representations inherently encode world/event knowledge. We show that the surprisal estimates that our model produces reflect both linguistic surprisal as well as surprisal that is driven by knowledge about structured events. We outline how we can employ the model to explore the interaction between these types of knowledge in online language processing.

https://www.researchgate.net/publication/321621784_When_the_food_arrives_before_the_menu_Modeling_event-driven_surprisal_in_language_comprehension

@inproceedings{Venhuizen2016,
title = {When the food arrives before the menu: Modeling event-driven surprisal in language comprehension},
author = {Noortje Venhuizen and Harm Brouwer and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/321621784_When_the_food_arrives_before_the_menu_Modeling_event-driven_surprisal_in_language_comprehension},
year = {2016},
date = {2016},
booktitle = {29th CUNY conference on Human Sentence Processing},
publisher = {Events in Language and Cognition workshops},
address = {University of Florida},
abstract = {

We present a neurocomputational—recurrent artificial neural network—model of language processing that integrates linguistic knowledge and world/event knowledge, and that produces word surprisal estimates that take into account both. Our model constructs a cognitively motivated situation model of the state-of-the-affairs as described by a sentence. Critically, these situation model representations inherently encode world/event knowledge. We show that the surprisal estimates that our model produces reflect both linguistic surprisal as well as surprisal that is driven by knowledge about structured events. We outline how we can employ the model to explore the interaction between these types of knowledge in online language processing.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: A1

Rabs, Elisabeth; Drenhaus, Heiner; Delogu, Francesca; Crocker, Matthew W.

Reading between the lines: The influence of script knowledge on on-line comprehension Inproceedings

29th CUNY conference on Human Sentence Processing, Events in Language and Cognition workshops, University of Florida, 2016.

Abstract
|
Links
|
BibTeX

While the influence of linguistic context on language processing has been extensively studied, less is known about the mental representation, structure and use of so-called script knowledge. Scripts are defined as a person’s knowledge about temporally and causally ordered sequences of events. They are often activated by linguistic context, but otherwise left implicit. In two ERP studies we examine how such non-linguistic event knowledge influences predictive language processing beyond what linguistic prediction or lexical priming alone can explain. Specifically, we find evidence for a decrease in N400 amplitude – known to reflect a word’s unexpectedness – for target nouns consistent with events that are expected according to script knowledge. Experiment 1 focuses on differentiating the relative contribution of lexical priming and script knowledge. Assuming the temporal structure of scripts is accessible and used for prediction, but does not alter any influence of priming, we inserted temporal shifts affecting the plausibility of the critical object. Results from Exp. 1 suggest that, even after a large temporal shift, a script-fitting object noun is still easier to process than a neutral one. One reason for this may be that the temporal shift used in Exp. 1 was not salient enough to completely deactivate a script. Experiment 2, for which data is currently being collected, explores how script knowledge is used when context provides two scripts. One script is active, and thus expected to influence processing of target nouns to a greater extent. By demonstrating that minimal linguistic material is sufficient to rapidly activate detailed script knowledge and make it accessible for language processing, we conclude that scripts provide an interesting method to investigate the interaction of non-linguistic knowledge in on-line comprehension. Specifically, drawing on aspects of their temporal and hierarchical structure we hope to further explore the role of implicit causal, temporal, and spatial relations in language comprehension.

https://www.researchgate.net/publication/320988696_Reading_Between_the_Lines_The_Influence_of_Script_Knowledge_on_On-Line_Comprehension

@inproceedings{Rabs2016,
title = {Reading between the lines: The influence of script knowledge on on-line comprehension},
author = {Elisabeth Rabs and Heiner Drenhaus and Francesca Delogu and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/320988696_Reading_Between_the_Lines_The_Influence_of_Script_Knowledge_on_On-Line_Comprehension},
year = {2016},
date = {2016},
booktitle = {29th CUNY conference on Human Sentence Processing},
publisher = {Events in Language and Cognition workshops},
address = {University of Florida},
abstract = {

While the influence of linguistic context on language processing has been extensively studied, less is known about the mental representation, structure and use of so-called script knowledge. Scripts are defined as a person’s knowledge about temporally and causally ordered sequences of events. They are often activated by linguistic context, but otherwise left implicit. In two ERP studies we examine how such non-linguistic event knowledge influences predictive language processing beyond what linguistic prediction or lexical priming alone can explain. Specifically, we find evidence for a decrease in N400 amplitude - known to reflect a word’s unexpectedness - for target nouns consistent with events that are expected according to script knowledge. Experiment 1 focuses on differentiating the relative contribution of lexical priming and script knowledge. Assuming the temporal structure of scripts is accessible and used for prediction, but does not alter any influence of priming, we inserted temporal shifts affecting the plausibility of the critical object. Results from Exp. 1 suggest that, even after a large temporal shift, a script-fitting object noun is still easier to process than a neutral one. One reason for this may be that the temporal shift used in Exp. 1 was not salient enough to completely deactivate a script. Experiment 2, for which data is currently being collected, explores how script knowledge is used when context provides two scripts. One script is active, and thus expected to influence processing of target nouns to a greater extent. By demonstrating that minimal linguistic material is sufficient to rapidly activate detailed script knowledge and make it accessible for language processing, we conclude that scripts provide an interesting method to investigate the interaction of non-linguistic knowledge in on-line comprehension. Specifically, drawing on aspects of their temporal and hierarchical structure we hope to further explore the role of implicit causal, temporal, and spatial relations in language comprehension.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: A1

Le Maguer, Sébastien; Steiner, Ingmar

The MaryTTS entry for the Blizzard Challenge 2016 Inproceedings

Blizzard Challenge, Cupertino, CA, USA, 2016.

Abstract
|
Links
|
BibTeX

The MaryTTS system is a modular architecture text-to-speech (TTS) system whose development started around 15 years ago. This paper presents the MaryTTS entry for the Blizzard Challenge 2016. For this entry, we used the default configuration of MaryTTS based on the unit selection paradigm.

However, the architecture is currently undergoing a massive refactoring process in order to provide a more fully modular system. This will allow researchers to focus only on some part of the synthesis process. The current participation objective includes assessing the current baseline quality in order to evaluate any future improvements. These can be achieved more easily thanks to a more flexible and robust architecture. The results obtained in this challenge prove that our system is not obsolete, but improvements need to be made to maintain it in the state of the art in the future.

@inproceedings{LeMaguer2016BC,
title = {The MaryTTS entry for the Blizzard Challenge 2016},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner},
url = {https://www.semanticscholar.org/paper/The-MaryTTS-entry-for-the-Blizzard-Challenge-2016-Maguer-Steiner/62e04ad78ba1a531e419bea25cb9eb8799aaf07e},
year = {2016},
date = {2016-09-16},
booktitle = {Blizzard Challenge},
address = {Cupertino, CA, USA},
abstract = {The MaryTTS system is a modular architecture text-to-speech (TTS) system whose development started around 15 years ago. This paper presents the MaryTTS entry for the Blizzard Challenge 2016. For this entry, we used the default configuration of MaryTTS based on the unit selection paradigm. However, the architecture is currently undergoing a massive refactoring process in order to provide a more fully modular system. This will allow researchers to focus only on some part of the synthesis process. The current participation objective includes assessing the current baseline quality in order to evaluate any future improvements. These can be achieved more easily thanks to a more flexible and robust architecture. The results obtained in this challenge prove that our system is not obsolete, but improvements need to be made to maintain it in the state of the art in the future.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: C5

Oualil, Youssef; Singh, Mittul; Greenberg, Clayton; Klakow, Dietrich

Long-short range context neural networks for language models Inproceedings

EMLP 2016, 2016.

Abstract
|
Links
|
BibTeX

The goal of language modeling techniques is to capture the statistical and structural properties of natural languages from training corpora. This task typically involves the learning of short range dependencies, which generally model the syntactic properties of a language and/or long range dependencies, which are semantic in nature. We propose in this paper a new multi-span architecture, which separately models the short and long context information while it dynamically merges them to perform the language modeling task. This is done through a novel recurrent Long-Short Range Context (LSRC) network, which explicitly models the local (short) and global (long) context using two separate hidden states that evolve in time. This new architecture is an adaptation of the Long-Short Term Memory network (LSTM) to take into account the linguistic properties. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art language modeling techniques.

https://aclanthology.org/D16-1154/

@inproceedings{Oualil2016,
title = {Long-short range context neural networks for language models},
author = {Youssef Oualil and Mittul Singh and Clayton Greenberg and Dietrich Klakow},
url = {https://aclanthology.org/D16-1154/},
year = {2016},
date = {2016},
publisher = {EMLP 2016},
abstract = {The goal of language modeling techniques is to capture the statistical and structural properties of natural languages from training corpora. This task typically involves the learning of short range dependencies, which generally model the syntactic properties of a language and/or long range dependencies, which are semantic in nature. We propose in this paper a new multi-span architecture, which separately models the short and long context information while it dynamically merges them to perform the language modeling task. This is done through a novel recurrent Long-Short Range Context (LSRC) network, which explicitly models the local (short) and global (long) context using two separate hidden states that evolve in time. This new architecture is an adaptation of the Long-Short Term Memory network (LSTM) to take into account the linguistic properties. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art language modeling techniques.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: B4

Schneegass, Stefan; Oualil, Youssef; Bulling, Andreas

SkullConduct: Biometric User Identification on Eyewear Computers Using Bone Conduction Through the Skull Inproceedings

Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI '16, ACM, pp. 1379-1384, New York, NY, USA, 2016, ISBN 978-1-4503-3362-7.

Abstract
|
Links
|
BibTeX

Secure user identification is important for the increasing number of eyewear computers but limited input capabilities pose significant usability challenges for established knowledge-based schemes, such as passwords or PINs. We present SkullConduct, a biometric system that uses bone conduction of sound through the user’s skull as well as a microphone readily integrated into many of these devices, such as Google Glass. At the core of SkullConduct is a method to analyze the characteristic frequency response created by the user’s skull using a combination of Mel Frequency Cepstral Coefficient (MFCC) features as well as a computationally light-weight 1NN classifier. We report on a controlled experiment with 10 participants that shows that this frequency response is person-specific and stable — even when taking off and putting on the device multiple times — and thus serves as a robust biometric. We show that our method can identify users with 97.0% accuracy and authenticate them with an equal error rate of 6.9%, thereby bringing biometric user identification to eyewear computers equipped with bone conduction technology.

@inproceedings{Schneegass:2016:SBU:2858036.2858152,
title = {SkullConduct: Biometric User Identification on Eyewear Computers Using Bone Conduction Through the Skull},
author = {Stefan Schneegass and Youssef Oualil and Andreas Bulling},
url = {http://doi.acm.org/10.1145/2858036.2858152},
doi = {https://doi.org/10.1145/2858036.2858152},
year = {2016},
date = {2016},
booktitle = {Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems},
isbn = {978-1-4503-3362-7},
pages = {1379-1384},
publisher = {ACM},
address = {New York, NY, USA},
abstract = {Secure user identification is important for the increasing number of eyewear computers but limited input capabilities pose significant usability challenges for established knowledge-based schemes, such as passwords or PINs. We present SkullConduct, a biometric system that uses bone conduction of sound through the user's skull as well as a microphone readily integrated into many of these devices, such as Google Glass. At the core of SkullConduct is a method to analyze the characteristic frequency response created by the user's skull using a combination of Mel Frequency Cepstral Coefficient (MFCC) features as well as a computationally light-weight 1NN classifier. We report on a controlled experiment with 10 participants that shows that this frequency response is person-specific and stable -- even when taking off and putting on the device multiple times -- and thus serves as a robust biometric. We show that our method can identify users with 97.0% accuracy and authenticate them with an equal error rate of 6.9%, thereby bringing biometric user identification to eyewear computers equipped with bone conduction technology.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: B4

Findings of the 2016 Conference on Machine Translation Inproceedings

Proceedings of the First Conference on Machine Translation, Association for Computational Linguistics, pp. 131-198, Berlin, Germany, 2016.

Abstract
|
Links
|
BibTeX

This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.

http://www.aclweb.org/anthology/W/W16/W16-2301

@inproceedings{bojar-EtAl:2016:WMT1,
title = {Findings of the 2016 Conference on Machine Translation},
author = {Ondvrej Bojar and Rajen Chatterjee and Christian Federmann and Yvette Graham and Barry Haddow and Matthias Huck and Antonio Jimeno Yepes and Philipp Koehn and Varvara Logacheva and Christof Monz and Matteo Negri and Aurelie Neveol and Mariana Neves and Martin Popel and Matt Post and Raphael Rubino and Carolina Scarton and Lucia Specia and Marco Turchi and Karin Verspoor and Marcos Zampieri},
url = {http://www.aclweb.org/anthology/W/W16/W16-2301},
year = {2016},
date = {2016-08-01},
booktitle = {Proceedings of the First Conference on Machine Translation},
pages = {131-198},
publisher = {Association for Computational Linguistics},
address = {Berlin, Germany},
abstract = {This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: B6

Varjokallio, Matti; Klakow, Dietrich

Unsupervised morph segmentation and statistical language models for vocabulary expansion Inproceedings

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, pp. 175-180, Berlin, Germany, 2016.

Abstract
|
Links
|
BibTeX

This work explores the use of unsupervised morph segmentation along with statistical language models for the task of vocabulary expansion. Unsupervised vocabulary expansion has large potential for improving vocabulary coverage and performance in different natural language processing tasks, especially in lessresourced settings on morphologically rich languages. We propose a combination of unsupervised morph segmentation and statistical language models and evaluate on languages from the Babel corpus. The method is shown to perform well for all the evaluated languages when compared to the previous work on the task.

http://anthology.aclweb.org/P16-2029

@inproceedings{varjokallio-klakow:2016:P16-2,
title = {Unsupervised morph segmentation and statistical language models for vocabulary expansion},
author = {Matti Varjokallio and Dietrich Klakow},
url = {http://anthology.aclweb.org/P16-2029},
year = {2016},
date = {2016-08-01},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
pages = {175-180},
publisher = {Association for Computational Linguistics},
address = {Berlin, Germany},
abstract = {This work explores the use of unsupervised morph segmentation along with statistical language models for the task of vocabulary expansion. Unsupervised vocabulary expansion has large potential for improving vocabulary coverage and performance in different natural language processing tasks, especially in lessresourced settings on morphologically rich languages. We propose a combination of unsupervised morph segmentation and statistical language models and evaluate on languages from the Babel corpus. The method is shown to perform well for all the evaluated languages when compared to the previous work on the task.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: B4

Sayeed, Asad; Hong, Xudong; Demberg, Vera

Roleo: Visualising Thematic Fit Spaces on the Web Inproceedings

Proceedings of ACL-2016 System Demonstrations, Association for Computational Linguistics, pp. 139-144, Berlin, Germany, 2016.

Abstract
|
Links
|
BibTeX

In this paper, we present Roleo, a web tool for visualizing the vector spaces generated by the evaluation of distributional memory (DM) models over thematic fit judgements. A thematic fit judgement is a rating of the selectional preference of a verb for an argument that fills a given thematic role. The DM approach to thematic fit judgements involves the construction of a sub-space in which a prototypical role-filler can be built for comparison to the noun being judged. We describe a publicly-accessible web tool that allows for querying and exploring these spaces as well as a technique for visualizing thematic fit sub-spaces efficiently for web use.

@inproceedings{sayeed-hong-demberg:2016:P16-4,
title = {Roleo: Visualising Thematic Fit Spaces on the Web},
author = {Asad Sayeed and Xudong Hong and Vera Demberg},
url = {https://www.researchgate.net/publication/306093691_Roleo_Visualising_Thematic_Fit_Spaces_on_the_Web},
year = {2016},
date = {2016-08-01},
booktitle = {Proceedings of ACL-2016 System Demonstrations},
pages = {139-144},
publisher = {Association for Computational Linguistics},
address = {Berlin, Germany},
abstract = {In this paper, we present Roleo, a web tool for visualizing the vector spaces generated by the evaluation of distributional memory (DM) models over thematic fit judgements. A thematic fit judgement is a rating of the selectional preference of a verb for an argument that fills a given thematic role. The DM approach to thematic fit judgements involves the construction of a sub-space in which a prototypical role-filler can be built for comparison to the noun being judged. We describe a publicly-accessible web tool that allows for querying and exploring these spaces as well as a technique for visualizing thematic fit sub-spaces efficiently for web use.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: B2

Ahrendt, Simon; Demberg, Vera

Improving event prediction by representing script participants Inproceedings

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 546-551, San Diego, California, 2016.

Abstract
|
Links
|
BibTeX

Automatically learning script knowledge has proved difficult, with previous work not or just barely beating a most-frequent baseline. Script knowledge is a type of world knowledge which can however be useful for various task in NLP and psycholinguistic modelling. We here propose a model that includes participant information (i.e., knowledge about which participants are relevant for a script) and show, on the Dinners from Hell corpus as well as the InScript corpus, that this knowledge helps us to significantly improve prediction performance on the narrative cloze task.

http://www.aclweb.org/anthology/N16-1067

@inproceedings{ahrendt-demberg:2016:N16-1,
title = {Improving event prediction by representing script participants},
author = {Simon Ahrendt and Vera Demberg},
url = {http://www.aclweb.org/anthology/N16-1067},
year = {2016},
date = {2016-06-01},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages = {546-551},
publisher = {Association for Computational Linguistics},
address = {San Diego, California},
abstract = {Automatically learning script knowledge has proved difficult, with previous work not or just barely beating a most-frequent baseline. Script knowledge is a type of world knowledge which can however be useful for various task in NLP and psycholinguistic modelling. We here propose a model that includes participant information (i.e., knowledge about which participants are relevant for a script) and show, on the Dinners from Hell corpus as well as the InScript corpus, that this knowledge helps us to significantly improve prediction performance on the narrative cloze task.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: A4

Pusse, Florian; Sayeed, Asad; Demberg, Vera

LingoTurk: managing crowdsourced tasks for psycholinguistics Inproceedings

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, Association for Computational Linguistics, pp. 57-61, San Diego, California, 2016.

Abstract
|
Links
|
BibTeX

LingoTurk is an open-source, freely available crowdsourcing client/server system aimed primarily at psycholinguistic experimentation where custom and specialized user interfaces are required but not supported by popular crowdsourcing task management platforms. LingoTurk enables user-friendly local hosting of experiments as well as condition management and participant exclusion. It is compatible with Amazon Mechanical Turk and Prolific Academic. New experiments can easily be set up via the Play Framework and the LingoTurk API, while multiple experiments can be managed from a single system.

http://www.aclweb.org/anthology/N16-3012

@inproceedings{pusse-sayeed-demberg:2016:N16-3,
title = {LingoTurk: managing crowdsourced tasks for psycholinguistics},
author = {Florian Pusse and Asad Sayeed and Vera Demberg},
url = {http://www.aclweb.org/anthology/N16-3012},
year = {2016},
date = {2016-06-01},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations},
pages = {57-61},
publisher = {Association for Computational Linguistics},
address = {San Diego, California},
abstract = {LingoTurk is an open-source, freely available crowdsourcing client/server system aimed primarily at psycholinguistic experimentation where custom and specialized user interfaces are required but not supported by popular crowdsourcing task management platforms. LingoTurk enables user-friendly local hosting of experiments as well as condition management and participant exclusion. It is compatible with Amazon Mechanical Turk and Prolific Academic. New experiments can easily be set up via the Play Framework and the LingoTurk API, while multiple experiments can be managed from a single system.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: B2

Wanzare, Lilian Diana Awuor; Zarcone, Alessandra; Thater, Stefan; Pinkal, Manfred

A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge Inproceedings

Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios; (Ed.): Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), Portorož, Slovenia, 2016, ISBN 978-2-9517408-9-1.

Abstract
|
Links
|
BibTeX

Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.

https://aclanthology.org/L16-1556/

@inproceedings{WANZARE16.913,
title = {A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge},
author = {Lilian Diana Awuor Wanzare and Alessandra Zarcone and Stefan Thater and Manfred Pinkal},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
url = {https://aclanthology.org/L16-1556/},
year = {2016},
date = {2016},
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
isbn = {978-2-9517408-9-1},
publisher = {European Language Resources Association (ELRA)},
address = {Portoro{\v{z}, Slovenia},
abstract = {Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: A2

Le Maguer, Sébastien; Steiner, Ingmar; Möbius, Bernd

Toward a Speech Synthesis Guided by the Modeling of Unexpected Events Inproceedings

Schweitzer, Antje; Dogil, Grzegorz (Ed.): Workshop on Modeling Variability in Speech, Stuttgart, Germany, 2015.

Links
|
BibTeX

https://www.bibsonomy.org/bibtex/217fb65d2ef291a8a10df15db8a8cf5c7/sfb1102

@inproceedings{LeMaguer2015Variability,
title = {Toward a Speech Synthesis Guided by the Modeling of Unexpected Events},
author = {S{\'e}bastien Le Maguer and Ingmar Steiner and Bernd M{\"o}bius},
editor = {Antje Schweitzer and Grzegorz Dogil},
url = {https://www.bibsonomy.org/bibtex/217fb65d2ef291a8a10df15db8a8cf5c7/sfb1102},
year = {2015},
date = {2015},
booktitle = {Workshop on Modeling Variability in Speech},
address = {Stuttgart, Germany},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: C5

Fischer, Andrea; Jágrová, Klára; Stenger, Irina; Avgustinova, Tania; Klakow, Dietrich; Marti, Roland

Models for Mutual Intelligibility Inproceedings

Data Mining and its Use and Usability for Linguistic Analysis, Universität des Saarlandes, Saarbrücken, Germany, 2015.

Links
|
BibTeX

sfb-b1-coll-C4-March2015 (0.51MB)

@inproceedings{andrea2015models,
title = {Models for Mutual Intelligibility},
author = {Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova and Dietrich Klakow and Roland Marti},
year = {2015},
date = {2015},
booktitle = {Data Mining and its Use and Usability for Linguistic Analysis},
publisher = {Universit{\"a}t des Saarlandes},
address = {Saarbr{\"u}cken, Germany},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: C4

Fischer, Andrea; Jágrová, Klára; Stenger, Irina; Avgustinova, Tania; Klakow, Dietrich; Marti, Roland

An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets Inproceedings

Sharp, Bernadette; Lubaszewski, Wiesław; Delmonte, Rodolfo (Ed.): Natural Language Processing and Cognitive Science, Ca Foscarina Editrice, Venezia, pp. 115-126, 2015.

Abstract
|
Links
|
BibTeX

This article presents the methods and findings of a computational transformation of orthography within two Slavic language pairs (CzechPolish and BulgarianRussian) on different word sets. The experiment aimed at investigating to what extent these closely related languages are mutually intelligible, concentrating on their orthographies as linguistic interfaces to the written text. Besides analyzing orthographic similarity, the aim was to gain insights into the applicability of rules based on traditional linguistic assumptions for the purposes of language modelling.

https://www.bibsonomy.org/bibtex/231c7c8a9b94a872a7396d5b1a1ef7962/sfb1102

@inproceedings{klara2015orthography,
title = {An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets},
author = {Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova and Dietrich Klakow and Roland Marti},
editor = {Bernadette Sharp and Wiesław Lubaszewski and Rodolfo Delmonte},
url = {https://www.bibsonomy.org/bibtex/231c7c8a9b94a872a7396d5b1a1ef7962/sfb1102},
year = {2015},
date = {2015},
booktitle = {Natural Language Processing and Cognitive Science},
pages = {115-126},
publisher = {Ca Foscarina Editrice, Venezia},
abstract = {This article presents the methods and findings of a computational transformation of orthography within two Slavic language pairs (CzechPolish and BulgarianRussian) on different word sets. The experiment aimed at investigating to what extent these closely related languages are mutually intelligible, concentrating on their orthographies as linguistic interfaces to the written text. Besides analyzing orthographic similarity, the aim was to gain insights into the applicability of rules based on traditional linguistic assumptions for the purposes of language modelling.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: C4

Avgustinova, Tania; Fischer, Andrea; Jágrová, Klára; Stenger, Irina

The Empirical Basis of Slavic Intercomprehension Inproceedings

REMU, Joensuu, Finland, 2015.

Abstract
|
Links
|
BibTeX

The possibility of intercomprehension between related languages is a generally accepted fact suggesting that mutual intelligibility is systematic. Of particular interest are the Slavic languages, which are “sufficiently similar and sufficiently different to provide an attractive research laboratory” (Corbett 1998). They exhibit practically all typologically attested means of encoding grammatical information, ranging from extremely dense to highly redundant constructions, and their development is the result of various language contact scenarios (Balkansprachbund, German influence on West Slavic languages, Finno-Ugric substratum in East Slavic languages etc.).

@inproceedings{tania2015empirical,
title = {The Empirical Basis of Slavic Intercomprehension},
author = {Tania Avgustinova and Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger},
url = {https://www.bibsonomy.org/bibtex/187b1c53b1bad76027e0a305d2a6e2cce/sfb1102},
year = {2015},
date = {2015},
booktitle = {REMU},
address = {Joensuu, Finland},
abstract = {The possibility of intercomprehension between related languages is a generally accepted fact suggesting that mutual intelligibility is systematic. Of particular interest are the Slavic languages, which are “sufficiently similar and sufficiently different to provide an attractive research laboratory” (Corbett 1998). They exhibit practically all typologically attested means of encoding grammatical information, ranging from extremely dense to highly redundant constructions, and their development is the result of various language contact scenarios (Balkansprachbund, German influence on West Slavic languages, Finno-Ugric substratum in East Slavic languages etc.).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: C4

Fischer, Andrea; Demberg, Vera; Klakow, Dietrich

Towards Flexible, Small-Domain Surface Generation: Combining Data-Driven and Grammatical Approaches Inproceedings

Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), Association for Computational Linguistics, pp. 105-108, Brighton, England, UK, 2015.

Abstract
|
Links
|
BibTeX

As dialog systems are getting more and more ubiquitous, there is an increasing number of application domains for natural language generation, and generation objectives are getting more diverse (e.g., generating informationally dense vs. less complex utterances, as a function of target user and usage situation). Flexible generation is difficult and labourintensive with traditional template-based generation systems, while fully data-driven approaches may lead to less grammatical output, particularly if the measures used for generation objectives are correlated with measures of grammaticality. We here explore the combination of a data-driven approach with two very simple automatic grammar induction methods, basing its implementation on OpenCCG.

https://www.aclweb.org/anthology/W15-4718/

@inproceedings{fischer:demberg:klakow,
title = {Towards Flexible, Small-Domain Surface Generation: Combining Data-Driven and Grammatical Approaches},
author = {Andrea Fischer and Vera Demberg and Dietrich Klakow},
url = {https://www.aclweb.org/anthology/W15-4718/},
year = {2015},
date = {2015},
booktitle = {Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)},
pages = {105-108},
publisher = {Association for Computational Linguistics},
address = {Brighton, England, UK},
abstract = {As dialog systems are getting more and more ubiquitous, there is an increasing number of application domains for natural language generation, and generation objectives are getting more diverse (e.g., generating informationally dense vs. less complex utterances, as a function of target user and usage situation). Flexible generation is difficult and labourintensive with traditional template-based generation systems, while fully data-driven approaches may lead to less grammatical output, particularly if the measures used for generation objectives are correlated with measures of grammaticality. We here explore the combination of a data-driven approach with two very simple automatic grammar induction methods, basing its implementation on OpenCCG.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects: A4 C4

Tourtouri, Elli; Delogu, Francesca; Crocker, Matthew W.

ERP Indices of situated reference in visual contexts Journal Article

37th Annual Conference of the Cognitive Science Society, Austin, Texas, USA, 2015.

Abstract
|
Links
|
BibTeX

Violations of the maxims of Quantity occur when utterances provide more (over-specified) or less (under-specified) information than strictly required for referent identification. While behavioural data suggest that under-specified expressions lead to comprehension difficulty and communicative failure, there is no consensus as to whether over-specified expressions are also detrimental to comprehension. In this study we shed light on this debate, providing neurophysiological evidence supporting the view that extra information facilitates comprehension. We further present novel evidence that referential failure due to underspecification is qualitatively different from explicit cases of referential failure, when no matching referential candidate is available in the context.

https://www.researchgate.net/publication/312296322_ERP_indices_of_situated_reference_in_visual_contexts

@article{Tourtouri2015,
title = {ERP Indices of situated reference in visual contexts},
author = {Elli Tourtouri and Francesca Delogu and Matthew W. Crocker},
url = {https://www.researchgate.net/publication/312296322_ERP_indices_of_situated_reference_in_visual_contexts},
year = {2015},
date = {2015},
publisher = {37th Annual Conference of the Cognitive Science Society},
address = {Austin, Texas, USA},
abstract = {Violations of the maxims of Quantity occur when utterances provide more (over-specified) or less (under-specified) information than strictly required for referent identification. While behavioural data suggest that under-specified expressions lead to comprehension difficulty and communicative failure, there is no consensus as to whether over-specified expressions are also detrimental to comprehension. In this study we shed light on this debate, providing neurophysiological evidence supporting the view that extra information facilitates comprehension. We further present novel evidence that referential failure due to underspecification is qualitatively different from explicit cases of referential failure, when no matching referential candidate is available in the context.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project: C3

Schulz, Erika; Malisz, Zofia; Andreeva, Bistra; Möbius, Bernd

Einfluss von Informationsdichte und prosodischer Struktur auf Vokalraumausdehnung Inproceedings

Phonetik und Phonologie 11, Marburg, 2015.

Abstract
|
Links
|
BibTeX

Vokalraumausdehnung wird von mehreren Faktoren bestimmt, z. B. von Geschlecht (Simpson und Ericsdotter 2007), Sprechstil (Bradlow, Kraus und Hayes 2003), Prosodie (Bergem 1993), Sprechgeschwindigkeit (Weirich und Simpson 2014) oder phonologischer Nachbarschaftsdichte (Munson und Solomon 2004). Auch Sprachredundanz kann als Prädiktor spektraler Ausprägung von Vokalen dienen (Aylett und Turk 2006). Diese Studie untersucht den Einfluss von Informationsdichte und prosodischen Strukturen auf Vokalraumausdehnung in Französisch, Deutsch, Amerikanischem Englisch und Finnisch.

https://www.online.uni-marburg.de/pundp11/talks/Schulz_etal.pdf

@inproceedings{pundp11,
title = {Einfluss von Informationsdichte und prosodischer Struktur auf Vokalraumausdehnung},
author = {Erika Schulz and Zofia Malisz and Bistra Andreeva and Bernd M{\"o}bius},
url = {https://www.online.uni-marburg.de/pundp11/talks/Schulz_etal.pdf},
year = {2015},
date = {2015},
booktitle = {Phonetik und Phonologie 11},
address = {Marburg},
abstract = {Vokalraumausdehnung wird von mehreren Faktoren bestimmt, z. B. von Geschlecht (Simpson und Ericsdotter 2007), Sprechstil (Bradlow, Kraus und Hayes 2003), Prosodie (Bergem 1993), Sprechgeschwindigkeit (Weirich und Simpson 2014) oder phonologischer Nachbarschaftsdichte (Munson und Solomon 2004). Auch Sprachredundanz kann als Pr{\"a}diktor spektraler Auspr{\"a}gung von Vokalen dienen (Aylett und Turk 2006). Diese Studie untersucht den Einfluss von Informationsdichte und prosodischen Strukturen auf Vokalraumausdehnung in Franz{\"o}sisch, Deutsch, Amerikanischem Englisch und Finnisch.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: C1

Oualil, Youssef; Schulder, Marc; Helmke, Hartmut; Schmidt, Anna; Klakow, Dietrich

Real-Time Integration of Dynamic Context Information for Improving Automatic Speech Recognition Inproceedings

INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, 2015.

Abstract
|
Links
|
BibTeX

The use of prior situational/contextual knowledge about a given task can significantly improve automatic speech recognition (ASR) performance. This is typically done through adaptation of acoustic or language models if data is available or using knowledge-based rescoring. The main adaptation techniques, however, are either domain-specific, which makes them inadequate for other tasks, or static and offline, and therefore cannot deal with dynamic knowledge. To circumvent this problem, we propose a real-time system which dynamically integrates situational context into ASR. The context integration is done either post-recognition, in which case a weighted Levenshtein distance between the ASR hypotheses and the context information based on the ASR confidence scores is proposed to extract the most likely sequence of spoken words, or pre-recognition, where the search space is adjusted to the new situational knowledge through adaptation of the finite state machine modeling the spoken language. Experiments conducted on 3 hours of Air Traffic Control (ATC) data achieved a 51% reduction of the Command Error Rate (CmdER) which is used as evaluation metric in the ATC domain.

https://core.ac.uk/display/31018097

@inproceedings{youalil_interspeech_2015,
title = {Real-Time Integration of Dynamic Context Information for Improving Automatic Speech Recognition},
author = {Youssef Oualil and Marc Schulder and Hartmut Helmke and Anna Schmidt and Dietrich Klakow},
url = {https://core.ac.uk/display/31018097},
year = {2015},
date = {2015},
booktitle = {INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany},
abstract = {

The use of prior situational/contextual knowledge about a given task can significantly improve automatic speech recognition (ASR) performance. This is typically done through adaptation of acoustic or language models if data is available or using knowledge-based rescoring. The main adaptation techniques, however, are either domain-specific, which makes them inadequate for other tasks, or static and offline, and therefore cannot deal with dynamic knowledge. To circumvent this problem, we propose a real-time system which dynamically integrates situational context into ASR. The context integration is done either post-recognition, in which case a weighted Levenshtein distance between the ASR hypotheses and the context information based on the ASR confidence scores is proposed to extract the most likely sequence of spoken words, or pre-recognition, where the search space is adjusted to the new situational knowledge through adaptation of the finite state machine modeling the spoken language. Experiments conducted on 3 hours of Air Traffic Control (ATC) data achieved a 51% reduction of the Command Error Rate (CmdER) which is used as evaluation metric in the ATC domain.

},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project: B4

«
1
2
3
…
26
27
28
29
»

Successfully