Publications

Fischer, Andrea; Jágrová, Klára; Stenger, Irina; Avgustinova, Tania; Klakow, Dietrich; Marti, Roland

Models for Mutual Intelligibility Inproceedings

Data Mining and its Use and Usability for Linguistic Analysis, Universität des Saarlandes, Saarbrücken, Germany, 2015.

@inproceedings{andrea2015models,
title = {Models for Mutual Intelligibility},
author = {Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova and Dietrich Klakow and Roland Marti},
year = {2015},
date = {2015},
booktitle = {Data Mining and its Use and Usability for Linguistic Analysis},
publisher = {Universit{\"a}t des Saarlandes},
address = {Saarbr{\"u}cken, Germany},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Fischer, Andrea; Jágrová, Klára; Stenger, Irina; Avgustinova, Tania; Klakow, Dietrich; Marti, Roland

An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets Inproceedings

Sharp, Bernadette; Lubaszewski, Wiesław; Delmonte, Rodolfo (Ed.): Natural Language Processing and Cognitive Science, Ca Foscarina Editrice, Venezia, pp. 115-126, 2015.

This article presents the methods and findings of a computational transformation of orthography within two Slavic language pairs (Czech­Polish and Bulgarian­Russian) on different word sets. The experiment aimed at investigating to what extent these closely related languages are mutually intelligible, concentrating on their orthographies as linguistic interfaces to the written text. Besides analyzing orthographic similarity, the aim was to gain insights into the applicability of rules based on traditional linguistic assumptions for the purposes of language modelling.

@inproceedings{klara2015orthography,
title = {An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets},
author = {Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova and Dietrich Klakow and Roland Marti},
editor = {Bernadette Sharp and Wiesław Lubaszewski and Rodolfo Delmonte},
url = {https://www.bibsonomy.org/bibtex/231c7c8a9b94a872a7396d5b1a1ef7962/sfb1102},
year = {2015},
date = {2015},
booktitle = {Natural Language Processing and Cognitive Science},
pages = {115-126},
publisher = {Ca Foscarina Editrice, Venezia},
abstract = {This article presents the methods and findings of a computational transformation of orthography within two Slavic language pairs (Czech­Polish and Bulgarian­Russian) on different word sets. The experiment aimed at investigating to what extent these closely related languages are mutually intelligible, concentrating on their orthographies as linguistic interfaces to the written text. Besides analyzing orthographic similarity, the aim was to gain insights into the applicability of rules based on traditional linguistic assumptions for the purposes of language modelling.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Avgustinova, Tania; Fischer, Andrea; Jágrová, Klára; Stenger, Irina

The Empirical Basis of Slavic Intercomprehension Inproceedings

REMU, Joensuu, Finland, 2015.

The possibility of intercomprehension between related languages is a generally accepted fact suggesting that mutual intelligibility is systematic. Of particular interest are the Slavic languages, which are “sufficiently similar and sufficiently different to provide an attractive research laboratory” (Corbett 1998). They exhibit practically all typologically attested means of encoding grammatical information, ranging from extremely dense to highly redundant constructions, and their development is the result of various language contact scenarios (Balkansprachbund, German influence on West Slavic languages, Finno-Ugric substratum in East Slavic languages etc.).

@inproceedings{tania2015empirical,
title = {The Empirical Basis of Slavic Intercomprehension},
author = {Tania Avgustinova and Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger},
url = {https://www.bibsonomy.org/bibtex/187b1c53b1bad76027e0a305d2a6e2cce/sfb1102},
year = {2015},
date = {2015},
booktitle = {REMU},
address = {Joensuu, Finland},
abstract = {The possibility of intercomprehension between related languages is a generally accepted fact suggesting that mutual intelligibility is systematic. Of particular interest are the Slavic languages, which are “sufficiently similar and sufficiently different to provide an attractive research laboratory” (Corbett 1998). They exhibit practically all typologically attested means of encoding grammatical information, ranging from extremely dense to highly redundant constructions, and their development is the result of various language contact scenarios (Balkansprachbund, German influence on West Slavic languages, Finno-Ugric substratum in East Slavic languages etc.).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Fischer, Andrea; Demberg, Vera; Klakow, Dietrich

Towards Flexible, Small-Domain Surface Generation: Combining Data-Driven and Grammatical Approaches Inproceedings

Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), Association for Computational Linguistics, pp. 105-108, Brighton, England, UK, 2015.

As dialog systems are getting more and more ubiquitous, there is an increasing number of application domains for natural language generation, and generation objectives are getting more diverse (e.g., generating informationally dense vs. less complex utterances, as a function of target user and usage situation). Flexible generation is difficult and labourintensive with traditional template-based generation systems, while fully data-driven approaches may lead to less grammatical output, particularly if the measures used for generation objectives are correlated with measures of grammaticality. We here explore the combination of a data-driven approach with two very simple automatic grammar induction methods, basing its implementation on OpenCCG.

@inproceedings{fischer:demberg:klakow,
title = {Towards Flexible, Small-Domain Surface Generation: Combining Data-Driven and Grammatical Approaches},
author = {Andrea Fischer and Vera Demberg and Dietrich Klakow},
url = {https://www.aclweb.org/anthology/W15-4718/},
year = {2015},
date = {2015},
booktitle = {Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)},
pages = {105-108},
publisher = {Association for Computational Linguistics},
address = {Brighton, England, UK},
abstract = {As dialog systems are getting more and more ubiquitous, there is an increasing number of application domains for natural language generation, and generation objectives are getting more diverse (e.g., generating informationally dense vs. less complex utterances, as a function of target user and usage situation). Flexible generation is difficult and labourintensive with traditional template-based generation systems, while fully data-driven approaches may lead to less grammatical output, particularly if the measures used for generation objectives are correlated with measures of grammaticality. We here explore the combination of a data-driven approach with two very simple automatic grammar induction methods, basing its implementation on OpenCCG.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A4 C4

Klakow, Dietrich; Avgustinova, Tania; Stenger, Irina; Fischer, Andrea; Jágrová, Klára

The INCOMSLAV Project Inproceedings

Seminar in formal linguistics at ÚFAL, Charles University, Prague, 2014.

The human language processing mechanism shows a remarkable robustness with different kinds of imperfect linguistic signal. The INCOMSLAV project aims at gaining insights about human retrieval of information in the mode of intercomprehension, i.e. from texts in genetically related languages not acquired through language learning. Furthermore it adds to this synchronic approach a diachronic perspective which provides the vital common denominator in establishing the extent of linguistic proximity. The languages to be analysed are chosen from the group of Slavic languages (CZ, PL, RU, BG). Whereas the possibility of intercomprehension between related languages is a generally accepted fact and the ways it functions have been studied for certain language groups, such analyses have not yet been undertaken from a systematic point of view focusing on information en- and decoding at different linguistic levels. The research programme will bring together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and will compare them with insights of comparative historical linguistics on the relationship between Slavic languages. The results should add a cross-linguistic perspective to the question of how language users master high degrees of surprisal (due to partial incomprehensibility) and extract information from “noisy” code.

@inproceedings{dietrich2014incomslav,
title = {The INCOMSLAV Project},
author = {Dietrich Klakow and Tania Avgustinova and Irina Stenger and Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a}},
url = {https://ufal.mff.cuni.cz/events/incomslav-project},
year = {2014},
date = {2014},
booktitle = {Seminar in formal linguistics at ÚFAL},
publisher = {Charles University},
address = {Prague},
abstract = {The human language processing mechanism shows a remarkable robustness with different kinds of imperfect linguistic signal. The INCOMSLAV project aims at gaining insights about human retrieval of information in the mode of intercomprehension, i.e. from texts in genetically related languages not acquired through language learning. Furthermore it adds to this synchronic approach a diachronic perspective which provides the vital common denominator in establishing the extent of linguistic proximity. The languages to be analysed are chosen from the group of Slavic languages (CZ, PL, RU, BG). Whereas the possibility of intercomprehension between related languages is a generally accepted fact and the ways it functions have been studied for certain language groups, such analyses have not yet been undertaken from a systematic point of view focusing on information en- and decoding at different linguistic levels. The research programme will bring together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and will compare them with insights of comparative historical linguistics on the relationship between Slavic languages. The results should add a cross-linguistic perspective to the question of how language users master high degrees of surprisal (due to partial incomprehensibility) and extract information from “noisy” code.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Successfully