Publications

Mosbach, Marius; Stenger, Irina; Avgustinova, Tania; Klakow, Dietrich

incom.py - A Toolbox for Calculating Linguistic Distances and Asymmetries between Related Languages Inproceedings

Angelova, Galia; Mitkov, Ruslan; Nikolova, Ivelina; Temnikova, Irina (Ed.): Proceedings of Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, 2-4 September 2019, pp. 811-819, Varna, Bulgaria, 2019.

Languages may be differently distant from each other and their mutual intelligibility may be asymmetric. In this paper we introduce incom.py, a toolbox for calculating linguistic distances and asymmetries between related languages. incom.py allows linguist experts to quickly and easily perform statistical analyses and compare those with experimental results. We demonstrate the efficacy of incom.py in an incomprehension experiment on two Slavic languages: Bulgarian and Russian. Using incom.py we were able to validate three methods to measure linguistic distances and asymmetries: Levenshtein distance, word adaptation surprisal, and conditional entropy as predictors of success in a reading intercomprehension experiment.

@inproceedings{Mosbach2019,
title = {incom.py - A Toolbox for Calculating Linguistic Distances and Asymmetries between Related Languages},
author = {Marius Mosbach and Irina Stenger and Tania Avgustinova and Dietrich Klakow},
editor = {Galia Angelova and Ruslan Mitkov and Ivelina Nikolova and Irina Temnikova},
url = {https://aclanthology.org/R19-1094/},
doi = {https://doi.org/10.26615/978-954-452-056-4_094},
year = {2019},
date = {2019},
booktitle = {Proceedings of Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, 2-4 September 2019},
pages = {811-819},
address = {Varna, Bulgaria},
abstract = {Languages may be differently distant from each other and their mutual intelligibility may be asymmetric. In this paper we introduce incom.py, a toolbox for calculating linguistic distances and asymmetries between related languages. incom.py allows linguist experts to quickly and easily perform statistical analyses and compare those with experimental results. We demonstrate the efficacy of incom.py in an incomprehension experiment on two Slavic languages: Bulgarian and Russian. Using incom.py we were able to validate three methods to measure linguistic distances and asymmetries: Levenshtein distance, word adaptation surprisal, and conditional entropy as predictors of success in a reading intercomprehension experiment.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   B4 C4

Avgustinova, Tania; Iomdin, Leonid

Towards a Typology of Microsyntactic Constructions Inproceedings

Corpas-Pastor, Gloria; Mitkov, Ruslan (Ed.): Computational and Corpus-Based Phraseology, Springer, Cham, pp. 15-30, 2019.

This contribution outlines an international research effort for creating a typology of syntactic idioms on the borderline of the dictionary and the grammar. Recent studies focusing on the adequate description of such units, especially for modern Russian, have resulted in two types of linguistic resources: a microsyntactic dictionary of Russian, and a microsyntactically annotated corpus of Russian texts. Our goal now is to discover to what extent the findings can be generalized cross-linguistically in order to create analogous multilingual resources. The initial work consists in constructing a typology of relevant phenomena. The empirical base is provided by closely related languages which are mutually intelligible to various degrees. We start by creating an inventory for this typology for four representative Slavic languages: Russian (East Slavic), Bulgarian (South Slavic), Polish and Czech (West Slavic). Our preliminary results show that the aim is attainable and can be of relevance to theoretical, comparative and applied linguistics as well as in NLP tasks.

@inproceedings{Avgustinova2019,
title = {Towards a Typology of Microsyntactic Constructions},
author = {Tania Avgustinova and Leonid Iomdin},
editor = {Gloria Corpas-Pastor and Ruslan Mitkov},
url = {https://link.springer.com/chapter/10.1007/978-3-030-30135-4_2},
year = {2019},
date = {2019-09-18},
booktitle = {Computational and Corpus-Based Phraseology},
pages = {15-30},
publisher = {Springer, Cham},
abstract = {This contribution outlines an international research effort for creating a typology of syntactic idioms on the borderline of the dictionary and the grammar. Recent studies focusing on the adequate description of such units, especially for modern Russian, have resulted in two types of linguistic resources: a microsyntactic dictionary of Russian, and a microsyntactically annotated corpus of Russian texts. Our goal now is to discover to what extent the findings can be generalized cross-linguistically in order to create analogous multilingual resources. The initial work consists in constructing a typology of relevant phenomena. The empirical base is provided by closely related languages which are mutually intelligible to various degrees. We start by creating an inventory for this typology for four representative Slavic languages: Russian (East Slavic), Bulgarian (South Slavic), Polish and Czech (West Slavic). Our preliminary results show that the aim is attainable and can be of relevance to theoretical, comparative and applied linguistics as well as in NLP tasks.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Jágrová, Klára; Avgustinova, Tania; Stenger, Irina; Fischer, Andrea

Language models, surprisal and fantasy in Slavic intercomprehension Journal Article

Computer Speech & Language, 2018.

In monolingual human language processing, the predictability of a word given its surrounding sentential context is crucial. With regard to receptive multilingualism, it is unclear to what extent predictability in context interplays with other linguistic factors in understanding a related but unknown language – a process called intercomprehension. We distinguish two dimensions influencing processing effort during intercomprehension: surprisal in sentential context and linguistic distance.

Based on this hypothesis, we formulate expectations regarding the difficulty of designed experimental stimuli and compare them to the results from think-aloud protocols of experiments in which Czech native speakers decode Polish sentences by agreeing on an appropriate translation. On the one hand, orthographic and lexical distances are reliable predictors of linguistic similarity. On the other hand, we obtain the predictability of words in a sentence with the help of trigram language models.

We find that linguistic distance (encoding similarity) and in-context surprisal (predictability in context) appear to be complementary, with neither factor outweighing the other, and that our distinguishing of these two measurable dimensions is helpful in understanding certain unexpected effects in human behaviour.

@article{Jágrová2018b,
title = {Language models, surprisal and fantasy in Slavic intercomprehension},
author = {Kl{\'a}ra J{\'a}grov{\'a} and Tania Avgustinova and Irina Stenger and Andrea Fischer},
url = {https://www.sciencedirect.com/science/article/pii/S0885230817300451},
year = {2018},
date = {2018},
journal = {Computer Speech & Language},
abstract = {In monolingual human language processing, the predictability of a word given its surrounding sentential context is crucial. With regard to receptive multilingualism, it is unclear to what extent predictability in context interplays with other linguistic factors in understanding a related but unknown language – a process called intercomprehension. We distinguish two dimensions influencing processing effort during intercomprehension: surprisal in sentential context and linguistic distance. Based on this hypothesis, we formulate expectations regarding the difficulty of designed experimental stimuli and compare them to the results from think-aloud protocols of experiments in which Czech native speakers decode Polish sentences by agreeing on an appropriate translation. On the one hand, orthographic and lexical distances are reliable predictors of linguistic similarity. On the other hand, we obtain the predictability of words in a sentence with the help of trigram language models. We find that linguistic distance (encoding similarity) and in-context surprisal (predictability in context) appear to be complementary, with neither factor outweighing the other, and that our distinguishing of these two measurable dimensions is helpful in understanding certain unexpected effects in human behaviour.},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Jágrová, Klára; Stenger, Irina; Avgustinova, Tania

Polski nadal nieskomplikowany? Interkomprehensionsexperimente mit Nominalphrasen Journal Article

Polnisch in Deutschland. Zeitschrift der Bundesvereinigung der Polnischlehrkräfte, 5/2017, pp. 20-37, 2018.

@article{Jágrová2018,
title = {Polski nadal nieskomplikowany? Interkomprehensionsexperimente mit Nominalphrasen},
author = {Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova},
year = {2018},
date = {2018},
journal = {Polnisch in Deutschland. Zeitschrift der Bundesvereinigung der Polnischlehrkr{\"a}fte},
pages = {20-37},
volume = {5/2017},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Fischer, Andrea; Vreeken, Jilles; Klakow, Dietrich

Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by MDL Journal Article

Computación y Systems, 21, pp. 829-839, 2017.
We present a minimum description length based algorithm for finding the regular correspondences between related languages and show how it can be used to quantify the similarity between not only pairs, but whole groups of languages directly from cognate sets. We employ a two-part code, which allows to use the data and model complexity of the discovered correspondences as information-theoretic quantifications of the degree of regularity of cognate realizations in these languages. Unlike previous work, our approach is not limited to pairs of languages, does not limit the size of discovered correspondences, does not make assumptions about the shape or distribution of correspondences, and requires no expert knowledge or fine-tuning of parameters. We here test our approach on the Slavic languages. In a pairwise analysis of 13 Slavic languages, we show that our algorithm replicates their linguistic classification exactly. In a four-language experiment, we demonstrate how our algorithm efficiently quantifies similarity between all subsets of the analyzed four languages and find that it is excellently suited to quantifying the orthographic regularity of closely-related languages.

@article{Fischer2017,
title = {Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by MDL},
author = {Andrea Fischer and Jilles Vreeken and Dietrich Klakow},
url = {http://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/view/2865},
year = {2017},
date = {2017},
journal = {Computación y Systems},
pages = {829-839},
volume = {21},
number = {4},
abstract = {

We present a minimum description length based algorithm for finding the regular correspondences between related languages and show how it can be used to quantify the similarity between not only pairs, but whole groups of languages directly from cognate sets. We employ a two-part code, which allows to use the data and model complexity of the discovered correspondences as information-theoretic quantifications of the degree of regularity of cognate realizations in these languages. Unlike previous work, our approach is not limited to pairs of languages, does not limit the size of discovered correspondences, does not make assumptions about the shape or distribution of correspondences, and requires no expert knowledge or fine-tuning of parameters. We here test our approach on the Slavic languages. In a pairwise analysis of 13 Slavic languages, we show that our algorithm replicates their linguistic classification exactly. In a four-language experiment, we demonstrate how our algorithm efficiently quantifies similarity between all subsets of the analyzed four languages and find that it is excellently suited to quantifying the orthographic regularity of closely-related languages.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Jágrová, Klára; Stenger, Irina; Marti, Roland; Avgustinova, Tania

Lexical and orthographic distances between Bulgarian, Czech, Polish, and Russian: A comparative analysis of the most frequent nouns Inproceedings

Joseph Emonds & Markéta Janebová (eds.), Language Use and Linguistic Structure. Proceedings of the Olomouc Linguistics Colloquium 2016, pp. 401–416, Olomouc: Palacký University, 2017.

@inproceedings{Klára2017,
title = {Lexical and orthographic distances between Bulgarian, Czech, Polish, and Russian: A comparative analysis of the most frequent nouns},
author = {Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Roland Marti and Tania Avgustinova},
year = {2017},
date = {2017},
booktitle = {Joseph Emonds & Mark{\'e}ta Janebov{\'a} (eds.), Language Use and Linguistic Structure. Proceedings of the Olomouc Linguistics Colloquium 2016},
pages = {401–416},
address = {Olomouc: Palacký University},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Stenger, Irina; Jágrová, Klára; Fischer, Andrea; Avgustinova, Tania; Klakow, Dietrich; Marti, Roland

Modeling the Impact of Orthographic Coding on Czech-Polish and Bulgarian-Russian Reading Intercomprehension Journal Article

Nordic Journal of Linguistic, 40, pp. 175-199, 2017.

Focusing on orthography as a primary linguistic interface in every reading activity, the central research question we address here is how orthographic intelligibility can be measured and predicted between closely related languages. This paper presents methods and findings of modeling orthographic intelligibility in a reading intercomprehension scenario from the information-theoretic perspective. The focus of the study is on two Slavic language pairs: Czech–Polish (West Slavic, using the Latin script) and Bulgarian–Russian (South Slavic and East Slavic, respectively, using the Cyrillic script). In this article, we present computational methods for measuring orthographic distance and orthographic asymmetry by means of the Levenshtein algorithm, conditional entropy and adaptation surprisal method that are expected to predict the influence of orthography on mutual intelligibility in reading.

@article{Stenger2017b,
title = {Modeling the Impact of Orthographic Coding on Czech-Polish and Bulgarian-Russian Reading Intercomprehension},
author = {Irina Stenger and Kl{\'a}ra J{\'a}grov{\'a} and Andrea Fischer and Tania Avgustinova and Dietrich Klakow and Roland Marti},
url = {https://www.cambridge.org/core/journals/nordic-journal-of-linguistics/article/modeling-the-impact-of-orthographic-coding-on-czechpolish-and-bulgarianrussian-reading-intercomprehension/363BEB5C556DFBDAC7FEED0AE06B06AA},
year = {2017},
date = {2017},
journal = {Nordic Journal of Linguistic},
pages = {175-199},
volume = {40},
number = {2},
abstract = {

Focusing on orthography as a primary linguistic interface in every reading activity, the central research question we address here is how orthographic intelligibility can be measured and predicted between closely related languages. This paper presents methods and findings of modeling orthographic intelligibility in a reading intercomprehension scenario from the information-theoretic perspective. The focus of the study is on two Slavic language pairs: Czech–Polish (West Slavic, using the Latin script) and Bulgarian–Russian (South Slavic and East Slavic, respectively, using the Cyrillic script). In this article, we present computational methods for measuring orthographic distance and orthographic asymmetry by means of the Levenshtein algorithm, conditional entropy and adaptation surprisal method that are expected to predict the influence of orthography on mutual intelligibility in reading.
},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Stenger, Irina; Avgustinova, Tania; Marti, Roland

Levenshtein distance and word adaptation surprisal as methods of measuring mutual intelligibility in reading comprehension of Slavic languages Inproceedings

Computational Linguistics and Intellectual Technologies: International Conference "Dialogue 2017" , 1, pp. 304-317, 2017.

In this article we validate two measuring methods: Levenshtein distance and word adaptation surprisal as potential predictors of success in reading intercomprehension. We investigate to what extent orthographic distances between Russian and other East Slavic (Ukrainian, Belarusian) and South Slavic (Bulgarian, Macedonian, Serbian) languages found by means of the Levenshtein algorithm and word adaptation surprisal correlate with comprehension of unknown Slavic languages on the basis of data obtained from Russian native speakers in online free translation task experiments. We try to find an answer to the following question: Can measuring methods such as Levenshtein distance and word adaptation surprisal be considered as a good approximation of orthographic intelligibility of unknown Slavic languages using the Cyrillic script?

@inproceedings{Stenger2017,
title = {Levenshtein distance and word adaptation surprisal as methods of measuring mutual intelligibility in reading comprehension of Slavic languages},
author = {Irina Stenger and Tania Avgustinova and Roland Marti},
url = {https://www.semanticscholar.org/paper/Levenshtein-Distance-anD-WorD-aDaptation-surprisaL-Distance/6103d388cb0398b89dec8ca36ec0be025bb6dea2},
year = {2017},
date = {2017},
booktitle = {Computational Linguistics and Intellectual Technologies: International Conference "Dialogue 2017"},
pages = {304-317},
abstract = {In this article we validate two measuring methods: Levenshtein distance and word adaptation surprisal as potential predictors of success in reading intercomprehension. We investigate to what extent orthographic distances between Russian and other East Slavic (Ukrainian, Belarusian) and South Slavic (Bulgarian, Macedonian, Serbian) languages found by means of the Levenshtein algorithm and word adaptation surprisal correlate with comprehension of unknown Slavic languages on the basis of data obtained from Russian native speakers in online free translation task experiments. We try to find an answer to the following question: Can measuring methods such as Levenshtein distance and word adaptation surprisal be considered as a good approximation of orthographic intelligibility of unknown Slavic languages using the Cyrillic script?},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Jágrová, Klára; Stenger, Irina; Avgustinova, Tania; Marti, Roland

POLSKI TO JEZYK NIESKOMPLIKOWANY? Theoretische und praktische Interkomprehension der 100 häufigsten polnischen Substantive Journal Article

In Polnisch in Deutschland. Zeitschrift der Bundesvereinigung der Polnischlehrkräfte, 4/2016, pp. 5-19, 2017.

@article{Jágrová2017,
title = {POLSKI TO JEZYK NIESKOMPLIKOWANY? Theoretische und praktische Interkomprehension der 100 h{\"a}ufigsten polnischen Substantive},
author = {Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova and Roland Marti},
year = {2017},
date = {2017},
journal = {In Polnisch in Deutschland. Zeitschrift der Bundesvereinigung der Polnischlehrkr{\"a}fte},
pages = {5-19},
volume = {4/2016},
pubstate = {published},
type = {article}
}

Copy BibTeX to Clipboard

Project:   C4

Stenger, Irina

How reading intercomprehension works among Slavic languages with Cyrillic script Inproceedings

Köllner, Marisa; Ziai, Ramon (Ed.): ESSLLI 2016, pp. 30-42, 2016.

@inproceedings{Stenger2016,
title = {How reading intercomprehension works among Slavic languages with Cyrillic script},
author = {Irina Stenger},
editor = {Marisa K{\"o}llner and Ramon Ziai},
url = {https://esslli2016.unibz.it/wp-content/uploads/2016/09/esslli-stus-2016-proceedings.pdf},
year = {2016},
date = {2016},
pages = {30-42},
publisher = {ESSLLI 2016},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Fischer, Andrea; Jágrová, Klára; Stenger, Irina; Avgustinova, Tania; Klakow, Dietrich; Marti, Roland

Models for Mutual Intelligibility Inproceedings

Data Mining and its Use and Usability for Linguistic Analysis, Universität des Saarlandes, Saarbrücken, Germany, 2015.

@inproceedings{andrea2015models,
title = {Models for Mutual Intelligibility},
author = {Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova and Dietrich Klakow and Roland Marti},
year = {2015},
date = {2015},
booktitle = {Data Mining and its Use and Usability for Linguistic Analysis},
publisher = {Universit{\"a}t des Saarlandes},
address = {Saarbr{\"u}cken, Germany},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Fischer, Andrea; Jágrová, Klára; Stenger, Irina; Avgustinova, Tania; Klakow, Dietrich; Marti, Roland

An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets Inproceedings

Sharp, Bernadette; Lubaszewski, Wiesław; Delmonte, Rodolfo (Ed.): Natural Language Processing and Cognitive Science, Ca Foscarina Editrice, Venezia, pp. 115-126, 2015.

This article presents the methods and findings of a computational transformation of orthography within two Slavic language pairs (Czech­Polish and Bulgarian­Russian) on different word sets. The experiment aimed at investigating to what extent these closely related languages are mutually intelligible, concentrating on their orthographies as linguistic interfaces to the written text. Besides analyzing orthographic similarity, the aim was to gain insights into the applicability of rules based on traditional linguistic assumptions for the purposes of language modelling.

@inproceedings{klara2015orthography,
title = {An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets},
author = {Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger and Tania Avgustinova and Dietrich Klakow and Roland Marti},
editor = {Bernadette Sharp and Wiesław Lubaszewski and Rodolfo Delmonte},
url = {https://www.bibsonomy.org/bibtex/231c7c8a9b94a872a7396d5b1a1ef7962/sfb1102},
year = {2015},
date = {2015},
booktitle = {Natural Language Processing and Cognitive Science},
pages = {115-126},
publisher = {Ca Foscarina Editrice, Venezia},
abstract = {This article presents the methods and findings of a computational transformation of orthography within two Slavic language pairs (Czech­Polish and Bulgarian­Russian) on different word sets. The experiment aimed at investigating to what extent these closely related languages are mutually intelligible, concentrating on their orthographies as linguistic interfaces to the written text. Besides analyzing orthographic similarity, the aim was to gain insights into the applicability of rules based on traditional linguistic assumptions for the purposes of language modelling.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Avgustinova, Tania; Fischer, Andrea; Jágrová, Klára; Stenger, Irina

The Empirical Basis of Slavic Intercomprehension Inproceedings

REMU, Joensuu, Finland, 2015.

The possibility of intercomprehension between related languages is a generally accepted fact suggesting that mutual intelligibility is systematic. Of particular interest are the Slavic languages, which are “sufficiently similar and sufficiently different to provide an attractive research laboratory” (Corbett 1998). They exhibit practically all typologically attested means of encoding grammatical information, ranging from extremely dense to highly redundant constructions, and their development is the result of various language contact scenarios (Balkansprachbund, German influence on West Slavic languages, Finno-Ugric substratum in East Slavic languages etc.).

@inproceedings{tania2015empirical,
title = {The Empirical Basis of Slavic Intercomprehension},
author = {Tania Avgustinova and Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a} and Irina Stenger},
url = {https://www.bibsonomy.org/bibtex/187b1c53b1bad76027e0a305d2a6e2cce/sfb1102},
year = {2015},
date = {2015},
booktitle = {REMU},
address = {Joensuu, Finland},
abstract = {The possibility of intercomprehension between related languages is a generally accepted fact suggesting that mutual intelligibility is systematic. Of particular interest are the Slavic languages, which are “sufficiently similar and sufficiently different to provide an attractive research laboratory” (Corbett 1998). They exhibit practically all typologically attested means of encoding grammatical information, ranging from extremely dense to highly redundant constructions, and their development is the result of various language contact scenarios (Balkansprachbund, German influence on West Slavic languages, Finno-Ugric substratum in East Slavic languages etc.).},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Fischer, Andrea; Demberg, Vera; Klakow, Dietrich

Towards Flexible, Small-Domain Surface Generation: Combining Data-Driven and Grammatical Approaches Inproceedings

Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), Association for Computational Linguistics, pp. 105-108, Brighton, England, UK, 2015.

As dialog systems are getting more and more ubiquitous, there is an increasing number of application domains for natural language generation, and generation objectives are getting more diverse (e.g., generating informationally dense vs. less complex utterances, as a function of target user and usage situation). Flexible generation is difficult and labourintensive with traditional template-based generation systems, while fully data-driven approaches may lead to less grammatical output, particularly if the measures used for generation objectives are correlated with measures of grammaticality. We here explore the combination of a data-driven approach with two very simple automatic grammar induction methods, basing its implementation on OpenCCG.

@inproceedings{fischer:demberg:klakow,
title = {Towards Flexible, Small-Domain Surface Generation: Combining Data-Driven and Grammatical Approaches},
author = {Andrea Fischer and Vera Demberg and Dietrich Klakow},
url = {https://www.aclweb.org/anthology/W15-4718/},
year = {2015},
date = {2015},
booktitle = {Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)},
pages = {105-108},
publisher = {Association for Computational Linguistics},
address = {Brighton, England, UK},
abstract = {As dialog systems are getting more and more ubiquitous, there is an increasing number of application domains for natural language generation, and generation objectives are getting more diverse (e.g., generating informationally dense vs. less complex utterances, as a function of target user and usage situation). Flexible generation is difficult and labourintensive with traditional template-based generation systems, while fully data-driven approaches may lead to less grammatical output, particularly if the measures used for generation objectives are correlated with measures of grammaticality. We here explore the combination of a data-driven approach with two very simple automatic grammar induction methods, basing its implementation on OpenCCG.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Projects:   A4 C4

Klakow, Dietrich; Avgustinova, Tania; Stenger, Irina; Fischer, Andrea; Jágrová, Klára

The INCOMSLAV Project Inproceedings

Seminar in formal linguistics at ÚFAL, Charles University, Prague, 2014.

The human language processing mechanism shows a remarkable robustness with different kinds of imperfect linguistic signal. The INCOMSLAV project aims at gaining insights about human retrieval of information in the mode of intercomprehension, i.e. from texts in genetically related languages not acquired through language learning. Furthermore it adds to this synchronic approach a diachronic perspective which provides the vital common denominator in establishing the extent of linguistic proximity. The languages to be analysed are chosen from the group of Slavic languages (CZ, PL, RU, BG). Whereas the possibility of intercomprehension between related languages is a generally accepted fact and the ways it functions have been studied for certain language groups, such analyses have not yet been undertaken from a systematic point of view focusing on information en- and decoding at different linguistic levels. The research programme will bring together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and will compare them with insights of comparative historical linguistics on the relationship between Slavic languages. The results should add a cross-linguistic perspective to the question of how language users master high degrees of surprisal (due to partial incomprehensibility) and extract information from “noisy” code.

@inproceedings{dietrich2014incomslav,
title = {The INCOMSLAV Project},
author = {Dietrich Klakow and Tania Avgustinova and Irina Stenger and Andrea Fischer and Kl{\'a}ra J{\'a}grov{\'a}},
url = {https://ufal.mff.cuni.cz/events/incomslav-project},
year = {2014},
date = {2014},
booktitle = {Seminar in formal linguistics at ÚFAL},
publisher = {Charles University},
address = {Prague},
abstract = {The human language processing mechanism shows a remarkable robustness with different kinds of imperfect linguistic signal. The INCOMSLAV project aims at gaining insights about human retrieval of information in the mode of intercomprehension, i.e. from texts in genetically related languages not acquired through language learning. Furthermore it adds to this synchronic approach a diachronic perspective which provides the vital common denominator in establishing the extent of linguistic proximity. The languages to be analysed are chosen from the group of Slavic languages (CZ, PL, RU, BG). Whereas the possibility of intercomprehension between related languages is a generally accepted fact and the ways it functions have been studied for certain language groups, such analyses have not yet been undertaken from a systematic point of view focusing on information en- and decoding at different linguistic levels. The research programme will bring together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and will compare them with insights of comparative historical linguistics on the relationship between Slavic languages. The results should add a cross-linguistic perspective to the question of how language users master high degrees of surprisal (due to partial incomprehensibility) and extract information from “noisy” code.},
pubstate = {published},
type = {inproceedings}
}

Copy BibTeX to Clipboard

Project:   C4

Successfully