Inaugural Colloquium Phase II, February 18 2019
PROGRAM
PROGRAM
We are pleased to welcome our guest researcher from Finland! Dr Tanja Säily from the Department of Languages at the University of Helsinki is visiting the Department of Language Science and Technology and the SFB (project B1) for 2 months (February-March 2019) in order to work with us on common research interests in the field of historical corpus linguistics. Dr Tanja Säily is part of the Helsinki Computational History Group (COMHIS). In addition, she is a member of two Academy of Finland funded research projects that focus on language change. She is also developing corpus-linguistic tools and methods for historical sociolinguistics in the STRATAS project and was actively involved in the compilation of the Corpora of Early English Correspondence.
The German Research Foundation has accepted Michael Roth to the DFG Emmy Noether Programme, which gives outstanding early career researchers the opportunity to lead an independent junior research group. Michael’s research group will use computational methods to investigate linguistic factors behind misunderstandings.
A new article by the members of project C4 entitled “Language models, surprisal and fantasy in Slavic intercomprehension” has been published in Computer Speech & Language.
You can access the full article here.
A new article by the members of project C1 entitled „Dimensions of segmental variability: interaction of prosody and surprisal in six languages“ has been published in Frontiers in Communication.
You can access the full article here.
In an interview with Saarbrücker Zeitung, Elke Teich gave an overview of the research conducted in the SFB 1102.
You can access the full article here.
The collaborative research centre 1102 „Information Denstity and Linguistic Encoding“, funded by the German Research Foundation (DFG), has been granted approx. 11 million euros for a four-year extension starting in July 2018. The CRC directed by Elke Teich brings together around 60 scientists from the fields of Linguistics, Psychology and Computer Science to investigate the hypothesis that language variation and language use can be better understood in terms of a speaker’s desire to rationally distribute information across the linguistic signal.
The fourth PhD Day of SFB 1102 will take place December 8 at Saarland University’s Graduate Centre (C9.3).
Please click here for the full program of the PhD Day.
During the last decade, the availability of historical data has increased dramatically, giving rise to a large amount of research on historical corpora and archives. One of the main challenges with this type of data is data processing: data is erroneous (e.g., OCR errors), heterogeneous (including text, figures, table, etc.), sometimes multilingual within one document (e.g., Latin or French in historical scientific texts), not easy to be processed by standard NLP tools, and difficult to structure document-internally (e.g., titles, pages, paragraphs) and corpus/archive-wide (e.g., categorization into text types/genres/registers). Moreover, collecting and making available meta-data is moving into focus. Especially for (socio)linguistic analysis taking a variationist approach, considering metadata information (e.g. text author, production time) is essential to generate valuable results. In this workshop, we aim to bring together specialists from the field of historical corpora/archive building as well as those researchers involved in conducting empirical research on historical data that are also interested in accounting for different variables, notably social variables. Each corpus/archive has its peculiarities. Which are these and how can we make best use of them? Collections and corpora might be based on the same data sets, but interests in the kinds of meta-data valuable for analysis vary. The workshop provides a forum to discuss and improve our understanding of building and using historical data as efficiently as possible accounting for a wider range of variables based on availability of metadata.
Please click here for the full program of the workshop. Abstracts are available here.
This workshop is kindly co-sponsored by the „Universitätsgesellschaft des Saarlandes e.V.“