News - SFB 1102

Inaugural Colloquium Phase II, February 18 2019

PROGRAM

Welcome to Dr Tanja Säily (University of Helsinki)

We are pleased to welcome our guest researcher from Finland! Dr Tanja Säily from the Department of Languages at the University of Helsinki is visiting the Department of Language Science and Technology and the SFB (project B1) for 2 months (February-March 2019) in order to work with us on common research interests in the field of historical corpus linguistics. Dr Tanja Säily is part of the Helsinki Computational History Group (COMHIS). In addition, she is a member of two Academy of Finland funded research projects that focus on language change. She is also developing corpus-linguistic tools and methods for historical sociolinguistics in the STRATAS project and was actively involved in the compilation of the Corpora of Early English Correspondence.

SFB-Networking Workshop Köln – Potsdam – Saarbrücken

SFB-Networking Workshop Language Processing (SFB 1252, SFB 1102, SFB 1287)

February 14, 2019 – February 15, 2019

This series of network meetings brings together researchers from the CRCs in Cologne, Potsdam and Saarbrücken. The aim is to discuss shared themes and methodological approaches of the CRC 1252 „Prominence in Language“, CRC 1102 „Information Density and Linguistic Encoding“ and CRC 1287 „Limits of Variability in Language“. Thematic workshops are organized on topics that are shared by different projects across the CRCs. PIs are welcome to initiate and organize meetings on specific topics.

Contact persons:

Petra B. Schumacher, Cologne (SFB 1252)

Maria Staudte, Saarbrücken (SFB 1102)

Isabell Wartenburger, Potsam (SFB 1287)

The kick-off networking meeting will focus on language processing and will take place in Potsdam on February 14/15, 2019.

Isabell Wartenburger, Potsam (SFB 1287)

Please click here for the full program.

DFG Emmy Noether award to Michael Roth

The German Research Foundation has accepted Michael Roth to the DFG Emmy Noether Programme, which gives outstanding early career researchers the opportunity to lead an independent junior research group. Michael’s research group will use computational methods to investigate linguistic factors behind misunderstandings.

C4 publishes new article in Computer Speech & Language

A new article by the members of project C4 entitled “Language models, surprisal and fantasy in Slavic intercomprehension” has been published in Computer Speech & Language.

You can access the full article here.

C1 publishes new article in Frontiers in Communication

A new article by the members of project C1 entitled „Dimensions of segmental variability: interaction of prosody and surprisal in six languages“ has been published in Frontiers in Communication.

You can access the full article here.

Saarbrücker Zeitung reports on SFB 1102 IDeaL

In an interview with Saarbrücker Zeitung, Elke Teich gave an overview of the research conducted in the SFB 1102.

You can access the full article here.

Second Phase of SFB 1102 IDeaL approved

The collaborative research centre 1102 „Information Denstity and Linguistic Encoding“, funded by the German Research Foundation (DFG), has been granted approx. 11 million euros for a four-year extension starting in July 2018. The CRC directed by Elke Teich brings together around 60 scientists from the fields of Linguistics, Psychology and Computer Science to investigate the hypothesis that language variation and language use can be better understood in terms of a speaker’s desire to rationally distribute information across the linguistic signal.

PhD Day

The fourth PhD Day of SFB 1102 will take place December 8 at Saarland University’s Graduate Centre (C9.3).

Please click here for the full program of the PhD Day.

Workshop „Making effective use of metadata of historical texts and corpora“

During the last decade, the availability of historical data has increased dramatically, giving rise to a large amount of research on historical corpora and archives. One of the main challenges with this type of data is data processing: data is erroneous (e.g., OCR errors), heterogeneous (including text, figures, table, etc.), sometimes multilingual within one document (e.g., Latin or French in historical scientific texts), not easy to be processed by standard NLP tools, and difficult to structure document-internally (e.g., titles, pages, paragraphs) and corpus/archive-wide (e.g., categorization into text types/genres/registers). Moreover, collecting and making available meta-data is moving into focus. Especially for (socio)linguistic analysis taking a variationist approach, considering metadata information (e.g. text author, production time) is essential to generate valuable results. In this workshop, we aim to bring together specialists from the field of historical corpora/archive building as well as those researchers involved in conducting empirical research on historical data that are also interested in accounting for different variables, notably social variables. Each corpus/archive has its peculiarities. Which are these and how can we make best use of them? Collections and corpora might be based on the same data sets, but interests in the kinds of meta-data valuable for analysis vary. The workshop provides a forum to discuss and improve our understanding of building and using historical data as efficiently as possible accounting for a wider range of variables based on availability of metadata.

Please click here for the full program of the workshop. Abstracts are available here.

This workshop is kindly co-sponsored by the „Universitätsgesellschaft des Saarlandes e.V.“