News - SFB 1102

Workshop „Making effective use of metadata of historical texts and corpora“

During the last decade, the availability of historical data has increased dramatically, giving rise to a large amount of research on historical corpora and archives. One of the main challenges with this type of data is data processing: data is erroneous (e.g., OCR errors), heterogeneous (including text, figures, table, etc.), sometimes multilingual within one document (e.g., Latin or French in historical scientific texts), not easy to be processed by standard NLP tools, and difficult to structure document-internally (e.g., titles, pages, paragraphs) and corpus/archive-wide (e.g., categorization into text types/genres/registers). Moreover, collecting and making available meta-data is moving into focus. Especially for (socio)linguistic analysis taking a variationist approach, considering metadata information (e.g. text author, production time) is essential to generate valuable results. In this workshop, we aim to bring together specialists from the field of historical corpora/archive building as well as those researchers involved in conducting empirical research on historical data that are also interested in accounting for different variables, notably social variables. Each corpus/archive has its peculiarities. Which are these and how can we make best use of them? Collections and corpora might be based on the same data sets, but interests in the kinds of meta-data valuable for analysis vary. The workshop provides a forum to discuss and improve our understanding of building and using historical data as efficiently as possible accounting for a wider range of variables based on availability of metadata.

Please click here for the full program of the workshop. Abstracts are available here.

This workshop is kindly co-sponsored by the „Universitätsgesellschaft des Saarlandes e.V.“

Published article about Katrin Menzel‘s research on the “Philosophical Transactions”

Saarland University’s online magazine „Campus“ has published an article about Katrin Menzel (Project B1: Information Density and Scientific Literacy in English – Synchronic and Diachronic Perspectives) and her group’s research on how scientific English has changed over the years, based on an analysis of the first 200 editions of the “Philosophical Transactions” of the Royal Society of London, the world’s first science journal.

The full article can be found here.

PhD Day

Program

ESSV 2017 – 28th Conference on Electronic Speech Signal Processing

Organized by Jürgen Trouvain, Ingmar Steiner, and Bernd Möbius

The goal of ESSV is to bring together everyone interested in the field of speech technology in research and applications. The Saarland University Campus is particularly well-suited for this, as it is home to a large number of speech technology related institutions, including the Department of Computational Linguistics & Phonetics, the German Research Center for Artificial Intelligence (DFKI), the Cluster of Excellence
“Multimodal Computing and Interaction”, and the Collaborative Research Center “Information Density and Linguistic Encoding” (SFB 1102).

39th Annual Meeting of the German Society of Linguistics (DGfS)

Organized by Ingo Reich & Augustin Speyer.

Note that there is, amongst others, a CL poster session organized by Vera Demberg and a Session (AG1) organized by Elke Teich, Vera Demberg and Bernd Möbius.

Fore more information, please click here.

The 2nd SFB 1102 PhD Day

The Second PhD Day for SFB 1102 has six talks lasting half an hour each. Doktoranden are presenting their work before the whole SFB to receive feedback from colleagues they don’t usually interact with, providing an opportunity to strengthen their work. The talks focus on the students’ work as it relates to their thesis, rather than framing the work solely in terms of a particular SFB project.We look forward to your stimulating questions and thoughtful comments!

Location: Graduate Center (Campus, C9.3)
Date: 18 November 2016

PROGRAM

09:45 – 10:15 Coffee + Breakfast available

10:15 – 10:30 Welcome

10:30 – 11:15 David M. Howcroft (A4)
Adaptive Generation: developing natural language generation systems with variation

11:15 – 12:00 Ekaterina Kravtchenko (A3)
Pragmatic interpretation of informationally redundant event mentions

12:00 – 13:00 Lunch

13:00 – 13:45 Mirjana Sekicki (A5)
Cognitive Load in the Visual World: the effect of gaze

13:45 – 14:30 Mittul Singh (B4)
Inducing Rare-Word Embeddings for NLP tasks

14:30 – 14:45 Coffee Break

14:45 – 15:30 Iliana Simova (B5)
Extracting relations across sentence boundaries

15:30 – 16.15 Simon Ostermann (A3)
Text-Script alignment based on textual similarity measures and ordering information

16:15 – 16:30 Concluding remarks

Followed by an evening out

Location: Iguana (Mainzer Str. 2, 66111 Saarbrücken)
Time: 19:00 – whenever
Feel free to bring your spouse, children and best friend with you!

Workshop on „Fragments“

This twoday workshop, hosted by project B3 of the SFB 1102 “Information Density and Linguistic Encoding” and taking place on 13-14 October 2016 at Saarland University, aims at bringing together people working on the syntax, semantics, pragmatics and psycholinguistics of fragments / non-sentential utterances.

For more information, please click here.

Workshop „Historical Corpus Linguistics: Methods and Applications“

The creation and use of digital corpora have revolutionized the study of language, signaling a shift towards a more empirical mode of investigation. The aim of the workshop is to bring together researchers in historical linguistics and corpus-based methods to share ideas on the many aspects of language variation and change and to propose new methods and corpora to address potential research gaps in the field. The workshop organizers are based at the Department of Language Science and Technology at Saarland University – home to Project B1 in SFB 1102 – where the role of linguistic densification in the evolution of English scientific writing is investigated from the 17th century to the present.

For more information, please click here.

Mixed-Models Workshop

„Advanced topics in using mixed-effects models“

This workshop is an event of the SFB 1102 „Information Density and Linguistic Encoding“ and focuses on advanced topics and problems in mixed-effect modelling. We are happy to have two excellent tutors that will share their experience and expertise with the audience in two full workshop days. Specifically, Christoph Scheepers, from University of Glasgow, will talk about confirmatory analyses of experimental data, while Stefan Gries, from the University of California, Santa Barbara, will talk about aspects of multilevel modeling in corpora analyses and visualization.

For more information, please click here.

Workshop „Perspectives on low-resource language varieties“

While most linguistic research deals with data from widespread and well-documented European languages, experts from an increasing variety of disciplines are now working with data from low-resource languages and language varieties across the globe. Many typologists hope that these data will allow for crucial insights into language variation and evolution (e.g. Trudgill 2011). In our workshop, we will bring together researchers with different backgrounds and perspectives on the specific challenges and the unique promises these data hold, to identify common ground and explore the most pressing problems and possible solutions.

Please click here for the program and the abstracts.