Language variation and change in social media: a computational perspective
Dong Nguyen
Utrecht University
Social media presents exciting opportunities to study language in a variety of social situations and on a very large scale. At the same time, language in social media also presents challenges to the development of NLP tools. In this talk, I will discuss results from two recent studies. In the first study, we use word embeddings (representing words as dense continuous vectors) to detect semantic change in a large Twitter corpus. In the second study, we look at what happens to spelling variants using popular word embedding methods, and I’ll ask how they should be represented in the embedding space. Finally, I will discuss the emerging interdisciplinary area of computational sociolinguistics and reflect on its challenges and opportunities.