Klakow, Dietrich; Trost, Thomas

Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings

Proceedings of TextGraphs-11: Graph-based Methods for Natural Language Processing (Workshop at ACL 2017), Association for Computational Linguistics, pp. 30-38, Vancouver, Canada, 2017.

Word embeddings are high-dimensional vector representations of words and are thus difficult to interpret. In order to deal with this, we introduce an unsupervised parameter free method for creating a hierarchical graphical clustering of the full ensemble of word vectors and show that this structure is a geometrically meaningful representation of the original relations between the words. This newly obtained representation can be used for better understanding and thus improving the embedding algorithm and exhibits semantic meaning, so it can also be utilized in a variety of language processing tasks like categorization or measuring similarity.