Selected Publications

This paper presents a novel model that learns and exploits embeddings of phone ngrams for word segmentation in child language acquisition. Through extensive visualization, we show that the learned embeddings are informative for both word segmentation and phonology in general.
ACL 2016 CogACLL Workshop

This paper proposes an embedding matching approach to Chinese word segmentation, which generalizes the traditional sequence labeling framework and takes advantage of distributed representations. Based on the proposed model, a greedy segmenter is developed and the evaluation shows that our segmenter achieves improved results over previous neural network-based word segmenters.
ACL 2015

Recent Publications

More Publications


A3: Corpus-based Disambiguation of Semantic Relations

The A3 project deals with the semantic analysis of noun-noun compounds (e.g. ‘Landhaus’/‘country house’) and the automatic prediction of the semantic property and the prepositional paraphrase for an unseen compound.

CLARA: Common Language Resources and their Applications — a Marie Curie ITN

CLARA trains a new generation of researchers who will be able to cooperate on the establishment of a language resources and its exploitation for the construction of the next generation of language models.