Accurate Linear-Time Chinese Word Segmentation via Embedding Matching


This paper proposes an embedding matching approach to Chinese word segmentation, which generalizes the traditional sequence labeling framework and takes advantage of distributed representations. The training and prediction algorithms have linear-time complexity. Based on the proposed model, a greedy segmenter is developed and evaluated on benchmark corpora. Experiments show that our greedy segmenter achieves improved results over previous neural network-based word segmenters, and its performance is competitive with state-of-the-art methods, despite its simple feature set and the absence of external resources for training.

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL, Volume 1: Long Papers)