Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation

Wanxiang Che, Yijia Liu, Yuxuan Wang, Bo Zheng, Ting Liu

Abstract

This paper describes our system (HIT-SCIR) submitted to the CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. We base our submission on Stanford’s winning system for the CoNLL 2017 shared task and make two effective extensions: 1) incorporating deep contextualized word embeddings into both the part of speech tagger and parser; 2) ensembling parsers trained with different initialization. We also explore different ways of concatenating treebanks for further improvements. Experimental results on the development data show the effectiveness of our methods. In the final evaluation, our system was ranked first according to LAS (75.84%) and outperformed the other systems by a large margin.

Anthology ID:: K18-2005
Volume:: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Month:: October
Year:: 2018
Address:: Brussels, Belgium
Editors:: Daniel Zeman, Jan Hajič
Venue:: CoNLL
SIG:: SIGNLL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 55–64
Language:
URL:: https://aclanthology.org/K18-2005
DOI:: 10.18653/v1/K18-2005
Bibkey:
Cite (ACL):: Wanxiang Che, Yijia Liu, Yuxuan Wang, Bo Zheng, and Ting Liu. 2018. Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 55–64, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation (Che et al., CoNLL 2018)
Copy Citation:
PDF:: https://aclanthology.org/K18-2005.pdf
Code: HIT-SCIR/ELMoForManyLangs
Data: Universal Dependencies

PDF Cite Search Code