A Preliminary Study on Mandarin-Hakka neural machine translation using small-sized data

Abstract

In this study, we implemented a machine translation system using the Convolutional Neural Network with Attention mechanism for translating Mandarin to Sixan-accent Hakka. Specifically, to cope with the different idioms or terms used between Northern and Southern Sixan-accent, we analyzed the corpus differences and lexicon definition, and then separated the various word usages for training exclusive models for each accent. Besides, since the collected Hakka corpora are relatively limited, the unseen words frequently occurred during real-world translation. In our system, we selected suitable thresholds for each model based on the model verification to reject non-suitable translated words. Then, by applying the proposed algorithm, which adopted the forced Hakka idioms/terms segmentation and the common Mandarin word substitution, the resultant translation sentences become more intelligible. Therefore, the proposed system achieved promising results using small-sized data. This system could be used for Hakka language teaching and also the front-end of Mandarin and Hakka code-switching speech synthesis systems.

Anthology ID:: 2022.rocling-1.38
Volume:: Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Month:: November
Year:: 2022
Address:: Taipei, Taiwan
Editors:: Yung-Chun Chang, Yi-Chin Huang
Venue:: ROCLING
SIG:
Publisher:: The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:: 307–315
Language:: Chinese
URL:: https://aclanthology.org/2022.rocling-1.38
DOI:
Bibkey:
Cite (ACL):: Yi-Hsiang Hung and Yi-Chin Huang. 2022. A Preliminary Study on Mandarin-Hakka neural machine translation using small-sized data. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 307–315, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):: A Preliminary Study on Mandarin-Hakka neural machine translation using small-sized data (Hung & Huang, ROCLING 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.rocling-1.38.pdf

PDF Cite Search