Identifying Source Language Expressions for Pre-editing in Machine Translation

Norizo Sakaguchi, Yugo Murawaki, Chenhui Chu, Sadao Kurohashi

Abstract

Machine translation-mediated communication can benefit from pre-editing source language texts to ensure accurate transmission of intended meaning in the target language. The primary challenge lies in identifying source language expressions that pose difficulties in translation. In this paper, we hypothesize that such expressions tend to be distinctive features of texts originally written in the source language (native language) rather than translations generated from the target language into the source language (machine translation). To identify such expressions, we train a neural classifier to distinguish native language from machine translation, and subsequently isolate the expressions that contribute to the model’s prediction of native language. Our manual evaluation revealed that our method successfully identified characteristic expressions of the native language, despite the noise and the inherent nuances of the task. We also present case studies where we edit the identified expressions to improve translation quality.

Anthology ID:: 2024.lrec-main.755
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 8605–8616
Language:
URL:: https://aclanthology.org/2024.lrec-main.755
DOI:
Bibkey:
Cite (ACL):: Norizo Sakaguchi, Yugo Murawaki, Chenhui Chu, and Sadao Kurohashi. 2024. Identifying Source Language Expressions for Pre-editing in Machine Translation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8605–8616, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Identifying Source Language Expressions for Pre-editing in Machine Translation (Sakaguchi et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.755.pdf

PDF Cite Search