An approach to user-centered translation quality assessment of machine translation output : the case of DeepL, Google Translate, and ChatGPT in Czech-to-Spanish translation outputs

Enrique Gutiérrez Rubio

doi:10.5817/ERB2024-4-4

An approach to user-centered translation quality assessment of machine translation output : the case of DeepL, Google Translate, and ChatGPT in Czech-to-Spanish translation outputs

Vol.45,No.4(2024)

Enrique Gutiérrez Rubio

https://doi.org/10.5817/ERB2024-4-4

Stable URL (handle):
http://hdl.handle.net/11222.digilib/digilib.81314

PDF (Spanish)

Abstract

The widespread use of free Neural Machine Translation (NMT) systems requires a greater effort on the part of the scientific community to evaluate their quality. This article presents the state of the art and the results of a pilot analysis aimed at revealing the level of satisfaction of potential users of these translations in terms of three variables: fluency, grammar, and usability. To this end, an experiment was carried out in which twenty native Spanish annotators evaluated, using a Likert rating scale, the translations generated by human professionals and by the applications DeepL, Google Translate, and ChatGPT of three Czech texts of different types (one technical, one marketing and one literary). The results show that although human translations are the best rated, there is a high degree of user satisfaction with the translations generated by NMT systems specifically designed for this purpose (DeepL and Google Translate), especially in terms of fluency and usability.

Keywords:
Neural Machine Translation; translation quality assessment; Czech-Spanish translation; DeepL; Google Translate; ChatGPT

Pages:
65–86

References

Castilho, S.; Doherty, S.; Gaspari, F.; & Moorkens, J. (2018). Approaches to Human and Machine Translation Quality Assessment. In J. Moorkens, Sh. Castilho, F. Gaspari, S. Doherty, (Eds.). Translation Quality Assessment (pp. 9–38). Cham: Springer. | DOI 10.1007/978-3-319-91241-7_2

Castilho S.; & O'Brien, S. (2016). Evaluating the impact of light post-editing on usability. In N. Calzolari et al. (Eds.). Proceedings of the tenth international conference on language resources and evaluation. Portorož, 23–28 May. (pp. 310–316).

Černý, J. (2014). El español hablado en América. Olomouc: Univerzita Palackého v Olomouci.

Dalayli, F. (2023). Use of NLP Techniques in Translation by ChatGPT: Case Study. In Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC) (pp. 19–25). Varna (Bulgaria): INCOMA Ltd., Shoumen, Bulgaria.

Dousková, I. (2023). El baile del oso. Barcelona: La Fuga Ediciones.

Fomicheva, M.; Sun, S.; Yankovskaya, L.; Blain, F.; Guzmán, F.; Fishel, M.; Aletras, N.; Chaudhary, V.; & Specia, L. (2020). Unsupervised Quality Estimation for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8, 539–555. https://doi.org/10.1162/tacl_a_00330 | DOI 10.1162/tacl_a_00330

Gaspari, F.; Almaghout, H.; & Doherty, S. (2015). A survey of machine translation competences: Insights for translation technology educators and practitioners. Studies in Translatology, 23, 3, 333–358. http://dx.doi.org/10.1080/0907676X.2014.979842 | DOI 10.1080/0907676x.2014.979842

Gao, Y.; Wang, R.; & Hou, F. (2023). How to Design Translation Prompts for ChatGPT: An Empirical Study. arXiv:2304.02182v2 [cs.CL]. https://doi.org/10.48550/arXiv.2304.02182

Gunathilaka, D. D. I. M. B.; & Ariyaratne, W. M. (2019). A Study on the Accuracy of Human Translation Output and Post-Edited Google Translate Output as far as English and Sinhalese Language Pair is considered: With Special Reference to Selected Literary and Non-literary Documents. International Journal of Research and Innovation in Social Science (IJRISS), Volume III, Issue VII, 503–510.

Hassan, H. et al. (2018). Achieving Human Parity on Automatic Chinese to English News Translation. arXiv:1803.05567 [cs.CL]. https://doi.org/10.48550/arxiv.1803.05567

Hendy, A.; Abdelrehim, M.; Sharaf, A.; Raunak, V.; Gabr, M.; Matsushita, H.; Kim, Y. J.; Afify, M.; & Awadalla, H. H. (2023). How good are gpt models at machine translation? A comprehensive evaluation. arXiv:2302.09210v1. https://doi.org/10.48550/arXiv.2302.09210

House, J. (2001). How do we know when a translation is good? In E. Steiner, & C. Yallop (Eds.). Exploring Translation and Multilingual Text Production: Beyond Content (pp. 127–160). Berlin: De Gruyter. | DOI 10.1515/9783110866193.127

International Organization for Standardisation (2002). ISO/TR 16982:2002 ergonomics of human-system interaction—usability methods supporting human centred design. International Organization for Standardisation, Geneva. https://www.iso.org/obp/ui/#iso:std:iso:ts:20282:-2:ed-2:v1:en [29/2/2024]

Jiao, W.; Wang, W.; Huang, J.; & Wang, X. (2023). Is ChatGPT a good translator? Yes With GPT-4 As The Engine. arXiv:2301.08745v4. https://doi.org/10.48550/arXiv.2301.08745

Klerke, S.; Castilho, S.; Barret, M.; & Søgaard, A. (2015). Reading metrics for estimating task efficiency with SMT output. In Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning (pp. 6–13). Lisbon: Association for Computational Linguistics. | DOI 10.18653/v1/w15-2402

Lakew, S. M.; Federico, M.; Negri, M.; & Turchi, M. (2018). Multilingual Neural Machine Translation for Low-Resource Languages, IJCoL [Online] (pp. 11–25). https://doi.org/10.4000/ijcol.531 | DOI 10.4000/ijcol.531

Lee, T. (2023). Artificial intelligence and posthumanist translation: ChatGPT versus the translator. Applied Linguistics Review (Ahead of Print). https://doi.org/10.1515/applirev-2023–0122 | DOI 10.1515/applirev-2023-0122

López González, A. M. (2019). Español neutro – español latino: Hacia una norma hispanoamericana en los medios de comunicación. Roczniki Humanistyczne, 67, 5, 7–27. https://doi.org/10.18290/rh.2019.67.5–1 | DOI 10.18290/rh.2019.67.5-1

Manakhimova, S. et al. (2023). Linguistically Motivated Evaluation of the 2023 State-of-the-art Machine Translation. In Proceedings of the Eighth Conference on Machine Translation (WMT), 224–245. https://doi.org/10.18653/v1/2023.wmt-1.23 | DOI 10.18653/v1/2023.wmt-1.23

Martínez Melis, N.; & Hurtado Albir, A. (2001). Assessment In Translation Studies: Research Needs. Meta, 46, 2, 272–287. https://doi.org/10.7202/003624ar | DOI 10.7202/003624ar

Ranathunga, S.; Lee, E. A.; Skenduli, M. P.; Shekhar, R.; Alam, M.; & Kaur, R. (2023). Neural Machine Translation for Low-resource Languages: A Survey. ACM Computing Surveys, 55, 11, Article 229. https://doi.org/10.1145/3567592 | DOI 10.1145/3567592

Sahari, Y.; Al-Kadi, A. M. T.; & Ali, J. K. M. (2023). Cross Sectional Study of ChatGPT in Translation: Magnitude of Use, Attitudes, and Uncertainties. Journal of Psycholinguistic Research 52, 2937–2954. https://doi.org/10.1007/s10936–023–10031-y | DOI 10.1007/s10936-023-10031-y

Specia, L.; & Shah, K. (2018). Machine Translation Quality Estimation: Applications and Future Perspectives. In J. Moorkens, Sh. Castilho, F. Gaspari, & S. Doherty, (Eds.). Translation Quality Assessment (pp. 201–235). Cham: Springer. | DOI 10.1007/978-3-319-91241-7_10

Suojanen, T.; Koskinen, K.; & Tuominen, T. (2014). User-Centered Translation. London: Routledge.

Suokas, J. (2019). User-centered Translation and Action Research Inquiry. Bringing UCT into the Field. Kääntämisen ja tulkkauksen tutkimuksen symposiumin verkkojulkaisu / Electronic Journal of the KäTu Symposium on Translation and Interpreting Studies, Vol. 12, 29–43. | DOI 10.61200/mikael.129364

Taira, B. R.; Kreger, V.; Orue, A.; & Diamond, L. C. (2021). A Pragmatic Assessment of Google Translate for Emergency Department Instructions. Journal of General Internal Medicine, Volume 36, 3361–3365. | DOI 10.1007/s11606-021-06666-z

Toral, A.; & Way, A. (2018). What Level of Quality Can Neural Machine Translation Attain on Literary Text? In J. Moorkens, Sh. Castilho, F. Gaspari, & S. Doherty, (Eds.). Translation Quality Assessment (pp. 263–287). Cham: Springer. | DOI 10.1007/978-3-319-91241-7_12

Ul Haq, S.; Rauf, S. A.; Shoukat, A.; & Saeed, A. (2020). Document Level NMT of Low-Resource Languages with Backtranslation. Proceedings of the 5th Conference on Machine Translation (WMT), online (pp. 442–446).

Wang, L. et al. (2023). Findings of the WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in the Cosmos of LLMs. In Proceedings of the Eighth Conference on Machine Translation (WMT) (pp. 55–67). https://aclanthology.org/2023.wmt-1.3.pdf | DOI 10.18653/v1/2023.wmt-1.3

Way, A. (2018). Quality Expectations of Machine Translation. In J. Moorkens, Sh. Castilho, F. Gaspari, S. Doherty, (Eds.). Translation Quality Assessment (pp. 159–178). Cham: Springer.

Zaretskaya, A.; Corpas Pastor, G.; & Seghiri, M. (2015). Translators' Requirements for Translation Technologies: a User Survey. In New Horizons in Translation and Interpreting Studies (pp. 247–254). Geneva: Tradulex.

Metrics