Enhancing Neural Machine Translation with Direct Preference Optimization Using Human Feedback

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

This paper presents a study on improving the quality of neural machine translation (NMT) for the English-Romanian language pair using Reinforcement Learning from Human Feedback (RLHF) via Direct Preference Optimization (DPO). Despite advancements in NMT, challenges remain, particularly for low-resource languages and personalized translations. By incorporating human feedback, the proposed approach demonstrates improvements in translation accuracy and naturalness. Although traditional metrics, such as BLEU and chrF++, yielded slightly lower scores for the DPO-trained model, human assessments indicate that the DPO-trained model better aligns with human preferences, particularly in everyday conversational contexts.
Original languageEnglish
Title of host publicationInformation and Communication Technology
Subtitle of host publication13th International Symposium, SOICT 2024, Danang, Vietnam, December 13–15, 2024, Proceedings, Part IV
PublisherSpringer Singapore
Pages394-405
Number of pages12
Edition1
ISBN (Electronic)978-981-96-4291-5
ISBN (Print)978-981-96-4290-8
DOIs
Publication statusPublished - 26 Apr 2025

Publication series

NameCommunications in Computer and Information Science
Volume2353

Fields of science

  • 102013 Human-computer interaction
  • 102002 Augmented reality
  • 102006 Computer supported cooperative work (CSCW)
  • 102027 Web engineering
  • 202038 Telecommunications
  • 102021 Pervasive computing
  • 102015 Information systems
  • 102025 Distributed systems
  • 102 Computer Sciences

JKU Focus areas

  • Digital Transformation
  • Sustainable Development: Responsible Technologies and Management

Cite this