The University of Jordan :: Research Groups :: Research Group Advances AI-Powered Translation...
News And Events

Research Group Advances AI-Powered Translation of Jordanian Arabic to Modern Standard Arabic

The Machine Learning for Natural Language Processing and Computer Vision research group has made significant progress in its project, "Developing Applications to Correct Jordanian Spoken Arabic to Proper Language Using Machine Learning Techniques," funded by the Jordanian Scientific Research and Innovation Support Fund with a grant of JOD 30,100.

As part of this project, the group has developed an advanced AI-based system for translating Jordanian Arabic dialect into Modern Standard Arabic (MSA) while correcting common linguistic errors. The system is built using a custom Jordanian Arabic dataset (JODA) of 59,135 sentences, along with synthetically error-injected Tashkeela data, to improve robustness against spelling and morphological variations. By fine-tuning the ByT5 pre-trained language model, the team has achieved notable improvements in BLEU scores and Character Error Rates (CER). The model reduces CER to 4.64% on the Test-200 test set and 1.65% on the TSMTS test set, demonstrating high accuracy.

To share the final results and practical applications of this research, the team will hold a workshop this summer, where they will present findings and showcase real-world applications, including a custom smartphone keyboard and a web portal that enable users to seamlessly translate Jordanian dialect text into proper MSA.

The figures shown above illustrate statistics on the collected dataset (JODA) and examples of the system’s translation performance.