Disertasi
A multilingual translation model using named entity recognition and indonesian as a pivot for local languages / Danang Arbian Sulistyo
Abstrak
Translation of Low-Resource language faces various challenges including limited parallel datasets loss of semantic meaning and error propagation in machine translation models. Conventional NMT cannot handle these limitationsoptimally. The Pivot-Based Neural Machine Translation (Pivot-NMT) approach has proven to be an effective solution in overcoming the limitations of the dataset. The method utilizes an intermediate language (Indonesian) to facilitate the translation process between regional languages. However the propagation of errors in the two stages of translation and its inability to accurately maintain named entities still need adjustment. This dissertation explores the integration of Named Entity Recognition with Pivot-NMT (NER-Pivot-NMT) to address the limitations of previous translation models particularly in preserving named entites and enhancing processing efficiency. The novel NER-Pivot-NMT combines Named Entity Recognition (NER) for the identification and preservation of named entities with Pivot-NMT to address thelimitations of parallel datasets in low-resource languages. Named Entity Recognition (NER) is crucial in the translation process since it autonomously identifies and categorizes named entities including proper nouns localities and organizations from the original language using supervised learning on annotated datasets. The efficacy of named entity recognition directly influences the overall quality of translation since these entities are preserved as placeholders throughout the translation process. Upon identification entities are safeguarded from translation alterations so averting semantic distortions in the outcome. Integrating NER with Pivot-NMT guarantees the preservation of named entities throughout the multi-step translation process. This not only preserves the coherence of the translated text but also reduces mistake propagation which is particularly crucial in low-resource environments where parallel corpora are few and the potential for semantic loss is significant. The efficacy of NER substantially enhances the system s resilience in delivering high-quality translation. Experimental findings indicate that NER-Pivot-NMT outperforms traditionalmodels such as NMT Pivot-NMT and NER-NMT in key evaluation metrics. It achieves a 33.7 BLEU Score which is 4.0 points higher than NER-NMT (29.7) and 8.3 points higher than Pivot-NMT (28.9) demonstrating superior translationaccuracy. In terms of Entity Preservation Rate (EPR) NER-Pivot-NMT excels with a 92.1% rate a significant improvement over NER-NMT (85.1%) and Pivot-NMT (78.4%). Furthermore the model shows enhancements in NER Precision (91.3%) NER Recall (89.7%) and NER F1-score (90.5%) outpacing NER-NMT in all three metrics. While NER-Pivot-NMT demonstrates better entity preservation and higher translation accuracy it does exhibit a slight increase in inference latency (2.6s/sentence) compared to NER-NMT (1.4s/sentence) Pivot-NMT (2.1s/sentence) and NMT (0.9s/sentence). Additionally the model has a higher number of parameters (200M) and longer training time (22h) compared to its counterparts but the trade-off results in a more reliable and scalable solution for translating regional languages like Madurese and Javanese. This research primarily contributes to a more precise and efficient solution for NMT in low-resource language contexts. It also lays the groundwork for further exploration in developing pivot-free multilingual models optimizing datasets and integrating NMT with other NLP technologies including speech-to-text. Thisapproach may support language preservation and advancement in the digital area.