The Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory (NLP2CT) under the University of Macau (UM) Faculty of Science and Technology (FST) participated in the Metrics Shared Task at the Sixth Conference on Machine Translation with the translation team from Alibaba Dharma Academy. They won first prizes in five of the eight automatic translation quality tests, as well as two second prizes and one fifth prize, outperforming competitors such as Google and Unbabel.

With the rapid development of machine translation in recent years, automatic evaluation of translation quality has become an indispensable part of machine translation. Due to a large demand for language translations, it is important to be more efficient in discovering errors, adjust the parameters in the translation system, evaluate the performance of the translation system, and compare differences between translation systems. These demands make automatic evaluation of translation quality a hot research topic.

RoBLEURT, the automatic translation quality evaluation model used in this competition, was jointly developed by UM and Dharma Academy. In addition to multi-stage pre-training for enhancing model learning capabilities, the model also uses robustness enhancement training strategies based on pseudo-data construction and innovative multi-model and multi-fold crossover combination technologies such as verification and reordering. RoBLEURT participated in the evaluation of eight translation tests. Eventually, it won the first prizes in the evaluation of one oral translation (Chinese-English) and four press release translations (Chinese-English, Czech-English, German-English and Japanese-English), outperforming rival models from Google and Unbabel. In addition, RoBLEURT won second prizes in the evaluation of a Hausa-to-English translation and an Icelandic-to-English with no training data. The competition showcased the university’s cutting-edge research in automatic translation quality evaluation, provided a platform for the exchange of experience and knowledge with hi-tech companies, and created an opportunity for students to learn more about cutting-edge research.

The NLP2CT mainly conducts research on machine learning and natural language, including language processing, such as deep learning, machine translation, dialogue systems, and natural language reasoning. It has established extensive cooperation with many well-known scientific research companies. So far, it has published more than 100 research articles at many leading international conferences, such as the Association for Computational Linguistics (ACL), the International Conference on Learning Representations (ICLR), the AAAI Conference on Artificial Intelligence (AAAI), the International Joint Conference on Artificial Intelligence (IJCAI), the Conference on Empirical Methods in Natural Language Processing (EMNLP), and the IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP). The model was sponsored by the Science and Technology Development Fund (file number: 0101/2019/A2) and UM (file number: MYRG2020-00054-FST).

澳門大學科技學院自然語言處理與中葡機器翻譯實驗室首次聯同阿里巴巴達摩院翻譯團隊參加2021年第六屆世界頂級機器翻譯大賽(WMT 2021 Metrics Shared Task),在8項翻譯質量自動評價測試中奪得5項第一名,超越Google、Unbabel等對手,同時還奪得兩項第二名和一項第五名。

近年隨著機器翻譯的快速發展,翻譯質量自動評價已成為機器翻譯領域不可或缺的部分。面對海量的譯文,需要快速發現文本中的錯誤、調節翻譯系統中的參數、評價系統性能、進行不同翻譯系統的比較等,這些需求使翻譯質量自動評價成為炙手可熱的研究方向。

是次比賽運用的翻譯質量自動評價模型“RoBLEURT”由兩個團隊合力研發,利用了多階段預訓練強化模型學習能力、基於偽數據構造的魯棒性增強訓練策略、多模型多折交叉聯合驗證和重排序等創新技術。“RoBLEURT”共參與了8個翻譯方向的測評,在1個口語領域(中文—英文)及4個新聞領域(中文—英文、捷克語—英文、德文—英文、日文—英文)的評價項目中奪冠,成功超越Google、Unbabel等對手的參賽模型。同時,在沒有相應資源的基礎下,在兩個新聞領域的低資源評價方向(豪薩語—英文及冰島語—英文)上均獲第二名。是次比賽展現了澳大在翻譯質量自動評價等前沿研究上的水平,也進一步加強了與科創企業在研究領域的經驗知識交流,為學生開創了更多接觸前沿研究的條件。

澳大科技學院自然語言處理與中葡機器翻譯實驗室主要對機器學習及自然語言進行研究,包括語言處理,如深度學習、機器翻譯、對話系統、自然語言推理等。實驗室與內地眾多知名科研企業有著廣泛的合作,目前已在多個國際頂級會議中,如國際計算語言學學會(ACL)、國際學習表徵會議會議(ICLR)、美國人工智能年會(AAAI)、人工智能國際聯合大會(IJCAI)、自然語言處理經驗方法會議(EMNLP)、IEEE / ACM音頻,語音和語言處理期刊(TASLP)等發表百餘篇文章,成果豐碩,在此領域處於領先水平。翻譯質量自動評價模型“RoBLEURT”由澳門特別行政區科學技術發展基金(檔案編號:0101/2019/A2)和澳門大學資助(檔案編號:MYRG2020-00054-FST)資助。

Source (UM): English version | Chinese version

Source (FST): English and Chinese