Issue #14 - Neural MT: A Case Study
Introduction
As a machine translation provider, one of the questions we’ve been asking ourselves most frequently over the last 18 months has been, “When should we switch an existing production deployment to Neural MT?”.
While all new projects are built using Neural MT, there is a certain element of – “if it ain’t broke, don’t fix it” – that can creep in when it comes to replacing something that’s working well. Despite the promise of improved output and the relative return on investment that will yield, there’s still some level of risk and uncertainty when changing an existing, stable workflow.
That being said, in early 2017, when the opportunity arose for us to test Neural MT on a project that had previously been unsuccessful with Statistical MT, it was the perfect opportunity to put the new technology into practice.
Patent translation for CJK
In a large production workflow, we process roughly 22 million words of patent translation per month from Simplified Chinese and Japanese into English. The output is raw machine translation which is published directly to end-users for information and search.
Strict quality requirements must be met before a language goes into production and, once in production, output is regularly sampled and reviewed.
Meeting quality requirements
Patent documents are reviewed and scored from 1-5 for scientific accuracy and comprehensibility, i.e. can the reader understand the technical content of the patent document. Translations are rated as follows:
- Unusable: incomprehensible, cannot be published.
- Poor: cannot understand the key topic of the patent.
- Adequate: translation is sufficient for understanding and publishing.
- Good: high-quality translation, all key themes accurate.
- Excellent: human-quality translation.
In order for a language to be accepted for production, and to remain in production, at least 90% of all translations must be ranked Adequate or better. While these criteria were met for Simplified Chinese and Japanese with Statistical MT, they were never met for Korean. Not even close.
With the advent of Neural MT, this seemed like the perfect opportunity to try again and see if the technology could stand up to the demands of Korean to English, a notoriously challenging language pair.
Deploying Neural MT
Using the same training data as the SMT engines, we built a Korean to English Neural MT engine using an early version of Recurrent Neural Network technology (think Q1 2017). The results of this, and how they compared to the SMT engines can be seen in the graph below in Phases 1-4.
There was a marked improvement, as we might have expected, but we were still some way off the quality threshold. Over time, we tried more approaches as our technology matured and some R&D bore fruit. Techniques we have described in previous entries in this series, such as data cleaning and preparation, byte-pair encoding which works well for character based languages, and terminology handling, all made a positive impact to the extent that, by evaluation Phase 6, we reached the quality threshold and were able to deploy the engine in production.
This had the net effect of allowing our client to divert a further 1.5 million words per month from a slow, costly, human-lead workflow to full automation.
In summary
This case study demonstrates two things. The first, and more obvious, is the out-of-the-box power of Neural MT to simply learn how to produce better translations.
The second is that it’s opening doors to new languages and use cases that previously were not viable for machine translation. There is still new development and testing required for each scenario, but without doubt, it is leading to new opportunities for enterprise localisation workflow disruption.