Issue #29 - Improving Robustness in Neural MT

Dr. Rohit Gupta 28 Mar 2019
The topic of this blog post is robustness.

Introduction

Despite the high level of performance in current Neural MT engines, there remains a significant issue with robustness when it comes to unexpected, noisy input. When the input is not clean, the quality of the output drops drastically. In this issue, we will take a look at the impact of various types of 'noise' on the quality and we will discuss techniques proposed by Vaibhav et al. (2019) to improve the robustness of an NMT system.

Noises and their impact on Translation Quality

There can be several different types of noise, or errors, in any piece of text. It really depends on how it was written. These include, but are not limited to:
  • spelling or typographical errors (receive vs recieve)
  • word omission
  • word insertion
  • repetitions
  • grammatical errors (a ton of vs a tons of)
  • spoken language (want to vs wanna)
  • slang (to be honest vs tbh)
  • proper nouns
  • dialects
  • jargon
  • emojis
  • obfuscated profanities (f*ing)
  • OCR related errors ([4] vs 14], study vs st ud y)
  • inconsistent capitalisation (change vs chaNGE)
Vaibhav et al. (2019) observed the impact of four types of errors and their impact on an NMT system. The figure below shows the drop in BLEU scores and the percentage of noise introduced. [caption id="attachment_14852" align="aligncenter" width="567"]The impact of varying the amount of Synthetic Noise Induction on BLEU The impact of varying the amount of Synthetic Noise Induction on BLEU[/caption]
As we can see, obfuscated profanity is the biggest cause of the quality drop, followed by emoticon, spelling and grammar. 

How to make NMT robust?

To improve the robustness of a system, we can fine tune our system on a small set of noisy data, where available. Apart from that, we can also generate synthetic noisy data and train our system on that data. Vaibhav et al. (2019) proposed two noise induction techniques.

Synthetic Noise Induction

For every token they introduce four different types of noises with some probability on both French and English sides of the corpus. The probabilities of error types were as follows: spelling (0.04), profanity(0.007), grammar (0.015) and emoticons (0.002). To simulate spelling error, they randomly add or drop a character in a given word. For grammar error and profanity, they randomly select and insert a stop word or an expletive and its translation on either side. And for emoticons, they randomly select an emoticon and insert it on both sides.

Noise Generation through Back-Translation:

To induct back translation noise, they train MT systems in both directions (en-fr and fr-en) on different domain data. They then passed the text through such systems to get the resulting noisy text. Also, they added a subset of noisy data obtained from Michel & Neubig (2018) to train intermediate systems (adding noisy data made the text much closer to the style of noisy text, a similar but not same text is used for testing). The figure below describes the noisy text generation with back-translation. The figure shows English text as an input (French text can also be processed in a similar way). [caption id="attachment_14853" align="aligncenter" width="1071"]Noisy text generation with back-translation; English text as an input Noisy text generation with back-translation; English text as an input[/caption]

Does it work?

The testing and tuning was performed on the data from Michel & Neubig (2018). Fine tuning the NMT system on 20K noisy sentences improves the BLEU score from 14.42 to 23.74. In addition, the synthetic noise induction as mentioned above improves results to 25.05, and the noise generation through back translation improves results to 25.75 BLEU.

In summary

We can see that tuning and training on similar noisy data helps significantly. And a further improvement can be obtained by synthetically generating noisy data.  As you can imagine, these types of noise are more likely to be prevalent in user generated content such as emails, online reviews, and social media. Therefore, the techniques described above should have good impact and value for use cases that require translation of such content.
Tags
Robustness
Dr. Rohit Gupta
Author

Dr. Rohit Gupta

Sr. Machine Translation Scientist
All from Dr. Rohit Gupta