How good can MT get? – Laura Freeburn Translations LLC

Over at Slate Star Codex, Scott Alexander has a good post about the future of AI, but I need to nitpick these speculations about what a “future superintelligent Google Translate” could do:

For example, take Google Translate. A future superintelligent Google Translate would be able to translate texts faster and better than any human translator, capturing subtleties of language beyond what even a native speaker could pick up. It might be able to understand hundreds of languages, handle complicated multilingual puns with ease, do all sorts of amazing things.

This description raises interesting questions about what the best possible machine translation (MT) would be like. Let’s go through it point by point:

Is MT faster than any human translator? Yes.

Can it “understand hundreds of languages”? No. The problem with MT is that it doesn’t actually “understand” any languages in the sense humans do. It matches patterns. For more on this topic, see Scott Spires’ post “Machine translation and savant syndrome” or my posts “Senta spinnt” and “Easy for humans, hard for computers.” (As an update to that post, I should say MT programs are getting better at dealing with common typos. But I think my basic point still stands.) What would have to happen for MT to truly understand a language? It would have to be an entity as complex as Star Trek’s Data – something that moves through the world, interacts with people, has experiences (including experiences of real-world communication and miscommunication) and personal memories – and even he has trouble sometimes.

Could the best possible future MT “capture subtleties of language beyond what even a native speaker could pick up?”

Seems unlikely.

Now, there are aspects of language that machines measure more accurately than humans. For example, they can measure the resonance of the phonemes you produce in cycles per second. An AI can store tons of vocab, which means it can make very precise matches very quickly. An AI with a huge data set of spoken language could analyze very subtle aspects of speech most humans would miss. It might be able to conclude from your speech that you’ll be diagnosed with a neurological disorder next year, or guess your age with almost perfect accuracy. There’s probably already an AI that does this kind of thing.

But what would it take for an artificially intelligent translator of written texts to pick up more from a text than a human could, or capture more of its subtleties? What would this look like, and what inputs would be needed? What would be in your training data set and what instructions would you give the AI?

To start with, you could feed it tons and tons of books. For German-to-English MT, you’d take every published book that exists in both English and German, and present them to your AI in pairs. For example, DeepL would not have made the error in “Senta spinnt” if its training data had included the original German libretto of The Flying Dutchman and a good English translation of same. An AI trained on all available pairs of translated literary classics would outperform human translators at identifying literary quotations, and if you gave it Schlegel’s German version of Hamlet , it would recognize that for what it is and give you Shakespeare’s Hamlet rather than this:

To be or not to be; that is the question:

Obs nobler in the mind, the arrow and spin

Endure the angry fate or,

Wielding against a sea of plagues,

By resistance they end? Dying – sleeping –

Nothing else! And to know that a sleep

The heartache and the thousand blows ends,

Our meat’s heritage, it’s a goal

To wish for the most intimate. Dying – sleeping –

Sleep! Maybe dream too! Yes, there is.

So MT could get better at recognizing existing translations. But of course, a large set of training data also helps MT to create good new translations of its own. DeepL has access to masses of web content as training data, which is why it’s so good at translating boilerplate:

For Germany’s CDU, the following applies: we must resolutely combat climate change and implement the Paris Agreement consistently. Strong climate protection legislation is the foundation on which we can credibly achieve our goals. We take this seriously and clarify how, for example, a “CO2 cap” with a binding climate protection path in the form of a national certificate trade could be implemented in the near future, particularly in the areas of transport and buildings. [press release translated by DeepL.]

That’s decent, but it’s not better than what a human would do, and could it ever be? If DeepL analyzed all the press releases ever written, in what way would its output be consistently better than the best human translators? I think it would be a slightly improved version of what it is now: much faster, and almost as good.

What specific aspects of MT could get substantially better than they are now? One area with strong potential for improvement is matching styles to time periods. I can imagine a future MT where you could select a time period for the text you’re putting in, so that, say, the MT wouldn’t translate “Mama und Papa” from a nineteenth-century text as “Mom and Dad”. I can also imagine one that would convert German footnotes into MLA or APA style in English. Both of those are useful but they’re also things humans already do, so again, the MT wouldn’t be outperforming humans.

As far as I can tell, there’s a ceiling for MT improvement set by the MT’s total lack of knowledge and experience. It doesn’t know anything and it’s never done anything, been anywhere, or met anyone. It never will. It doesn’t have a theory of mind enabling it to guess whether the average reader will find a given sentence easy or hard to understand and to adjust its phrasing accordingly. It doesn’t know that certain turns of phrase might annoy certain kinds of people. It doesn’t know whether its translation of an ad grabs people’s attention or not. It doesn’t know who is feeding it a text or what they plan to do with the translation. (It could have some information about those issues – e.g., a metric that says “NSFW” is an attention-grabbing term – but it wouldn’t have the understanding required for human-style judgement calls.) What it can do is analyze lots of data about which words and phrases in different languages correspond to each other under what textual circumstances and…isn’t that it? Apart from speed and stored vocab, in what specific way could it actually exceed human translating ability? That’s an honest question, so if you have an answer, please comment.

On to the last point: Could this future MT handle complicated multilingual puns with ease? This assertion is really interesting because puns are among the hardest aspects of translation. And they offer a broad scope for action that ranges from essentially doing nothing to making really wild choices. I wish Mr. Alexander would come over here and explain how he imagines such a thing would be accomplished.

There’s a lot to say about puns and this post is already too long. So I’ll write a follow-up post about puns and jokes…I have plenty of material. It’ll be fun. Until then, please comment with your ideas about how good MT could really be.

5 comments

Michael Bailey says:

September 4, 2019 at 1:38 pm

One of the ways I see MT offering passable target language translations is if the ST is written in such a way that it is intended to be translated and if rules are used. Until that happens there will be some contextual howlers. Whenever MT providers are talked about, e.g. Google Translate and DeepL, it is worth bearing in mind that many government settings will not allow their use given that usage of such systems will not be allowed for translating many texts in an official setting. I would quite reasonably not be permitted to use any solution that takes the text outside of our IT environment. I’ve also recently rejected machine translations sent to me for a spot of PEMT from other bodies, after commenting on the first 2-3 pages of say a 100 page document.

SWSpires says:

September 4, 2019 at 11:38 pm

“It doesn’t know that certain turns of phrase might annoy certain kinds of people”

This is particularly important when you’re doing any sort of business translation (esp. in today’s hyper-sensitive corporate environment), and obviously in diplomacy or anything similar thereto.

I was faced with this question yesterday. A text literally said that a well-known company had a high opinion of a country as a place to invest. I thought about saying it exactly that way, but instead of “high” I said “lofty.” That’s very literary, and in some contexts, could be perceived as pretentious.

I thought it was worth a go, especially since the lawyer who wrote the original text can always change it if gets too flowery.

Pingback: Puns and jokes | Laura Freeburn Translations LLC
Pingback: Interlingual puns | Laura Freeburn Translations LLC
Pingback: Assorted notes | Laura Freeburn Translations LLC

5 comments

Leave a comment Cancel reply