Winograd’s dilemma

Via the Dish, I read this hype-ridden piece on the “astonishing” progress of Google Translate, and this far more sober piece by Lance Ulanoff. Ulanoff writes,

“Initially… the translation was perfect, but when I started to speak in longer sentences, it basically fell apart and got a lot of it wrong. As I tested with others who spoke in Greek, German and French, we noticed the same thing. We could never completely rely on Google translate to get the words right.”

This is, I think, a constant dynamic in tech circles: astonishment, followed by growing dissatisfaction and frustration. So what makes machine translation difficult?

I could go on about these issues forever. (Put a beer in me and I’ll talk your ear off about why both the Chomskyan approach and the Peter Norvig approach are at an impasse when it comes to actually decoding language as a system.) Let’s pick one particular issue for true machine translation: the dilemma put forth by Terry Winograd, professor of computer science at Stanford. (I first read about this in this fantastic piece on AI by Peter Kassan.) Winograd proposed two sentences:

The committee denied the group a parade permit because they advocated violence.

The committee denied the group a parade permit because they feared violence.

There’s one essential step to decoding these sentences that’s more important than any other step: deciding what the “they” refers to. (In linguistics, we call this coindexing.) There are two potential within-sentence nouns that the pronoun could refer to, “the committee” and “the group.” (Note that both are singular and “they” is plural, so one thing machine translation has to overcome is problems with formalist grammar!) These sentences are structurally identical, and the two verbs are grammatically as similar as they can be. The only difference between them is the semantic meaning. And semantics is a different field from syntax, right? After all, Chomsky teaches us that a sentence’s grammatically is independent from its meaning. That’s why “colorless green ideas sleep furiously” is nonsensical but grammatical, while “gave Bob apples I two” is ungrammatical and yet fairly easily understood.

But there’s a problem here: the coindexing is different depending on the verb. In the first sentence, a vast majority of people will say that “they” refers to “the group.” In the second sentence, a vast majority of people will say that “they” refers to “the committee.” Why? Because of what we know about committees and parades and permitting in the real world. Because of semantics. A syntactitian of the old school will simply say “the sentence is ambiguous.” But for the vast majority of native English speakers, the coindexing is not ambiguous. In fact, for most people it’s trivially obvious. And in order for a computer to truly understand language, it has to have an equal amount of certainty about the coindexing as your average human speaker. In order for that to happen, it has to have a theory of the world, and that theory of the world has to not only include understanding of committees and permits and parades, but apples and honor and schadenfreude and love and ambiguity and paradox….

Some might say that this is a particularly bad example to pick with Google Translate, because it is a probablistic engine; rather than trying to parse the syntax-semantics interface for these sentences, it would merely see how these sentences or parts of sentences have been translated in the past, assign a certain probability to a given set of translations being correct, and act accordingly. (In terms of pure translation, anyway, it would only have to faithfully provide an equivalent language-specific reading of the English text to speakers of other languages, but I’m afraid in some languages that would entail having to coindex the pronoun itself.) That’s true– but it’s precisely that probabilistic nature, that reliance on chance, that leaves Ulanoff and his partners frequently unable to understand each other past a certain level of complexity. In order to do that — in order to go from pretty good to legitimately astonishing — I believe machine translation would have to move beyond Bayesian probabilistic approaches and towards developing an actual theory of the world for their models, which would entail a functioning theory of mind. Outside of Doug Hofstadter, hardly anyone is even trying  to do that. (As my friend Alex Waller says, “It’s okay to discuss the pros and cons of AI, but we need to admit actual AI will almost certainly not exist in our lifetimes.”) So for now we’ll have to settle for OK and recognize that there’s always going to be the odd WTF awful translation popping up, because of what the human language capacity can do and computers can’t.


  1. I never quite understand what people in AI are getting at when they do what they do. If we’re gonna make an artificial version of something, doesn’t that demand that we know what the ‘real thing’ is?

    There is one instance of ‘intelligence’ in the universe that we’re aware of: human intelligence. So when the AI people ignore what people like Chomsky are interested in (the nature of human intelligence), I can only see this as shooting themselves in the foot.

    Now, I’m of the opinion that we have a looong way to go before we figure human intelligence out, so it’s understandable that the AI can’t be bothered with the particulars. But if that’s the case, don’t call it ‘AI’ call it ‘engineering’ or something

    1. Most of the people working in what the popular press calls AI would actually describe their field as Machine Learning. AI as it was originally conceived (building machines that simulate human intelligence) is not a popular field of study, because the problem is blindingly difficult, and most of the obvious problems that true AI could be applied to can be more productively tackled by asking ‘how can we build a machine that can do X?’ than by asking ‘how can we build a machine that does what a human brain does when a person does X?’. At least up to the point where you run into problems like the one Freddie describes above.

      1. I think it’s a classic map-territory problem. People say they merely want a machine that models at some level of abstraction, but let them alone for a minute and they’re back to trumpeting the machine as a model of human intelligence.

        And then us poor linguists end up being forced to cram our models of language into a machine, with the machine as the frame of reference.

        It’s very frustrating…

  2. This is a good description of edge cases for machine translation (MT), and I agree with the general claim that “perfect” MT is going to require a representation of what we could loosely term as commonsense knowledge and reasoning, but it’s worth pointing out a few things in response. First, there is a lot that can be done even in the absence of the kind of general underlying theories that would please someone like Chomsky, as the Google demo demonstrates. To begin with, sometimes the ambiguity exists in the target language, too, so you can just push it through and let the human figure it out. Even where that’s not possible (say where divergent target-language morphology or syntax requires the ambiguity be resolved), this can often be done with heuristics or with statistic properties built from enough data. For coreference resolution, most of the time, pronouns corefer with the nearest noun phrase, and both of these can be identified. That doesn’t work for your example, but a parse of the input sentence along with some statistics about who-fears-what leading to which-action can get you a long way. And once you’ve resolved these ambiguities, you have the information you need, say, to encode some morphological feature in the target language. This is all to say that where the exact performance ceiling lies — beyond which you need this kind of special reasoning — is unclear. The solution used by places like Google is to train the systems on larger and larger pools of data, and that has been a crucial piece of the steady progress that MT has made over the last 15 years. Most everyone agrees a ceiling exists, but we don’t seem to have hit it.

    This is somewhat depressing from a scientific perspective, since it doesn’t help us understand or explain what people are doing in any way. But there is also a lot of work in using syntactic and even semantic representations to try to improve machine translation. The best-performing English–German translation system at last year’s Workshop on Statistical Machine Translation (which I co-organize) took plain English and built a German parse tree, and included features to encourage subject-verb agreement. The drawback is that such systems tend to be very, very slow, and thus can’t work in real-time settings like Google Translate. This has been the frustrating result of much of MT research as long as I’ve been a part of that community: come up with some new feature-rich model that better describes the world, get some small gain (at huge expense in computational complexity and running time), and then watch that gain get squashed by someone training their model on an order-of-magnitude-larger dataset.

    I think there definitely is a ceiling, and I’m also a huge AI skeptic. It seems to me that people’s answer to whether they think AI is going to happen is an axiom around which their communities form. At one extreme you have people at like those at MIRI. These people spend all their time trying to maximize the chances that AI — whose development is taken as a given — will be friendly to us. They are extremely* intelligent but also sort of crazy. On the other end, it seems to me, are academics and industry-types building actual systems, who might share some of those beliefs, but whose familiarity with the inner workings and present lackings of actual AI tools makes such concerns seem very remote. Interestly, to me, there is very little overlap between the two camps, which only fuels my skepticism about AI and about the long-term prospects for MT.

    In the end, what matters here is what people use this for. Is a computer ever going to translate Voltaire to a human’s satisfaction? Probably not. But they can help you communicate with your neighbor, and that’s worth something.

    1. It is indeed worth something. The difficulty, for me, is in getting people to recognize that the limits, while related to length and complexity, are not really, or not quite, or not entirely a matter of length and complexity. There are texts of great length and complexity that Google Translate could translate very close to perfectly; there are short, superficially simple structures that it would struggle mightily to translate effectively. And that, to me, is the real difficulty, speaking from my position as a humble applied linguist. The tough part is not convincing people that there is a limit but in developing an intuitive understanding of where the limits lie. (Which I don’t have, myself.)

      Thanks for your insight and your comment.

      1. Really interesting post and comment. If one were using this to indeed communicate with their neighbor, would there be ways to write sentences to avoid ambiguity in the translation, or would this require such extensive previous knowledge of the other language that you’d basically have to be fluent already? (I figure avoiding pronouns and just repeating the nouns would help.)

        Also, are there any books on linguistics you’d recommend that a lay person could read? I read Pinker’s Language Instinct about a decade ago now – is that still accurate? Are there other books now?

        1. You’d have to be fluent in the other language in order to avoid unintended ambiguity, unfortunately. The mismatches between even things as simple as lexical meanings are completely specific to the two languages concerned.

          I guess, for a layman, Pinker’s book is still a good start. He’s fluffy, tho. A psychologist who likes to think about language, rather than a linguist. It’s a particular point of view you’re getting. From the POV of my field (formal linguistics, formal semantics), it’s quite vague and fluffy… But nobody would like to read a book written by us. 😉

    2. That was an amazing comment, Matt. Thanks for taking the time!

      It does seem to have been the case — for decades now — that the people who are the most excited about imminent singularity-type AI are (otherwise) very smart people who have absolutely no fucking idea what they’re talking about w/r/t machine learning and/or neurobiology.

  3. Machine translation claims have always been darkly hilarious to me, ever since I was trapped in one of Microsoft’s presentations about it… (I’m a linguist, btw.)

    Stochoastic approaches, statistical modelling, machine-learning – none of it has worked, none of it is going to work. You can’t model language by feeding it into a shredder and counting the bits of chaff.

    But the solution that computer people have always used is very effective, from a PR perspective: They announce success while quietly lowering the bar for success.

    1. None of it is going to work, if by work you mean putting you out of work. But they’re getting very good at doing stuff that people thought they wouldn’t be able to.

      I once came into a french board gaming forum through a link, read the discussion for a while, halting English but that’s not unexpected – wait! They’re not speaking in halting English at all, it’s machine translated French! Google translate was on!

      It’s not perfect, it will never be perfect, it will never replace human translators… but it’s a hell of a lot more useful than pre-statistical approaches to translation (remember old Systran? AltaVista’s BabelFish?). You can use it to get a good idea of what a text is about, what people are talking about.

  4. But there’s a problem here: the coindexing is different depending on the verb.

    I work on programming languages rather than natural languages, but my impression is that formal semantics has not had a problem with the idea that parsing depends on meaning since the work of Richard Montague and Barbara Partee in the 1970s.

    If you’re thinking of machine translation specifically, see the recent work of Mehrnoosh Sadrzadeh and Bob Coecke have proposed “compositional distributional models of meaning”, which exploit the common mathematical structure of vector spaces and Lambek grammars to move LSA-style techniques past the of-words model and into taking serious advantage of grammatical structure.

    It’s new research and computationally expensive, so there’s a lot of engineering before it can scale up to Google translate-like services, but I think their work shows how to scale statistical methods up to at least sentence-level understanding. (Document-level understanding will remain open for a while longer, I think, because there are a lot of literary methods like ironic reversals and twist endings that are still beyond mathematical treatment.)

    1. I’m thinking, actually, of the basic presumption of Chomsky’s generative syntax that the parsing of sentences is meaning-independent. As far as latest semantic analysis goes, I think it’s most germane here if we’re very optimistic about the meaning of understanding-independent translation; even the most enthusiastic proponents of LSA still define the systems as relational, right? I mean I’m certainly not up on the latest LSA systems, but as far as I’m aware, everyone involved in vectorial semantics still sees those systems as about structural relations between position and meaning. (I mean, that’s why they’re vectorial.) There’s no place where actual semantic understanding resides.

      Now maybe we should just be more optimistic about the potential powers of purely probabilistic, relational translation. As I suggest here, I’m a skeptic. But maybe in terms of pure machine translation, we can create systems that translate without real, theory- and understanding-laden semantic content knowledge.

  5. I think I probably agree with the gist of your argument. I’m not one of those who believes “strong AI” will suddenly spring from Google’s efforts, in translation or otherwise. Yet, I think you probably underestimate what’s possible with the mere statistical approaches they’re using.

    For many years until it closed, I used to follow Google Translate’s little known official forum/group. It had few actual regulars besides me, but tons of tons of people who dropped in to complain about an especially bad or strange translation. As such, I’ve seen a lot of the weird things statistical translation does. Inspired by the weird translations I saw, I’ve also experimented a lot with it myself. I’ll say this: I know how statistical translation works, roughly. I know it’s not magic, arguably it isn’t even smart, but man it’s impressive sometimes. And weird.

    There was an Irishman who came into that forum furious, because he had tried to translate the Irish national anthem to English, and after a few lines of halting literal translation it had inserted “God save the queen!”. He was convinced this was some sort of deliberate easter egg, to make fun of Irish republicans. To explain that through its statistical models, GT has built a hazy idea of what an anthem is, and naïvely got a little too helpful in localizing it… that was not easy. I was surprised by it myself, although I had seen how it translated US airports into Norwegian airports, or English football teams into Italian ones.

    Other times I was impressed was when it translated a misspelled Spanish word into a creatively misspelled English word (“elado” to “Ais krihm”). Or when it helpfully interpreted news for you in its strange way (“Sarkozy loses” became “Sarkozy wins”. “Sarkozy Sarkozy Sarkozy” became “Bush defeats Blair”).

    I think the Google Translate has actually tuned their model away from excessive creativity, preferring bad but more-literal translations over grammatical and context-inspired but possibly wildly off translations. Because there’s been less of this sort of weirdness in the latter years, both positive and negative.

Leave a Comment

Your email address will not be published. Required fields are marked *