The Corner

Chomsky on Nate Silver and Jay Cost

Okay, not exactly. But close enough.

See, I’ve been reading this very long interview with Noam Chomsky, and the reason I’ve been reading it is that it’s about cognitive science and artificial intelligence, not politics. (I’ve argued in the Corner that when it comes to his day job as a linguistics professor, Chomsky is actually a conservative.)

The interview is largely about two competing scientific methodologies. To simplify it, one methodology — which Chomsky finds great fault with — calls for generating massive amounts of data from observation (of brain processes, in this case) with sophisticated equipment, and applying statistical techniques to this data in a search for patterns and correlates that can form a basis of future predictions. In a neat passage, Chomsky contrasts this approach with his own preferred methodology, which he considers more classical. The masses-of-data approach to science, he says, is driven by a desire to approximate what is going on “outside the window” (that is, in the world), whereas the classical approach is driven by a desire to understand what is going on “outside the window.”

. . . [T]ake an extreme case, suppose that somebody says he wants to eliminate the physics department and do it the right way. The “right” way is to take endless numbers of videotapes of what’s happening outside the [window], and feed them into the biggest and fastest computer, gigabytes of data, and do complex statistical analysis — you know, Bayesian this and that [Editor’s note: A modern approach to analysis of data which makes heavy use of probability theory.] — and you’ll get some kind of prediction about what’s gonna happen outside the window next. In fact, you get a much better prediction than the physics department will ever give. Well, if success is defined as getting a fair approximation to a mass of chaotic unanalyzed data, then it’s way better to do it this way than to do it the way the physicists do, you know, no thought experiments about frictionless planes and so on and so forth. But you won’t get the kind of understanding that the sciences have always been aimed at — what you’ll get at is an approximation to what’s happening.

Suppose you want to predict tomorrow’s weather. One way to do it is okay I’ll get my statistical priors, if you like, there’s a high probability that tomorrow’s weather here will be the same as it was yesterday in Cleveland, so I’ll stick that in, and where the sun is will have some effect, so I’ll stick that in, and you get a bunch of assumptions like that, you run the experiment, you look at it over and over again, you correct it by Bayesian [statistical] methods, you get better priors. You get a pretty good approximation of what tomorrow’s weather is going to be. That’s not what meteorologists do — they want to understand how it’s working. And these are just two different concepts of what success means, of what achievement is. In my own field, language fields, it’s all over the place. Like computational cognitive science applied to language, the concept of success that’s used is virtually always this. So if you get more and more data, and better and better statistics, you can get a better and better approximation to some immense corpus of text, like everything in The Wall Street Journal archives — but you learn nothing about the language.

A very different approach, which I think is the right approach, is to try to see if you can understand what the fundamental principles are that deal with the core properties, and recognize that in the actual usage, there’s going to be a thousand other variables intervening . . . and you’ll sort of tack those on later on if you want better approximations, that’s a different approach. These are just two different concepts of science. The second one is what science has been since Galileo, that’s modern science. The approximating unanalyzed data kind is sort of a new approach, not totally, there’s things like it in the past. It’s basically a new approach that has been accelerated by the existence of massive memories, very rapid processing, which enables you to do things like this that you couldn’t have done by hand.

It struck me reading this that Chomsky is unwittingly describing the difference between the Nate Silver and Jay Cost approaches to presidential prognostication. Silver is basically feeding masses of raw data into a model that he has fine-tuned to approximate, and predict, what will happen “outside the window.” In the present case, Silver thinks what will happen is a comfortable Obama win. Cost explicitly rejects this approach in his prediction of a Romney victory:

When I started making election predictions eight years ago, I had a very different perspective than I do today. I knew relatively little about the history of presidential elections or the geography of American politics. I had a good background in political science and statistics. So, unsurprisingly in retrospect, I focused on drawing confidence intervals from poll averages.

Since then, I have learned substantially more history, soured somewhat on political science as an academic discipline, and have become much more skeptical of public opinion polls. Both political science and the political polls too often imply a scientific precision that I no longer think actually exists in American politics. I have slowly learned that politics is a lot more art than science than I once believed. . . .

Again, this is a different approach than the poll mavens will offer. They are taking data at face value, running simulations off it, and generating probability estimates. That is not what this is, and it should not be interpreted as such. I am not willing to take polls at face value anymore. I am more interested in connecting the polls to history and the long-run structure of American politics, and when I do that I see a Romney victory.

Cost’s approach is closer to what Chomsky calls the Galilean approach (I’m not sure that’s the best label for it, but whatever). Cost thinks the data generated by polls are not determinative in their own right, and are best utilized as part of a holistic attempt to understand the underlying dynamics of the race, one that incorporates historical and other general political principles. (The upshot is that Cost thinks Democrats can’t repeat their post–New Deal, pre-Reagan feats of winning presidential elections without winning independents, which I fear is false, FWIW.) The limitation of Cost’s way of doing things is that it is only quasi-empirical, relying quite a bit on “art” and “gut instinct”, and thus tricky to falsify or reproduce. The limitation of Silver’s way of doing things is that Silver’s predictions don’t tell us much about the Why and How — that is, their assumed predictive power isn’t premised on anything like a deep understanding of “what the numbers mean.” And if you can’t say why the model does a good job of approximating what’s going on “outside the window,” you’re vulnerable to being caught off guard by the Black Swans of the world.


The Latest