The Selective Forces at Work Among Scholars

Betsey Stevenson and Justin Wolfers argue that there is a great deal of consensus concerning U.S. economic policy that is masked by conservative ideologues. Consider the following passage:

Let’s start with Obama’s stimulus. The standard Republican talking point is that it failed, meaning it didn’t reduce unemployment. Yet in a survey of leading economists conducted by the University of Chicago’s Booth School of Business, 92 percent agreed that the stimulus succeeded in reducing the jobless rate. On the harder question of whether the benefit exceeded the cost, more than half thought it did, one in three was uncertain, and fewer than one in six disagreed.

Note the importance of defining “the standard Republican talking point” as “it didn’t reduce unemployment.” Having encountered many criticisms of the stimulus, I can attest that while many people did indeed embrace this strong form of the critique, i.e., that it did not reduce the jobless rate relative to a counterfactual in which the stimulus law was not passed, others relied primarily on the notion that the benefit did not exceed the cost. For example, one could argue that while stimulus spending might temporarily reduce the jobless rate in certain sectors, e.g., federal transfers to state and local governments explicitly tied to maintaining high employment levels that could not be sustained by state and local revenue bases that had experienced a severe negative shock, or that were rooted in assumptions set during housing booms, it would result in costly distortions in future years that might reduce employment levels by a commensurate, or by a more-than-commensurate, amount.

What I find particularly striking about the result is that on the question of whether the benefit exceeded the cost, an extraordinary two in three respondents expressed certainty about their assessment. Stevenson and Wolfers make note of this certainty later in the column:

Let’s be clear about what the economists’ remarkable consensus means. They aren’t purporting to know all the right answers. Rather, they agree on the best reading of murky evidence. The folks running the survey understand this uncertainty, and have asked the economists to rate their confidence in their answers on a scale of 1 to 10. Strikingly, the consensus looks even stronger when the responses are weighted according to confidence.

This paragraph is odd in that it is self-undermining. One actually gets the strong impression that the respondents don’t give due regard to the murkiness of the evidence in question, hence the expressions of confidence.

In Uncontrolled, Jim Manzi, who advocated a targeted fiscal stimulus, as did many others on the center-right, reflects on the barriers to identifying useful, nonobvious, and reliable predictions in the social sciences:

First, causal density is very high, so sample size is critical, but many natural experiments have far too few data points. In fact, many of the most important questions (e.g., Should we execute a stimulus program?) can be executed only at the national level, so we have no control group at all unless we want to conduct cross-country analysis in which the differences between societies become so profound that control becomes impractical. Many other questions occur at the state level, or are done by a very small number of more local jurisdictions, so sample sizes are still very small, as in the abortion-crime example.

Second, a national society is holistically integrated; therefore, it is hard to get causal impermeability between the test and control groups. In the abortion-crime debate, for example, I indicated that a significant technical issue was how to account for the reality that people move between states. By illustration, the causal effects of abortion legalization in California in the early 1970s partially propagated to Washington, Arizona, and other states by the early 1990s as people moved. Approximate data exist on how many people of what ages moved between what states in what years, but we know neither whether within these groups people who moved had a greater or lesser inherent propensity to crime than those who did not, nor whether they were more or less influential in shaping attitudes and behaviors that affected the behavior of those who lived around them in their new state of residence. This problem of “causal pollution” becomes especially severe for evaluating long-term effects, which of course is often what is most important to us in evaluating social interventions.

Third is the possibility of systematic, unobserved bias between the individuals or places that are subject to the treatment in the natural experiment as compared to those that are not. Consider the abortion-crime example. All kinds of plausible differences in political culture, social evolution, rational expectation for future challenges, and so on could vary between the early legalization states and the rest of the country. As one hypothetical illustration, it might be that those states that legalized abortion early did so because the structure of the relationship between their state legislators and political interest groups tended to be systematically different than those states that did not, and this difference also caused criminal-justice behavior to change differently in these states than in other states over the ensuing decades, thereby resulting in different crime rates. Even now, we have no idea to what degree such differences are the true causal drivers of any difference in outcome we see for the early legalizers versus the rest of the country. This is the irreducible problem for any such social natural experiment that does not use strict randomization for assignment to the test population, no matter how large the same size.

Given the complexity of the fiscal stimulus law and the many moving pieces that constitute an advanced economy, the confidence expressed by many of the respondents seems misplaced. And indeed, when we more closely examine the responses to the two questions, we notice a pattern: not all of the respondents are using the 1 to 10 scale in the same way; many of those who express the greatest confidence have served in government, under Democratic and Republican administrations, and partisan affiliation helps predict the reply; and a large majority of respondents offered no explanation of their vote, presumably because these are busy people, who did not devote the care and attention to answering a survey question that they might to their scholarly work.

But the explanations are a treat. Take, for example, Darrell Duffie, who agreed that the stimulus reduced joblessness with a low confidence of 2:

Subsidizing employment leads employment to go up, other things equal. Adverse impacts through growth incentives might take time.

Not surprisingly, his answer to the question of whether or not the benefits exceeded the cost was an unexplained “Uncertain,” with a low confidence of 2 yet again.

Judith Chevalier confidently (9) agreed that the stimulus reduced unemployment, yet on the question of whether the benefits exceeded the cost, she offered a very sensible “Uncertain”:

This is in part an empirical question and I think it would be difficult to answer this with any certainty.

Ray Fair, who specializes in modeling macroeconomic outcomes, expressed uncertainty as well. Anil Kashyap noted the role of “payoffs to special interests,” which in his view reduced the effectiveness of the program. Bengt Holmström agreed, yet he explicitly states that the “feedback effects [were] too complicated to calculate” and his “guess is that the program was marginally beneficial, but monetary easing helped more.” This answer strikes me as essentially indistinguishable from “uncertain,” and Holmström assigned a low confidence (2) to his reply. But of course different respondents treated the Agree, Disagree, Uncertain answers very differently, which presumably skewed the results.

My favorite replies are from Nancy Stokey of the University of Chicago, who writes,

How can anyone imagine this question is answerable, given the current state of economic science?

And Caroline Hoxby, who, in a somewhat similar vein, writes:

High confidence on an issue like this would be foolish.

Indeed.

Jim has a great take on how to interpret this kind of a survey, which he illustrates with a parable about a president seeking advice from various scholars:

The president would be incredibly irresponsible to begin debating nuclear physics with his science adviser, even if the president happened to have trained as a physicist. Conversely, the president would be incredibly irresponsible not to begin a debate with the historian. This likely would include having several historians present different perspectives, querying them on their logic and evidence, combining this with introspection about human motivations, considering prior life experience, consulting with non-historians who might have useful perspectives, and so on.

Next an economist walks into the room. She predicts a certain amount of change in Iranian employment if the CIA were to successfully execute a proposed Iranian currency-counterfeiting scheme designed to create additional inflation in Iran for the next five years. Is this more like the historian’s prediction or the physicist’s prediction?

Superficially she might sound a lot more like the physicist. She would use lots of empirical data, equations, and technical language. Some parts of the prediction would have some firm foundation, for example, a buildup of alternative production capacity at all known manufacturing plants based on measurement of physical capacity. But lots of things would arguably remain outside the grasp of formal models. How would consumer psychology in Iran respond to this change, and how would this then translate to overall demand changes? How would the economy respond to this problem over time by shifting resources to new sectors, and what innovations would this create? How would political reactions to inflation in Iran lead to foreign policy changes, provoking other countries to war and other decisions, which would in turn lead to economic changes within Iran? And so on, ad infinitum.

How would the economist respond if challenged with respect to the reliability of her prediction with such questions? As far as I can see, with recourse to three kinds of evidence: (1) a priori beliefs about human nature, and conclusions that are believed to be logically derivable from them, (2) analysis of historical data, which is to say, data-driven theory-building, and (3) a review of the track record of prior predictions made using the predictive rule in question.

The problem, of course, is that the scientist doesn’t have access to the same kind of experimental knowledge about macroeconomic events as the physicist does about nuclear explosions. And so Jim argues that we should treat economists as we would treat historians

I’m not arguing that social science is valueless—I would no more advise a president to make a major economic decision without professional economic advice than I would suggest that he make a decision about war and peace without consulting relevant historians—but I am arguing that we should be extremely humble about our ability to make reliable, useful, and nonobvious predictions about the results of our policy interventions.

This strikes me as deeply sensible. With historians, we generally know that we are dealing with fallible human beings who suffer from all kinds of cognitive biases, and that intuition and judgment are at the heart of the discipline — and certainly when it comes to drawing on a body of historical knowledge to draw conclusions about the present and future. A brief perusal of the actual substance of the IGM Forum poll makes clear that economists do much the same when they weigh policy questions.

So when we’re told that there is a semi-consensus among tenured economists at leading research universities, we should remember that there is also a semi-consensus among tenured scholars of Middle East and Islamic Studies on various questions. That scholars reproduce their own beliefs from one generation to the next — and that political sensibilities formed before one embraces scholarly work endure in the face of uncertainty — should hardly surprise us.

Once again, I’m reminded of Rafe Sagarin’s excellent Learning from the Octopus:

When we take a biological perspective on learning, we realize that we are biased toward learning from failure because of the selective forces at work. In nature, the selective agent acting on learning processes is anything that identifies one variant over another and helps it reproduce or kills it off — a violent storm that rips the weaker kelps off the rocks, a clever predator that lures deep-sea fish directly into its jaws with a glowing lantern, a picky mate that passes up the advances of any male companion whose claws or antlers or tail feathers or just a little too small.

When it comes to how we respond to big events in society, it is often news media that play the selective agent. After the Cosco Busan spill, images of hundreds of frustrated San Francisco volunteers waiting to clean up oiled birds, but held back by government bureaucrats, were disseminated by national medi. Those kind of images result in calls to Congress and demands for investigations. By contrast, the Coast Guard’s valiant attempts to clean up oil spills following Hurricane Katrina hardly made newsworthy footage relative to images of people stranded on the roofs of their flooded houses and Americans begging for deliverance from the overwhelmed refugee camp in the Superdome.

What exactly are the selective forces at work among economists, or in particular among economists in the public sphere? I would suggest that the kind of economist who expresses — I would say acknowledges — deep uncertainty, like Nancy Stokey, is likely to have less influence than an economist like Cecilia Rouse, who until recently served as a member of President Obama’s Council of Economic Advisors, or Paul Krugman, the celebrated Princeton economist and New York Times columnist, both of whom tend to express strong confidence in their judgments and assessments. (The same, I should stress, is also true of many right-leaning economists.) But does this mean that we should listen more closely to Rouse and Krugman or to Stokey and Hoxby, both of whom avoid the political fray?