The evidence seemed overwhelming. When Rutgers professor Steven Barnett gave a congressional briefing on preschool last spring, he touted higher test scores, improved graduation rates, reduced crime, less welfare use, and other benefits large and small. More than a hundred studies over 50 years had documented the positive impact of preschool, and the more rigorous ones had found the larger effects. The benefits were shown to far outstrip the costs. High-quality preschool clearly works, according to the presentation, and now the challenge is to give as many children as possible access to its benefits.
Just by looking at the presentation slides, you would never have guessed that, only a few months earlier, the nation’s largest preschool program had been found to confer no lasting benefits on children.
That’s the strange thing about the preschool movement. It mixes scholarship and advocacy in a way that leads some academics to put themselves far out in front of what the evidence supports. Preschool advocates inundate their audiences with long and authoritative-looking lists of academic citations, but they rarely seem troubled by the rigorous gold-standard evaluations that generate contrary results. Healthy skepticism — which is crucial to a good scholarly mindset — is too often absent among the preschool movement’s most vocal leaders.
And skepticism is needed right now. Political interest in preschool is growing rapidly, with billions of dollars already spent and much more funding on the way. For the second straight year, President Obama used his State of the Union address to endorse federal spending on universal preschool, and states are hastening to create their own programs. In New York City, Mayor Bill de Blasio is so keen on preschool that he’s pushing for a special tax to pay for it. Once in place, public preschool is about as likely to be repealed as public kindergarten, so it’s essential that policymakers understand the limitations of the evidence that preschool advocates put forth.
The central limitation is self-selection bias. It’s a straightforward concept: The people who choose to sign up for a program have different characteristics from the people who do not sign up, and so what appears to be a program’s impact — say, higher test scores after attending preschool — may be just a reflection of preexisting differences between the two groups.
Why would self-selection bias affect preschool evaluations? Because higher-ability children also tend to have more involved parents. It’s the double whammy of genetic and environmental inheritance that accentuates achievement differences among young children. When parents choose to send their children to preschool, their decision is influenced by a variety of factors that also affect their children’s ability to succeed independent of preschool. Put more simply, the smarter and better-adjusted kids might be more likely to go to preschool in the first place.
If children are assigned to preschool randomly, as part of a controlled experiment, then self-selection bias can be avoided. (A non-experimental technique called regression discontinuity design can also avoid the bias, but it is of limited usefulness when applied to preschool.) Yet most of the modern evidence cited by preschool advocates does not come from experiments. Instead, researchers have tried to eliminate the effect of self-selection non-experimentally, by controlling for other factors — that is, making sure that the preschool and non-preschool groups have similar demographic profiles.
A recent evaluation of preschool in New Jersey provides a basic example of this. After controlling for school district, age, ethnicity, gender, home language, and parents’ socioeconomic status, researchers found that fifth-graders who had attended the state’s preschool program had higher test scores than those who had not attended. But the researchers could not control for the crucial cognitive and parenting differences between those who seek out preschool and those who do not.
Some of these non-experimental designs are sophisticated and technically impressive. They no doubt succeed in reducing self-selection problems. But reduce is not the same as eliminate, and that is where scholar-advocates have jumped ahead of the science. They view suggestive evidence as definitive, portray second-best techniques as the equal of gold-standard experiments, and explain away contrary findings on speculative grounds.
Their confidence is not justified. A 2008 study in The American Statistician showed how unreliable statistical methods that purport to overcome self-selection bias can be. The study’s authors compared the actual experimental results of a multi-state job-promotion program with the results estimated by a popular non-experimental technique called propensity-score matching (PSM). In two of the three states, PSM indicated that the program had increased annual earnings for participants by well over $1,000, while the controlled experiment showed that participants actually lost money.
#page#Experiments have not been kind to early-education programs. Consider the largest such program, Head Start. The $8 billion program’s main lobbying outfit, the National Head Start Association, has on its website an impressive list of non-experimental studies purporting to document Head Start’s benefits. Some experts have always been skeptical, despite the lobby’s informal “Head Start works!” slogan. (As researcher Nicholas Zill once remarked, “You should know that if anybody wears pins that say whatever program ‘works,’ you can take it as an assumption that it doesn’t work. You don’t see military people saying, ‘Machine guns work!’”)
When the federal government finally conducted a large-scale, random-assignment evaluation of Head Start, the skeptics were vindicated. By first grade, children who had been randomly granted access to Head Start fared no better than the large control group of children who had applied for the program but were not given access due to space constraints. The evaluation was well designed, well implemented, and conclusive: Head Start has no lasting effect.
When Early Head Start, a related program focusing on toddlers, was recently subjected to experimental evaluation, it met the same conclusion: no lasting effect. As for Even Start, an intervention that combines preschool for children and job training for their parents, there was no effect even in the first year.
Perhaps most telling is a recent experimental evaluation of public preschool in Tennessee. With small class sizes, licensed teachers, and a rigorous curriculum, the state’s preschool program is in line with what advocates are pushing for nationwide. As with Head Start, participants outperformed a control group after the preschool year, but the advantage was gone by the end of first grade.
Preschool supporters like to point out that when children in experimental studies are randomly denied placement in government preschool, their parents sometimes just send them to a different preschool. The control group therefore is not exclusively in the no-preschool condition. That’s actually a feature, not a bug, since the relevant policy question is whether government preschool adds any additional value to the population being served. But even when restricted to children who have no other preschool options, Head Start’s impact appears to be zero.
Blame-shifting is another way that preschool advocates react to the experimental findings. Head Start lobbyists, for example, attribute the fade-out of its effects in part to the bad elementary schools that children go on to attend. This is a problematic defense on two levels. First, logically, improving elementary schools would likely raise the achievement levels of both Head Start participants and non-participants. So even with better elementary schools, Head Start’s value-added might be negligible. Second, as a policy matter, improving elementary schools is no easy task. In fact, one reason that preschool has generated so much interest is that interventions in the elementary years have been largely unsuccessful.
The claim that preschool would be useful if only the other parts of society worked better is speculation writ large. Equally speculative is the advocates’ belief in “sleeper effects” — preschool benefits that fade out but mysteriously reemerge later in life. Some sleeper effects were observed in the famous Perry Preschool Project, started during the Kennedy administration. But initial effects persisted far longer with Perry than with any conventional preschool program, and thinking that the tiny, half-century-old Perry experiment can tell us much about today’s large-scale interventions is itself a leap of faith.
This is the fundamental contradiction of preschool advocacy. Adherents are utterly confident that government preschool is effective, yet their arguments from the data depend on hope and speculation. It’s a lesson in what can happen when scholars abandon their innate skepticism in pursuit of a policy goal. The U.S. is about to make historically large public investments in preschool, motivated in large part by a “consensus” of scholars who will brook no disagreement. The truth is that we know very little about what kinds of early-education programs, if any, are cost-effective uses of taxpayer dollars. It’s not too late for policymakers to pull back, quiet the true believers, and take a hard look at the data.
– Mr. Richwine is a public-policy analyst in Washington, D.C.