I recently urged caution in interpreting the U.S.’s mediocre PISA scores. The post was meant to help preempt all the hand-wringing that was expected when the scores became public. But it turns out that the blogosphere didn’t need my infinite wisdom after all: Skepticism about the PISA abounds, and it’s coming from across the ideological spectrum, from Diane Ravitch to Rick Hess.
What’s especially encouraging is that the conversation has focused as much on the reliability of the PISA data as on the interpretation. Gone are the days, hopefully, when a major organization can release a report and expect as a matter of course that the media will treat the data and conclusions as authoritative.
I previously mentioned the inappropriately strong lessons the OECD attempts to draw from thin data. In addition, Hess points to the mixing of city and country data — “Comparing U.S. performance to that of Shanghai isn’t apples and oranges; it’s applesauce and Agent Orange” – the inexplicable fall of Finland, and the extreme sensitivity of the rankings to the choice of test questions.
What about the mechanics of PISA test administration? Did every country follow the same strict procedures? Almost certainly not. Consider sample drop-out, which is one of the most frequent problems we confront in program evaluation. Even an impeccable research design won’t produce meaningful results if a significant number of people don’t get measured. This is an especially common issue in educational interventions, when the least gifted students in the treatment group are mysteriously absent on test day.
PISA response rates vary widely from country to country. For instance, Finland tested 96 percent of its nationally-representative sample, but Mexico somehow tested only 63 percent of its own. One need not be a cynic to suspect some gamesmanship there.
This isn’t the first time that I’ve encountered questionable OECD data. Its earlier 2013 report on teachers contained some comparisons of relative pay and work time that were simply invalid. As I wrote at the time, “The truth is that large-scale international comparisons are almost inevitably plagued by inconsistent and unreliable data.” The PISA is no exception.