Politics & Policy

Proficiency Illusion

Tests are getting easier, but student scores on them are stagnating.

The No Child Left Behind Act is up for reauthorization. Putting aside the philosophical problems that stem from increased federal interference in education, the law, as it currently stands, also has a ream of logistical troubles.

One of the biggest flaws with NCLB, for example, is its insistence that all students — 100 percent — be proficient in reading and math by 2014. That won’t happen, of course. But no politician has the stomach to amend this irrational goal to a more manageable 70 or 80 percent, fearing that inevitable question: “Which 20 percent children don’t you care about?”

Another problem: The federal government holds all states accountable for their schools’ performance, but it lets states design their own accountability measures. That means the tests that students take in Wisconsin, for example, might be far easier than the tests students take in Massachusetts (in fact, they are).

The disparities are laughable, especially when they’re used as the basis for a massive federal educational accountability system. Some states habitually report that upwards of 80 percent of their students score at the “proficient” level on state tests. But when those same students take the national assessment (the scores on which don’t count for anything), only 20 percent reach the “proficient” mark.

But even if the state tests are easier, the argument goes, they can still show whether students in each state are making academic progress. If the percentage of Illinois’s eighth graders who score “proficient” on the state test increases from one year to the next, then the state is doing a better job teaching its youngsters, right?

Wrong. A new study, The Proficiency Illusion, shows among other things that some state tests are simply getting even easier from one year to the next.

Researchers used data from schools in several states whose pupils participated both in state testing and in a nationally standardized assessment by the Northwest Evaluation Association (NWEA). Then they estimated proficiency cut scores, i.e., the level students needed to reach to pass the tests. What did they find?

State tests vary greatly in difficulty. The extent to which the difficulty of tests varies from one state to the next is shocking. Cut scores on Colorado’s math test were at the 6th percentile on the NWEA scale; Massachusetts’ math test cut scores were at the 77th percentile.

The tests of eight states (out of the 26 evaluated) have become significantly easier.

Improvements in passing rates on state tests is largely the result of easier tests.

State tests vary immensely from one state to the next. According to The Proficiency Illusion, to achieve the reading proficiency cut score a fourth grader in Wisconsin might be asked to pick out which of four sentences is a fact. A child who scores proficient on Massachusetts’s grade-four reading assessment, however, must navigate questions of far greater complexity. On the NWEA test, a question of equivalent difficulty to Massachusetts’s grade-four cut score asks students to read a passage from Tolstoy’s short story “How Much Land Does a Man Need.”

But think, for a moment, about the last bulleted point — that test-score improvement is largely the result of easier tests. America is spending billions on a k-12 education system — one that is supposedly data-driven — any success of which may be largely an illusion.

The scary thing is that the United States has not been showing progress from one year to the next on many state tests. So if the tests are getting easier, but student scores on them are stagnating, that means this country’s students may actually be learning less today than they were several years ago.

Take Illinois for example. Between 2003 and 2006, Illinois’s proficiency cut scores on its state math tests plummeted, i.e., the 2003 assessment was significantly more difficult than the 2006 assessment. Thus, even if student test scores remained the same over that three-year period, the tests would show increases of 8 percentile points in grade 5 and a whopping 27 percentile points in grade 8. And sure enough, over the past three years Illinois has reported similar gains: 10 points for fifth graders and 25 points for eighth graders. (Illinois publicly lowered its grade-8 math cut score.)

In reading, declines in test rigor have occurred, too. Because of the downgrade of proficiency, even if Illinois students actually performed no worse or no better over the past three years on the state assessment, the test scores would show gains of 17-percentile points in third grade and 14 percentile points in eighth grade.

And since 2003, the state has reported score increases of 9 points for third graders and 16 points for eighth graders. Which means that, regardless of what state officials say, in the world of reality, third-grade reading skills in Illinois would seem to be declining. After years of unprecedented federal intrusion into education, after billions of dollars and countless hours of effort, Illinois third graders may be reading worse today than they were in 2003.

What does this mean for educational policy and practice? What does it mean for standards-based reform in general and NCLB in particular? It means big trouble, and those who care about strengthening U.S. k-12 education should be furious. There’s all this testing — too much, surely — yet the testing enterprise is unbelievably slipshod. It’s not just that results vary but that they vary almost randomly, erratically, from place to place and grade to grade and year to year in ways that have little or nothing to do with true differences in pupil achievement.

America is awash in achievement “data,” yet the truth about our educational performance is far from transparent and trustworthy. It may be smoke and mirrors. Gains (and perhaps slippages) may be illusory. Comparisons may be misleading. Apparent problems may be nonexistent or, at least, misstated. The testing infrastructure on which so many school reform efforts rest, and in which so much confidence has been vested, is unreliable — at best.

Until we fix the basic parts of this NCLB system, its results cannot claim to be solid. And the accountability measures it brings to bear on schools and students must, therefore, be viewed as unjust and perhaps indefensible.

– Liam Julian is associate writer and editor at the Thomas B. Fordham Foundation and a research fellow at Stanford’s Hoover Institution.


The Latest