In what could prove a turning point in favor of education reform, American Federation of Teachers president Randi Weingarten came out in favor of considering student performance on standardized tests as one part of teacher evaluations. If Weingarten turns her words into real actions, and if the teachers’ unions follow Weingarten’s lead, it will improve teacher quality across the country.
Support for using student test scores to evaluate teachers is a departure for Weingarten. Two years ago, when New York City planned to start using test scores as part of its teacher evaluations, it was Weingarten, then head of the city’s teachers’ union, who pushed state lawmakers to ban the city from doing so. The legislature caved. New York’s education reformers are working to eliminate the ban.
Improving teacher evaluations is vital, because they are the foundation underlying many promising education reforms. We cannot hope to make informed decisions about which teachers deserve tenure, higher salaries, and greater responsibility without a reliable system for differentiating between effective and ineffective teachers. Unfortunately, current public-school evaluation systems don’t nearly meet that burden of reliability.
The nonprofit New Teacher Project recently analyzed teacher evaluations in twelve large school districts across four states. They found that in districts using a binary evaluation system — the only ratings being “satisfactory” and “not satisfactory” — over 99 percent of teachers received the thumbs-up rating. Even districts that used broader evaluation distinctions ranked 94 percent of teachers in one of the top two tiers and deemed just 1 percent “unsatisfactory.”
The consistently homogeneous results of teacher evaluations across the country not only run counter to a wide body of research showing large variations in teacher effectiveness; they also strain plausibility.
It’s particularly difficult to believe that so many teachers in struggling urban school systems are living up to expectations. In 2007, only 57 percent of fourth-graders in New York City and 44 percent of fourth-graders in Chicago could claim to be basically literate, according to a highly respected test administered to representative samples of students each year by the U.S. Department of Education. That same year, less than 2 percent of New York City’s 56,000 classroom teachers and less than 1 percent of Chicago’s 19,000 were deemed “unsatisfactory” in their official evaluations. The vast majority of teachers were rated well above par.
All teachers are not created equal, and any evaluation system that suggests otherwise is worse than useless. The evaluation system is a lost opportunity to identify those teachers who are successful, those who need assistance, and those whom we should show the door. U.S. Education Secretary Arne Duncan had it right when he lamented to a group of education researchers that “in California, they have 300,000 teachers. If you took the top 10 percent, they have 30,000 of the best teachers in the world. If you took the bottom 10 percent, they have 30,000 teachers that should probably find another profession, yet no one in California can tell you which teacher is in which category. Something is wrong with that picture.”
What’s wrong with the picture is that teacher-evaluation systems rely on entirely subjective assessments with inflationary incentives. But if used properly, now-ubiquitous standardized testing of students in America’s public schools can provide an objective benchmark capable of improving — though not entirely replacing — the modern teacher-evaluation system.
Current teacher evaluations overemphasize classroom observation, which, while valuable, cannot tell us everything we need to know about a teacher’s effectiveness. Besides, current classroom observations are conducted too infrequently to be informative. More than half of the districts evaluated in a recent U.S. Department of Education study evaluated tenured teachers just once every three years. In Chicago, tenured teachers whose last rating was Excellent or Superior — a distinction awarded to 93 percent of evaluations in that district between 2003 and 2006 — are evaluated once every two years.
A teacher’s job is far too complex to be evaluated with observations of a single class period or less, once every year or so. In the Miami-Dade school system, according to the collective-bargaining agreement, the required annual official evaluation need not last longer than 20 minutes.
Exacerbating the problem of inflated teacher evaluations is the fact that principals have neither the incentive nor, often, the power to correctly identify an ineffective teacher. For starters, the rarity with which teachers are identified as unsatisfactory itself tends to reduce principals’ willingness to use the designation, because it implies that the recipient is not only unsatisfactory but in fact egregiously incompetent — often a far stronger signal than the principal intends to send. The collective-bargaining agreements that govern many school systems give teachers powerful means to fight back if they do not agree with their evaluation, thus burying the principal in paperwork. And tenure ensures that the principal can’t remove an ineffective teacher, no matter how poor his or her rating. In sum, identifying an ineffective teacher brings a principal few benefits and many headaches.
So classroom observations need to be supplemented with other useful, objective information about a teacher’s classroom performance — both to give a more complete picture and to provide principals with empirical support for their decisions. Test scores are an obvious and accessible way to do that.
In the last decade, researchers have developed statistical tools capable of measuring teachers’ independent contribution to their students’ learning, as reflected by their scores on standardized tests. When carefully applied, these measures can separate the influence of the teacher on a student’s test scores from the influences of other factors, such as the student’s background characteristics and even the quality of his home life.
It’s true that test-score analysis, if used improperly, can do as much harm as good. And test scores alone are not broad enough to be used in isolation in evaluating teachers. Nonetheless, they are valuable tools for strengthening those evaluations and the employment decisions that might result.
Randi Weingarten should be applauded for her decision to support the use of test scores to evaluate teachers. However, she is likely to face an uphill climb. While Weingarten’s AFT has shown itself open to some common-sense reforms, the nation’s largest teachers’ union, the National Education Association, has so far pursued a scorched-earth policy not only against using test scores in teacher evaluations, but also against more widely praised reforms such as the expansion of charter schools. It’s also unclear whether local AFT affiliates will be willing to follow their president. For instance, Weingarten’s replacement as president of New York City’s teachers’ union, Michael Mulgrew, has actively fought to keep the ban on using test scores to evaluate teachers.
For the sake of America’s future, let’s hope that Randi Weingarten sticks to her brave words and that the broader teachers’-union community follows her lead.
– Marcus A. Winters is a senior fellow at the Manhattan Institute.