Grading on a Twisted Curve
Who do you see about giving the grading system a failing grade?
Oklahoma is one of the states that’s instituted an A-F grading system for schools as part of a push for school accountability. As with many school accountability systems, Oklahoma’s A-F grades depend heavily on test scores. That system, however, has now been roundly criticized in a report [PDF] by researchers at the University of Oklahoma and Oklahoma State University.
Their analysis pulled thousands of test scores from dozens of schools across the state. That analysis showed that, on the 50 question tests used as the basis for the grades, the difference between an “A” school and an “F” school was 3.67 correct answers, and that there were few meaningful differences between A, B, and C schools. As the report writers put it, “Many of the differences between letter grades were likely due to chance; even when they reached statistical significance they were of questionable practical utility....”
Oh, and they also hid achievement gaps by relying too much on aggregate measures. This is a problem Minnesotans should be familiar with, as our long-running high average educational performance has masked serious equity gaps.
The problems with the Oklahoma grades are different than the ones uncovered earlier in Indiana, where former state superintendent Tony Bennett (and several of his staff) had been found changing the grading system behind the scenes. Setting aside for the moment the fact that those changes were made in large part because a school Bennett respected (and which was connected to a major campaign donor), the Indiana case showed the distinctly artificial nature of this kind of grading system. As comforting as it would be to have a system that could authoritatively tell us which schools deserved an A and which deserved an F, there’s not much of a promising track record to go off of here.
Minnesota, as you may know, does not use an A-F grading scale as the basis of our accountability system. Instead, we use something called the Multiple Measurements Rating (MMR), devised as a replacement for No Child Left Behind’s Adequate Yearly Progress (AYP) requirement.
The MMR combines three different test score calculations: proficiency by student group, year-to-year student growth, and an achievement gap calculation that compares a school’s scores for generally low-scoring groups with the state’s scores for generally high-scoring groups. It also includes a graduation rate component. Those four components -- proficiency, growth, achievement gap, and graduation rates -- are assigned equal weight in calculating a numeric score for the school. This allows for more nuanced scores than the simplistic A-F scale, and it should more accurately represent how close or far apart school’s scores are.
In one sense, then, Minnesota’s system is “better” than the A-F systems in Oklahoma, Indiana, and elsewhere. Those systems, too, could be seen as going through the expected early hiccups that come along with developing a new tool. Who’s to say this isn’t just a natural part of the trial-and-error process?
Well, the Oklahoma researchers weighed in on this, too. They argue that the entire exercise is built on a flawed concept: the idea that test scores are a reasonable measurement of school performance. Here I’ll quote again from the report, which says, “While basing the letter grade solely on student test performance and like indicators, the A-F policy ignores the fact that most achievement variation exists within schools not across schools.”
Also, “A primary assumption of the A-F accountability system, that student test scores can be dissected and manipulated into valid indicators of school performance, is simply false.” Or, in simpler terms, “Student test scores are not a trustworthy measure of school performance.”
Here it should be noted that the Oklahoma researchers do support the use of accountability systems for schools, and offer several recommendations for improving Oklahoma’s system. They strongly argue against the use of arbitrary cutpoints for defining “proficiency,” and instead argue for the use of averages and standard deviations in raw scores. They also argue for abandoning the A-F scale in favor of a multidimensional system “includes indicators of standardized test performance, other outcome indicators, school process indicators, and school inputs.”
One final consideration is what the purpose of such systems should be. The report writers state plainly that, “It is a myth to think that using student test scores to punish or reward schools is a driver of improvement.” If we want to change school practices, we need to do something other than label schools and assign high stakes to those labels. This comes back to the theme that systems change is not a substitute for capacity building. If the problem is with the rules, incentives, and distribution of authority, changing those things can fix the problem. If the problem is a lack of capacity -- material, human, or skill -- giving a school an F isn’t going to help it get any better.