It’s a new year, but for public education it looks like we may be seeing more of the same old thing. Tonight the Pittsburgh School Board will be reviewing a new teacher evaluation plan developed by the District based on highly problematic data drawn from all those high-stakes-tests our kids have been taking. Not only is the data bad, but the uses to which it is being put should be setting off alarm bells in every parent’s head as it actually damages our schools, our teachers, and even our children’s education. To understand why, Yinzercation talked to Dr. Tim Slekar, an education researcher and Head of the Division of Education, Human Development and Social Sciences at Penn State Altoona.
Pittsburgh’s plan comes just as Pennsylvania has introduced a new law mandating that every school district in the state must implement a teacher evaluation system, basing half of the evaluation on classroom observation and the other half on “multiple measures of student achievement.” We’ll get to these halves in a moment, but let’s start with the very premise of this new evaluation system. Pennsylvania and many other states around the country have introduced these laws as part of the corporate-reform-movement, which rests on the idea that public schools are failing, and that we must measure students with tests that will then be used to hold teachers accountable and even close down or “turn around” low performing schools (often by firing all the teachers). This seductive reasoning centers on the assumption that teachers are responsible for how their students perform on tests and that tests are an accurate measure of their teaching.
However, there are multiple problems with this logic. First, and perhaps most importantly, Dr. Slekar explains that abundant research demonstrates that out-of-school factors are far and away the largest contributors to student achievement. As much as 80% of student achievement can be directly attributed to issues such as family stability, number of books in the home, exposure to cultural resources, and whether or not a child had breakfast before school. Of the remaining 20%, teachers are certainly the most important in-school factor affecting student achievement, but by no means the only one.
The Organization for Economic Cooperation and Development (OECD) did a major study a few years ago in collaboration with 25 countries around the world looking at effective teaching. They concluded, “The first and most solidly based finding is that the largest source of variation in student learning is attributable to differences in what students bring to school – their abilities and attitudes, and family and community background.” The report noted that, “Such factors are difficult for policy makers to influence.” [OECD “Teachers Matter” report, 2005]
Furthermore, while teacher quality definitely matters, the OECD report found that most measures of teaching effectiveness have concentrated on factors that can be easily quantified – usually correlating student test data with teaching credentials, years of experience, and such. While there is a positive relationship between these things, the research shows that these matter to a “lesser extent than may have been expected” and that the teacher characteristics “which can be vital to student learning” are “harder to measure.” These include the things we should all care about in our teachers, such as “the ability to convey ideas in clear and convincing ways; to create effective learning environments for different types of students; to foster productive teacher-student relationships; to be enthusiastic and creative; and to work effectively with colleagues and parents.” [OECD “Teachers Matter” report, 2005, p. 2]
Another huge problem with the current frenzy of teacher evaluation systems is that they are also premised on the idea that we have too many “bad” teachers and must get rid of them. As evidence, the “reformers” often cite the statistic that current evaluation methods result in 99% of all teachers in the state receiving a satisfactory, or “qualified,” grade. The logic seems to be that we couldn’t possibly have so many qualified teachers. Naturally, a sensible counter-argument would be, “Why not?” It’s not like we’re hiring people off the street: teachers have to go through many gates, including training, certification, and then hiring by a school district, before they get their own classrooms.
While there may be a handful of ineffective teachers in any given district, I’m not seeing a plague of bad teaching: if anything, the teaching I see on a regular basis is quite good, despite the immense odds stacked against our teachers. Sure, where there is ineffective teaching, I want an improvement plan put in place, proper supports offered to that teacher, and then if none of that works, guide the person out the door. This is why we have a three year probationary period and the best districts train principals in good observation and feedback techniques, then make sure they have time to perform this most critical function.
In fact, if you think about it, the very best teaching evaluation system would be highly qualitative, one in which the principal takes on a teaching mentor role, creating what Dr. Slekar refers to as “a framework to discuss the classroom experience.” If anything, if there is a problem with current evaluation systems based on observation, it’s not that too many teachers receive a satisfactory grade, it’s that the quality and quantity of observation is insufficient (frequently just a quick once a year visit) and often inconsistent across districts (varying from building to building). The solution to that problem has nothing to do with student achievement scores.
Ah, but therein lies the rub. The entire evaluation system depends on what Dr. Slekar calls “the mythology of objectivity.” This is the idea that we can quantify everything, come up with the perfect formula, and reduce all aspects of teaching to numbers that will not lie – after all, they are numbers. But this lure of objectivity masks the reality that every standardized test we give our kids – and then want to use to evaluate our teachers – is in fact subjective. Slekar explains that the PSSAs are not objective measures at all and actually contain a great deal of cultural bias which continue to skew scores against our poorest students and students of color.
But even if we assume for the moment that those high-stakes-tests our children are taking yield legitimate results, there are still serious problems with using those tests to evaluate teaching. First, they were only designed to measure student achievement – not how well our teachers are teaching. As any scientist will tell you, when you want to examine something, the measurements have to be designed to actually look at what you’re interested in. And second, they completely omit many of the most important elements of teaching – you know, those very things we as parents and concerned community members think about when we recall our very best teachers.
Now let’s get back to that new state mandate which requires districts to base at least half of our teachers’ evaluations on student test scores. This half is supposed to use “multiple measures of student achievement,” but what that really amounts to is using the PSSA scores and breaking them apart and putting them back together in different ways. Pittsburgh has been working on a system to get out ahead of the new law, and wants to use a slightly different breakdown of percentages within this half than the one dictated by the state: the district “proposes 5 percent for building-level results, 30 percent for teacher-specific data and 15 percent for elective data,” which in most cases means “student surveys of individual teachers.” [Post-Gazette, 12-31-12]
For the building-level and teacher-specific data, Pittsburgh wants to use what is known as Value Added Measurements (VAM), which take into account how much a student has grown academically in a year, rather than taking a single snapshot of year-end performance on a test. While VAMs sound like a huge improvement, the reality is that VAM systems are still in the experimental phase and so far there is no evidence that any of them work. The National Education Policy Center (NEPC) reviewed VAM research funded by the Gates Foundation and found that “a teachers’ value-added for the state test is not strongly related to her effectiveness in a broader sense. Most notably…many teachers whose value-added for one test is low are in fact quite effective when judged by the other.” What’s more, the researchers warned, “there is every reason to think that the problems with value-added measures … would be worse in a high-stakes environment,” calling the results of the study “sobering about the value of student achievement data as a significant component of teacher evaluations.” [NEPC, Review of “Learning About Teaching,” 2011]
Dr. Slekar explains the problem with VAM quite simply: “Value Added Measurement systems will incorrectly rank teachers one out of every three times—at best.” [@theChalkface, 1-2-13] Just last week, education researcher Dr. Mercedes K. Schneider published an excellent investigation of the VAM system proposed by the state of Louisiana. Her careful analysis is worth reading in full if you are interested in the mathematics behind these measurements, but the crux of the problem comes down to this: VAM systems rank teachers, and in any ranking some will be at the top and some will be at the bottom (I find this itself a problematic underlying assumption). If you are going to use a tool to rank the teachers, then it ought to at least be stable, reliable, and consistent.
Dr. Schneider uses this analogy: “It is like standing on a bathroom scale; reading your weight; stepping off (no change in your weight); then, stepping on the scale again to determine how consistent the scale is at measuring your weight. Thus, if the standardized tests are stable (consistent) measures, they will reclassify teachers into their original rankings with a high level of accuracy. This high level of accuracy is critical if school systems are told they must use standardized tests to determine employment and merit pay decisions.” However, the VAM system frequently re-ranked teachers who had been at the top, down to the middle or the bottom, even when they had not changed a thing in their teaching. The bottom line? Dr. Schneider says, “I would discard the bathroom scale.” She concludes, “Yes, teachers should be evaluated. However, attempting to connect teacher performance to student standardized test scores cannot work and will not improve education in America. VAM does not work; it cannot work and needs to be discarded.” [“Value Added Modeling (VAM) and ‘Reform’: Under the Microscope,” 12-28-12]
So if VAM is a sham, why are we wasting our time – and untold taxpayer dollars – on this stuff? Pittsburgh appears to be smitten with the idea that it can keep jiggling the numbers until it finds the magic formula: the district says it will adjust for variables like “free- or reduced-price lunch eligibility, the number of English language learners, the number of gifted students, and other characteristics.” [Post-Gazette, 12-31-12] But as Dr. Slekar remarked, “VAM is garbage in, garbage out. There’s no research that shows a way to account for out of school factors. This is all in the experimental phase. No one has done it. In two different years you get two different results.” He asks, “How can a teacher be successful one year and not the next? When researchers look at this over 3, 4, 5 years, the reliability is zero.” And he points out that those teachers getting bad VAM scores can be the very ones who get the highest ratings from parents, those who inspire kids and are most humane.
Dr. Slekar also points out the difficulty in combining this VAM and student test score data with the other half of the teacher’s evaluation, which is supposed to be classroom observation. In Pittsburgh, this half comes from a system it developed called RISE (Research-Based Inclusive System of Evaluation), based on the work of education researcher Charlotte Danielson. But Slekar argues that RISE is a “distortion of [her] original work on quality teaching. Danielson’s qualitative system of evaluation was never meant to be merged with a invalid and unreliable quantitative evaluation system—Valued Added Measures.” [@theChalkface, 1-2-13]
At this point your head may be spinning. What’s the big deal? Why should we care? The take away is this: we are wasting precious resources on a system that will not give us good results, resources that we know would be far better spent on early childhood education, or even textbooks for our schools. Pittsburgh may feel it has no choice other than to comply with the new state law, but it has been preparing this system for a while. I would like to see our elected school board representatives have a real conversation about this at tonight’s meeting, find its backbone, and take a public stand. Enough is enough. These high stakes tests – and the VAM sham they perpetuate – are damaging our schools, our kids, and our teachers.
Help grow our grassroots movement for public education: join other volunteer parents, students, educators, and concerned community members by subscribing to Yinzercation. Enter your email address and hit the “Sign me up” button to get these pieces delivered directly to your inbox and encourage your networks to do the same. Really. Can you get five of your friends to subscribe? Working together we can win this fight for our schools.