The VAM Sham

It’s a new year, but for public education it looks like we may be seeing more of the same old thing. Tonight the Pittsburgh School Board will be reviewing a new teacher evaluation plan developed by the District based on highly problematic data drawn from all those high-stakes-tests our kids have been taking. Not only is the data bad, but the uses to which it is being put should be setting off alarm bells in every parent’s head as it actually damages our schools, our teachers, and even our children’s education. To understand why, Yinzercation talked to Dr. Tim Slekar, an education researcher and Head of the Division of Education, Human Development and Social Sciences at Penn State Altoona.

Pittsburgh’s plan comes just as Pennsylvania has introduced a new law mandating that every school district in the state must implement a teacher evaluation system, basing half of the evaluation on classroom observation and the other half on “multiple measures of student achievement.” We’ll get to these halves in a moment, but let’s start with the very premise of this new evaluation system. Pennsylvania and many other states around the country have introduced these laws as part of the corporate-reform-movement, which rests on the idea that public schools are failing, and that we must measure students with tests that will then be used to hold teachers accountable and even close down or “turn around” low performing schools (often by firing all the teachers). This seductive reasoning centers on the assumption that teachers are responsible for how their students perform on tests and that tests are an accurate measure of their teaching.

However, there are multiple problems with this logic. First, and perhaps most importantly, Dr. Slekar explains that abundant research demonstrates that out-of-school factors are far and away the largest contributors to student achievement. As much as 80% of student achievement can be directly attributed to issues such as family stability, number of books in the home, exposure to cultural resources, and whether or not a child had breakfast before school. Of the remaining 20%, teachers are certainly the most important in-school factor affecting student achievement, but by no means the only one.

The Organization for Economic Cooperation and Development (OECD) did a major study a few years ago in collaboration with 25 countries around the world looking at effective teaching. They concluded, “The first and most solidly based finding is that the largest source of variation in student learning is attributable to differences in what students bring to school – their abilities and attitudes, and family and community background.” The report noted that, “Such factors are difficult for policy makers to influence.” [OECD “Teachers Matter” report, 2005]

Furthermore, while teacher quality definitely matters, the OECD report found that most measures of teaching effectiveness have concentrated on factors that can be easily quantified – usually correlating student test data with teaching credentials, years of experience, and such. While there is a positive relationship between these things, the research shows that these matter to a “lesser extent than may have been expected” and that the teacher characteristics “which can be vital to student learning” are “harder to measure.” These include the things we should all care about in our teachers, such as “the ability to convey ideas in clear and convincing ways; to create effective learning environments for different types of students; to foster productive teacher-student relationships; to be enthusiastic and creative; and to work effectively with colleagues and parents.” [OECD “Teachers Matter” report, 2005, p. 2]

Another huge problem with the current frenzy of teacher evaluation systems is that they are also premised on the idea that we have too many “bad” teachers and must get rid of them. As evidence, the “reformers” often cite the statistic that current evaluation methods result in 99% of all teachers in the state receiving a satisfactory, or “qualified,” grade. The logic seems to be that we couldn’t possibly have so many qualified teachers. Naturally, a sensible counter-argument would be, “Why not?” It’s not like we’re hiring people off the street: teachers have to go through many gates, including training, certification, and then hiring by a school district, before they get their own classrooms.

While there may be a handful of ineffective teachers in any given district, I’m not seeing a plague of bad teaching: if anything, the teaching I see on a regular basis is quite good, despite the immense odds stacked against our teachers. Sure, where there is ineffective teaching, I want an improvement plan put in place, proper supports offered to that teacher, and then if none of that works, guide the person out the door. This is why we have a three year probationary period and the best districts train principals in good observation and feedback techniques, then make sure they have time to perform this most critical function.

In fact, if you think about it, the very best teaching evaluation system would be highly qualitative, one in which the principal takes on a teaching mentor role, creating what Dr. Slekar refers to as “a framework to discuss the classroom experience.” If anything, if there is a problem with current evaluation systems based on observation, it’s not that too many teachers receive a satisfactory grade, it’s that the quality and quantity of observation is insufficient (frequently just a quick once a year visit) and often inconsistent across districts (varying from building to building). The solution to that problem has nothing to do with student achievement scores.

Ah, but therein lies the rub. The entire evaluation system depends on what Dr. Slekar calls “the mythology of objectivity.” This is the idea that we can quantify everything, come up with the perfect formula, and reduce all aspects of teaching to numbers that will not lie – after all, they are numbers. But this lure of objectivity masks the reality that every standardized test we give our kids – and then want to use to evaluate our teachers – is in fact subjective. Slekar explains that the PSSAs are not objective measures at all and actually contain a great deal of cultural bias which continue to skew scores against our poorest students and students of color.

But even if we assume for the moment that those high-stakes-tests our children are taking yield legitimate results, there are still serious problems with using those tests to evaluate teaching. First, they were only designed to measure student achievement – not how well our teachers are teaching. As any scientist will tell you, when you want to examine something, the measurements have to be designed to actually look at what you’re interested in. And second, they completely omit many of the most important elements of teaching – you know, those very things we as parents and concerned community members think about when we recall our very best teachers.

Now let’s get back to that new state mandate which requires districts to base at least half of our teachers’ evaluations on student test scores. This half is supposed to use “multiple measures of student achievement,” but what that really amounts to is using the PSSA scores and breaking them apart and putting them back together in different ways. Pittsburgh has been working on a system to get out ahead of the new law, and wants to use a slightly different breakdown of percentages within this half than the one dictated by the state: the district “proposes 5 percent for building-level results, 30 percent for teacher-specific data and 15 percent for elective data,” which in most cases means “student surveys of individual teachers.” [Post-Gazette, 12-31-12]

For the building-level and teacher-specific data, Pittsburgh wants to use what is known as Value Added Measurements (VAM), which take into account how much a student has grown academically in a year, rather than taking a single snapshot of year-end performance on a test. While VAMs sound like a huge improvement, the reality is that VAM systems are still in the experimental phase and so far there is no evidence that any of them work. The National Education Policy Center (NEPC) reviewed VAM research funded by the Gates Foundation and found that “a teachers’ value-added for the state test is not strongly related to her effectiveness in a broader sense. Most notably…many teachers whose value-added for one test is low are in fact quite effective when judged by the other.” What’s more, the researchers warned, “there is every reason to think that the problems with value-added measures … would be worse in a high-stakes environment,” calling the results of the study “sobering about the value of student achievement data as a significant component of teacher evaluations.” [NEPC, Review of “Learning About Teaching,” 2011]

Dr. Slekar explains the problem with VAM quite simply: “Value Added Measurement systems will incorrectly rank teachers one out of every three times—at best.” [@theChalkface, 1-2-13] Just last week, education researcher Dr. Mercedes K. Schneider published an excellent investigation of the VAM system proposed by the state of Louisiana. Her careful analysis is worth reading in full if you are interested in the mathematics behind these measurements, but the crux of the problem comes down to this: VAM systems rank teachers, and in any ranking some will be at the top and some will be at the bottom (I find this itself a problematic underlying assumption). If you are going to use a tool to rank the teachers, then it ought to at least be stable, reliable, and consistent.

Dr. Schneider uses this analogy: “It is like standing on a bathroom scale; reading your weight; stepping off (no change in your weight); then, stepping on the scale again to determine how consistent the scale is at measuring your weight. Thus, if the standardized tests are stable (consistent) measures, they will reclassify teachers into their original rankings with a high level of accuracy. This high level of accuracy is critical if school systems are told they must use standardized tests to determine employment and merit pay decisions.” However, the VAM system frequently re-ranked teachers who had been at the top, down to the middle or the bottom, even when they had not changed a thing in their teaching. The bottom line? Dr. Schneider says, “I would discard the bathroom scale.” She concludes, “Yes, teachers should be evaluated. However, attempting to connect teacher performance to student standardized test scores cannot work and will not improve education in America. VAM does not work; it cannot work and needs to be discarded.” [“Value Added Modeling (VAM) and ‘Reform’: Under the Microscope,” 12-28-12]

So if VAM is a sham, why are we wasting our time – and untold taxpayer dollars – on this stuff? Pittsburgh appears to be smitten with the idea that it can keep jiggling the numbers until it finds the magic formula: the district says it will adjust for variables like “free- or reduced-price lunch eligibility, the number of English language learners, the number of gifted students, and other characteristics.” [Post-Gazette, 12-31-12] But as Dr. Slekar remarked, “VAM is garbage in, garbage out. There’s no research that shows a way to account for out of school factors. This is all in the experimental phase. No one has done it. In two different years you get two different results.” He asks, “How can a teacher be successful one year and not the next? When researchers look at this over 3, 4, 5 years, the reliability is zero.” And he points out that those teachers getting bad VAM scores can be the very ones who get the highest ratings from parents, those who inspire kids and are most humane.

Dr. Slekar also points out the difficulty in combining this VAM and student test score data with the other half of the teacher’s evaluation, which is supposed to be classroom observation. In Pittsburgh, this half comes from a system it developed called RISE (Research-Based Inclusive System of Evaluation), based on the work of education researcher Charlotte Danielson. But Slekar argues that RISE is a “distortion of [her] original work on quality teaching. Danielson’s qualitative system of evaluation was never meant to be merged with a invalid and unreliable quantitative evaluation system—Valued Added Measures.” [@theChalkface, 1-2-13]

At this point your head may be spinning. What’s the big deal? Why should we care? The take away is this: we are wasting precious resources on a system that will not give us good results, resources that we know would be far better spent on early childhood education, or even textbooks for our schools. Pittsburgh may feel it has no choice other than to comply with the new state law, but it has been preparing this system for a while. I would like to see our elected school board representatives have a real conversation about this at tonight’s meeting, find its backbone, and take a public stand. Enough is enough. These high stakes tests – and the VAM sham they perpetuate – are damaging our schools, our kids, and our teachers.

 

Help grow our grassroots movement for public education: join other volunteer parents, students, educators, and concerned community members by subscribing to Yinzercation. Enter your email address and hit the “Sign me up” button to get these pieces delivered directly to your inbox and encourage your networks to do the same. Really. Can you get five of your friends to subscribe? Working together we can win this fight for our schools.

35 thoughts on “The VAM Sham

  1. This is a great explanation of the problem with VAM type measures — particularly salient is the point that outside of school factors are so very important.

    However, this is also why I had such a problem with Yinzercation’s alliance with A+ Schools last spring around the PPS administration’s desire to “keep all those great young teachers” and fire “all those bad old teachers.” That whole crisis was rigged and used by this administration as a quicker way than VAM or even accurate evaluations to clean house. The truth is that administrators (and not teachers and not their union) have and have always had the ability to get rid of bad teachers. It isn’t a quick process; there are steps to be taken which allow time for remediation of the teacher. But, an administrator doing his or her job can fire a teacher.

    • Check out Montgomery County, Md. evaluation plan. It’s called PAR and is very successful because it’s done by committee which includes teachers. and also includes a year of mentoring for a struggling teacher. Under this system, which is approved by the union and allows due process, teachers either improve, leave on their own, or are fired. Joshua Starr the superintendent and his predecessor both turned down RTTT funds in order to keep this process over VAM. Test scores are never used an indicators of a teacher’s success or failure.

      It’s

      • Teachers should push for something like that in Pittsburgh. They have been very successful in Montgomery. Perhaps if the union proposed their own methods of accountability, some of the external pressure would dissipate.

      • Teachers or “the union”. No teacher with any common sense would NOT agree to VAM. VAM is a faulty measure. However, unions across the country are agreeing to it and selling (selling out) and spinning it as a good thing, along with rubrics that decides what good teaching is. All VAM does is create a climate of teaching to the test.

  2. Pingback: The VAM Sham | Yinzercation | THE CLOSED CAMPUS

  3. PPSparent,

    You hit the nail on the head – so many wonderful points!!! As a teacher and Union member (in a right to work for less state), this is a huge pet peeve of mine — saying there are so many bad teachers. That’s actually not true, as explained in the article.

    Additionally, if principal’s would do their jobs, those few bad teachers that exist would be OUT! Most principals cannot be bothered to follow to the process and even fewer are willing to work with weak teachers to make them stronger. They prefer to just ignore them — so, so frustrating!!!

    Thank you so much for your comments; they are appreciated.

    Teacher who has been VAM-ed in FL

  4. Pingback: The VAM Mess in Pennsylvania « Diane Ravitch's blog

  5. Thank you. I enjoy your articles and their worth, I am a 2nd grade PPS teacher, and these concerns over data driving my instruction is never ending. I know data is important, but, at what point will the upper crust of PPS take a deeper look at the external reasons to why these children aren’t performing where they “should”. They don’t see the growth they do make like I do. I grow these kids, I grow them well!
    This Wednesday, in my post conference of an informal observation I was asked what data made me teach my phonics lesson the way I did (Mind you, it was the first day BACK from winter break and the introduction to a new theme). I was floored because it was an introduction to the vowel sound! I luckily had my data from another unit 6 weeks prior and spewed out 40% of those kids present today got the long vowel u wrong 6 themes ago.
    The reality remains, my job seems to depend on how the kids will do on the terra novas’ then into their PSSA’s. I am with these kids doing my hardest, day in and out. However, I can’t go home with them. Talking with my colleagues, most feel the same. A chunk of our children in our class have families that don’t value education. They avoid our calls, call us names, and don’t place the value of education into their kids. We battle everyday with those student and those external voices they hear. As a teacher, I don’t just teach the curriculum and beyond (taking into account my formative assessment data, past test data, dibels data, CBA data, pretest data, RTII data), I am teaching core values: compassion, love, tolerance, patience, personal care, discipline – just to name a few.
    I am aware that my some of my kids spent the night in the ER because mom is going through early labor, or the reason one child comes in late is because she has nightmares from finding her grandmother dead, or their bus suspension kept them from school because they have no vehicle to get them to school, or that I have donated many a bag of clothes to families who need them far more than I do. It’s too bad the administration doesn’t see that and take that account into my VAM scores.
    No, I just have to hope no mom goes into labor, no nightmares, no abuse, no bus suspensions happen the night before.

    • Very well put! I am also a PPS teacher and have many of the same issues with VAM and the way we are evaluated. This school year, we are only four months in, my students have been tested, from district level assessments, six times. Keep in mind that six times does not equate to six days, but more like two FULL months of testing. By this point they are burnt out and the data being collected is completely invalid. But do you think they take this into consideration when “VAMing” us?
      The district is constantly stating everything they do is to grow students, but they most of the initiatives take teachers or students out of the classroom; thus, leaving many students behind.
      In the end, the kids always lose out and are not ready for the real world. The decision makers need to let us do our job that we are trained for and we will move kids. We know them. We know what how they learn best. We know how to care for them. Just let us.

  6. Pingback: VAM in PA « Network Schools – Wayne Gersen

  7. As stated in the article, garbage (and I prefer the more crude term of excrement) in, garbage out. Noel Wilson has proven that the many errors (he’s identified 13-any one of which invalidates the process) involved in the making of educational standards and standardized testing renders any conclusions “vain and illusory”. See his: “Educational Standards and the Problem of Error” found at:

    http://epaa.asu.edu/ojs/article/view/577/700

    Or any essay review of the testing bible: “A Little Less than Valid: An Essay Review” found at:

    http://www.edrev.info/essays/v10n5index.html

    http://www.edrev.info/essays/v10n5.pdf

    Also any usage of a standardized test score for anything other than what it is designed for is UNETHICAL. So that using a 5th grade math score to “evaluate” the teacher is dead wrong and UNETHICAL (yes I shout that out as no one seems to understand that simple fact).

  8. ‘and whether or not a child had breakfast before school.’ How many free breakfasts, lunches, snacks, and now, yes, even suppers do we have to offer? The fact that a mother doesn’t get up in the morning and fix a hot bowl of Cream of Wheat (21 servings@$3.48+ ¼ cup of milk)is an indictment on her motherly ‘skill’. Even an animal innately provides for their children. I know, they’re all ‘working mothers’ and are not home when their child catches the bus at 6:30am. I won’t ask where the father is, but how about a bowl of Cheerios, then?

    And how many books are in the home? I guess that depends on whether the ‘mother’ has ever taken her children to the free library and checked any out. Surely, even a ‘working mother’ has one day off? What better way to teach your child to respect property (books), authority (librarian), rules (getting books back on time) and to learn civilized behavior skills such as being respectful, quiet and still.

    If 80% of the problem lies in the home (and I don’t refute that) the question arises what to do with those students who lack basic school skills before entering 1st grade. They’re six years behind and statistically never catch up. In the mean time even one youngster can disrupt an entire class. It’s time for Social Skills Remediation Boot Camp.

  9. (Pre-K teacher) My parish teacher evaluation included 1 formal 30 minute peek with a 5 point rubric and an administrator who never stepped foot in my classroom beforehand with no preschool experience.Teachers are being given a number of rank based on this!

  10. Yinzer, an honor to be quoted on your blog. Dr. Slekar has invited me to be a guest on his internet radio talk show, @the chalk face, to discuss VAM/teacher eval issues. I will post the specific date once scheduled. The show is on Sundays at 6 p.m. EST.

    • Dr. Schneider, that is terrific! Definitely let us know when the show is scheduled so we can spread the word.
      Yours,
      Jessie
      ———————
      Jessie B. Ramey, Ph.D.
      ACLS New Faculty Fellow
      Women’s Studies and History
      University of Pittsburgh

  11. Our theme is “VAMed if you do, VAMed if you don’t!” It would be great if all things outside of school were also taken into consideration as well as those students who just don’t care or are timed out on ESL minutes and still need assistance. About 25% of my class falls in this category and I’m a first year teacher that is scared to death that I won’t have a job next year because of these issues…

    • May I suggest that you get a copy of my book “Making Peasants into Kings.” (ISBN: 978-1-4490-0634-1). I discuss the strategies I used to get unmotivated students to think their way into knowledge. It worked for me.
      Jay C. Powell

      • Hi Jay,
        Thanks for the recommendation of the book for our teachers. My concern is that our teachers are not able to implement solutions like this in the classroom now — things have become so scripted, with bell-to-bell curriculum, and massive amounts of time spent on test-prep, that they are not able to do what we *know* is right. Forget real, evidence based strategies … forget human beings teaching human beings. What is happening in our classrooms is not OK — for our teachers or our kids.

      • Hi Dr. Schneider,
        The fact that profoundly informed students bring more than the expected level of nderstanding to a question can lead to selection “wrong” answers for juditifable reasons and this behavior lowers their total scores should be a strong enough discovery for a class-action lawsuit.
        There are at least thousands, if not if not hundreds of thousands of students impacted by this approach to assessment should be sufficient for a major class-action law suit against the Federal Department of Education for funding this misceant behavior. A colleague and I developed an alternative approach that was published in a major journal (Educational and Psychological Measurement) in 1992.
        That is morw than 20 years ago. I have two chapters in two monographs published by Springer -Verlag, the world’s leading science publisher. One in 2010 and the other in 2012. I am endeavoring to get a book published through them.
        This work has international recognition. People in ETS and ACT have had direct contact with this research. There is no excuse for this research not to be implemented, at least on a trial basis. Ignorance is not an excuse because these leading people already know about this work.
        I suggest that you get together with Students First, the Fordham Institute and the American Ststistical Association, all of whom have been made aware of it and see whether, colectively you can raise the threat of such a suit. I am certain the American teachers’ Federation would follow, and may be many others interested organizations, perhaps icluding the Bill and Malinda Gates Foundation.
        I am not interested in money for me, I am interested in breaking the strangle-hold by the assessment community and the corporate interests behind them upon public policy, in all areas, but most particularly education.
        Education, properly handled is the most intimate of of human activities because, for it to work, the Souls of two people must touch and embrace. This is how the deep potentials of children are found and released to open the opportunity for the contribution of these hidden talents.
        Placing childern into intellectual cages and treating them like rats in a maze is a crime against humanity. It MUST be STOPPED!
        Please let me help you and please help me to get this message out!
        Jay Powell

      • Jim, Michelle Rhee’s Students First, the Fordham Institute, and Bill and Melinda Gates Foundation are promoting the reforms, including VAM. The only thing these folks wish to “touch and embrace” is the money, power, and prestige that come with wrecking American education.

      • Hi Mercedes,
        This a is a shame and a sham. I will continue to express myself to all who will listen and together we may be able to build a network that will stand against the money and power interests.
        Children are our future. Anything that interferes with the full realization of their true potentials is a crime against humanity.
        Accountability for the quality of service delivery is critically important, but the accountability model being used must be just, equitable and founded upon human values.
        My personal story is a constant battle between doing my best for my students and meeting the contrary demands of the educational bureaucracy. What these people who have done well for themselves in the current system by scratch and scramble techniques do not understand is that kindness, gentleness and the empowerment of others is the route to human survival and to the rescuing of our planet from man-made disaster.
        Please give me your snail-mail address and I will send you an autographed copy of my book. You will then know from whence I am coming and we can begin to set up a counterforce.
        Time is short and the allies are scattered and often unaware of each other.
        Jay

      • Jay, some good news here is that there IS a counter-force that has been pulling together — in just the past couple years, we’ve seen the explosion of both local grassroots groups like Yinzercation, and national groups like Parents Across America, United Opt Out, and many others. Check out the “Resources” tab here on Yinzercation to see the impressive list! The grassroots IS getting organized — parents, students, teachers, and concerned community members are standing together and fighting back. Just look at what is happening in Seattle — Bill Gates’ own back yard — as entire schools full of teachers are refusing to administer needless tests. (Check out the Yinzercation Facebook page for the latest on this …)

  12. The use of total-correct scores is invalid for two reasons:
    First, multiple choice tests usually have four options from which to chose but they are scores as if they are “true/false” items. The alternative possibilities are collapsed as if they are only one catagory (wrong), which destroys the integrity of the data from a mathematical sense.
    Second, the excuse for this scoring error is the assumption that these alternative are selected thoughtlessly. This assumption is false. These answers are chosen based upon how the examinees interpreted the questions,
    This second fact means that tests are not measuring what students “know” but their interpretation skills.
    In fact very knowledgeable student can think more deeply than the question designer intended and “correctly” chose a “wrong” answer.
    To help students, teachers need to know which answers were selecvted and why these choices were made. This information is potentially available if the test-developers were reqiored to provide it, but from the total scores, we don’t even know which answers were “right!”
    We cannot fix education with such poor quality information. This approach is as useless as trying to tell the quality of an egg from the color of its shell.

  13. Pingback: The VAM Sham | progressivenetwork

  14. Jessie, from Advocate for Public Schools: Are we applying the correct “Business Model” to American schools? Actually, I believe that searching for “quantifiable” measures of teacher effectiveness, counting test scores, and thinking of students as “widgets” rather than persons with rights is precisely the wrong way to go. Gurus pop up periodically. At mid-century, an American corporate philosopher and statistician, W. Edwards Deming, led Japanese manufacturers to economic highs with his advice: TQM: Total Quality Management. Steel, automobiles, electronics, appliances made in Japan were the new gold rush. In time, Westinghouse and other U.S. companies joined the bandwagon, training executives and employees to learn the Deming way. The idea that we could reduce the number of “production failures” trickled down to schools. A new reform movement in the cloak of a business model almost succeeded. While it didn’t totally succeed, the residual thinking is still lurking. This chase for single measures of learning, competitive awards for money, bids for private contracts, the hunt for the next new thing has been a diversion.

    Japan’s economy eventually took a nose dive despite the Deming reforms and awards. Other forces were in play. So, too, today. Credible voices are telling us that other forces are in play in schools: that we must not reduce education to a single index; not short-change our students. We owe it to ourselves to go farther in the right direction to rediscover human rights in teachers and learners; restore school budgets to align with what it takes to succeed. Let’s listen to those who know the total value of understanding our children one at a time. Evelyn Murrin

  15. Pingback: Frequently Asked Questions About Opting Out | More than a Score Chicago

  16. Pingback: A teacher’s troubling account of giving a 106-question standardized test to 11 year olds

  17. Pingback: WP: Troubling Test for 11yr Olds | The Next Deal

Leave a Reply (posting policy: no name calling, keep it civil or we'll send in the Kindergarten teachers for a lesson in manners)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s