Pointing Fingers

Mea culpa! I’m sorry! I messed up. Last week I posted a comment on an article in the Post-Gazette that caused quite a stir among some of the people whose opinions I care deeply about. I issued a public clarification, but I feel this episode could give us all an opportunity to think through some issues together.

In fact, from its inception, that is what this blog has been about – a place where I have essentially been “thinking out loud,” posing questions, and looking for answers. It has often been a place for public dialogue, with people posting responses and counter arguments. As posts have circulated in social media (and occasionally been picked up in the national media), they – and your reactions to them – have become a part of the public conversation around education justice.

But I should note that this is often a very uncomfortable form of writing for an academic: it’s vulnerable and offers much less shielding than peer review and the lengthy research and publication process. No matter the format, I believe scholars ought to approach their writing with humility, admit they don’t know everything, and be willing to ‘fess up when they step in it. So here goes.

Last week Tony Norman published a column on education, taking some black leaders to task saying, “Their emphasis on teacher evaluations as the key to closing the education gap and spurring black academic achievement is misplaced….” And he noted, “The racial achievement gap and the academic mediocrity of far too many black students is not the creation of diabolical teachers unions determined to protect the jobs of unqualified teachers at the expense of children in urban schools.” [Post-Gazette, 2-18-14] He then turned and pointed the finger at black parents, suggesting they are the real problem in education today.

Now anyone who has read my work surely knows that I do not blame black parents: from my book on black and white families and the history of child welfare, to the over 250 posts on this blog calling for equity and vociferously objecting to racism and the disproportionate impact of education policies such as school closures, discipline, and resource decisions on students of color and communities of color. But I did a poor job of explaining that when I posted my comment:

“Thank you, Tony, for moving the needle on this conversation. It’s time to think bigger about our persistent opportunity gap. Parents and families are a crucial part of the equation — but they, too, must often be supported. I am very excited about the Community Schools strategy put forward by Great Public Schools Pittsburgh, with proposals for working with the mayor’s office, community partners, teachers, parents, faith based groups, foundations, social services, and more. Community Schools can engage entire communities in supporting families and students, and put the resources we need back in our neighborhoods. This is a positive, attainable plan to re-energize our public schools as the hearts of our communities again. I encourage everyone to read:
http://www.gpspgh.org/storage/documents/GPS_Community_Schools_Education_Report.pdf”

In response to a public firestorm around Norman’s article (it seemed everywhere I went last week it was being discussed), and several pointed emails from friends that landed in my inbox, I realized in horror that my comment could be interpreted as support for blaming black parents and I posted this additional note:

“Several people have contacted me in regards to this comment, so I am offering this clarification. I am glad to see a piece that is not simply blaming teachers, which has become a very loud public narrative that I do not find helpful. And I am most indeed very concerned about our achievement/opportunity gaps. But I disagree with Tony’s conclusion: I don’t want to blame parents, either. Substituting parent blame for teacher blame won’t work and plays on long-standing, troubling assumptions about race and families. My hope is that we can find bigger solutions, that address our pernicious equity and resource issues. That is why I am hoping everyone will take the time to read the new GPS report, which offers both vision and solutions.”

Obviously I am excited about the new Great Public Schools Pittsburgh report and am eager for folks to read it, comment, and discuss. I do feel it offers a blueprint for community collaboration that will let us get past pointing fingers at teachers or parents, and focus on bigger answers to the complex problems that vex public education in our city. If I was overly eager to direct people to the report and missed an opportunity to shake my finger at Mr. Norman, I apologize.

Right now, however, I would like to put all those fingers down. I am concerned about the growing divisiveness I have seen over the past year in the education justice movement, especially around the issue of teacher evaluation. (And to the extent that I have contributed to that, I apologize, too.) I have largely resisted writing directly about Pittsburgh’s new evaluation system because it’s extremely complex and I still have more questions than answers. But perhaps now would be a useful time to sketch out what I see and pose some of those questions.

First and foremost, I think we all believe teachers should be evaluated. That’s not the issue. It comes down to the context in which it is done and the way that process impacts students and our schools (not to mention individual teachers).

Here’s my understanding of how the new system works. Most teachers will now receive a score based on 50% observation (a relatively new system called RISE), 15% student ratings (a new system called the Tripod survey), 30% teacher Value-Added Measure (VAM) and 5% school VAM. The value-added system is a complicated formula that attempts to predict how much individual students should learn in a year and then calculates how much they actually grow, on the basis of test scores. The VAM system uses Curriculum Based Assessments (CBAs) and unit assessments, which are given throughout the year, as well as PSSA and Keystone results from the state exams, which are given just once a year. There are VAM scores generated for both individual teachers as well as for the whole school.

I have heard very good things from teachers and administrators alike about the new RISE system, which uses both principal and peer observation. I particularly appreciate that it is done in the spirit of practice improvement and to target professional development. It seems to me that multiple classroom visits by both peers and principal evaluators also means problems can be caught earlier (rather than relying, say, on state test data that is not available until six months later, in the next school year). Presumably, this kind of improved observation can help the district weed out unacceptable teachers more quickly.

I’ve been hearing more mixed results about the Tripod surveys, which students complete about their teachers, especially concerning poorly worded and confusing questions. I learned recently that the survey is made and administered by the Cambridge company, and that we can’t change these questions. But, it’s “only” 15% of the total score, so maybe not enough to get too worried about. (If it’s fair to say that about numbers that affect real human beings.) I am curious if the district is seeing a strong correlation between RISE and Tripod data? In other words, are principal and peer observers seeing the same thing that students are saying?

VAM is where I have the most questions. Again, I am glad that the majority of the teacher’s score is observation based. But with 35% of the score dependent on student test data, I am concerned about the impact on students as we continue to expand the number of tests (for example, so that we can get test data for all teachers, including music and art) and therefore the change in test culture in our schools. For instance, we now see far more test-prep and focus on the tests with posters, morning announcements, pep rallies, and more. And I worry that Pennsylvania will see a similar lawsuit like the one in Florida this week which forced the state to release individual teacher’s names and VAM scores to the media. [Tampa Bay Times, 2-24-14] When that happened in California a few years ago, a highly regarded teacher committed suicide. [LA Times, 9-28-10]

Yet even with my concerns about privacy and individual pieces of the new system, I have listened to and been reassured by Dr. Lane that our teachers will not be force stack-ranked (in other words, the system does not force a certain number of teachers into each category, thereby guaranteeing a particular “fail rate”). I have met with district officials who have described very positive professional development and support structures, some existing, some in the works. And just last week I heard the Pittsburgh Area Jewish Committee interview Pittsburgh Federation of Teachers president Nina Esposito-Visgitis, who reiterated the union’s support for the entire evaluation system and described the current quibble with the district as a disagreement over where to draw the “cut score” defining unsatisfactory teachers.

So if the district and the union both think this is a worthwhile system, it seems to me it’s time for them to get back together and figure those numbers out so we can move on to talking about much more important things. I have zero interest in the cut-score debate.

I would rather be talking about what kind of professional development and support our teachers need and what they’re getting. I would much rather be talking about how to reduce the overall number of high-stakes tests our kids are taking and how we might work together to change some of those stakes attached to the tests. (For instance, I just learned that students applying to district magnet schools get two extra weights for advanced PSSA scores, while low income status adds only one extra weight: should good test taking count for more than poverty for kids trying to get into some of the district’s best schools?) I want to have an honest conversation about test culture. Heck, I want to talk about the Community Schools strategy and nurses and librarians and early childhood education.

There’s room for disagreement in these conversations, but we could be working together. Here’s what I’ve committed to working on in the coming weeks in the spirit of collaboration and fostering dialogue:

  • I’m delighted to serve on Mayor Peduto’s newly appointed Task Force on Public Education.
  • I’m also thrilled to be working with the Heinz Endowments and several other key community leaders on a proposal to bring a major national education justice conference to Pittsburgh (we are one of two finalists in the running!)
  • On Friday I will be going to Austin along with two other Yinzercation activists, where I’ve been invited to speak at the first national conference of the new Network for Public Education. I look forward to learning more about the national education justice scene and reporting back.
  • On Tuesday, March 11, Yinzercation will be hosting a screening of the new movie, “Standardized,” followed by a community discussion of testing and student learning.
  • On Tuesday, April 8, Yinzercation and PIIN will be co-hosting a gubernatorial candidate debate focused exclusively on education (the only one of its kind in Pennsylvania!). The event will be co-sponsored by the League of Women Voters, and others are coming on board. I’ll have more details for you soon. We need help organizing, so let me know if you’re interested.

I’m inviting all of us to put away the fingers and participate in some good old-fashioned civil discourse.

Eight Reasons Why Scoring Schools Doesn’t Work

Pennsylvania has just released its new School Performance Profiles, or SPP. As I’ve said before, that acronym probably ought to stand for Stupid Public Policy. These profiles are essentially scores assigned to schools based on the results of student testing and replace the previous Adequate Yearly Progress (AYP) rankings. [See “From AYP to SPP”] It’s very trendy right now among corporate-style reformers to grade schools like this. But the whole idea should receive an “F” and here’s why:

1.  The stakes are too high. Assigning scores to schools adds the “high stakes” to high-stakes-testing. When student test data is being used to determine resource allocation and to shape public perceptions of schools, the system creates a perverse incentive for adults to cheat. [See “A Plague of Cheating”] Recall Florida’s state superintendent, Tony Bennett, who was forced to resign this summer after reporters discovered that he had helped increase the grades of several schools. In one case, he increased a C to an A grade for a charter school run by and named after one of his campaign donors. [Politico, 8-1-13]

And don’t forget the Atlanta superintendent who was indicted this spring along with 34 others, including teachers and principals, for widespread cheating on the state’s standardized state tests. Investigators found 178 Atlanta educators had worked to change student answers, among other things, to increase the district’s performance. Eighty-two people have already confessed and the superintendent now faces up to 45 years in jail. [Washington Post, 3-30-13] This year we have confirmed cases of test score manipulations in at least 37 states plus the District of Columbia. [FairTest, 3-27-13]

Of course adult cheating is just one consequence of high-stakes-testing. Teachers are being demoralized by this system. Pittsburgh superintendent Dr. Linda Lane reports that when teachers received the results of the high-stakes-testing that formed the SPP scores “some were in tears.” [Post-Gazette, 10-5-13] But students suffer the most. The over-emphasis on testing results in lost class time, a school year spent on test preparation, the narrowing of curriculum, and the perpetuation of abusive practices that undermine actual learning. [See our piece “Testing Madness,” which was just republished in the Washington Post.]

2.  Scores actually reflect bad state policy making. The SPP scores are largely based on PSSA and Keystone test results, which are down for many students as the result of state decisions. Dr. Lane suggested the drop in Pittsburgh test scores resulted from, among other things, budget cuts, the elimination of modified testing for special education students, and the new Common Core standards, which are being taught in the classroom but not measured on the tests. [Post-Gazette, 10-5-13] With increased class sizes, school closings, and the loss of hundreds of educators again this year in Pittsburgh alone, our student test scores say more about poor state educational policy making than about actual teaching or learning.

3.  A single number is insufficient. The Pennsylvania Department of Education calls its new SPP system “comprehensive” and boasts it “brings together multiple academic indicators that are proven to provide a full overview of academic growth and achievement in our public schools.” [PDE, 10-4-13] I don’t know what evidence there is that these indicators “prove” academic growth in schools, but the idea of using multiple academic indicators sounds like a good idea. Too bad, then, that 26 of the 31 indicators listed for each school are actually based on high-stakes-test scores. [PA School Performance Profiles] While factors such as attendance and promotion rates are now being considered, these SPP scores are little more than a re-packaging of high-stakes-testing. Test scores don’t tell us much about what is actually happening at a school: the after-school mentoring program that parents started, the new playground or garden built and paid for by the local community, or all of the programs teachers volunteer to make happen, from directing the school chorus and plays, to coaching sports teams and the math club, mentoring student government, and collaborating with local artists. Where are those things on the profile?

What’s more, nearly everyone is fixated on the single “academic score” calculation – the grade – assigned to each school. The PDE can claim all it wants that these are robust profiles, but the media in every corner of the state has already demonstrated the way in which these profiles will be reported as single scores. For instance, the Post-Gazette reported, “Of those [Pittsburgh schools] that have academic scores, the highest is 82.6 at Pittsburgh Liberty K-5.” [Post-Gazette, 10-5-13; see also Post-Gazette 10-5-13 graphic.] Yay, Liberty! But honestly, what does that mean? The SPP scores effectively rank and sort schools.

4.  These systems are prone to error. The state has already bungled the release of SPP data. More than 600 schools (out of 3,000) do not have complete scores because of problems with relying on students to correctly fill in bubbles on the tests indicating if they were “end of course” exams. Rather than hold up the promised roll-out of the new profiles, the PA Department of Education instead released only partial data on Friday, leading to more confusion. For instance, no Pittsburgh high school or any school containing eighth grade currently has a score. The state is also delaying the release of the 2013 PSSA and Keystone results. West Mifflin Area Superintendent Daniel Castagna summed it up, saying, “This is a mess, an absolute mess.” [Post-Gazette, 10-5-13]

5.  Scores actually just measure poverty. It’s great that my former elementary school (Eisenhower in Upper St. Clair) got the highest score reported so far in the county, with a whopping 97.9. Not surprisingly, there’s my middle school, Fort Couch, at 96.8. And all of Mt. Lebanon’s reported schools so far are over 90. But did we need all these tests and this elaborate new system to tell us that upper-middle-class kids in predominantly white suburbs are doing better than those in the struggling Duquesne school district, which weighed in at 49.3, the lowest in the county? What standardized test scores are really good at showing is family income. For an excellent visualization of the correlation between test scores and poverty, take a look at last year’s SAT:

6.  Scores don’t measure what matters. The Pittsburgh school district has conducted research on its own graduates and concluded that, “the most important predictors of post-secondary education success are grade point average and attendance, not state test scores.” [Post-Gazette, 10-5-13] If that’s the case, why are we spending so much time giving these high-stakes-tests to our students? Why are we giving 21, even 30, standardized tests each year to our kids? [PPS Assessment Calendar] Why aren’t we focusing on providing a rich, engaging curriculum with music, art, languages, and activities so that students want to be in school? We can’t even discuss these important questions because the School Performance Profile system forces districts to continue playing the game of ever-more-testing in the name of accountability. But if we really care about what matters – such as actual student learning or college success – policy makers must move away from systems that simply reinforce testing by assigning grades to our schools.

7.  Scoring schools wastes valuable resources. The SPP system cost us taxpayers $2.7 million to develop over the past three years. [Post-Gazette, 10-5-13] That’s $2.7 million at the exact same time that Governor Corbett and our legislature were telling us we did not have money to pay for our public schools. And it will cost an estimated $838,000 every year to maintain. That’s a lot of drumsticks for the Westinghouse Bulldogs Marching Band or library books for Pittsburgh Manchester K-8. Beyond the ridiculous price tag, grading our schools costs valuable staff time and wastes the attention of the public, media, and policy makers by forcing them to focus on the wrong thing.

8.  School scores don’t help students. SPP scores don’t give students what they really need: adequate, equitable, and sustainable state funding for their public schools. Public policies that support, rather than vilify, their teachers. Quality early childhood education. Pre-natal care. Healthcare. The stability of their community school remaining open. Smaller class sizes. It’s would be funny, if it weren’t so cruel, to hear the PA Department of Education proudly explaining that under the new SPP system, the lowest scoring Title I schools (those that serve a large proportion of low-income students) are now eligible for “access to intervention and support services.” [PDE, 10-4-13] How about access to their laid-off teachers and state funding they desperately need?

Even worse, the highest performing Title I schools will now be rewarded by becoming “eligible to compete for collaboration and/or innovation grants.” Are you kidding me? This is right out of the Race-to-the-Top playbook, making schools compete for the resources they desperately need. Races and grant programs by definition have winners and losers. No student at a Title I school deserves to be a loser in this game invented by policy makers. Our kids don’t need “technical assistance,” they need state legislators to restore the budget cuts and reinstate a modern, fair funding formula. These SPP scores are only going to hurt our poorest students and communities of color more.

Testing Madness

We’re just a month into school and already the testing madness has begun. Many Pittsburgh Public School students have just taken their first round of standardized tests, and it’s time to ask some serious questions about their purpose, the ever-increasing number of tests, and the impact on our children.

Let’s start with this troubling account from a middle school language arts teacher, who gave the “GRADE” reading test to her students on Friday. This is a diagnostic assessment designed by the education corporate giant, Pearson, and the district is using grant money to pay for it. (More on both of these points after you have read this teacher’s story).

———–

Say you’re a teacher with a diverse and exciting group of students who have found learning together an exciting prospect. You have had ups and downs, but each day has ended with more students feeling positive about their ability to learn, and each day investing more in the process. Then, a couple of weeks into the school year, you have to make the first stop in this process. The first Pearson-created standardized test has landed on your desk. Teaching/learning has to stop. You hide your face from your students as you grit your teeth. You tell them, as always, not to worry. You tell them no one expects them to get all of the answers right, but you do expect them to do their very best. You know they will, as they will want to show everyone how smart they are, just as they’ve shown you in so many ways. But inside you cringe . . .

You stand in front of the class and read a sentence to the children. You are allowed to repeat the sentence only once. Then the students select one of four pictures that they think most reflects what the sentence says. The children look determined; they are ready; you begin.

The first question seems harmless enough. The students look ok. Then you get to the second question. Of 106 questions. The sentence you read says something like, “Luis draws a blank when he is asked to solve a math problem on the board.” The students have four drawings to choose from. In the second drawing, a student is drawing the kind of blank one would see on a paper on which students are directed to “fill in the blanks.” It is a blank. He is drawing it.

You start to feel stomach pangs as you look around the room at eleven-year-olds, many of whom come from non-English-speaking families, or families for whom this type of idiomatic expression is not common, and you realize that you have never come across this expression in any of the literature you have taught students over the years. You know it is unlikely that many of these children will recognize the puzzled expression on the face in one of these pictures as the “right” answer. For many of these kids, “blanks” have to do with guns.

But you go on, and hope this is an exception, this bad question. Then there is Question #4: “Roger told her that he would have changed the oil himself in a couple of weeks.” What? Changed what oil, you think, as you look around at your class of children for whom having a car in the family is far from a given. Children for whom having a parent who changes oil in a car is even less likely. Then you look at the drawings, and you see why the children are looking confused – but still trying hard.

All four drawings include a car. Three of them include a man doing something under a car. Two of them include a girl sitting in a wheel chair watching the man (this, perhaps, makes this test culturally responsible?). One of them shows a car driving down the road with smoke coming from under the hood. I – a long-time car owner and driver and oil-change customer — had to look at this set of pictures several times, over a couple of days, to figure out which answer they were looking for. (Really, if you waited a couple of weeks to change the oil, wouldn’t it be possible that the engine would smoke?) But the children had only a short period of time to figure out – or guess – what the answer was.

By now my students were getting a bit restless. The confidence with which they had gone into this testing situation was beginning to dispel. Just a bit. There were still 102 questions left to answer.

We went on. Question number six referred to “a pair of drumsticks” and included as choices a boy eating two chicken-type drumsticks along with others of the musical kind. This is almost funny, but the students are supposed to choose the “right” answer. Number seven brought my stomach pangs back. The expression in this question was “brushed up on art history.” “Brushed up.” The first choice showed a man with a paint brush and an easel – the only one of the pictures clearly about art. The “correct” choice was a man looking through a stack of books, one of which had a tiny, crude and hard-to-see drawing of a female which one could interpret as the Mona Lisa, if one were familiar with her. I began to wonder, what was Pearson, this test maker, doing to our children?

Question number eight had two possible answers, each of which was equally justifiable. Oh, but our students never would have the chance to justify their answers on this type of test. Take that, you kids who are daring to think . . .

Question twelve put me over the top. But I continued my outward calm, even as I watched the kids squirm, and as some began to lose their focus and their positive demeanor. The mumbling had begun. The sentence I read to the class said something like “she realized she could store her belongings in the bureau.” “Bureau.” There were four pictures to choose from. One was a building that looked like a public “bureau” of the government to me, but I doubted my students would think of that. One was of a tractor. Scratch that. But I looked at my students whose families speak Spanish at home. And I looked at the burro in picture “C.”

Then I looked at the picture of what my family calls a chest-of-drawers. And I thought about how we have never used that word, “bureau,” for a piece of furniture. And I have never heard that word in the homes of my students’ families. And I thought, how crude, how cruel, how ignorant, how disrespectful of these children. What a set-up. Who would do that to kids?

Question 16 was . . . well, you decide. The sentence is something like “Carl approached a friendly wave as he walked onto the beach.” You guessed it. Only worse. One drawing has boys looking at what looks to me like a very friendly wave of water. Another has a boy walked toward two boys, one of whom gives a friendly hand wave. Another has two boys walking toward one boy who gives a friendly wave. Who is Carl? No one has told the student. And the last one has a boy riding a friendly-looking wave with a smile on his face. Wha . . .?

After the 17th “listening comprehension” question, the students went on to the rest of the 106 questions on their own. They still wanted to do well. Some, however, had already given up. Among them were the tell-take signs of anger and frustration (broken pencil; slumped back in the chair; head down on the table; making eye contact across the room with another student and laughing; calling out “this is stupid!” – and other indications of labeling themselves as “stupid”). The work to build that community of self-confident learners had been undercut.

But the test went on. And the next section had students doing something all teachers know does not make sense. They were trying to guess among five choices the meaning of a word all by itself, out of context. This section was called “Vocabulary.” The words included such certain-to-be-missed-by-most-students words such as “whimsical,” “supple,” “guile,” “resplendent,” “broach,” and on and on.

By the end of the Vocabulary section these children had been through 57 of the 106 questions. They were more than half way done. But the double period was almost over. They were about to go home, having entered the classroom feeling strong and ready to learn, about to leave feeling, in their words, “stupid.” They had lost two full period of real teaching/learning. What had they gained? Really, what? It was Friday. I would not be with them again for more than two days. I could not ease them back into knowing that they were smart and making progress.

Like them, I left for the weekend feeling defeated. What happens when our beautiful children face this kind of situation over, and over, and over again. The phrase, “first do no harm,” consumed me. I was leaving school for the weekend on the wrong side of that admonition.

Isn’t it time to stop this ever-increasing testing cabal, which puts our children, and their enthusiastic and devoted teachers, into these untenable situations? Can we remain compliant when our children and our teachers are judged by performance on such abominations parading (and being paid for) as “assessments?” Is this how we want our children, and our teachers, to spend the precious hours they have together in our schools? When does this situation become untenable enough for us to stand up, together, on their behalf?

————–

This teacher asks critical questions that we should all be trying to answer. This isn’t a rhetorical exercise. Really – when do we make it stop? To her list, I add the following for consideration:

What is the purpose of these tests? Some assessments such as the GRADE test are meant to be diagnostic tools, to help teachers figure out where students are in their learning. But if giving poorly designed tests actually interferes with students’ learning process, and takes away from actual instructional time in the classroom, are they helping or hurting overall teaching? If tests are poorly designed, are they really effective as diagnostic tools? Even if such tests are well designed, are they providing the kind of information that our teachers need to shape learning?

Are they culturally biased? A 2002 review of the research literature concluded that the GRADE assessment is developmentally appropriate, reliable, and valid. That’s reassuring, though this teacher’s personal experience would seem to challenge these findings and I would love to learn more from our educational research colleagues out there. However, that same study found that there was “no evidence” that the GRADE test was “sensitive and appropriate for differing cultures and needs.” [Collaborative Center for Literacy Development] That was in 2002, eleven years ago, and seems to be true still today. How long does Pearson need to correct the obvious cultural biases in its tests?

Are they useful for teaching and learning? Pittsburgh parent Pam Harbin started looking into the GRADE assessment last year when it was introduced into the district and discovered that students do not have the opportunity to review and learn the material they got wrong. “For too long we have taken for granted that the tests our kids are taking are for their benefit,” Pam says. “I’m really having a hard time understanding why the District is requiring so many assessments where kids don’t have a chance to learn from their mistakes. … It doesn’t make any sense to test kids in this way.” Whether it’s formal District policy or not, it appears that many schools are working under the belief that teachers are not permitted to discuss anything on the GRADE test, in particular, with students before or after giving it. On other tests, such as the PSSAs and Keystones, teachers are explicitly forbidden to see the actual test questions or provide feedback to the students.

How has the frequency and quantity of testing increased? The GRADE test is given three different times during the year. That alone might not sound so bad, but consider that the District is now giving up to 17 different standardized tests to students each year, depending on grade level, and many of them are given more than once a year. For instance, my 7th grader will take 21 standardized tests this year. [PPS 2013-14 Assessment Calendar]

Does testing reduce learning opportunities? All of that test-taking robs students of real learning time. This teacher reports that her students lost four class periods alone taking the GRADE. Even worse, her students are about to take another round of tests, the CDT’s, which are given on computers. Because the school’s classes are too large to fit in the computer lab, and there are so many classes that need to schedule testing, the lab won’t be available for anything other than testing for quite some time. She explains, “My students [in another class] need that lab to do the research that is a part of our curriculum and can’t be locked out during this period.” She also worries about “giving this test to three of my classes, losing yet more instructional time for yet another non-curriculum-based test.”

How can testing harm students? For some students, taking a test such as GRADE is a minor annoyance. For others, it can leave them feeling “stupid,” frustrated, and ready to give up on learning. This seems particularly cruel, as this teacher points out, when this is “due in large part to the errors and problems with the test. Students do not need more of that in their lives.” Yet one reason districts might hold on to tests such as GRADE is that they can help to demonstrate “student growth” to state officials, sometimes more accurately than the PSSA results. But this is a misuse of student testing – ostensibly designed to help individual students – to evaluate schools and districts. This is yet another way in which the culture of high-stakes testing is hurting our kids.

How can testing harm teachers? We know that some tests such as the PSSAs and Keystones have very high-stakes attached to them. [See “The VAM Sham”] But even these lower-stakes tests can harm teachers, as this teacher points out: “Giving the test makes teachers feel like they are abusing their children. We do not need more of this in our lives.”

Do we have to? The district is using grant money to pay for the GRADE test (which was a requirement of the grant) along with professional development for teachers and other worthy things. But what if we refused to accept grants with such strings attached? Imagine if we could use those dollars now going to line the pockets of the international corporate giant, Pearson to buy drumsticks for the Westinghouse Bulldogs marching band or books for the Pittsburgh Manchester K-8 library? Pennsylvania is spending hundreds of millions of taxpayer dollars to develop more high-stakes tests for students, and requiring local districts to spend hundreds of millions on top of that to get their students ready to take them. [Tribune Review, 6-2-13] (And guess who makes all the test prep materials?) What if we stopped this upward spiral of testing madness and focused on what actually helps students learn?

What if? We can do it, working together. If you are interested in discussing the impact of testing on our schools, kids, and teachers, please drop us a line so we can be in touch. We will be scheduling a session soon!

How to Read the PSSA Report

The PA Department of Education just distributed the results of last year’s PSSA testing. Those are the high-stakes tests that Pennsylvania students start taking in the 3rd grade. The fact that families are only getting these results now – six months after students took the actual tests – is the first big clue that these have nothing to do with actual student learning. Quality feedback must be timely, so that teachers can make adjustments to individual instruction and students can learn from their mistakes. But it’s September: students have started a whole new school year and don’t even have the same teachers they did in March.

These test results are largely meaningless. At least for students. Yet they are being used – inappropriately – to evaluate teachers and schools. To threaten, punish, and eliminate them. To justify mass school closures in our sister city of Philadelphia. To determine which schools are next on the chopping block in Pittsburgh.

Tests are only valid measurements of the things they were designed to measure. If they are designed to measure students’ mastery of a set of concepts, then they are measuring students. You cannot turn around and use them to measure how well teachers are teaching or how well schools are performing – that’s not what those particular tests were designed to evaluate. Education researchers and professionals know this, but it is education policy makers who are twisting student assessment to meet a set of ideological goals.

To help you see through the doublespeak, here’s our handy guide to reading your child’s PSSA report:

PSSAguide

 

Who’s In to Opt Out?

Opting out is taking off. Parents, teachers, and now even entire state legislatures are saying they’ve had enough with high-stakes-testing and the damage it’s doing to education. I sat in a room with teachers here in Pittsburgh this week who told me that ten years ago they would have given one standardized test a year; now they are spending weeks upon weeks on test prep and test administration. But their students aren’t learning more. If anything they are learning less, while the high-stakes attached to the tests have radically changed what education looks like.

This radical shift was really brought home for me this week reading about Alan C. Jones, a former principal and teacher educator in Illinois, who accompanied his daughter in the search for a good public school for his grandson. After decades working in education, he reports that he was appalled at what high-stakes-testing had done to those schools he visited:

“Nothing could have prepared me for the mindlessness of the hallways, classrooms, and main offices I observed … I reviewed curriculum with no art or music and only sporadic attempts at teaching science. I followed a school schedule heavily focused on basic literacy skills. I found kindergarten programs with no recess. I observed classrooms where students were required to repeat state standards written on the chalkboard and spend hours completing mountains of worksheets designed to make children more test-savvy. … There were breaks in the day that amounted to forced marches to and from bathrooms. Following these brief breaks, students were led back to classrooms for timed tests, test-preparation games, and the distribution of awards for those who met the state standard for the day.” [Education Week, 1-22-13]

Teachers here in Southwest Pennsylvania will tell you what testing has done to their schools and their students. Ask them. Really. Go ahead and have a quiet conversation with the teachers in your local school. Most are not able to speak out publicly, for fear of losing jobs that feed their families. But ask a veteran teacher who was in the classroom ten or fifteen years ago to describe how the national obsession with testing has put handcuffs on real learning, narrowed the curriculum to math and reading, cut music, art, and library, labeled teachers and entire schools as failures, served as cover to close “failing” neighborhood schools, and cut budgets.

As you may recall, brave teachers at Seattle’s Garfield High School voted two weeks ago, without a single ‘no’ vote, to refuse to administer a high-stakes-test. [See “What Education Activism Looks Like”] The student government and parent association also voted to support the action. As Garfield teacher, Jesse Hagopian, explains, “We at Garfield are not against accountability or demonstrating student progress. We do insist on a form of assessment relevant to what we’re teaching in the classroom.” [Seattle Times, 1-17-13] The district superintendent warned that the administration expects all teachers to administer the test. [Seattle Times, 1-14-13] Yet by its own admission the test results are not valid for high-schoolers and the former superintendent purchased the test for $4 million while sitting on the board of the company that makes it.

The threat to those teachers has led to a groundswell of support from leading educators all over the country. This week Brian Jones, a New York City teacher and doctoral student, drafted a statement supporting the teacher’s opt out movement and saying that, “High stakes standardized tests are overused and overrated.” University of Washington professor Wayne Au helped reach out to education researchers and says, “We contacted leading scholars in the field of education and nearly every single one said ‘Yes, I’ll sign.’ The emerging consensus among researchers is clear: high stakes standardized tests are highly problematic, to say the least.” [BrianPJones blog, 1-21-13]

Over the past few days, more than 230 educators have signed the fully researched and documented statement that demonstrates the ways in which high-stakes-testing actually hurts students. Among the signers are some of the most well-respected names in the field of education, including former US Assistant Secretary of Education and education historian Diane Ravitch, Chicago Teachers Union President Karen Lewis, author Jonathan Kozol, professor Nancy Carlsson-Paige, and MIT professor and writer Noam Chomsky. Also on the list is urban sociologist Pedro Noguera, an education professor at New York University who has been assisting the Pittsburgh Public Schools with their equity plan this year. Dr. Noguera was just in town last week speaking with African American male teens about becoming “promise ready” to quality for a Pittsburgh Promise college scholarship. [Post-Gazette, 1-18-13]

If these are the voices supporting opt-out, we need to be listening and thinking about what they saying. Yinzercator Stacy Bodow, a Pittsburgh Public School parent, pointed us to a terrific letter written by parents Will and Wendy Richardson in New Jersey last year, opting their son out of that state’s high-stakes-tests. The Richardsons explain, “we are basing this decision on our serious concerns about what the test itself is doing to our son’s opportunity to receive a well-rounded, relevant education, and because of the intention of state policy makers to use the test in ways it was never intended to be used.” They added, “These concerns should be shared by every parent and community member who wants our children to be fully prepared for the much more complex and connected world in which they will live.” [The Daily Riff, 4-18-12]

And if that’s not enough, consider what happened in Texas this week: the Texas House actually proposed cutting all state funding for standardized tests! Speaker Joe Straus explained, “To parents and educators concerned about excessive testing, the Texas House has heard you.” The Dallas Morning News is reporting that the proposed budget is not likely to stand, since it would have to be reconciled with the state Senate’s, which already includes money for testing. However, Valerie Strauss of the Washington Post points out that this action alone “underscores growing discontent with high-stakes testing in the state where it was born when George W. Bush, as governor, implemented the precursor to No Child Left Behind, which he took national when he became president.” What’s more, “Last year about this time school districts in Texas started passing resolutions saying that high-stakes standardized tests were ‘strangling’ public schools, and hundreds of districts representing nearly 90 percent of the state’s K-12 students have followed suit.” [Washington Post, 1-24-13]

It’s clearly time to think seriously about opting out. Who’s in?

The VAM Sham

It’s a new year, but for public education it looks like we may be seeing more of the same old thing. Tonight the Pittsburgh School Board will be reviewing a new teacher evaluation plan developed by the District based on highly problematic data drawn from all those high-stakes-tests our kids have been taking. Not only is the data bad, but the uses to which it is being put should be setting off alarm bells in every parent’s head as it actually damages our schools, our teachers, and even our children’s education. To understand why, Yinzercation talked to Dr. Tim Slekar, an education researcher and Head of the Division of Education, Human Development and Social Sciences at Penn State Altoona.

Pittsburgh’s plan comes just as Pennsylvania has introduced a new law mandating that every school district in the state must implement a teacher evaluation system, basing half of the evaluation on classroom observation and the other half on “multiple measures of student achievement.” We’ll get to these halves in a moment, but let’s start with the very premise of this new evaluation system. Pennsylvania and many other states around the country have introduced these laws as part of the corporate-reform-movement, which rests on the idea that public schools are failing, and that we must measure students with tests that will then be used to hold teachers accountable and even close down or “turn around” low performing schools (often by firing all the teachers). This seductive reasoning centers on the assumption that teachers are responsible for how their students perform on tests and that tests are an accurate measure of their teaching.

However, there are multiple problems with this logic. First, and perhaps most importantly, Dr. Slekar explains that abundant research demonstrates that out-of-school factors are far and away the largest contributors to student achievement. As much as 80% of student achievement can be directly attributed to issues such as family stability, number of books in the home, exposure to cultural resources, and whether or not a child had breakfast before school. Of the remaining 20%, teachers are certainly the most important in-school factor affecting student achievement, but by no means the only one.

The Organization for Economic Cooperation and Development (OECD) did a major study a few years ago in collaboration with 25 countries around the world looking at effective teaching. They concluded, “The first and most solidly based finding is that the largest source of variation in student learning is attributable to differences in what students bring to school – their abilities and attitudes, and family and community background.” The report noted that, “Such factors are difficult for policy makers to influence.” [OECD “Teachers Matter” report, 2005]

Furthermore, while teacher quality definitely matters, the OECD report found that most measures of teaching effectiveness have concentrated on factors that can be easily quantified – usually correlating student test data with teaching credentials, years of experience, and such. While there is a positive relationship between these things, the research shows that these matter to a “lesser extent than may have been expected” and that the teacher characteristics “which can be vital to student learning” are “harder to measure.” These include the things we should all care about in our teachers, such as “the ability to convey ideas in clear and convincing ways; to create effective learning environments for different types of students; to foster productive teacher-student relationships; to be enthusiastic and creative; and to work effectively with colleagues and parents.” [OECD “Teachers Matter” report, 2005, p. 2]

Another huge problem with the current frenzy of teacher evaluation systems is that they are also premised on the idea that we have too many “bad” teachers and must get rid of them. As evidence, the “reformers” often cite the statistic that current evaluation methods result in 99% of all teachers in the state receiving a satisfactory, or “qualified,” grade. The logic seems to be that we couldn’t possibly have so many qualified teachers. Naturally, a sensible counter-argument would be, “Why not?” It’s not like we’re hiring people off the street: teachers have to go through many gates, including training, certification, and then hiring by a school district, before they get their own classrooms.

While there may be a handful of ineffective teachers in any given district, I’m not seeing a plague of bad teaching: if anything, the teaching I see on a regular basis is quite good, despite the immense odds stacked against our teachers. Sure, where there is ineffective teaching, I want an improvement plan put in place, proper supports offered to that teacher, and then if none of that works, guide the person out the door. This is why we have a three year probationary period and the best districts train principals in good observation and feedback techniques, then make sure they have time to perform this most critical function.

In fact, if you think about it, the very best teaching evaluation system would be highly qualitative, one in which the principal takes on a teaching mentor role, creating what Dr. Slekar refers to as “a framework to discuss the classroom experience.” If anything, if there is a problem with current evaluation systems based on observation, it’s not that too many teachers receive a satisfactory grade, it’s that the quality and quantity of observation is insufficient (frequently just a quick once a year visit) and often inconsistent across districts (varying from building to building). The solution to that problem has nothing to do with student achievement scores.

Ah, but therein lies the rub. The entire evaluation system depends on what Dr. Slekar calls “the mythology of objectivity.” This is the idea that we can quantify everything, come up with the perfect formula, and reduce all aspects of teaching to numbers that will not lie – after all, they are numbers. But this lure of objectivity masks the reality that every standardized test we give our kids – and then want to use to evaluate our teachers – is in fact subjective. Slekar explains that the PSSAs are not objective measures at all and actually contain a great deal of cultural bias which continue to skew scores against our poorest students and students of color.

But even if we assume for the moment that those high-stakes-tests our children are taking yield legitimate results, there are still serious problems with using those tests to evaluate teaching. First, they were only designed to measure student achievement – not how well our teachers are teaching. As any scientist will tell you, when you want to examine something, the measurements have to be designed to actually look at what you’re interested in. And second, they completely omit many of the most important elements of teaching – you know, those very things we as parents and concerned community members think about when we recall our very best teachers.

Now let’s get back to that new state mandate which requires districts to base at least half of our teachers’ evaluations on student test scores. This half is supposed to use “multiple measures of student achievement,” but what that really amounts to is using the PSSA scores and breaking them apart and putting them back together in different ways. Pittsburgh has been working on a system to get out ahead of the new law, and wants to use a slightly different breakdown of percentages within this half than the one dictated by the state: the district “proposes 5 percent for building-level results, 30 percent for teacher-specific data and 15 percent for elective data,” which in most cases means “student surveys of individual teachers.” [Post-Gazette, 12-31-12]

For the building-level and teacher-specific data, Pittsburgh wants to use what is known as Value Added Measurements (VAM), which take into account how much a student has grown academically in a year, rather than taking a single snapshot of year-end performance on a test. While VAMs sound like a huge improvement, the reality is that VAM systems are still in the experimental phase and so far there is no evidence that any of them work. The National Education Policy Center (NEPC) reviewed VAM research funded by the Gates Foundation and found that “a teachers’ value-added for the state test is not strongly related to her effectiveness in a broader sense. Most notably…many teachers whose value-added for one test is low are in fact quite effective when judged by the other.” What’s more, the researchers warned, “there is every reason to think that the problems with value-added measures … would be worse in a high-stakes environment,” calling the results of the study “sobering about the value of student achievement data as a significant component of teacher evaluations.” [NEPC, Review of “Learning About Teaching,” 2011]

Dr. Slekar explains the problem with VAM quite simply: “Value Added Measurement systems will incorrectly rank teachers one out of every three times—at best.” [@theChalkface, 1-2-13] Just last week, education researcher Dr. Mercedes K. Schneider published an excellent investigation of the VAM system proposed by the state of Louisiana. Her careful analysis is worth reading in full if you are interested in the mathematics behind these measurements, but the crux of the problem comes down to this: VAM systems rank teachers, and in any ranking some will be at the top and some will be at the bottom (I find this itself a problematic underlying assumption). If you are going to use a tool to rank the teachers, then it ought to at least be stable, reliable, and consistent.

Dr. Schneider uses this analogy: “It is like standing on a bathroom scale; reading your weight; stepping off (no change in your weight); then, stepping on the scale again to determine how consistent the scale is at measuring your weight. Thus, if the standardized tests are stable (consistent) measures, they will reclassify teachers into their original rankings with a high level of accuracy. This high level of accuracy is critical if school systems are told they must use standardized tests to determine employment and merit pay decisions.” However, the VAM system frequently re-ranked teachers who had been at the top, down to the middle or the bottom, even when they had not changed a thing in their teaching. The bottom line? Dr. Schneider says, “I would discard the bathroom scale.” She concludes, “Yes, teachers should be evaluated. However, attempting to connect teacher performance to student standardized test scores cannot work and will not improve education in America. VAM does not work; it cannot work and needs to be discarded.” [“Value Added Modeling (VAM) and ‘Reform’: Under the Microscope,” 12-28-12]

So if VAM is a sham, why are we wasting our time – and untold taxpayer dollars – on this stuff? Pittsburgh appears to be smitten with the idea that it can keep jiggling the numbers until it finds the magic formula: the district says it will adjust for variables like “free- or reduced-price lunch eligibility, the number of English language learners, the number of gifted students, and other characteristics.” [Post-Gazette, 12-31-12] But as Dr. Slekar remarked, “VAM is garbage in, garbage out. There’s no research that shows a way to account for out of school factors. This is all in the experimental phase. No one has done it. In two different years you get two different results.” He asks, “How can a teacher be successful one year and not the next? When researchers look at this over 3, 4, 5 years, the reliability is zero.” And he points out that those teachers getting bad VAM scores can be the very ones who get the highest ratings from parents, those who inspire kids and are most humane.

Dr. Slekar also points out the difficulty in combining this VAM and student test score data with the other half of the teacher’s evaluation, which is supposed to be classroom observation. In Pittsburgh, this half comes from a system it developed called RISE (Research-Based Inclusive System of Evaluation), based on the work of education researcher Charlotte Danielson. But Slekar argues that RISE is a “distortion of [her] original work on quality teaching. Danielson’s qualitative system of evaluation was never meant to be merged with a invalid and unreliable quantitative evaluation system—Valued Added Measures.” [@theChalkface, 1-2-13]

At this point your head may be spinning. What’s the big deal? Why should we care? The take away is this: we are wasting precious resources on a system that will not give us good results, resources that we know would be far better spent on early childhood education, or even textbooks for our schools. Pittsburgh may feel it has no choice other than to comply with the new state law, but it has been preparing this system for a while. I would like to see our elected school board representatives have a real conversation about this at tonight’s meeting, find its backbone, and take a public stand. Enough is enough. These high stakes tests – and the VAM sham they perpetuate – are damaging our schools, our kids, and our teachers.

 

Help grow our grassroots movement for public education: join other volunteer parents, students, educators, and concerned community members by subscribing to Yinzercation. Enter your email address and hit the “Sign me up” button to get these pieces delivered directly to your inbox and encourage your networks to do the same. Really. Can you get five of your friends to subscribe? Working together we can win this fight for our schools.

International Test Panic

Stay calm and don’t panic. You’re about to start seeing a whole new wave of alarmist rhetoric over the state of U.S public education with the release yesterday of two new international tests. The Progress in International Reading Literacy Study (PIRLS, conducted every 5 years) and the Trends in International Mathematics and Science Study (TIMSS, conducted every 4 years) both just announced their 2011 results. [TIMSS & PIRLS International Study Center]

This is where headlines, such as the one in today’s Post-Gazette, start to scream things like “U.S. Students Still Lag Globally in Math and Science, Tests Show.” Then the hand-wringing commences over the fact that the U.S. ranks behind South Korea, Singapore and Taiwan (in fact, these tests put us 11th in fourth-grade math, 9th in eighth-grade math, 7th in fourth-grade science and 10th in eighth-grade science). [Post-Gazette, 12-12-12] But the headlines and articles inevitably fail to mention several key points.

First, the U.S. has never been at the top of these comparative tests. In fact, in the 1960s when the first international math and science tests were conducted, U.S. students scored at the bottom in nearly every category. Over the past fifty years, U.S. students have actually improved – not declined as so many of the pundits would have you believe. [For an excellent summary and analysis, see Yong Zhao, 12-11-12] Rather than falling behind our international peers, U.S. students have been making slow gains. We may not be where we want to be, but the “falling” metaphor implies the exact opposite direction of where we are headed.

Second, these tests are often comparing apples and oranges. For example, some countries do not test all of their students, particularly in older grades as they siphon off those who will not go on to college. In essence, this leaves just their university-bound students to take the exams compared to all U.S. students, college-bound or not. [Dave F. Brown, Why America’s Public Schools are the Best Place for Kids: Reality vs. Negative Perceptions, Rowman & Littlefield, 2012, p. 42.]

Third, what these international tests really seem to report is the effect of the United State’s unbelievably high child poverty rate. When you look just at students from our well-resourced schools taking these tests, they actually score at the top of the heap. [For an excellent analysis, the National Association of Secondary School Principals, 12-15-10] But a whopping 26% of our country’s children from birth to age five live in poverty – yes, 26 percent – and over 23% of our kids under the age of 18 live in poverty. Our child poverty rate puts us second in the entire developed world – only Romania scores worse than us. [See “Poverty and Public Education”]

Valerie Strauss, education writer for the Washington Post, said the real headline we ought to be seeing is, “U.S. low-poverty schools do much better than high-poverty schools in international tests.” She points out that this holds true for all standardized tests and “that continues to be the real story in U.S. education, not how American students’ scores stack up against Singapore or the South Koreans.” [Valerie Strauss, Answer Sheet, Washington Post, 12-11-12]

The fourth point we ought to remember is the way in which the hype over these international tests has reinforced the notion that we need ever more testing to measure our children. I am not opposed to student assessment – I want our teachers to be able to assess student learning using valid tools. Bring on the weekly spelling quiz or end of unit test. But I am opposed to high-stakes-testing in which our children are subjected to mountains of high-pressure, poorly designed tests, which are then used to label and punish our kids, our teachers, and our schools. Yinzercation’s intrepid librarian Sheila May-Stein has written a heartfelt description of what it’s like to be a teacher forced to give these high-stakes, standardized tests in our schools. I encourage everyone to read her piece, “Outside the Lines,” as we start a discussion around Opting-Out of this testing madness.

Rather than wringing our hands over how far we rank below Taiwan, we ought to be fretting over how we will address child poverty and get to the business of how we will adequately and equitably fund our public schools.