How will erroneous state tests affect teacher evaluations, school funding?

Did you hear the one about the talking pineapple?

How about the one about the math question with no right answer?

These aren’t riddles. They’re questions from the New York state math and English Language Arts exams given to third- through eighth-graders statewide from April 16 to 27. The tests are designed by Pearson, Inc., which has a $32 million contract from the state to provide the tests, and vetted by a team of psychomatricians (test experts) before they are distributed to school districts.

The pineapple question refers to a much-maligned reading passage on the eighth grade ELA exam describing a race between a pineapple and a hare, a twist on the old fable of the tortoise and the hare. The entire passage and related questions can be read here: usny.nysed.gov/docs/the-hare-and-the-pineapple.pdf. The passage so confused such a great number of students that the state had to pull the questions; they won’t be counted towards the final assessments.

Meanwhile, there were errors in the math sections, as well. Due to a typographical error on the eighth grade math exam, one question had no correct answer. On the sixth grade exam, one question had two correct answers.

Representatives from State Ed were quick to defend the questions.

“A lot of items are taken from a pool of items that have been field tested,” said Dennis Tompkins from the State Education Department’s communications office. “For example, the pineapple question is eight years old. It was first used on the SAT-10, which is kind of like the IOWAs — a local assessment test. It’s been used in several other states.”

Tompkins said that the pineapple question and the two math questions that were pulled will not impact student scores.

“The psychomatricians — the experts in testing — they’ll tell you those questions have no impact on that final score,” he said. “Because of the way the tests are constructed, they’re weighted so that adjustments can be made very easily.”

Tompkins noted that the questions that were pulled were not what he called “operational items” – they were not in line with the state’s new Common Core standards and were not going to be used for assessment purposes anyway.

“They were field test items [questions upon which future tests will be built] or normative items, which basically means they’re used to compare us to other states,” Tompkins said.

In a release after the pineapple debacle, State Ed Commissioner John King acknowledged that the tests do have to be tighter in order to better measure student growth.

“The accuracy and efficacy of our state assessments are crucial to our reform efforts and measuring student academic growth,” King said. “We will, as always, review and analyze all questions on every assessment we administer. “

But the average scores districts receive on these exams do more than measure academic growth. They also determine which districts are named Districts in Need of Improvement under the federal No Child Left Behind Act; they determine which districts are eligible for competitive grants under Gov. Andrew Cuomo’s plan, developed as part of his 2011-12 budget; and they make up part of the formula determining the capability of the state’s teachers and principals under the agreement reached by State Ed, the New York State United Teachers (NYSUT) and the Board of Regents this past February.

Fortunately, the errors on the exam, because they won’t count against scoring, also won’t count against these measures, Tompkins said.

But the fact that there were such glaring errors on the exam has teachers and administrators questioning the state’s reliance on the exams for such important decisions.

“Yes, tests tell you something,” said Liverpool Superintendent Dr. Richard Johns. “But they use it for such a wide array of things and put so much importance on it that the validity is always in question. You can’t make a test that tight.”

Johns has long been critical of the state’s over-reliance on standardized assessments, particularly as a factor in evaluating teachers.

“I think it’s invalid,” he said. “It’s a ridiculous way to assess teacher performance. How do you control for kids that had a bad test day? For more troubled students in one classroom than another? For special ed kids? For kids out in one class? How do you do that and get an absolute empirical rating out of teachers?”

Johns isn’t alone in his criticisms. Last week, more than 2,000 delegates at the NYSUT Representative Assembly in Buffalo voted to adopt a resolution calling upon State Ed to “reduce the focus on questionable standardized tests in favor of other measures of student learning that are more ‘accurate, fair and appropriate,’” according to a news release.

“It extends beyond math questions without answers and talking pineapples to the inappropriate over-reliance on state tests,” said Carl Korn, communications director for NYSUT. “They’re being used for purposes they were never designed for.”

Korn said teachers told stories of children breaking down in tears, unable to sit through three hours of exams.

“We had a teacher speak on the floor of the meeting, a special education teacher, who was working with a student with special needs,” he said. “His IEP [individualized education plan, which is required for all students identified as special education students] requires him to work at a fifth grade level in math. He had to sit for three hours to take a seventh-grade test. He just laid his head on the desk and said he couldn’t do it. The teacher could only comfort him and say, ‘Do your best.’”

Korn was especially critical of the errors in the exams.

“If we are going to use state tests for important decisions like closing schools, labeling schools as in need of improvement, determining teacher and principal effectiveness, deciding which districts get competitive grants — it’s doubly important that the tests are reliable, accurate and fair,” he said.

Both Korn and NYSUT President Richard Iannuzzi said that teachers are more than willing to be evaluated; they’re just looking for a fair system not based on a flawed testing battery.

“New York’s new teacher evaluations were created with teachers,” Korn pointed out. “NYSUT worked with State Ed and the Board of Regents to develop a comprehensive, rigorous and fair teacher evaluation law.”

“Teachers embrace accountability, including New York’s new teacher/principal evaluation law. It sets the framework for a comprehensive, rigorous and fair system that uses tests appropriately as one of multiple measures of student achievement,” Iannuzzi said. “If tests are going to be used to evaluate teachers and administrators, close schools and award competitive grants to school districts, there must be public confidence that tests are accurate, reliable and fair. The State Education Department has a lot of work to do to restore the public’s trust.”