What measures the best teacher? More than scores, study shows

Tue Jan 8, 2013 5:26pm EST

1 of 3. The campus grounds of the Bill and Melinda Gates Foundation in Seattle are pictured in this file photo from November 4, 2011. A three-year, $50 million study, funded by the Bill & Melinda Gates Foundation and released January 8, 2013, found that effective teachers can be identified by observing them at work, measuring their students' progress on state standardized tests -- and asking those students directly how much they're learning.

Credit: Reuters/Anthony Bolante/Files

Related Topics

(Reuters) - Effective teachers can be identified by observing them at work, measuring their students' progress on standardized tests - and asking those students directly what goes on in the classroom, according to a comprehensive study released Tuesday.

The three-year, $50 million Measures of Effective Teaching study, funded by the Bill & Melinda Gates Foundation, found it was difficult to predict how much students would achieve in a school year based on their teacher's years of experience or knowledge of pedagogical technique.

But researchers found they could pick out the best teachers in a school and even predict roughly how much their students would learn if they rated the educators through a formula that put equal weight on student input, test scores and detailed classroom observations by principals and peers.

Taken alone, each of those measures was fairly volatile. Judging teachers primarily by student performance on state tests, for instance, turned out to be highly unreliable, with little consistency from year to year. Judging them chiefly by a principal's observations failed to identify those teachers who could be counted on to boost student proficiency on state math and reading tests.

Combining all three measures into a properly weighted index, however, produced a result "teachers can trust," said Vicki Phillips, a director in the education program at the Gates Foundation.

The study comes at a time of bitter political wrangling over teacher evaluations in cities including New York, Los Angeles and Chicago - and provides ammunition for all sides.

Education reformers who have been pressing to dismantle tenure systems, which protect veteran teachers from layoffs, could take heart in the finding that seniority doesn't predict success in the classroom.

Yet the report also bolstered union leaders who have argued that teacher evaluations should not be tied so heavily to trendy "value-added measures," or VAM - complex algorithms that aim to gauge whether students do better or worse than expected on state tests after several months in a given teacher's classroom.

The Obama administration has pushed states to give heavy weight to quantitative measures such as test scores in designing teacher evaluations. More than a dozen states have moved in that direction, in some cases making it impossible for a teacher to earn a good review if her VAM score is low, no matter how well she performs on other measures. States including Florida, Louisiana, Colorado, Michigan and Ohio have been particularly aggressive in tying teacher ratings to test scores.

The Gates study concluded that student performance should ideally make up one-third to one-half of a teacher's evaluation.

STUDENT SURVEYS HELP PREDICT LEARNING

"This should be a very big red flag to all those policy makers who think they can have test-based accountability be half or more of a teacher's evaluation," said Randi Weingarten, the president of the American Federation of Teachers.

In fact, some states that relied heavily on those value-added measures are already rethinking.

Louisiana is poised to announce a dramatic overhaul of its evaluation system, which took effect just last July and was hailed by education reformers as pioneering for its reliance on student test scores to rate teachers.

The revised system, which Superintendent of Education John White will unveil this week, still uses student test scores but in a far more nuanced way.

Teachers who score in the bottom 10 percent on the value-added metrics will be automatically deemed ineffective and can be fired, White said. Those in the top 20 percent will be deemed highly effective. Those who score in the middle, however, won't be pitted against one another in a ranking of best to worst. Instead, their principals will be urged to use other measures of quality, including watching the teacher at work and evaluating student progress toward classroom goals, White said.

"The system that had been put in place in Louisiana assumed ... greater statistical precision" for the value-added measures than they could realistically deliver, White said, adding that he developed the new plan after consulting with researchers from the Gates Foundation and elsewhere.

The Washington, D.C., public school system also recently revamped its teacher evaluation formula so that value-added measures based on student test scores now account for 35 percent rather than 50 percent.

Weighting the test scores so heavily caused undue anxiety for teachers, said Jason Kamras, chief of human capital for the D.C. public schools. The district also feared it led to an unfortunate narrowing of the curriculum, as some elementary school teachers focused intently on math and reading, which are tested frequently, while spending less time on social studies and science, which are not, Kamras said.

The Gates report was notable as well for its emphasis on student evaluations of teachers. Researchers found that evaluations were most reliable when student surveys made up as much of a third of a teacher's rating.

Researchers used a survey known as Tripod, developed by a Harvard researcher and a British consulting company, Cambridge Education. Children as young as five are asked to respond to statements such as "This class is a happy place for me to be," or "In this class, we learn to fix our mistakes." Older children answer questions such as whether the teacher has firm control over the class and whether she explains new concepts clearly.

Few districts use student surveys as part of their formal teacher evaluations. Any effort to change that could stir up opposition from teachers, who fear putting their jobs in the hands of sometimes immature students.

The Gates study examined 3,000 teachers in several cities, including Dallas, Denver, New York and Charlotte, N.C. The first year, teachers were evaluated by multiple measures, including student test scores and classroom observations. The researchers then randomly assigned students to each participating teacher. The following year they checked to see whether teachers rated as highly effective did indeed produce better results for students - not only on the state standardized tests but also on other measures, including open-ended math and reading assessments requiring sophisticated critical thinking.

Sure enough, they said their predictions about which teachers would produce the best results proved correct.

The research relied on teachers who volunteered to have their work scrutinized, so they may not have been a representative sample. And since researchers could not randomly assign students to a classroom across town, they were only able to study the relative strengths of teachers within a given school.

Despite those limitations, Thomas Kane, a professor of education and economics at Harvard University and a lead researcher on the Gates team, said the study achieved a landmark goal: "We identified groups of teachers who caused students to learn more."

(Reporting by Stephanie Simon; editing by Lee Aitken and Prudence Crowther)

FILED UNDER:
We welcome comments that advance the story through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can flag it to our editors by using the report abuse links. Views expressed in the comments do not represent those of Reuters. For more information on our comment policy, see http://blogs.reuters.com/fulldisclosure/2010/09/27/toward-a-more-thoughtful-conversation-on-stories/
Comments (4)
Azza9 wrote:
How can a country with such an inflated sense of the importance of individuality set up a system that totally ignores individual merits of both teacher and student?

There is always a chance that a very intelligent student/s learn/ perform well with an inept teacher inflating that teachers worth.
But there is the reverse instance, where a really competent teacher is demerited due to having an abundance of unintelligent unmotivated students.
Hopefully these evaluations take these things in to account and or those instances rare.

Teachers have great influence on child development but they are not as pivotal as people think. You also need to give kids more credit, just a little bit mind you. If they have what it takes to excel in scholastic pursuits they will teach them selves (to a point). Rendering their teacher as merely another knowledge base out of many to leverage from.

Then again school isn’t just about knowledge is it.

Jan 08, 2013 6:21pm EST  --  Report as abuse
tmc wrote:
Thank you Mr. Gates. A little reality is good for the system. I don’t think it be adopted accross the board though. To many powerful interests with other agendas.

Jan 08, 2013 6:23pm EST  --  Report as abuse
Mott wrote:
Half of the derivation is common-sense while the rest is disputable.

$50M over 3 years makes it a nice vacation though.

It’s not watching the teacher nor the scores – just ask the few top rank students and it’ll correlate well with the quality of teaching.

Score are another story – perhaps worth another $50M over 3 years?

Jan 08, 2013 7:33pm EST  --  Report as abuse
This discussion is now closed. We welcome comments on our articles for a limited period after their publication.

Pictures