Putting Tests to the Test: Assessing High-Stakes Exams for Second Language Learners

woman wearing a green jacket standing by a very large statue

While earning undergraduate degrees in French and Philosophy, Paula Winke was inspired to blend her interests in language learning, research, and social justice. Today, she’s an applied linguist, researching the ethics and effectiveness of foreign language testing as Professor in the Department of Linguistics and Germanic, Slavic, Asian, and African Languages and Director of the Second Language Studies Program

Winke works in foreign language education where she assesses tools that measure the proficiency of individuals learning second languages. Her research examines the effectiveness of high-stakes language tests, or those used for citizenship, immigration, university entrance exams, and certification. She also evaluates standardized tests that assess the reading and writing development of third-grade English language learners in Michigan.

Underpinning it all, is Winke’s belief that any high-stakes decision for a child or adult shouldn’t be completely tied to one measure of their ability — in this case being language proficient.

“Understanding that the language tests we administer are not perfect allows us to accept that test scores are not the be-all and end-all of any decision,” she said. “There’s an ethics to educational measurement that’s rooted in social justice — meaning what’s right in terms of the law, but also in terms of culture and how those laws came about.”

When Testing Reigns

Language assessment, Winke observed, is becoming increasingly important as society attaches higher and higher stakes to results.

An instance where test scores dictate lifelong opportunities involves children in K-12 schools. In this case, Michigan was one of 16 states plus the District of Columbia that was set to tie grade advancement for third graders to a standardized English language arts test before COVID-19 put that spring 2020 implementation on hold. Winke’s research determined that 70 percent of third graders in Michigan’s English language learning population would not be able to pass the current test.

“That could be devastating for English language learning communities, many of which are refugee communities centered in and around Detroit,” she said, also noting that the children not passing would be highly concentrated within the English language learning communities, especially in the Spanish-speaking and Arabic speaking ones.

two people talking across the table in front of a book shelf
Dr. Paula Winke talks with College of Arts & Letters Dean Christopher P. Long as part of
the Liberal Arts Endeavor Podcast about foreign and second language testing.

Winke and her team are investigating theories about how English language learners develop reading and writing skills using reading-test data from the Michigan Department of Education. The goal, she said, is to look into more equitable and fair alternatives to measuring ability instead of high-stakes tests. Her team also is gathering additional qualitative data on tests and fairness by having particular populations take simulated tests through the MSU Second Language Eye-Tracking Labs.

“By doing so, we can see what parts of the test they are looking at and using, be they picture prompts or directions,” she said. “It helps us to see what they are able to read and process and assess their ability to take the test.”

Questioning Results

While keenly focused on language test-taking in school-age populations, Winke also zeros in on critical language tests required by millions of adults each year. Among the biggest is the Test of English as a Foreign Language, also known as TOEFL, required for admission to any U.S. college or university. And Winke has researched the U.S. Foreign Service Institute’s language tests taken by career diplomats, as well as the civics component of the U.S. Citizenship Test.

“We’ve looked at the reliability of the citizenship test, and in particular, if people are simply memorizing facts,” she said. “We questioned what the test measured, and if taking it resulted in any lasting value. We found, too, there were questions that many U.S. citizens couldn’t answer.”

When someone receives a score, is it an accurate measure of their ability? Or is there some wiggle room around the score? The more people who understand that scores can be imprecise, the more it will allow society to be just a little more fair and flexible.

Among other research initiatives, Winke was the co-principal investigator on a five-year, $1.4 million grant-supported project that examined the proficiency of foreign language learners at MSU, and whether graduates went on to work in a profession using a second language. Her research also compared the language competencies of university students in the United States to military personnel who receive training in foreign languages.

While foreign language training is extremely important for national security, cultural exchange, business, and most any international endeavor, Winke says that measuring proficiency almost always comes down to taking tests. Her goal is to help ensure that high-stakes tests are reliable, equitable, and fair, and that the scores motivate rather than label or diminish a person’s desire to learn.

“One of the things we hope for in language assessment is that the public at large comes to understand more how test scores are imperfect,” she said. “When someone receives a score, is it an accurate measure of their ability? Or is there some wiggle room around the score? The more people who understand that scores can be imprecise, the more it will allow society to be just a little more fair and flexible.”