Washback

Definition:

Washback (also called backwash) refers to the influence that an assessment or examination has on the teaching and learning that precedes it. Because teachers and learners naturally orient their activity toward what will be evaluated, the content, format, and emphasis of a test shapes — sometimes dramatically — the content and methods of instruction. Washback can be positive (the test encourages genuine language skill development) or negative (the test distorts instruction toward low-level test-taking strategies divorced from authentic communicative competence). The term is most associated with Alderson and Wall (1993), who published a foundational critique of the washback hypothesis and proposed 15 nuanced washback hypotheses.

Why Washback Happens

Assessments — especially high-stakes summative assessments — serve as de facto curriculum gatekeepers. Learners and teachers rationally respond to incentive structures:

What is tested is practiced
What is not tested is often deprioritized
How an item is tested (multiple choice vs. production) shapes how it is practiced

This is not inherently problematic. In fact, washback is unavoidable — the question is whether it is positive or negative.

Positive Washback

Positive washback occurs when test preparation aligns with and encourages genuine language development:

A speaking test that requires real conversational interaction → learners practice actual conversation
A writing test that requires argument construction → learners practice paragraph organization and discourse
A proficiency test that rewards broad vocabulary → learners engage with extensive reading

When a test is well-designed — testing communicatively authentic abilities — preparation for it IS language learning.

Negative Washback

Negative washback occurs when test preparation substitutes for or undermines genuine language development:

Multiple-choice grammar tests → learners memorize grammar rules for paper recognition, not production
Tests with predictable item formats → learners train test-taking strategy (“answer B if unsure”) rather than language
Tests that reward speed on isolated sentences → learners do not practice reading continuous discourse

“Teaching to the test” is the colloquial expression for negative washback in educational systems.

The JLPT Case Study

The Japanese Language Proficiency Test (JLPT) is a frequently cited example of negative washback:

JLPT tests no speaking or writing production components (all selected response)
Heavy emphasis on memorized grammar patterns and vocabulary recognition
Learners often achieve high JLPT levels while being unable to hold basic conversations
Teachers and textbook publishers design curricula around JLPT grammar lists rather than communicative needs

This has generated criticism that JLPT preparation produces “paper competence” — grammatical form knowledge without functional communicative ability.

Alderson and Wall (1993) — The Washback Hypotheses

Alderson and Wall argued that washback is more complex than simple cause-and-effect (“test changes teaching”). Their 15 hypotheses include:

A test will influence teaching
A test will influence learning
A test will influence what teachers teach
A test will influence how teachers teach
A test will influence what learners do
A test will influence the rate and sequence of teaching
A test will influence the degree and depth of teaching
A test will have washback on all learners and teachers
Tests that have important consequences will have greater washback
Tests will have washback regardless of whether or not teachers/learners are aware of it

Importantly, they showed empirically that washback effects are not universal or automatic — some teachers change content but not methods; some learners resist test-prep; institutional factors mediate washback.

Washback and Test Design

Testing for positive washback is an explicit goal of modern communicative language assessment design:

Construct tests that sample authentic tasks (conversations, writing for real audiences, comprehension of real texts)
Use rubrics that reward communicative effectiveness, not just accuracy
Avoid item formats that can be “cracked” by test strategy without language ability

The principle: if you design a test worth teaching to, teaching to the test is teaching language.

Washback as a Validity Concern

Messick (1989) and subsequent validity theorists include consequential validity — the social consequences of test use, including washback — as a component of overall test validity. A test that has demonstrably negative washback on curricula has a validity problem regardless of its psychometric properties.

History

The concept of washback — the influence of testing on teaching and learning — was discussed informally in language education for decades before being systematically studied. Alderson and Wall (1993) published the foundational paper that established washback as a research topic, proposing the “Washback Hypothesis” — a set of testable claims about how tests influence curriculum content, teaching methodology, and learning behavior. Their work challenged the prevailing assumption that washback was always and automatically beneficial (“test what you want to teach, and teaching will follow”). Research in the 1990s-2000s revealed that washback is far more complex than assumed: tests can narrow curriculum, encourage superficial test preparation, and have different effects on different teachers. The concept was expanded to include “impact” — the broader social effects of tests on individuals, educational systems, and society (Bachman & Palmer, 1996; McNamara, 2000).

Common Misconceptions

“Washback is always negative.”

Washback can be positive or negative. A well-designed communicative test can produce positive washback by encouraging communicative teaching methods. The IELTS speaking test, for example, has encouraged speaking practice in preparation courses. Negative washback occurs when tests lead to narrow, mechanical test preparation at the expense of broader learning.

“Better tests automatically produce better teaching.”

This is the “strong” washback hypothesis, which research has not supported. Test design is one factor, but teacher beliefs, training, institutional constraints, and resources mediate washback. A communicative test in a system with untrained teachers and large classes may not produce communicative teaching.

“Washback only affects test preparation courses.”

Washback extends throughout educational systems: high-stakes tests influence what textbooks include, how teachers plan curricula, what students study independently, and even what aspects of language are valued. The effects are systemic, not limited to test-prep contexts.

“Removing high-stakes tests would eliminate negative washback.”

Some degree of assessment is necessary for educational accountability, certification, and motivation. The goal is designing assessments that produce positive washback — encouraging beneficial learning behaviors — rather than eliminating assessment entirely.

Criticisms

Washback research has been criticized for methodological challenges: establishing causal connections between test characteristics and teaching/learning behavior is difficult because many other factors (institutional culture, teacher training, resource availability) simultaneously influence classroom practice. Most washback studies are observational and cannot definitively attribute teaching changes to test effects.

The concept itself has been critiqued as too broad: “washback” encompasses everything from major curricular shifts to individual student study choices, making it difficult to develop precise, testable hypotheses. Some researchers argue that washback should be decomposed into more specific mechanisms (teacher content selection, student strategy adaptation, institutional resource allocation) rather than treated as a single phenomenon. Additionally, the prescription that tests should be designed for positive washback has been questioned — improving tests is important, but relying on testing to drive teaching quality addresses a symptom rather than the systemic causes of poor instruction.

Social Media Sentiment

Washback effects are visible throughout language learning communities, even when the term is not used. On r/LearnJapanese, the JLPT’s lack of speaking and writing sections is criticized for encouraging passive study (reading and listening only) — a washback concern. IELTS and TOEFL preparation discussions on their respective subreddits reflect washback: learners adapt their study behavior to match specific test formats and scoring criteria.

The most common washback discussion involves “studying for the test vs. studying for real ability” — the tension between optimizing for test performance and developing genuine communicative competence.

Practical Application

Be aware of how tests shape your study — If you’re preparing for JLPT, notice whether your study narrows to test-style content (reading/listening) at the expense of speaking and writing. Consciously maintain a balanced study plan.
Use test preparation productively — Test preparation can be positive washback if approached correctly: preparing for a speaking test by actually practicing speaking develops real skill. Preparing by memorizing template responses does not.
Don’t confuse test mastery with language mastery — Test scores indicate performance on a specific assessment under specific conditions. Language mastery involves using language flexibly across diverse real-world situations.
For teachers: design assessments that encourage desired learning — Test what you want students to learn, in the way you want them to learn it. If communicative ability is the goal, assess communicative performance.

Related Terms

Research

Alderson and Wall (1993) established washback as a research area with their Washback Hypothesis. Their Sri Lanka study found that a new communicative exam did not produce the expected positive washback in teaching methodology — demonstrating that test design alone does not determine washback.

Watanabe (2004) studied washback from university entrance exams in Japan, finding that teacher beliefs and training mediated test influence — the same test produced different washback effects in different classrooms. Cheng (2005) conducted longitudinal washback research in Hong Kong, documenting both intended and unintended effects of educational assessment reform. For the JLPT specifically, Green (2007) provided the framework for understanding washback intensity, proposing that washback is stronger when tests are high-stakes, used for important decisions, and closely linked to instruction — all conditions that apply to major language proficiency tests.