The Silent Period: Should Japanese Learners Avoid Speaking Early?

For years, part of the AJATT advice for new learners of Japanese was blunt: don’t speak yet. The logic came from Stephen Krashen’s concept of the silent period — borrowed from child language acquisition research — and applied wholesale to adult L2 learners. The idea was that output before sufficient input was counterproductive at best, harmful at worst. Spend 18 months immersing. Let the language settle. The speaking would come naturally. But is there any research support for delaying speaking in adult Japanese learners — and what does the evidence actually show?

What People Are Saying

The silent-period debate hasn’t gone anywhere. On r/LearnJapanese, threads about speaking early vs. waiting appear regularly, with positions broadly divided between immersion-focused learners who advocate a listening-heavy early phase and output advocates who argue that never getting reps out of your mouth is exactly how you end up with fossilized reading comprehension and zero ability to hold a conversation.

The AJATT community has historically been the most explicit about the “no speaking” phase. Khatzumoto’s original site recommended not speaking Japanese for the first year or more, framing early output as stressful and potentially harmful to acquisition. More recent iterations — Matt vs Japan’s methodology, various AJATT-adjacent creators — have softened this position over time, acknowledging that tutor-based conversation practice (via iTalki) earlier in the process doesn’t seem to damage acquisition. But the core skepticism of early output still circulates in immersion communities as received wisdom.

Language exchange advocates argue the opposite: that native speaker interaction is itself a rich source of comprehensible input — harder to get from media alone — and that the effort of production forces the learner to notice exactly what they don’t know how to say.

The Research: What the Silent Period Actually Claimed

Krashen’s original “silent period” concept arose from observing children acquiring their first language — not adults learning a second language. L1 acquisition does involve a receptive-dominant early phase where children understand before they produce. Krashen then analogized this to L2 learning, arguing that comprehensible input alone was sufficient for acquisition and that output was a product of acquisition, not a cause of it. The input hypothesis holds that learners acquire language when they understand messages just beyond their current level — not by producing language.

But Krashen’s framework has always had a notable critic: Merrill Swain. Swain’s output hypothesis, developed from Canadian immersion data in the 1980s, was a direct challenge. French immersion students in Quebec received enormous amounts of comprehensible input — nearly all instruction was conducted in French — yet their productive French remained clearly non-native in ways their receptive French did not. Swain argued that output serves distinct functions: it forces learners to move from semantic to syntactic processing, it triggers noticing of gaps between intention and production, and it generates feedback that input alone cannot provide.

The noticing hypothesis (Schmidt, 1990) adds a related mechanism: conscious attention to form is necessary for acquisition, and output — especially output that fails to communicate intended meaning — creates exactly the conditions where learners notice the mismatch.

The Nuance: Output Isn’t Just “Practice”

The AJATT critique of early output isn’t entirely wrong about one thing: low-quality output without feedback — speaking to yourself for a year in imperfect Japanese, cementing bad habits — is probably worse than no output. The concern about “fossilization” is real: producing the same errors repeatedly without correction can entrench interlanguage forms.

But the research doesn’t bear out a blanket case for silence. Studies on interaction — including Muranoi (2000) on interaction and form-focused instruction with Japanese EFL learners — show that structured output tasks that push learners to reformulate produce measurable gains in accuracy. The interaction framework developed by Gass and Mackey (2007) identifies negotiation of meaning as a key mechanism by which acquisition proceeds: when a learner’s output fails and is not understood, the correction and reformulation that follows is high-signal input.

The research also distinguishes between types of output. Speaking without feedback (monologing at a partner who won’t push back) is very different from pushed output (being asked “what do you mean?” and having to reformulate), which is different from written output (where you have time to process form and search for words). Blanket advice to avoid “output” collapses these into one thing when they are not.

What This Means for Japanese Learners Specifically

Japanese presents some specific considerations that make the silent-period debate more nuanced than the same debate in, say, Spanish or French.

Japanese phonology is relatively accessible for English speakers: there are no tones, the syllable structure is simple (mostly consonant-vowel), and the main phonemic challenge — the distinction between long and short vowels and consonants — is trainable by ear. This means early output is unlikely to cement the kind of catastrophic pronunciation errors that can be hard to undo. Early production won’t ruin your Japanese accent in the way that would concern someone learning Mandarin

Pitch accent is a partial exception — and it’s relevant to the silent period debate in a way AJATT discussions rarely acknowledge. Pitch accent patternization does happen with speaking practice, but the concerning patterns seem to emerge from habitual unsupported speaking rather than from speaking per se. Getting pitch accent feedback early, through a tutor or native speaker, is probably better than avoiding output and then finding your pitch accent defaults are wrong.

The register issue is the bigger practical challenge. Japanese has keigo and multiple formality registers. Reading and listening practice doesn’t prepare you to produce appropriately in professional or high-formality contexts. If you spend two years consuming only anime and casual YouTuber content, your output — when it comes — will sound like it. The way to acquire register is through exposure combined with attempts at production in contexts where the register matters.

For most learners, the practical takeaway from the research is not “speak from day one” and not “wait 18 months” — it’s that structured speaking with feedback (a tutor conversation, a language exchange with pushback, a language class where errors are addressed) probably provides both input and noticing mechanisms that pure consumption does not. Early, structured output is not a replacement for immersion — but it’s not the enemy of it either.

Social Media Sentiment

The silent-period debate has become less heated in recent years, partly because prominent immersion-method creators have moved toward more pragmatic positions that allow for early structured speaking. On r/LearnJapanese, the consensus has shifted reasonably clearly: speaking early with a tutor is widely accepted; avoiding all output for a year or more is now a minority position associated with a more ideological reading of AJATT. The immersion community occasionally relitigates the Swain vs. Krashen divide, but the dominant view now seems to be that input-heavy early phases make sense while completely silent phases don’t. Critics of the immersion method argue that even the softened version underweights speaking and that many self-identified “immersion learners” plateau at comprehension while remaining unable to produce fluent Japanese.

Last updated: 2026-04

Related Glossary Terms

Sources

Krashen, S. (1982). Principles and Practice in Second Language Acquisition. Pergamon Press. Available online — original articulation of the silent period and input hypothesis.
Swain, M. (1985). Communicative competence: some roles of comprehensible input and comprehensible output in its development. Input in Second Language Acquisition, pp. 235–253. — Swain’s foundational output hypothesis paper, drawing on Canadian immersion data.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129–158. doi:10.1093/applin/11.2.129 — the noticing hypothesis, foundational for understanding why output matters.
Muranoi, H. (2000). Focus on form through interaction enhancement: Integrating formal instruction into a communicative task in EFL classrooms. Language Learning, 50(4), 617–673. — interaction and form-focused instruction in Japanese EFL context; output-involved conditions showed accuracy gains.
Gass, S. M., & Mackey, A. (2007). Input, Interaction, and the Second Language Learner. Routledge. — comprehensive account of the interaction framework and how output drives acquisition.
r/LearnJapanese. “Silent period — is it real or a cope?” ~1.2k upvotes, 2024. View on Reddit — community sentiment thread on silence vs. speaking.

Mikey Does