Definition:
Lexical coverage is the percentage of word tokens in a given text that a language user knows — the proportion of running words whose meaning is accessible to the reader or listener. If a learner knows 95 of every 100 words encountered in a text, their lexical coverage for that text is 95%. Research by Paul Nation, Batia Laufer, and others has established that comprehension requires coverage well above 95% — with 98% often cited as the minimum for comfortable, unassisted reading, and 95% as the minimum for an instructional context with significant support. Lexical coverage is determined by the interaction of the learner’s vocabulary size and the frequency profile of the specific text.
In-Depth Explanation
The Coverage-Comprehension Relationship
Understanding a text requires knowing not just individual words but enough words to reconstruct meaning from context. Research on the lexical coverage threshold:
- 80% coverage: Approximately 1 in 5 words is unknown — too many gaps to construct coherent meaning; reading would be extremely laborious
- 90% coverage: 1 in 10 words unknown — still very disruptive; limited context for inference
- 95% coverage: 1 in 20 words unknown — the commonly cited threshold for “unassisted reading” in early research; now considered insufficient for most purposes
- 98% coverage: 1 in 50 words unknown — most researchers now consider this the realistic minimum for comfortable independent reading
- 99%+ coverage: necessary for relaxed, pleasurable reading without frequent interruptions
Liu and Nation (1985), Laufer (1989), and others have demonstrated empirically that below 95% coverage, comprehension breaks down sharply.
Vocabulary Size Requirements
Coverage thresholds translate into vocabulary size requirements depending on the text:
- Most everyday written English is covered by knowledge of the most frequent 2,000 word families (Nation’s BNC/COCA lists)
- Academic texts require the first 2,000 + the Academic Word List (~570 families) for ~90% coverage
- Newspapers: first 5,000–6,000 word families for ~95%
- Native-speaker novels: approximately 8,000–9,000 word families for 98% coverage
For Japanese:
- The 2,136 jōyō kanji are the baseline for educated literacy
- Vocabulary coverage for NHK-style news: approximately 6,000–8,000 word families
- Manga/casual speech: highest-frequency 3,000–5,000 items cover much of the vocabulary, but new vocabulary density in native material is high
Implications for L2 Learners
The lexical coverage framework has direct practical implications:
- Graded readers are engineered to provide 98%+ coverage at a specific vocabulary level
- Input-based methods (Krashen, immersion) require learners to access material within their vocabulary range
- The “i+1” principle requires that most input be comprehensible — lexical coverage must be high enough for meaning to be accessible
- Extensive reading research shows accelerated vocabulary growth occurs at coverage levels where learners can tolerate and infer from unknown words (~98%)
Incidental Vocabulary Learning
At high coverage levels (98%), learners can make use of context to infer unknown words. Estimates suggest that ~10–20 repetitions in varied contexts are needed for incidental acquisition of a new word — meaning that high-frequency words (those appearing in many texts) are acquired first through extensive reading.
Common Misconceptions
“If I understand the gist of a text, I have sufficient coverage.” Gist understanding can occur at lower coverage levels through schema knowledge and prediction, but this is not the same as accurate, word-by-word reading comprehension. Low-coverage gist reading does not lead to efficient vocabulary growth because the unknown words do not receive sufficient contextual support for inference.