Lexical Coverage

Definition:

Lexical coverage is the percentage of word tokens in a given text that a language user knows — the proportion of running words whose meaning is accessible to the reader or listener. If a learner knows 95 of every 100 words encountered in a text, their lexical coverage for that text is 95%. Research by Paul Nation, Batia Laufer, and others has established that comprehension requires coverage well above 95% — with 98% often cited as the minimum for comfortable, unassisted reading, and 95% as the minimum for an instructional context with significant support. Lexical coverage is determined by the interaction of the learner’s vocabulary size and the frequency profile of the specific text.


In-Depth Explanation

The Coverage-Comprehension Relationship

Understanding a text requires knowing not just individual words but enough words to reconstruct meaning from context. Research on the lexical coverage threshold:

  • 80% coverage: Approximately 1 in 5 words is unknown — too many gaps to construct coherent meaning; reading would be extremely laborious
  • 90% coverage: 1 in 10 words unknown — still very disruptive; limited context for inference
  • 95% coverage: 1 in 20 words unknown — the commonly cited threshold for “unassisted reading” in early research; now considered insufficient for most purposes
  • 98% coverage: 1 in 50 words unknown — most researchers now consider this the realistic minimum for comfortable independent reading
  • 99%+ coverage: necessary for relaxed, pleasurable reading without frequent interruptions

Liu and Nation (1985), Laufer (1989), and others have demonstrated empirically that below 95% coverage, comprehension breaks down sharply.

Vocabulary Size Requirements

Coverage thresholds translate into vocabulary size requirements depending on the text:

  • Most everyday written English is covered by knowledge of the most frequent 2,000 word families (Nation’s BNC/COCA lists)
  • Academic texts require the first 2,000 + the Academic Word List (~570 families) for ~90% coverage
  • Newspapers: first 5,000–6,000 word families for ~95%
  • Native-speaker novels: approximately 8,000–9,000 word families for 98% coverage

For Japanese:

  • The 2,136 jōyō kanji are the baseline for educated literacy
  • Vocabulary coverage for NHK-style news: approximately 6,000–8,000 word families
  • Manga/casual speech: highest-frequency 3,000–5,000 items cover much of the vocabulary, but new vocabulary density in native material is high

Implications for L2 Learners

The lexical coverage framework has direct practical implications:

  1. Graded readers are engineered to provide 98%+ coverage at a specific vocabulary level
  2. Input-based methods (Krashen, immersion) require learners to access material within their vocabulary range
  3. The “i+1” principle requires that most input be comprehensible — lexical coverage must be high enough for meaning to be accessible
  4. Extensive reading research shows accelerated vocabulary growth occurs at coverage levels where learners can tolerate and infer from unknown words (~98%)

Incidental Vocabulary Learning

At high coverage levels (98%), learners can make use of context to infer unknown words. Estimates suggest that ~10–20 repetitions in varied contexts are needed for incidental acquisition of a new word — meaning that high-frequency words (those appearing in many texts) are acquired first through extensive reading.


Common Misconceptions

“If I understand the gist of a text, I have sufficient coverage.” Gist understanding can occur at lower coverage levels through schema knowledge and prediction, but this is not the same as accurate, word-by-word reading comprehension. Low-coverage gist reading does not lead to efficient vocabulary growth because the unknown words do not receive sufficient contextual support for inference.


See Also