Frequency List

Definition:

A frequency list is a ranked inventory of words in a language ordered by their frequency of occurrence in a large text corpus, used by language learners and teachers to identify which vocabulary items are highest priority for acquisition based on the principle that the most frequent words appear most often in real language use — generating the highest comprehension return on study time. The foundational insight behind frequency lists is that lexical frequency is steeply distributed: in English, the most frequent 2,000 word families cover approximately 80–90% of everyday text; the next 8,000 cover an additional 10%; and infrequent vocabulary covers the final fraction. Learning the highest-frequency words first produces disproportionate comprehension gains in all skill areas.


How Frequency Lists Work

A frequency list is built from a corpus — a large, representative collection of texts in the target language. A corpus might include:

  • Books, news articles, academic writing, websites (written language)
  • Film and TV subtitles, interviews, conversation transcripts (spoken language)

Words or word families are counted across the corpus, and items are ranked from most common (the, and, is, in…) to least common. The learner can then use this ranking to systematically prioritize high-frequency items in vocabulary study.

Lexical Coverage and Comprehension

Research by Paul Nation and others establishes approximate coverage thresholds:

Vocabulary SizeText CoverageComprehension Level
Top 1,000 word families~72–76% of textSignificant comprehension, but many gaps
Top 2,000 word families~80–86% of textPartial comprehension
Top 5,000 word families~88–92% of textFunctional comprehension
Top 10,000 word families~95–97% of textNear-comfortable reading
Top 20,000+ word families~98%+ of textComfortable independent reading

The steep initial curve means that the first 2,000 word families are extraordinarily high-priority — they cover the majority of all texts encountered. Beyond that, returns diminish progressively.

Major Frequency Lists by Language

English:

  • General Service List (GSL, West 1953): 2,000 most common English words; foundational for EFL vocabulary pedagogy
  • Academic Word List (AWL, Coxhead 2000): 570 word families most common in academic English — high priority for students
  • New General Service List (NGSL, 2015): Updated 2.8K list based on a 273-million-word corpus
  • BNC/COCA frequency lists: Based on the British National Corpus (100M words) and Corpus of Contemporary American English (600M+ words)

Japanese:

  • JLPT vocabulary lists (N5–N1): Japanese Language Proficiency Test vocabulary by level
  • Core 2K/6K/10K decks: Community frequency lists derived from subtitles corpus; popular in immersion learning community

Spanish, French, German, etc.:

  • Subtitle-derived frequency lists (based on OpenSubtitles corpus) widely used in language learning communities

Frequency Lists vs. Topical/Thematic Vocabulary

Some learners organize vocabulary study around topics (food vocabulary, travel vocabulary) rather than frequency. The tradeoff:

  • Topical study: Immediately useful for specific situations; may not include high-frequency general vocabulary
  • Frequency study: Systematically builds the vocabulary with highest text coverage; may include abstract/functional words that are less intuitive to study but more useful overall

Best practice is generally frequency-first for core vocabulary, topic-based for domain-specific needs.

Limitations of Frequency Lists

  • Different corpora produce different frequency rankings — spoken language and written language have different profiles
  • Word families group morphologically related words (run, running, runner, ran) in ways that may not reflect how learners encounter them
  • Stop words (the, a, was) top all lists but need little study; cleaned “headword” lists remove function words for pedagogical use
  • Frequency outside of learner experience matters less than frequency within content the learner actually consumes — genre and domain effects can shift priorities

History

1944 — Thorndike and Lorge, “The Teacher’s Word Book of 30,000 Words.” Early English frequency ranking based on written text corpus.

1953 — Michael West, “A General Service List of English Words.” The foundational 2,000-word frequency list for EFL vocabulary pedagogy; shaped decades of vocabulary teaching and textbook design.

1990 — Nation, “Teaching and Learning Vocabulary.” Popularizes frequency-based vocabulary learning for EFL learners; articulates coverage threshold research.

2000 — Coxhead, Academic Word List. Corpus study identifying 570 academic word families as a distinct high-return list for academic English learners.

2015 — New General Service List. Modern update to West’s GSL with contemporary corpus data.


Common Misconceptions

“The most frequent words are the most important to learn first.” Frequency lists rank by raw text frequency, which is dominated by function words (the, a, is, in) that L2 learners largely acquire incidentally early. The high-frequency lexical words — the content vocabulary most directly contributing to communicative competence — sit below the function words in raw frequency rankings. Lists specifically designed for vocabulary teaching (Coxhead’s AWL, Nation’s BNC/COCA lists) address this by stratifying by frequency band and excluding function words already acquired.

“Learning from a frequency list gives you the most important vocabulary.” Frequency is a necessary but not sufficient criterion for vocabulary prioritization. Register coverage (a word very frequent in one register may be irrelevant to a learner’s target domain), learner L1 background (cognates that require no learning time), and word difficulty all affect the actual learning value of a frequency-list item for any given learner.


Criticisms

Frequency lists have been criticized for corpus selection bias — the frequency profile depends entirely on the corpus from which the list was calculated, and corpora that overrepresent written text, formal registers, or specific domains produce lists that may not reflect the vocabulary most needed by learners targeting conversational or informal language. The BNC and COCA-based lists, while large and well-balanced, still reflect English usage patterns that may not generalize to specific learner needs. There is also a persistent gap between corpus-derived frequency rankings and learner intuitions about vocabulary importance, creating selection conflicts in curriculum design.


Social Media Sentiment

Frequency lists are a practical staple in language learning communities — learners regularly seek frequency-ordered vocabulary lists (especially for Japanese, Chinese, Korean, and Spanish) to guide their early study priorities. The “learn the most common X words first” strategy is universally recommended as an efficient starting point. Community discussions focus on which list to use (for Japanese: the JLPT vocab lists vs. Anime corpus frequency lists; for Spanish: the Davies list), how long the “core vocabulary” is, and when to move beyond frequency lists to domain-specific vocabulary.

Last updated: 2026-04


Practical Application

  1. Learn the top 2,000 frequency words of your target language before anything else. The coverage return is unmatched. Most established courses (A1–B1) cover this range systematically.
  1. Use pre-made frequency decks in your SRS. Anki have frequency-based decks available — studying with frequency-ranked vocabulary systematically covers the highest-return items first.
  1. Switch to mining after core frequency. Once you have the top 2,000 word families, vocab mining from content you consume is more efficient than extending frequency list study.

Related Terms


See Also

  • Word Frequency — The underlying linguistic phenomenon that frequency lists measure and encode
  • Vocab Mining — The complementary approach for vocabulary beyond the core frequency list range
  • Vocabulary Learning — Overview of vocabulary acquisition and study methods
  • Sakubo

Research

Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge University Press.

The comprehensive treatment of vocabulary learning research, covering frequency-based vocabulary selection, vocabulary load analysis, and the pedagogical use of frequency lists — the authoritative reference for frequency-list-based vocabulary teaching methodology.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.

The foundational paper presenting the Academic Word List, a corpus-derived high-frequency vocabulary list for academic English — demonstrating how frequency analysis applied to a domain-specific corpus produces a more useful teaching tool than general frequency lists.

West, M. (1953). A General Service List of English Words. Longman.

The pioneering frequency-list project establishing the methodology of corpus-based vocabulary selection for language teaching — the original frequency list that all subsequent L2 vocabulary frequency research has built upon and refined.