Type-Token Ratio

Definition:

Type-token ratio (TTR) is a measure of lexical diversity in a spoken or written language sample, calculated by dividing the number of distinct word forms (types) by the total number of words (tokens): TTR = types / tokens. A higher TTR indicates greater lexical variety — the speaker or writer is using more distinct vocabulary rather than repeating the same words. In SLA and applied linguistics, TTR has been used as a measure of L2 vocabulary development, writing quality, and oral fluency. However, TTR has a well-documented problem with text-length sensitivity — it inevitably decreases as text length increases (because new words appear less frequently as a sample grows) — which has driven the development of more valid corrected measures: D (vocd), MTLD, and HD-D.

In-Depth Explanation

Basic mechanics:

A text with 100 words in which 60 are unique types has TTR = 60/100 = 0.60.

A text with 200 words in which 80 are unique has TTR = 80/200 = 0.40.

Even if the second writer has similar or greater vocabulary knowledge, the longer text yields a lower TTR — demonstrating that TTR is not directly comparable across texts of different lengths.

Why TTR is problematic:

Length dependency: As text grows, new word types appear at a decreasing rate (you get to a point where most new words are repetitions of already-used types). TTR is mathematically guaranteed to decrease as text length increases.
Incomparability: Any comparison of TTR across speakers or learners is only valid if their sample sizes are equal — a condition rarely met in natural language data.
Genre sensitivity: Highly repetitive genres (e.g., scripted formulaic speech, procedural texts) by design use high token repetition; literary texts may use strategic repetition for effect. TTR is confounded with genre independently of vocabulary knowledge.

Corrected measures:

Three main alternatives have been developed:

D-value (vocd; Malvern & Richards 2002): D is estimated by fitting a curve to TTR values calculated across multiple random subsamples of the text at different lengths. D is theoretically independent of text length and provides a single numeric value (higher D = greater diversity). Vocd software implements this.
MTLD (Measure of Textual Lexical Diversity; McCarthy & Jarvis 2010): MTLD calculates the mean length of sequential word strings within which TTR remains above a threshold (0.72), then averages across forward and backward passes. It is less sensitive to text length.
HD-D (Hypergeometric Distribution D; McCarthy & Jarvis 2007): Based on hypergeometric distribution statistics; estimates the probability of encountering new types at each successive sample; correlates strongly with D.

TTR in SLA research:

Despite its limitations, TTR is still widely used in SLA research as:

Written output measure: Higher TTR in L2 writing composition correlated with teacher proficiency ratings (Arnaud 1992); used in automated essay scoring systems.
Oral production measure: TTR in oral narratives has been used to track vocabulary development in longitudinal L2 studies (Ellis & Barkhuizen 2005).
Fluency and planning research: Planned vs. unplanned oral speech differs in TTR — planned speech tends to show higher lexical diversity.
Proficiency levels: Nation (2001) and others use operational lexical coverage and diversity measures (including TTR-family measures) in vocabulary assessment frameworks.

TTR in Japanese:

TTR application in Japanese requires careful consideration of:

Morphological complexity: Japanese can be analyzed at the word level or morpheme level. Agglutinative Japanese verbal morphology means that 食べる, 食べた, 食べたい, 食べられる, 食べてもらう are morphological variants of the same base — whether these count as one type (lemmatized) or five types (unlemmatized) affects TTR substantially.
Script mixing: Japanese uses multiple scripts — a word may appear in kanji one context and hiragana/katakana in another — creating type-counting decisions about script variants.
Particle repetition: Japanese uses grammatical particles extremely frequently (は, が, を, に, で appear in virtually every sentence) — their high token frequency systematically depresses TTR.
JLPT vocabulary assessment: JLPT vocabulary lists are sometimes analyzed via coverage and diversity measures related to TTR, but these face the same length-dependency issues as raw TTR.

Relationship to vocabulary breadth and depth:

TTR and related diversity measures assess productive vocabulary breadth — the range of vocabulary a learner activates in production. It does not directly measure:

Vocabulary depth (whether the learner knows multiple meanings, collocations, and contexts of type-variety words).
Vocabulary size (passive recognition vocabulary may be much larger than active production vocabulary, independent of TTR).
Accuracy or appropriateness of vocabulary use.

Nation (2001) and Read (2000) treat TTR and diversity measures as one component in a multi-faceted vocabulary assessment framework.

History

Early 20th century: TTR used in stylometric studies of literary authorship.
1950s–1960s: Carroll applied TTR to language development in psychology.
1985–1990s: Wide adoption in SLA writing research; limitations recognized.
2002: Malvern & Richards introduce D-value; vocd software released.
2007–2010: McCarthy & Jarvis develop HD-D and MTLD as alternatives.
2010s–present: Automated essay scoring increasingly uses diversity measures; MTLD and D replace raw TTR in best-practice research.

Common Misconceptions

“Higher TTR = better vocabulary.” TTR is confounded with text length, so comparing TTRs across texts of different lengths is not valid. A learner with a short but high-TTR sample may not have greater vocabulary than a learner with a long, lower-TTR sample.

“TTR measures all aspects of vocabulary knowledge.” TTR measures productive diversity in a given sample — it does not measure depth, accuracy, collocational knowledge, pragmatic appropriateness, or passive vocabulary size.

“Corrected measures solve all problems.” D, MTLD, and HD-D reduce length-dependency substantially but do not solve confounds with genre, topic, or register. All lexical diversity measures require thoughtful interpretation in context.

Criticisms

All TTR-family measures depend on the particular word-type unit chosen (word form, lemma, word family), which affects results substantially and is rarely standardized across studies.
Validity of diversity measures as assessments of vocabulary knowledge (rather than vocabulary use in a particular output context) is contested.
Lexical diversity is one dimension of vocabulary; its relationship to communicative vocabulary competence is indirect.

Social Media Sentiment

Automated writing feedback tools (Turnitin, Grammarly features, some L2 writing apps) include lexical diversity as a metric — making TTR-family measures visible to learners even without their knowing the theoretical construct. Language learners sometimes discuss “word variety” in output as an experience they notice: “I keep using the same 100 words in Japanese.” This intuition corresponds to low lexical diversity in production, and deliberate vocabulary expansion through reading and vocabulary study directly targets it.

Last updated: 2026-04

Practical Application

Check your own diversity: Copy a few hundred words of your written Japanese output into a TTR calculator and compare across time points using consistent-length samples (use the same word count for each sample to make comparisons valid).
Use D-value or MTLD: For more rigorous self-assessment, use tools implementing vocd or MTLD (available in AntConc or specialized vocabulary analysis tools) rather than raw TTR.
Target productive vocabulary gaps: If lexical diversity is low, identify which content words you repeat most often in your output and systematically expand synonyms, near-synonyms, and contextually varied alternatives.
Distinguish passive and active vocabulary: High TTR in reading (passive recognition) but low TTR in writing output is a common L2 asymmetry — deliberate production practice (writing, speaking) is the bridge from passive to active diversity.

Related Terms

Research

Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical Diversity and Language Development: Quantification and Assessment. Palgrave Macmillan. [Summary: Foundational monograph for D-value and vocd methodology; demonstrates length-independence of D; provides theoretical and empirical basis for TTR alternatives; comprehensive reference for lexical diversity measurement.]

McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. [Summary: Comparative validation of MTLD, vocd-D, and HD-D; MTLD performs well across text length conditions; recommended as current best practice; key reference for choosing between diversity measures.]

Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge University Press. [Summary: Comprehensive vocabulary framework including lexical diversity; TTR and coverage measures in context of vocabulary assessment; places diversity within broader vocabulary learning theory; essential reference.]

Daller, H., Milton, J., & Treffers-Daller, J. (Eds.). (2007). Modelling and Assessing Vocabulary Knowledge. Cambridge University Press. [Summary: Edited volume on vocabulary assessment approaches; includes TTR, D, and other diversity measures; discusses limitations and comparisons; review of research on vocabulary assessment in L2 contexts.]

Read, J. (2000). Assessing Vocabulary. Cambridge University Press. [Summary: Comprehensive vocabulary assessment reference; covers TTR and diversity measures in assessment context; distinguishes vocabulary size, depth, and accessibility; places lexical diversity in broader assessment framework.]