Definition:
Lexical density is a quantitative measure of the ratio of content (lexical) words — nouns, main verbs, adjectives, and content adverbs — to the total number of words in a text, used as an indicator of textual information density, stylistic complexity, and the register differences between spoken and written language, first systematically investigated by Ure (1971) and later theorized by Halliday (1985) as a key differentiator between spoken and written modes of language — written language having characteristically higher lexical density than spoken language.
Calculating Lexical Density
Formula: Lexical Density = (Number of content words / Total number of words) × 100
Each content word (noun, main verb, adjective, content adverb) is counted once per occurrence (or as a type — type-token ratio is a related but distinct measure).
Example:
> “The exhausted student carefully revised the complex essay before submission.”
> Content words: exhausted, student, carefully, revised, complex, essay, submission = 7
> Total words: 9
> Lexical density ≈ 78%
This is typical of written language. Spoken language typically has lexical densities of 30–50%; written language 50–70%+ (academic writing at the high end).
Halliday on Spoken vs. Written Language
Halliday (1985) argued that spoken and written language are not simply different realizations of the same grammar — they are fundamentally different modes:
- Written language: Lexically dense, grammatically simple (long nominal groups, few clauses)
- Spoken language: Grammatically intricate (many clauses), lexically sparse (more function words, deixis, pronouns)
This is the “lexical density” vs. “grammatical intricacy” distinction.
Lexical Density in SLA
Lexical density is used as a measure of:
- L2 writing development: higher lexical density is associated with more advanced proficiency
- Register appropriateness: learners must develop sensitivity to appropriate densities for different contexts
- Vocabulary breadth: lexical density reflects access to content word vocabulary
Studies of L2 learners’ writing development track lexical density alongside other measures (syntactic complexity, lexical variety).
Type-Token Ratio (TTR)
Related to lexical density, Type-Token Ratio measures lexical variety:
TTR = (Number of unique word types / Total tokens) × 100
High TTR indicates greater lexical diversity. Adjusted measures (MSTTR, vocd) correct for the effect of text length on TTR.
Lexical Density as a Register Marker
Lexical density is a reliable marker of:
- Academic vs. conversational register
- Written vs. spoken production
- Formality level
Corpus analyses consistently show higher lexical density in academic and scientific prose than in spontaneous spoken dialogue.
History
Ure (1971) first proposed lexical density as a measurable property distinguishing spoken from written language. Halliday (1985) theorized the spoken-written distinction in terms of lexical density vs. grammatical intricacy, making it a central concept in SFL text analysis. Computer-assisted corpus analysis has made large-scale lexical density calculation routine.
Common Misconceptions
- “Higher lexical density always means better writing.” Very high density can impede comprehension; genre-appropriate density is the goal, not maximum density.
- “Lexical density is the same as vocabulary richness.” Lexical density measures the proportion of content words; vocabulary richness (diversity, sophistication) is captured by TTR and other measures.
Criticisms
Lexical density is criticized for oversimplifying text complexity: it does not capture the sophistication of the content words used, the syntactic complexity of the embedding, or genre-internal variation. Multiple indices of lexical sophistication (Lexical Frequency Profile, Academic Word List coverage) provide richer pictures.
Social Media Sentiment
Lexical density rarely appears explicitly in social media discussions but is implicated in debates about academic writing difficulty, readability, and the “plain language” movement. Applied linguistics communities and EAP practitioners discuss it in professional contexts.
Last updated: 2025-07
Practical Application
Writing teachers use lexical density as a diagnostic tool: student academic texts that are very low in lexical density may rely too heavily on function words and repetition; instruction in nominalization, academic vocabulary, and sentence combining can increase appropriate density.
Related Terms
See Also
Research
Halliday, M. A. K. (1985). Spoken and Written Language. Deakin University Press.
The foundational SFL account contrasting the lexical density of written language with the grammatical intricacy of spoken language — establishing the theoretical framework within which lexical density is most commonly applied.
Ure, J. (1971). Lexical density and register differentiation. In G. Perren & J. L. M. Trim (Eds.), Applications of Linguistics (pp. 443–452). Cambridge University Press.
The original proposal of lexical density as a measurable text property distinguishing registers — foundational for the construct.
Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge University Press.
The standard comprehensive reference for vocabulary research in SLA — covering word frequency, lexical richness measures, and learning burden in ways directly relevant to lexical density and vocabulary development.