Definition:
A multi-word expression (MWE) is any sequence of two or more words that is stored, processed, and used as a conventional unit — including idioms, phrasal verbs, lexical chunks, collocations, fixed phrases, compound nouns, lexical bundles, and conversational formulae. MWEs are not a single linguistically defined category but rather a broad family of phenomena united by the property of being conventional multi-word pairings of form and meaning. They differ from freely composed phrases in that their components are (to varying degrees) fixed, their meaning is (to varying degrees) non-compositional or conventionalized, and they must be acquired as wholes rather than derived from rules alone. Corpus-based estimates suggest that 40–60% of English language production consists of multi-word units, making MWEs arguably more central to language competence than grammatical rules.
Taxonomy of Multi-Word Expressions
Idioms: Fully or largely non-compositional — “kick the bucket,” “red herring,” “bite the bullet.” Meaning cannot be derived from components.
Phrasal verbs: Verb + particle combinations, often idiomatic — “give up,” “look into,” “come across.”
Collocations: Statistically strong but semantically transparent word partnerships — “make a decision,” “blond hair,” “heavy drinker.”
Fixed formulae / social phrases: Conventionalized pragmatic expressions — “How are you?”, “Long time no see,” “Better late than never.”
Compound nouns: “Bus stop,” “air conditioning,” “machine learning” — lexicalized noun-noun or adjective-noun combinations.
Lexical bundles: High-frequency multi-word sequences identified statistically in corpora — “as a result of,” “I would like to,” “in the case of.” Not necessarily idiomatic; just highly co-occurring as a chunk.
Binomials: Fixed paired expressions — “bread and butter,” “black and white,” “trial and error.”
Semi-fixed frames: Partially open templates — “the fact that ___,” “it is important to ___,” “as far as ___ is concerned.”
Why MWEs Matter
Frequency: MWEs are not marginal features of language; they constitute the fabric of ordinary communication. A learner who knows 20,000 individual word forms but knows few MWEs will produce grammatically correct but non-nativelike language that is perceived as “foreign.”
Comprehension: In both reading and listening, failure to recognize MWE boundaries leads to compositional misinterpretation: “The project finally took off” ? literal misreading of two words.
Fluency enablement: Psycholinguistic research (Pawley & Syder 1983; Wray 2002) demonstrates that native-speaker fluency depends in large part on prefabricated multi-word sequences that reduce online processing load. Learners who rely exclusively on word-by-word construction produce slower, more disfluent speech.
Cultural competence: Many MWEs (especially idioms and fixed social expressions) encode cultural reference points, humor, and group identity signals that cannot be acquired without extensive sociocultural exposure.
MWE Acquisition
MWE acquisition is incremental and heavily dependent on input frequency. Learners typically:
- First encounter an MWE as a whole without decomposing it (holistic phase)
- Later analyze its components and internal structure (analytic phase)
- Eventually re-automatize it as a retrieval unit with rapid access (re-chunking phase)
Explicit attention and vocabulary logging accelerate Phase 1. Extensive reading and listening accelerate all phases.
History
Bolinger (1976): Called for treating the “fixed phrase” as a first-class linguistic object.
Pawley & Syder (1983): “Two Puzzles for Linguistic Theory” — proposed prefabricated chunks as the mechanism for nativelike fluency.
Sinclair (1991): Corpus, Concordance, Collocation — introduced the “idiom principle” (language is largely a series of multi-word choices) alongside the “open choice principle” (grammar rules); corpus-based foundations.
Wray (2002): Formulaic Language and the Lexicon — definitive psycholinguistic treatment of formulaic sequences in L1 and L2 processing.
Baldwin & Kim (2010): “Multiword Expressions” — contemporary computational NLP treatment; MWE identification as a core NLP task.
Practical Application
- Treat MWEs as vocabulary items — each new MWE you encounter is a vocabulary learning opportunity, not a grammar exception. Log it as a whole unit with meaning, example, and register note.
- Build a phrase notebook or digital card deck — dedicated tracking of MWEs is one of the most high-ROI vocabulary learning practices available to intermediate and advanced learners, where single-word learning gains diminish.
Common Misconceptions
“Multiword expressions are just idioms.”
Idioms are one type of multiword expression, but the category also includes collocations (make a decision), phrasal verbs (pick up), fixed phrases (as a matter of fact), binomials (bread and butter), and lexical bundles (on the other hand). Most multiword expressions are compositional to varying degrees.
“Learning words individually is sufficient for fluency.”
Research shows that a large proportion of fluent speech consists of multiword expressions retrieved as wholes rather than assembled word by word. Lacking multiword expression knowledge produces speech that is grammatically correct but unnatural.
Criticisms
Multiword expression research has been critiqued for definitional inconsistency — the boundary between “multiword expression,” “collocation,” “chunk,” and “formulaic sequence” is disputed, with different researchers using different criteria. Computational approaches to identifying multiword expressions often produce conflicting results depending on the statistical measure used. The pedagogical implications remain unclear: should multiword expressions be taught explicitly, or are they best acquired through extensive input?
Social Media Sentiment
Multiword expressions are discussed in language learning communities primarily under the label “collocations” — learners recognize that knowing which words “go together” is essential for natural-sounding speech. Advanced learners frequently recommend collocation dictionaries and extensive reading as strategies for acquiring multiword expressions. The concept is central to discussions about what separates intermediate from advanced proficiency.
Last updated: 2026-04
Related Terms
- Idiom
- Phrasal Verb
- Lexical Chunk
- Collocational Competence
- Vocabulary Breadth
- Active Vocabulary
- Fluency vs. Accuracy
See Also
- Lexical Chunk — Overlapping concept; chunks and MWEs are closely related terms from different research traditions
- Idiom — MWE subtype with non-compositional meaning
- Phrasal Verb — MWE subtype; verb-particle units
- Collocational Competence — Competence that includes mastery of collocational MWEs
- Sakubo
Research
1. Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press.
The foundational work on formulaic language — argues that multiword expressions are stored holistically in the mental lexicon and play a central role in fluent native-speaker language production.
2. Nattinger, J.R., & DeCarrico, J.S. (1992). Lexical Phrases and Language Teaching. Oxford University Press.
Pioneering work applying multiword expression research to language pedagogy — demonstrates that teaching lexical phrases improves fluency and argues for their central role in language instruction.