Multiword Unit

Definition:

A multiword unit (MWU) is a sequence of two or more words that functions conventionally as a single unit of meaning, grammar, or pragmatic function. Multiword units include: idioms (kick the bucket), collocations (make a decision), phrasal verbs (look up), fixed expressions (by the way, how are you?), sentence frames (the reason is that…), and lexical bundles (as a result of, in terms of). Also called formulaic sequences, chunks, or multi-word expressions, MWUs are a fundamental feature of natural, fluent language — a large proportion of everyday speech consists of retrieved conventional sequences rather than novel word-by-word construction. Acquisition of MWUs is increasingly recognized as a central component of communicative competence.


Why Multiword Units Matter for Fluency

The fluency argument: Fluent language production relies heavily on retrieval of stored chunks rather than real-time rule-based construction. When a native speaker says “by the way…” or “make a decision,” they retrieve the sequence as a unit, not word-by-word. L2 learners who rely only on rule application (construct every sentence from scratch) produce grammatically correct but unnatural, slow output.

Wray (2002) argues that a large portion of native speaker language is formulaic — pre-stored and retrieved holistically:

> “A considerable proportion of the language we produce may consist of ready-made units of varying length.”

Types of Multiword Units

Idioms:

Meaning is non-compositional (cannot be predicted from individual word meanings).

  • kick the bucket (die)
  • bite the bullet (endure pain stoically)
  • once in a blue moon (rarely)

Collocations:

Words that statistically co-occur with high frequency but retain compositional meaning.

  • make a mistake (not do a mistake)
  • heavy rain (not strong rain)

Phrasal verbs:

Verb + particle combinations with idiomatic or extended meaning.

  • look up (search for; improve)
  • give up (quit)
  • turn down (reject; lower volume)

Fixed expressions and social formulae:

Pragmatically fixed phrases used in specific social contexts.

  • How do you do? / Nice to meet you.
  • I’m sorry to hear that.
  • Thanks for having me.

Lexical bundles:

Frequently recurring multi-word sequences in corpora (Biber et al.).

  • as a result of, at the same time, on the other hand
  • Research shows academic writing relies on a limited set of recurring lexical bundles

Multiword Units in SLA

Acquisition path: Children and early L2 learners often first acquire formulaic chunks holistically before analyzing them grammatically. “What’s this?” is learned as a chunk before the learner understands the grammar of what + is + this.

The analysis-synthesis problem: At some point, learners must analyze chunks into components to generate novel sentences. Over-reliance on unanalyzed chunks limits productive competence; under-reliance produces grammatical but unnatural output.

Processing: MWUs are processed faster than novel sequences of equal length — they represent lower cognitive load, which is why they contribute to fluency.

Multiword Units in Japanese

Japanese MWUs include:

  • Compound verbs: 〜始める (~hajimeru, “start doing”), 〜続ける (~tsuzukeru, “continue doing”), 〜終わる (~owaru, “finish doing”) — productive patterns that function as multiword constructions
  • Verb + て-form chains: Complex event sequences expressed as formulaic chains
  • Set expressions (決まり文句): 暑いですね (“It’s hot, isn’t it?”), よろしくお願いします (conventionalized social ritual)
  • Grammar patterns: Japanese learners study patterns like Nにとって (“for N / from N’s perspective”), 〜に違いない (“must be ~”), which function as formulaic units
  • Keigo formulae: Highly formulaic sequences in polite/formal contexts, often learned as unanalyzed chunks

History

The study of multiword units gained prominence through corpus linguistics in the 1980s-1990s, when large-scale computer analyses revealed that a substantial portion of natural language consists of semi-fixed word combinations rather than novel constructions. Sinclair (1991) articulated this through the “idiom principle” — the observation that speakers have available a large number of semi-preconstructed phrases. Pawley and Syder (1983) identified the “puzzle of nativelike selection” — native speakers routinely choose specific multiword combinations over equally grammatical alternatives, suggesting that phrasal competence is central to fluency. Wray (2002) provided a comprehensive review of formulaic language, and the lexical approach (Lewis, 1993) translated these findings into pedagogical proposals for teaching vocabulary as chunks.


Common Misconceptions

“Multiword units are just idioms.”

Idioms are one subcategory, but multiword units also include collocations (heavy rain, make a decision), phrasal verbs (give up, look into), fixed expressions (by the way, on the other hand), and semi-fixed frames (the ___ is that…). Most multiword units are compositional (their meaning is predictable from parts), unlike idioms.

“You can always generate multiword combinations from vocabulary + grammar knowledge.”

This is the “open choice principle” that Sinclair contrasted with the idiom principle. In practice, native speakers select specific word combinations (make a decision, not do a decision; strong coffee, not powerful coffee) that cannot be predicted from grammar or word meaning alone.

“Multiword units don’t need explicit study — they’ll come naturally.”

Research shows that L2 learners underuse and misuse multiword units even at advanced levels. Formulaic competence requires explicit attention because the specific combinations chosen by native speakers are arbitrary from the learner’s perspective.

“Learning individual words is more efficient than learning multiword units.”

Multiword units provide more communicative power per learning unit: “by the way” learned as a chunk is more useful than learning “by,” “the,” and “way” separately. Phrases function as single processing units, reducing working memory load in real-time communication.


Criticisms

Multiword unit research has been criticized for classification inconsistency — the terminology (formulaic sequence, lexical bundle, chunk, collocation, multiword expression, phraseme) is fragmented across subfields, making it unclear whether researchers are studying the same or different phenomena. The boundaries between multiword units and free combinations are gradient rather than categorical, complicating both research and instruction.

Pedagogically, the challenge is scalability: native speakers control thousands of multiword units, but identifying which to teach and in what order requires frequency data that is not available for all languages and registers. Additionally, explicit teaching of multiword units shows inconsistent retention gains — learners can recognize taught chunks but may not produce them spontaneously in conversation, suggesting that input exposure is needed alongside explicit instruction.


Social Media Sentiment

Multiword units are discussed in language learning communities through practical terms: “phrases,” “expressions,” “set phrases,” and “chunks.” The advice to “learn phrases, not just individual words” appears constantly in r/languagelearning and is among the most common study tips shared in the community.

Japanese learning communities discuss specific multiword patterns: ~ことがある, ~ようにする, ~というのは — grammatical chunks that function as multiword units. The community generally supports phrase-based learning, particularly at intermediate levels where individual word knowledge is no longer the primary bottleneck.


Practical Application

  1. Learn words in their common combinations — When acquiring a new word, note what other words it typically appears with. Don’t just learn “make” — learn “make a decision,” “make progress,” “make sense.”
  2. Use sentence mining — Mining full sentences automatically captures the multiword units that isolated vocabulary cards miss.
  3. Identify formulaic frames — Patterns like “the thing is…,” “it depends on…,” or Japanese ~ということは can be learned as productive frames with variable slots.
  4. Read extensively — Repeated encounters with multiword units in natural context builds the implicit knowledge that makes them available for production. Extensive reading provides the exposure volume needed.

Related Terms


See Also


Research

Wray (2002) provided the comprehensive review establishing formulaic language as a central phenomenon in both L1 and L2 processing. Pawley and Syder (1983) identified the “puzzle of nativelike selection” — why native speakers prefer specific word combinations — as a fundamental challenge for SLA.

Conklin and Schmitt (2012) demonstrated processing advantages for multiword units: both native speakers and L2 learners read formulaic sequences faster than matched novel combinations, suggesting that multiword units are stored and processed as holistic units. For pedagogical application, Boers and Lindstromberg (2012) found that explicit instruction on formulaic sequences improved both recognition and production, particularly when accompanied by form-focused activities that highlighted the specific word combinations.