Definition:
Formulaic sequences are strings of two or more words that are stored and retrieved from the mental lexicon as prefabricated units, rather than being constructed afresh each time from individual words and grammatical rules. They include fixed idioms (kick the bucket), conventional phrases (as a result of, on the other hand), collocations, lexical bundles (it is important to note that), and formulaic social routines (how are you doing?).
Also known as: formulaic language, prefabricated language, multi-word units, lexical chunks, chunks, prefabs (informal)
In-Depth Explanation
Formulaic sequences occupy a central role in both L1 and L2 language use. Research suggests that a large proportion of everyday speech — estimates range from 30–70% depending on the counting method — consists of formulaic rather than freshly generated language. Wray (2002) defines them as “a sequence, continuous or discontinuous, of words or other elements which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar.”
Why They Matter for Language Learning
For L2 learners, formulaic sequences create a processing shortcut. Instead of spending cognitive resources constructing I’d be happy to help, a learner who has stored this phrase can produce it fluently without consciously retrieving the words happy, to, help and applying linking rules. This frees working memory for content — leading to the observation that phrasal knowledge and fluency are strongly linked.
Native speakers have stored thousands of formulaic sequences. L2 learners who rely on rules-only production consistently produce syntactically correct but non-native-sounding output (I have a great pleasure instead of it’s a great pleasure) — because native speakers store these phrases whole, not as generated grammar.
Types of Formulaic Sequences
Formulaic language covers a wide and overlapping range of phenomena:
| Type | Example | Notes |
|---|---|---|
| Fixed idiom | spill the beans, kick the bucket | Semantically opaque; meaning ≠ sum of parts |
| Semi-fixed idiom | burn one’s bridges | Core fixed; some variation allowed |
| Collocation | strong coffee, heavy rain | Preferred co-occurrence; not always predictable |
| Lexical bundle | it is important to, in the case of | High-frequency multi-word units in academic/written registers |
| Social formula | how are you?, nice to meet you | Conventional pragmatic routines |
| Sentence stem | what I want to say is, the reason why | Slot-and-filler patterns with substitutable elements |
| Idiom blend | between a hard place, barking up the wrong tree | Malapropisms / errors revealing formula-level storage |
The boundary between “formulaic” and “generated” is fuzzy. An item like on the other hand may be formulaically stored for proficient users but assembled grammatically by learners.
The Dual-Route Debate
A significant theoretical debate concerns whether formulaic language is processed by a distinct memory system (the memory route) or by the same grammar as rule-generated language (the grammar route). The dual-route (or dual-mechanism) model (Pinker, 1999; Ullman’s Declarative/Procedural model) proposes a genuine architecture split: irregular past tenses (went, broke) and formulaic phrases are stored in declarative memory, while rule-based construction uses procedural memory. Critics (connectionist models, usage-based linguistics) argue that the distinction is gradient, not binary — frequency of use can shift any item toward holistic storage.
History
- 1970s: Pawley and Syder (1983, but research circulated earlier) identify nativelike selection and nativelike fluency as dependent on stored multi-word units — the first systematic linguistic treatment of the problem.
- 1983: Pawley & Syder publish “Two Puzzles for Linguistic Theory,” the foundational formulaic sequences paper. They conclude that fluent native speech requires a memorized phrasal lexicon far beyond what grammar-only models can explain.
- 1990s: Nattinger & DeCarrico (1992) introduce lexical phrases into ESL pedagogy. Lewis (1993) argues for a Lexical Approach centered on teaching chunks rather than grammar + vocabulary separately.
- 2002: Wray publishes Formulaic Language and the Lexicon — the most comprehensive theoretical treatment to date; provides the widely used definition above.
- 2008–present: Corpus linguistics enables systematic frequency-based identification of lexical bundles (Biber et al., 2004). Second language acquisition research investigates formulaic knowledge as a predictor of fluency and collocational knowledge.
Practical Application
For learners: Explicitly learning common collocations and conventional phrases — rather than only rules — accelerates native-like production. This doesn’t mean memorizing lists; it means noticing co-occurrence patterns when reading and using SRS to consolidate chunks encountered in natural input.
For Japanese learners specifically: Japanese has abundant formulaic structures at multiple levels: fixed greeting formulas (yoroshiku onegaishimasu, otsukaresama desu), conventional sentence-final patterns, and pragmatic routines that are non-compositional. These cannot be derived from grammar alone and must be memorized as wholes.
For teachers: The Lexical Approach (Lewis) recommends structuring lessons around frequent multi-word chunks. Concordance tools and learner corpus data can surface which collocations are most frequent in a given register.
Common Misconceptions
“Formulaic sequences are just idioms.”
Idioms are a subset. The broader category includes collocations, conventional phrases, lexical bundles, and grammatical routines — the vast majority of which are not semantically opaque or exotic.
“Learning vocabulary means learning words.”
Vocabulary knowledge includes knowing which words co-occur with which. Strong tea, not powerful tea. Heavy rain, not strong rain. Collocational knowledge — a form of formulaic knowledge — is a distinct, learnable component of vocabulary.
Criticisms
- Measuring formulas is methodologically contested. No agreed criteria determine what counts as formulaic vs. generated. This makes it difficult to compare findings across studies.
- The pedagogical Lexical Approach has limited empirical support. Despite its intuitive appeal, controlled studies showing that explicitly teaching chunks improves L2 production over grammar-focused instruction are relatively few.
- Formulaic storage may be more gradient than the theory implies. Connectionist accounts show that frequency-driven neural networks naturally develop chunk-like representations without a separate formula storage system.
Social Media Sentiment
- r/languagelearning: “Learn chunks, not rules” is recurring advice and generally well-received. Collocations and formulaic phrases come up frequently in vocabulary discussions.
- r/LearnJapanese: The concept is implicit in debates about sentence mining vs. grammar study — sentence mining inherently captures formulaic language in context.
- Twitter/X: Vocabulary researchers (Paul Nation, Michael McCarthy) have popularized chunk-learning in ELT circles; the concept has broad teacher uptake.
Last updated: 2026-04
Related Terms
See Also
- Sakubo – Study Japanese
- Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press. — the foundational theoretical treatment.
Research
- Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press.
Summary: The most comprehensive theoretical treatment of formulaic sequences in linguistics, providing the widely adopted definition and reviewing evidence for holistic storage from psycholinguistics and language pathology.
- Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: nativelike selection and nativelike fluency. In J. Richards & R. Schmidt (Eds.), Language and Communication (pp. 191–226). Longman.
Summary: The foundational paper identifying that native-like fluency requires access to a memorized phrasal lexicon far beyond what rule-based grammar models predict.
- Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405. https://doi.org/10.1093/applin/25.3.371
Summary: Corpus-based study identifying frequent multi-word lexical bundles in academic registers, establishing their pedagogical importance for EAP.
- Lewis, M. (1993). The Lexical Approach: The State of ELT and a Way Forward. Language Teaching Publications.
Summary: The pedagogical manifesto for chunk-based language teaching; influenced curriculum design significantly in the 1990s–2010s despite limited controlled empirical confirmation.