A chunk is a multi-word sequence that is stored and processed as a single holistic unit in memory, rather than being assembled word-by-word from grammatical rules. Examples range from fixed expressions (by the way, at the end of the day, nice to meet you) to partially fixed frames (can you ___?, it’s a matter of ___) to common collocations (make a decision, take a shower). Chunks form the backbone of formulaic language and are fundamental to fluency in both first and second language use.
In-Depth Explanation
What makes something a chunk?
Chunks are characterized by three properties:
- Frequency: They occur together in language far more often than chance would predict
- Holistic storage: They are stored and retrieved as single units, bypassing compositional processing
- Functional coherence: They perform recognizable communicative functions in context
Not all chunks are idioms (fixed in meaning) or idioms are not all chunks. Many chunks are semantically transparent but still stored as units: of course, I don’t know, have a look are fully interpretable but habitually chunked by native speakers.
Types of formulaic sequences
Research (Wray 2002; Erman & Warren 2000) identifies several subtypes:
| Type | Example | Notes |
|---|---|---|
| Fixed idiom | kick the bucket | Opaque meaning, fixed form |
| Proverb | time flies | Fixed, figurative |
| Collocation | make a decision (not do a decision) | Preferred co-occurrence |
| Frame with slot | the ___ of the matter is | Partially fixed, slot-fillable |
| Discourse marker | having said that, by the way | Functional, conversational |
| Greeting/routine | how are you? / fine, thanks | Social ritual sequence |
Chunks and fluency
Native speakers retrieve chunks as single units without compositional analysis — this explains how fluent speech can proceed at speeds that would be impossible if every word were assembled rule-by-rule. For L2 learners, a large chunk inventory provides pre-packaged utterances that reduce online processing load, freeing cognitive resources for meaning and communication. This is why learners with a rich chunk repertoire sound more fluent than those relying entirely on rule-governed production.
Chunks vs. grammar rules
The relationship between chunks and grammar is a key theoretical question. Traditional generative approaches treat grammar as the foundation and chunks as lexical exceptions to be listed. Usage-based approaches (Tomasello 2003; Ellis 2002) argue the opposite: chunks come first in acquisition, and abstract grammatical rules are induced from patterns in chunks over time. Children acquire grammar not by learning rules and applying them to words, but by gradually extracting patterns from formulaic sequences — I want + X, gimme + X, where’s + X — and schematizing them into abstract categories.
History
The concept of chunks and formulaic language has a long history in linguistics. Firth (1957) introduced the concept of collocation. Bolinger (1976) argued that language is “much more a matter of prefabricated chunks” than rule application. The idea gained systematic research momentum with Pawley and Syder’s (1983) seminal paper on “nativelike selection” — arguing that what makes native speakers sound native is not just grammaticality but their selection of conventional formulaic sequences. Nattinger and DeCarrico (1992) applied the concept directly to language teaching in Lexical Phrases and Language Teaching. Lewis (1993) built the Lexical Approach around chunks, arguing that “language consists of grammaticalized lexis, not lexicalized grammar.” Wray (2002) provided the most comprehensive theoretical synthesis in Formulaic Language and the Lexicon. Since then, corpus linguistics has provided empirical tools to identify chunks quantitatively, driving a large body of research on L2 chunk acquisition and instruction.
Common Misconceptions
- “Chunks are just idioms or fixed expressions.” Most chunks are not opaque idioms — the majority are frequency-based meaningful sequences (in terms of, as a result of, I think that) that are transparent in meaning but stored holistically.
- “Memorizing chunks is rote learning, not real language.” Chunk-based learning is how native speaker fluency works. The distinction between “real grammar” and “mere chunks” is artificial — chunks encode grammatical patterns and are the raw material from which grammar is abstracted.
- “Teaching chunks undermines communicative competence.” The reverse is argued by many researchers: a rich chunk inventory is essential for communicative fluency, as it reduces the processing burden of online production.
Social Media Sentiment
The concept of chunks and formulaic language is popular in language-learning communities, particularly those focused on immersion and comprehensible input. The idea that fluency comes from absorbing thousands of whole expressions (not mastering grammar rules) resonates with immersionists and AJATT/MISA practitioners. The “sentence mining” practice in SRS communities directly targets chunk acquisition — learners extract and review whole sentences, not isolated words, building a chunk repertoire through repeated exposure. Terms like “set phrases,” “collocations,” and “fixed expressions” are used interchangeably in online discussion.
Last updated: 2026-04
Practical Application
- Mine sentences, not words: When adding vocabulary to an SRS (Anki), learn words in example sentences — this naturally builds chunk knowledge alongside vocabulary.
- Notice formulaic sequences: During immersion in Japanese, notice recurring phrases (そういえば、ところで、なるほど, etc.) and add them as whole units rather than analyzing them component-by-component.
- Japanese-specific chunks: Japanese has a rich inventory of sentence-ending patterns (~んだけど、~じゃないかな、~かもしれない), set conversational routines, and keigo formulae. These are better acquired as chunks than derived from grammatical rules.
- Teach phrases before rules: In production practice, using a memorized chunk correctly is more communicatively effective than constructing a grammatically perfect but unnatural sentence from scratch.
Related Terms
See Also
- Sakubo – Japanese SRS App — Japanese SRS study app; sentence-level review in SRS directly supports chunk acquisition, which is central to achieving natural fluency.
Sources
- Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press. — the comprehensive theoretical synthesis of formulaic language research; defines chunks, identifies types, and reviews acquisition evidence.
- Pawley, A. & Syder, F. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. Richards & R. Schmidt (Eds.), Language and Communication. — foundational paper arguing that native-speaker fluency is primarily a matter of formulaic sequence selection, not rule application.
- Ellis, N. (2002). Frequency effects in language processing. Studies in Second Language Acquisition, 24(2), 143–188. — reviews how frequency-based chunk learning drives L2 acquisition; key paper connecting corpus linguistics to SLA chunk theory.