Chunk

A chunk is a multi-word sequence that is stored and processed as a single holistic unit in memory, rather than being assembled word-by-word from grammatical rules. Examples range from fixed expressions (by the way, at the end of the day, nice to meet you) to partially fixed frames (can you ___?, it’s a matter of ___) to common collocations (make a decision, take a shower). Chunks form the backbone of formulaic language and are fundamental to fluency in both first and second language use.


In-Depth Explanation

What makes something a chunk?

Chunks are characterized by three properties:

  1. Frequency: They occur together in language far more often than chance would predict
  2. Holistic storage: They are stored and retrieved as single units, bypassing compositional processing
  3. Functional coherence: They perform recognizable communicative functions in context

Not all chunks are idioms (fixed in meaning) or idioms are not all chunks. Many chunks are semantically transparent but still stored as units: of course, I don’t know, have a look are fully interpretable but habitually chunked by native speakers.

Types of formulaic sequences

Research (Wray 2002; Erman & Warren 2000) identifies several subtypes:

TypeExampleNotes
Fixed idiomkick the bucketOpaque meaning, fixed form
Proverbtime fliesFixed, figurative
Collocationmake a decision (not do a decision)Preferred co-occurrence
Frame with slotthe ___ of the matter isPartially fixed, slot-fillable
Discourse markerhaving said that, by the wayFunctional, conversational
Greeting/routinehow are you? / fine, thanksSocial ritual sequence

Chunks and fluency

Native speakers retrieve chunks as single units without compositional analysis — this explains how fluent speech can proceed at speeds that would be impossible if every word were assembled rule-by-rule. For L2 learners, a large chunk inventory provides pre-packaged utterances that reduce online processing load, freeing cognitive resources for meaning and communication. This is why learners with a rich chunk repertoire sound more fluent than those relying entirely on rule-governed production.

Chunks vs. grammar rules

The relationship between chunks and grammar is a key theoretical question. Traditional generative approaches treat grammar as the foundation and chunks as lexical exceptions to be listed. Usage-based approaches (Tomasello 2003; Ellis 2002) argue the opposite: chunks come first in acquisition, and abstract grammatical rules are induced from patterns in chunks over time. Children acquire grammar not by learning rules and applying them to words, but by gradually extracting patterns from formulaic sequences — I want + X, gimme + X, where’s + X — and schematizing them into abstract categories.


History

The concept of chunks and formulaic language has a long history in linguistics. Firth (1957) introduced the concept of collocation. Bolinger (1976) argued that language is “much more a matter of prefabricated chunks” than rule application. The idea gained systematic research momentum with Pawley and Syder’s (1983) seminal paper on “nativelike selection” — arguing that what makes native speakers sound native is not just grammaticality but their selection of conventional formulaic sequences. Nattinger and DeCarrico (1992) applied the concept directly to language teaching in Lexical Phrases and Language Teaching. Lewis (1993) built the Lexical Approach around chunks, arguing that “language consists of grammaticalized lexis, not lexicalized grammar.” Wray (2002) provided the most comprehensive theoretical synthesis in Formulaic Language and the Lexicon. Since then, corpus linguistics has provided empirical tools to identify chunks quantitatively, driving a large body of research on L2 chunk acquisition and instruction.


Common Misconceptions

  • “Chunks are just idioms or fixed expressions.” Most chunks are not opaque idioms — the majority are frequency-based meaningful sequences (in terms of, as a result of, I think that) that are transparent in meaning but stored holistically.
  • “Memorizing chunks is rote learning, not real language.” Chunk-based learning is how native speaker fluency works. The distinction between “real grammar” and “mere chunks” is artificial — chunks encode grammatical patterns and are the raw material from which grammar is abstracted.
  • “Teaching chunks undermines communicative competence.” The reverse is argued by many researchers: a rich chunk inventory is essential for communicative fluency, as it reduces the processing burden of online production.

Social Media Sentiment

The concept of chunks and formulaic language is popular in language-learning communities, particularly those focused on immersion and comprehensible input. The idea that fluency comes from absorbing thousands of whole expressions (not mastering grammar rules) resonates with immersionists and AJATT/MISA practitioners. The “sentence mining” practice in SRS communities directly targets chunk acquisition — learners extract and review whole sentences, not isolated words, building a chunk repertoire through repeated exposure. Terms like “set phrases,” “collocations,” and “fixed expressions” are used interchangeably in online discussion.

Last updated: 2026-04


Practical Application

  • Mine sentences, not words: When adding vocabulary to an SRS (Anki), learn words in example sentences — this naturally builds chunk knowledge alongside vocabulary.
  • Notice formulaic sequences: During immersion in Japanese, notice recurring phrases (そういえば、ところで、なるほど, etc.) and add them as whole units rather than analyzing them component-by-component.
  • Japanese-specific chunks: Japanese has a rich inventory of sentence-ending patterns (~んだけど、~じゃないかな、~かもしれない), set conversational routines, and keigo formulae. These are better acquired as chunks than derived from grammatical rules.
  • Teach phrases before rules: In production practice, using a memorized chunk correctly is more communicatively effective than constructing a grammatically perfect but unnatural sentence from scratch.

Related Terms


See Also

  • Sakubo – Japanese SRS App — Japanese SRS study app; sentence-level review in SRS directly supports chunk acquisition, which is central to achieving natural fluency.

Sources