Short-term Memory

Definition:

Short-term memory (STM) is the memory system that temporarily holds a small amount of information in an active, immediately accessible state for a brief period — typically seconds — without rehearsal. It is the most limited memory store in the classical modal model of memory, characterized by rapid decay and a strict capacity constraint. In modern cognitive science, short-term memory is largely superseded by the more precise concept of working memory, which includes both storage and active processing.

Also known as: STM, primary memory, immediate memory, short-term store

In-Depth Explanation

The classical distinction between short-term and long-term memory was formalized by Atkinson and Shiffrin’s “modal model” (1968), which described memory as a three-stage system: sensory memory (very brief, preconscious sensory traces), short-term memory (limited capacity, seconds duration), and long-term memory (effectively unlimited capacity and duration). In this model, information flows from sensory memory into STM, and from STM into long-term memory through rehearsal.

Short-term memory has two defining characteristics that distinguish it from long-term memory:

Capacity: George Miller’s famous 1956 paper established that short-term memory holds approximately 7 ± 2 “chunks” of information — typically 5 to 9 items, regardless of whether those items are letters, words, numbers, or larger meaningful units. (Contemporary research tends toward a lower estimate of 4 ± 1 chunks for most tasks.) The chunk size depends on existing long-term memory structure: experts can hold more information because they have larger, more integrated chunks.

Duration: Without rehearsal, short-term memory traces fade within approximately 15–30 seconds. Peterson and Peterson (1959) demonstrated this with the Brown-Peterson paradigm: participants remembered 3 letters after 3 seconds (90% accuracy) but less than 10% after 18 seconds when prevented from rehearsing. The rapid decay is one of the hallmarks distinguishing STM from long-term storage.

The STM ? LTM transfer mechanism is central to understanding why SRS works. Information temporarily held in STM can be transferred to long-term memory through:

Maintenance rehearsal: Repeating information to keep it in STM (effective for short-term retention, poor for long-term encoding)
Elaborative rehearsal: Connecting new information to existing long-term memories, attaching meaning, and engaging deeper processing (the foundation of effective encoding)
Retrieval practice: Re-activating a long-term memory trace through effortful recall, which strengthens the trace (active recall, retrieval practice)

The limitations of short-term memory explain why cramming and massed practice fail for long-term retention. During a cramming session, information is maintained in STM through rehearsal but not deeply encoded into long-term memory. When the rehearsal loop is disrupted, the traces decay rapidly. SRS solves this by distributing retrieval across time — each spaced retrieval forces long-term memory reconstruction rather than STM maintenance.

The modern framework — working memory (Alan Baddeley, 1974) — replaces the simple STM concept with a more detailed architecture: the phonological loop (verbal/acoustic rehearsal), visuospatial sketchpad (visual and spatial information), episodic buffer (integrating information across systems), and the central executive (attentional control). Working memory subsumes STM’s storage function and adds active manipulation, making it the more theoretically precise construct for understanding cognitive processing in learning.

Common Misconceptions

“Short-term memory and working memory are the same thing.”

Short-term memory refers specifically to temporary storage; working memory includes both storage and active manipulation. Working memory is the broader, more recent concept that has largely replaced STM in cognitive research — but STM remains the correct term when discussing the original Atkinson-Shiffrin model or pure storage capacity limits.

“Improving your short-term memory will improve your long-term learning.”

Short-term memory capacity is largely fixed (around 4–7 chunks for most adults and does not substantially improve with training). What does improve with expertise is the size of meaningful chunks — long-term memory knowledge enables larger encoding units, which effectively expands functional working memory capacity. Learning more vocabulary improves how efficiently new vocabulary can be processed, not the raw capacity of STM.

“Information in short-term memory is automatically saved to long-term memory after enough repetition.”

Maintenance rehearsal (repeating items to prevent decay) does not reliably produce long-term memory encoding. Simply holding information in STM through repetitive refreshing — as in repeating a phone number to yourself — does not produce durable long-term storage. Elaborative encoding and spaced retrieval are necessary for durable long-term memory formation.

“The 7 ± 2 rule means you should never present more than 7 items at once.”

Miller’s 7 ± 2 was a measurement of short-term storage capacity under specific experimental conditions, not a pedagogical prescription. Contemporary research suggests 4 ± 1 is more accurate for most tasks. More importantly, chunking — grouping items into meaningful units — can dramatically reduce the effective number of items that must be held in STM simultaneously. The practical limit depends on chunk size, not raw item count.

Criticisms

Short-term memory research in SLA has been critiqued for the difficulty of isolating STM from working memory — most language tasks engage both storage and processing, making pure STM measures (like digit span) of limited ecological validity for predicting language learning outcomes. The classic “7 ± 2” capacity limit has been revised downward by recent research (Cowan, 2001, suggests 4 ± 1 chunks), and the notion of fixed capacity has been challenged by evidence that capacity varies by material type and individual expertise.

Social Media Sentiment

Short-term memory is discussed in language learning communities when learners report difficulty holding L2 sentences in mind during conversation or while trying to understand complex speech. The concept surfaces in discussions about why beginners feel “cognitive overload” and why practice increases the apparent capacity for processing L2 input. Learners intuitively understand that what can be held in STM increases as proficiency develops.

Last updated: 2026-04

History

1885: Hermann Ebbinghaus implicitly documents the short-term / long-term memory distinction through his finding that very recent items are better recalled than older items — an early empirical demonstration of temporally graded memory.

1890: William James introduces the terms “primary memory” (immediately available, part of conscious present) and “secondary memory” (older knowledge requiring retrieval) in Principles of Psychology. The first explicit theoretical distinction between short-term and long-term storage.

1956: George Miller publishes “The Magical Number Seven, Plus or Minus Two,” establishing the 7 ± 2 capacity limit and the concept of “chunking.” The most-cited paper in cognitive psychology and the foundation of STM capacity research. [Miller, 1956]

1959: Lloyd and Margaret Peterson publish the Brown-Peterson paradigm, demonstrating rapid STM decay without rehearsal (90% recall at 3 seconds ? <10% at 18 seconds). The first direct experimental demonstration of STM's temporal limitations. [Peterson & Peterson, 1959]

1968: Atkinson and Shiffrin publish the “modal model” in The Psychology of Learning and Motivation, formalizing the three-stage memory model (sensory ? STM ? LTM) with transfer via rehearsal. For decades, this becomes the dominant model of human memory. [Atkinson & Shiffrin, 1968]

1974: Alan Baddeley and Graham Hitch publish the working memory model, replacing the unitary STM with a more detailed multi-component architecture. “Short-term memory” becomes less used in technical cognitive science, replaced by working memory for most purposes.

2001: Cowan’s review proposes a capacity limit of 4 ± 1 chunks (lower than Miller’s 7 ± 2), accounting for methodological differences in earlier studies. Contemporary research generally supports the lower estimate.

Practical Application

Accept that limited short-term memory capacity is normal — as L2 processing becomes more automatic, effective capacity increases
Break long sentences into smaller chunks when reading or listening to complex material
Use note-taking during listening to offload short-term memory demands
Practice with gradually increasing sentence length to build your ability to hold and process L2 input
Spaced repetition helps transfer vocabulary from fragile short-term memory to stable long-term memory through systematic review scheduling

Related Terms

Research

Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. https://doi.org/10.1037/h0043158
Summary: The foundational paper establishing the 7 ± 2 short-term memory capacity limit and introducing the concept of chunking. One of the most cited papers in all of psychology, and the origin of the quantified understanding of STM constraints.

Peterson, L.R., & Peterson, M.J. (1959). Short-term retention of individual verbal items. Journal of Experimental Psychology, 58(3), 193–198.
Summary: The Brown-Peterson paradigm demonstrating rapid STM decay without rehearsal. Established the temporal limitation of short-term storage experimentally and provided the empirical basis for distinguishing STM from LTM by duration.

Atkinson, R.C., & Shiffrin, R.M. (1968). Human memory: A proposed system and its control processes. In K.W. Spence & J.T. Spence (Eds.), The Psychology of Learning and Motivation (Vol. 2, pp. 89–195). Academic Press.
Summary: The modal model paper formalizing short-term memory within a three-stage memory architecture. Provides the theoretical framework within which STM, LTM, and the rehearsal transfer mechanism operate. The starting point for virtually all subsequent memory systems research.

Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87–114.
Summary: Proposes a revised STM capacity of 4 ± 1 chunks, reanalyzing prior studies including Miller’s to account for methodological confounds. The primary reference for the modern consensus that STM capacity is lower than Miller’s 1956 estimate.

Baddeley, A.D., & Hitch, G. (1974). Working memory. In G.H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 8, pp. 47–89). Academic Press.
Summary: Introduces working memory as a replacement for the unitary STM concept, proposing the phonological loop, visuospatial sketchpad, and central executive. Shows that short-term storage is not a single system but a multi-component architecture. The most influential paper in the history of short-term/working memory research.