Perspectives on Psychological Science
2015, Vol. 10(2) 176 –199
© The Author(s) 20


Reprints and permissions:
DOI: 10.1177/174569161556900


Whether in the classroom or on the field, the major goal
of instruction is, or at least should be, to equip learners
with knowledge or skills that are both durable and flex-
ible. We want knowledge and skills to be durable in the
sense of remaining accessible across periods of disuse
and to be flexible in the sense of being accessible in the
various contexts in which they are relevant, not simply in
contexts that match those experienced during instruction.
In other words, instruction should endeavor to facilitate
learning, which refers to the relatively permanent
changes in behavior or knowledge that support long-
term retention and transfer. Paradoxically, however, such
learning needs to be distinguished from performance,
which refers to the temporary fluctuations in behavior or
knowledge that can be observed and measured during or
immediately after the acquisition process.

The distinction between learning and performance is
crucial because there now exists overwhelming empirical
evidence showing that considerable learning can occur
in the absence of any performance gains and, conversely,
that substantial changes in performance often fail to
translate into corresponding changes in learning. Perhaps

even more compelling, certain experimental manipula-
tions have been shown to confer opposite effects on
learning and performance, such that the conditions that
produce the most errors during acquisition are often the
very conditions that produce the most learning. Such
results are regularly met with incredulity, whether in the
context of metacognitive research in which people are
asked to make judgments about their own learning or
during informal conversations with researchers, educa-
tors, and students. It is, however, the counterintuitive
nature of the learning–performance distinction that
makes it so interesting and important from both practical
and theoretical perspectives.

We provide the first integrative review of the evidence
that bears on the critical distinction between learning, as
measured by long-term retention or transfer, and perfor-
mance, as measured during acquisition. We attempt to

Corresponding Author:
Nicholas C. Soderstrom, Department of Psychology, University of
California, Los Angeles, 1285 Franz Hall, Los Angeles, CA 90095-1563

Learning Versus Performance: An
Integrative Review

Nicholas C. Soderstrom and Robert A. Bjork
Department of Psychology, University of California, Los Angeles

The primary goal of instruction should be to facilitate long-term learning—that is, to create relatively permanent
changes in comprehension, understanding, and skills of the types that will support long-term retention and transfer.
During the instruction or training process, however, what we can observe and measure is performance, which is often
an unreliable index of whether the relatively long-term changes that constitute learning have taken place. The time-
honored distinction between learning and performance dates back decades, spurred by early animal and motor-skills
research that revealed that learning can occur even when no discernible changes in performance are observed. More
recently, the converse has also been shown—specifically, that improvements in performance can fail to yield significant
learning—and, in fact, that certain manipulations can have opposite effects on learning and performance. We review
the extant literature in the motor- and verbal-learning domains that necessitates the distinction between learning and
performance. In addition, we examine research in metacognition that suggests that people often mistakenly interpret
their performance during acquisition as a reliable guide to long-term learning. These and other considerations suggest
that the learning–performance distinction is critical and has vast practical and theoretical implications.

learning, performance, memory, instruction, training, motor learning, verbal learning

Learning Versus Performance 177

synthesize research from both the motor- and verbal-
learning domains, as well as relevant work in metacogni-
tion. We note, however, that a number of other articles
provide an introduction to the learning versus perfor-
mance distinction and summarize key findings that illus-
trate the distinction (e.g., R. A. Bjork, 1999; Christina &
Bjork, 1991; Jacoby, Bjork, & Kelley, 1994; Kantak &
Winstein, 2012; Lee, 2012; Lee & Genovese, 1988; Schmidt
& Bjork, 1992; Schmidt & Lee, 2011; Wulf & Shea, 2002).
As well, we (Soderstrom & Bjork, 2013) have published
an annotated bibliography that is slated to be updated
annually so as to keep researchers, educators, and others
abreast of the newest research relevant to the topic.

Our review begins by presenting the foundational
research on which the learning–performance distinction
rests—specifically, the early work on latent learning,
overlearning, and fatigue—and we then highlight the
corresponding conceptual distinctions made by learning
theorists at that time. Next, we discuss various experi-
mental manipulations from both the motor- and verbal-
learning domains that have resulted in dissociations
between learning and performance. We then summarize
research findings in the domain of metacognition that
demonstrate that learners are prone to interpreting per-
formance during acquisition as a valid index of learning,
which can lead not only to misassessments of the degree
to which learning has happened but also to learners pre-
ferring poorer conditions of learning over better condi-
tions of learning. Finally, we present several current
theoretical perspectives that can accommodate the differ-
ence between learning and performance.

Foundational Studies

Studies conducted decades ago necessitated the distinc-
tion between learning and performance by showing that
considerable learning could occur in the absence of
changes in performance. For example, rats’ learning of a
maze could be enhanced by permitting a period of free
exploration in which their behavior seemed aimless (i.e.,
performance was irregular); additional practice trials pro-
vided after performance was at asymptote (“overlearn-
ing”) resulted in slowed forgetting and more rapid
relearning; and when fatigue stalled performance of to-
be-learned motor tasks, learning could still transpire. This
section reviews these foundational studies.

Latent learning

Latent learning is defined as learning that occurs in the
absence of any obvious reinforcement or noticeable
behavioral changes. Learning is said to be “latent,” or hid-
den, because it is not exhibited unless a reinforcement of
some kind is introduced to reveal it. Consider, for

example, a person who recently moved to a new city
and, apprehensive about driving, decides to ride the city
bus each day to work. Riding the bus day after day, the
route would be learned through observation, but such
learning would only be evident if an incentive was pres-
ent that required it—say, when it was necessary for the
person to drive to work on his or her own. The early
findings of latent learning were intriguing and controver-
sial because they challenged the widely held assumption
that learning could occur only in the presence of rein-
forcement. For a classic review of the early latent learn-
ing studies, we recommend Tolman (1948), in which the
concept of “cognitive maps” was introduced, a term that
refers to the mental representation of one’s spatial

Although first demonstrated by Blodgett (1929), Tolman
and Honzik (1930) are credited for providing what is now
considered the classic experiment on latent learning, the
results of which are reported in most textbooks on learn-
ing and memory. In their experiment, which is essentially
a replication of Blodgett’s, three groups of rats were
placed in a complex T-maze every day for a total of 17
days. One group of rats was never reinforced for reaching
the goal box—they were simply taken out of the maze
when they found it—whereas another group was rein-
forced with food every time the goal box was reached. A
third group was not rewarded for reaching the goal box
until Day 11, after which time they were regularly
rewarded. The results of this experiment are presented in
Figure 1. Unsurprisingly, the group that made the fewest
errors in finding the goal box over the 17-day period was
the regularly reinforced group, and the group that was
never reinforced made the most errors. Consistent with
the notion of latent learning, the delayed-reinforcement
group showed the same number of errors as the never-
reinforced group until Day 11—the day the food was
introduced—when an immediate improvement occurred,
dropping their error rate to a level comparable to that of
the regularly reinforced group. Thus, delaying reinforce-
ment revealed that the rats did, indeed, learn the maze
while no reinforcement was provided and their behavior
seemed rather aimless. In other words, learning occurred
when performance was stagnant.

The studies by Blodgett (1929) and Tolman and
Honzik (1930) spurred numerous follow-up experiments
on latent learning in rats, further refining our understand-
ing of this phenomenon (see Buxton, 1940; Spence &
Lippitt, 1946). Seward (1949), for example, showed that
latent learning could occur after just 30 min of free explo-
ration and, furthermore, that the amount of time spent in
the maze with no reinforcement—and thus during a time
when no changes in performance were discernible—was
positively related to learning the maze (see also Bendig,
1952; Reynolds, 1945).

178 Soderstrom, Bjork

It was also made clear decades ago that latent learning
is not limited to rats. In their influential studies, Postman
and Tuma (1954) and Stevenson (1954) showed that
latent learning is also empirically demonstrable in
humans. In Stevenson’s experiment, children—some as
young as 3 years old—explored a series of objects to find
a key that would open a box. Critically, the explored
environment also contained nonkey objects, or those that
were irrelevant to the task. The question was whether the
children would learn the locations of these peripheral
objects during the exploration of the key-relevant
objects—that is, whether the children would show latent
learning. Indeed, when the children were asked to find
the irrelevant, nonkey objects, they were relatively faster
in doing so when those objects had been contained in
the explored environment. Stevenson also found that the
amount of latent learning observed in the children
increased with age.

Overlearning and fatigue

Consider a violinist who continues to practice a musical
piece despite already being able to perform it—that is,
after acquisition performance is already at asymptote.
Such continued practice on a task after some criterion of
mastery on that task has been achieved is referred to as
“overlearning” and can be expressed by the number of
postmastery trials divided by the number of trials needed

to reach mastery. For example, if the violinist practiced a
piece 5 additional times after needing 10 practice trials to
master it, then the degree of overlearning would be 50%.
Many early studies of overlearning—starting with
Ebbinghaus’s (1885/1964) famous study using nonsense
syllables—demonstrated the power of overlearning as a
method for enhancing the long-term learning of informa-
tion and skills. Referencing these findings, Fitts (1965)
stated, “The importance of continuing practice beyond
the point in time where some . . . criterion is reached
cannot be overemphasized” (p. 195).

Krueger (1929) carried out the most frequently cited
study on overlearning. In his seminal experiment, two
groups of participants repeatedly studied lists of words
until all of the words could be recalled. At that point, the
control group was finished with the study phase, whereas
participants in the overlearning group continued study-
ing the material—in fact, they overlearned the material
by 100%, meaning that they were exposed to twice as
many study trials as the control group. On a retention test
administered up to 28 days later, the participants in the
overlearning group recalled more items than participants
in the control group, who had mastered the material dur-
ing the study phase but had not overlearned it.
Additionally, retention increased as a function of the
degree of overlearning. Subsequent research showed that
overlearning aids in the retention of more complex ver-
bal materials, such as prose passages, and accelerates the








1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17






No Reinforcement

Regular Reinforcement

Delayed Reinforcement

Fig. 1. Average number of errors rats committed while trying to find the goal box as a
function of time and reinforcement group. The arrow above Day 11 denotes when rein-
forcement (food) was introduced to the delayed-reinforcement group. (Note that lower
scores represent better performance and learning.) Data are adapted and approximated
from Tolman and Honzik (1930).

Learning Versus Performance 179

rate of relearning—that is, the amount of time required to
learn the material again after some delay (e.g., Gilbert,
1957; Postman, 1962; see also Ebbinghaus, 1885/1964).

Overlearning also benefits the learning of motor skills.
The year after Krueger (1929) demonstrated overlearning
for words, he (Krueger, 1930) showed similar benefits for
a maze-tracing task. Participants first performed the task
until they reached 100% accuracy, after which they over-
learned it by 50%, 100%, or 200%. As with the verbal
materials, the amount of overlearning was positively
related to long-term retention. Later work replicated the
benefits of overlearning for simple and more complex
motor skills (e.g., Chasey & Knowles, 1973; Melnick,
1971; Melnick, Lersten, & Lockhart, 1972), including the
assembly and disassembly of an M60 machine gun
(Schendel & Hagman, 1982). Overlearning seems to be
an effective learning tool for a wide range of tasks (for a
meta-analytic review, see Driskell, Willis, & Cooper,

Similar to research on overlearning, early work on
fatigue suggested that learning could occur even after
fatigue prevented any further gains in performance dur-
ing acquisition. Adams and Reynolds (1954), for example,
had basic trainees from the Air Force learn a rotary pur-
suit task, which requires one to manually track a target
on a revolving wheel with a wand. Varying the length of
rest intervals between trials showed that when fatigue
limited or eliminated gains in performance, learning
nonetheless occurred, as revealed by a subsequent test
on the task after the fatigue had dissipated. Fifteen years
later, Stelmach (1969) examined how different training
schedules affect learning and performance on a ladder-
climbing task. One group of participants practiced more
than they rested; another group rested more than they
practiced. Performance during training, which was
defined as the number of rungs climbed on a given trial,
favored the group that was permitted more interpolated
rest. This finding is not surprising given that the other
group, as a result of receiving little rest between trials,
became increasingly fatigued during the training. After a
delay, however, a retention test revealed that the group
that received little rest caught up to the well-rested group,
ostensibly demonstrating that substantial learning had
occurred when fatigue had stifled any gains in short-term

Corresponding conceptual distinctions

The early experiments on latent learning, overlearning,
and fatigue, plus other considerations, led early learning
theorists (e.g., Estes, 1955a; Guthrie, 1952; Hull, 1943;
Skinner, 1938; Tolman, 1932) to distinguish between
behaviors that can be observed during training, or acqui-
sition (i.e., performance), and the relatively permanent

changes that occur in the capability for exhibiting those
behaviors in the future (i.e., learning). Hull used the
terms habit strength of a response and the momentary
reaction potential of that response; Estes, in his fluctua-
tion model, referred to habit strength and response
strength; and Skinner differentiated between reflex
reserve and reflex strength. Empirically, habit strength, or
reflex reserve (i.e., learning), was assumed to be indexed
by resistance to extinction or forgetting, or by the rapidity
of relearning, whereas momentary reaction potential,
response strength, or reflex strength (i.e., performance)
was assumed to be indexed by the current probability,
rate, or latency of a response.

In the domain of human verbal learning, Tulving and
Pearlstone’s (1966) distinction between “availability” (i.e.,
what is stored in memory) and “accessibility” (i.e., what
is retrievable at any given time) also maps, albeit not
perfectly, onto learning and performance, respectively.
Finally, R. A. Bjork and Bjork (1992), in an effort to
account for a wide range of findings in research on
human verbal and motor learning, formulated a new the-
ory of learning in which the distinction between learning
and performance is indexed by storage strength and
retrieval strength, respectively. This account, as well as
other contemporary theoretical perspectives regarding
the learning–performance distinction, is discussed later.


The learning versus performance distinction can be
traced back decades when researchers of latent learning,
overlearning, and fatigue demonstrated that long-lasting
learning could occur while training or acquisition perfor-
mance provided no indication that learning was actually
taking place. The results of latent learning studies, in par-
ticular, were both compelling and controversial at the
time because they verified that, although reinforcement is
necessary to reveal learning, it is not required to induce
learning. In sum, this early work showed learning with-
out performance. In the next several sections, we review
more recent evidence showing that the converse is also
true—specifically, that gains in performance often impede
posttraining learning compared with those conditions
that induce more performance errors.

Distribution of Practice

The dissociation between learning and performance has
been repeatedly found by manipulating the study sched-
ules of to-be-learned skills or information. Massing prac-
tice or study sessions—that is, practicing or studying the
same thing over and over again—usually benefits short-
term performance, whereas distributing practice or
study—that is, separating practice or study sessions with

180 Soderstrom, Bjork

time or other activities—usually facilitates long-term
learning. This section presents, in turn, experiments from
the motor- and verbal-learning domains in which the dis-
tribution of practice was shown to have differential influ-
ences on learning and performance.

Motor learning

Suppose a swimmer wishes to improve his or her front,
back, and butterfly strokes. Suppose further that the
swimmer’s training is restricted to 1 hr per day. One train-
ing option would be to mass (or block) the different
strokes by practicing each for 20 min before moving on
to the next, never returning to the previously practiced
strokes during that training session. Alternatively, he or
she might distribute (or randomize) the practice schedule
such that each stroke is practiced for 10 min before mov-
ing on to the next stroke. This schedule would permit
each stroke to be revisited one more time during the
training session. In this section, we review research that
suggests that, whereas massing practice might promote
rapid performance gains during training, distributing
practice facilitates long-term retention of that skill.

Baddeley and Longman (1978) and J. B. Shea and
Morgan (1979) published two classic studies that showed
that distributing practice has differential effects on learn-
ing and performance of a simple motor skill.
Commissioned by the British Postal Service, Baddeley
and Longman investigated how to optimize postal work-
ers’ ability to type newly introduced postcodes on the
keyboard. The question was whether the postal workers
should learn the new system as rapidly as possible, prac-
ticing several hours per day, or whether learning would
profit most if practice was more distributed. Varying the
amount of practice per day and the number of days in
which practice occurred, more distributed practice fos-
tered more effective learning of the typewriter keystrokes;
however, the opposite was true in regard to the efficiency
in which the skill was acquired, as measured by the num-
ber of days to reach criterion versus the number of hours
to reach criterion—that is, the distributed group required
more days to reach any given level of performance rela-
tive to the massed group. In sum, massed practice sup-
ported quicker acquisition of the keystrokes, but
distributed practice led to better long-term retention of
the skill (see also Simon & Bjork, 2001).

J. B. Shea and Morgan (1979) also showed that distrib-
uting practice benefits the long-term retention of a motor
skill. In their seminal experiment, participants learned
three different movement patterns, each of which
involved knocking over three (of six) small wooden bar-
riers in a prescribed order. Two different practice sched-
ules were implemented: blocked and random. In the
blocked-practice condition, each of the three movement

patterns was practiced for 18 trials in succession, whereas
in the random-practice condition, the 18 trials of each
pattern were intermingled among the trials on the other
patterns in a way that was unpredictable from a partici-
pant’s standpoint. Importantly, therefore, practice time
for the three tasks was equated across the two different
practice conditions. Of interest was how the different
practice schedules affected the rapidity in which the arm
movements were executed.

The results of J. B. Shea and Morgan (1979) are shown
in Figure 2. First, it is clear that during acquisition, partici-
pants assigned to the blocked-practice condition per-
formed better than those in the random-practice
condition, as evidenced by shorter times required to per-
form the arm movements. But how well would each
group retain the acquired skills? On retention tests given
after 10 min and 10 days, participants were tested on
each skill in either a blocked (B) or random (R) fashion,
which produced four subgroups of participants: B-B,
B-R, R-B, and R-R. The first letter in the pair denotes how
practice was scheduled during acquisition; likewise, the
second letter denotes how each group was tested. As can
be seen in Figure 2, the advantage of blocked practice
during acquisition was no longer evident after a delay. In
fact, the pattern reversed when learning was assessed
after 10 min and 10 days—that is, overall, those who ini-
tially practiced the skills in a random order exhibited the
most learning. Comparing the groups that were tested in
a blocked fashion (B-B vs. R-B), the study showed that
random practice during acquisition was better than
blocked practice during acquisition. This pattern was
dramatically demonstrated when researchers compared
the groups that were tested in a random order (B-R vs.
R-R) on the delayed test. Thus, similar to Baddeley and
Longman (1978), Shea and Morgan showed that blocking
practice of several to-be-learned movement patterns
facilitated acquisition performance, whereas interleaving
practice of those same movements promoted long-term
retention (see Lee & Magill, 1983, for a replication and
extension of these findings). Although not shown in
Figure 2, Shea and Morgan also found a transfer advan-
tage of interleaving, such that participants who initially
practiced the skill in a random fashion were relatively
better in executing a new response pattern—that is, one
that had not been practiced.

J. B. Shea and Morgan’s (1979) results, together with
the multiple subsequent demonstrations that interleaving
separate to-be-learned tasks can enhance long-term
retention, serve as one example of a broader finding,
referred to by Battig (1979) as contextual interference
effects. Primarily on the basis of findings from verbal
paired-associate learning tasks (see Battig, 1962, 1972),
Battig proposed that conditions during acquisition that
act to increase the possible interference between

Learning Versus Performance 181

separate to-be-learned tasks can enhance long-term
retention and transfer, despite their depressing effects on
performance during the acquisition process. Randomly
intermixing the trials on separate to-be-learned tasks,
such as, say, the forehand, backhand, and serve strokes
in tennis, increases the interference between the compo-
nents of those strokes but then can enhance long-term
retention of those skills.

The results of J. B. Shea and Morgan (1979) spurred
many follow-up studies, many of which were field based
and examined more complex motor skills. In one such
study, badminton players learned three different types of
serves from one side of the court under blocked or ran-
domly interleaved practice schedules. After a retention
interval, the players were tested on the serves from both
the same and opposite side of the court from which the
serves were practiced. The blocked group performed
better during training, but the interleaved group showed
better long-term retention, whether tested on the same or
opposite side of the court (S. Goode & Magill, 1986).
Thus, not only does distributing practice enhance the
retention of the specific skill that is practiced, but it also
fosters better transfer of that skill—that is, the application
of the skill in a different context. The learning benefits
promoted by distributed practice have also been demon-
strated for learning to hit pitches of different types in
baseball (Hall, Domingues, & Cavazos, 1994) and piano
pieces (Abushanab & Bishara, 2013) and for both chil-
dren (e.g., Ste-Marie, Clark, Findlay, & Latimer, 2004) and
older adults (e.g., Lin, Wu, Udompholkul, & Knowlton,

2010). For reviews of the effects of distributed practice on
motor skills, both simple and complex, we recommend
Lee (2012) and Merbah and Meulemans (2011).

Verbal learning

As in the motor domain, empirical evidence from verbal
tasks suggests that distributing (or spacing) study oppor-
tunities benefits learning relative to massing them, a find-
ing in the verbal literature termed the spacing effect. The
first to demonstrate the spacing effect, Ebbinghaus
(1885/1964) showed that spacing study opportunities, as
opposed to massing them, rendered the material more
resistant to forgetting. Decades later, now-classic articles
were published on the topic (e.g., Battig, 1966; Madigan,
1969; Melton, 1970). For example, and particularly rele-
vant to the current review, Peterson, Wampler, Kirkpatrick,
and Saltzman (1963) were the first to observe that massed
items are often retained better in the short term (i.e.,
spacing impairs performance), whereas spaced items are
retained better over the long term (i.e., spacing enhances
learning; see also Glenberg, 1977). Since then, hundreds
of experiments have demonstrated the spacing effect to
be highly robust and reliable (for reviews, see Cepeda,
Pashler, Vul, Wixted, & Rohrer, 2006; Dempster, 1988).
We now selectively review evidence of the spacing effect
and how this experimental manipulation bears on the
learning–performance distinction. We note that we have
grouped together situations in which spacing is achieved
in two different ways: (a) by inserting periods of rest or










1 2 3 4 5 6 10 min 10 day




t T




Acquisition Retention






Fig. 2. Mean movement time as a function of practice schedule during acquisition and on the
retention tests administered 10 min and 10 days later. For the retention data, the first letter in
the pair denotes how practice was scheduled during acquisition (B or R, for blocked or random,
respectively); the second letter denotes how each group was tested. (Note that lower scores
represent better performance and learning.) Data are adapted and approximated from J. B. Shea
and Morgan (1979).

182 Soderstrom, Bjork

unrelated activity between repetitions of to-be-learned
information or procedures; and (b) by interleaving the
study or practice trials of several different—and possibly
interfering—to-be-learned tasks or verbal materials. A
currently active issue, however, is whether the benefits of
interleaving go beyond the benefits of the spacing such
interleaving introduces (see, e.g., Birnbaum, Kornell,
Bjork, & Bjork, 2013; Kang & Pashler, 2012).

The majority of studies examining the spacing effect
have done so using relatively simple to-be-learned mate-
rials, such as single words or paired associates. In one
study, for example, high school students learned French–
English vocabulary pairs (e.g., l’avocat—lawyer) under
conditions of either massed practiced, in which the pairs
were studied for 30 consecutive minutes on one day, or
spaced (distributed) practice, in which the pairs were
studied for 10 min on each of three consecutive days. On
an initial test that was administered immediately follow-
ing each practice schedule—after the 30-min study ses-
sion for the massed group and after the third 10-min
study session for the spaced group—virtually identical
short-term performance was observed. However, on a
long-term retention test administered 7 days later, partici-
pants who had spaced their study recalled more pairs
than participants who had massed their study (Bloom &
Shuell, 1981). Similarly, spacing study sessions, relative to
massing them, can actually slow down the acquisition of
foreign language vocabulary pairs but can still lead to
superior retention—even over a span of several years
(Bahrick, Bahrick, Bahrick, & Bahrick, 1993).

Age-related differences in the spacing effect have also
been examined. For example, both younger (ages 18–25)
and older (ages 61–76) adults studied unrelated paired
associates (e.g., kitten–dime) multiple times according to
either a massed or spaced presentation schedule. Using a
continuous cued-recall paradigm, each item was tested
after either 2 (short retention) or 20 (long retention)
intervening items were presented following the item’s last
presentation. An unsurprising finding was that older
adults performed worse, overall, compared with their
younger counterparts. More interesting, and relevant to
the learning–performance distinction, both age groups
exhibited a spacing-by-retention-interval interaction—
that is, short-term retention (i.e., performance) favored
the massed items, whereas long-term retention (i.e.,
learning) favored the spaced items (Balota, Duchek, &
Paullin, 1989).

In addition to fostering better retention of simple mate-
rials, spacing also improves the learning of more complex
materials, such as prose passages (e.g., Rawson & Kintsch,
2005), and the learning of higher-level concepts, such as
logic (Carlson & Yaure, 1990) and inductive reasoning
(e.g., Kang & Pashler, 2012; Kornell & Bjork, 2008; Kornell,
Castel, Eich, & Bjork, 2010). A particularly striking example

showed that spacing various types of math problems, as
opposed to massing them, facilitates learning. Participants’
task was to learn how to find the geometric volume of
four differently shaped objects. One group worked
through the practice problems according to a blocked
schedule, such that four problems for one object were
attempted before moving on to four problems for the next
object, and so on; the other group worked through the
problems for various shapes in a randomly mixed order.
Participants were then tested on the problems 1 week
later. During the practice phase, participants were able to
solve more of the problems if those problems were prac-
ticed in a blocked fashion—that is, massing improved
performance. This pattern reversed, however, on the long-
term retention test: Participants better retained the ability
to solve the problems if those problems were practiced 1
week earlier in a mixed format—that is, spacing enhanced
learning (Rohrer & Taylor, 2007). These results exemplify
the distinction between learning and performance (see
also Rohrer, Dedrick, & Burgess, 2014; Taylor & Rohrer,


Evidence from the motor- and verbal-learning domains
demonstrates that long-term learning profits from distrib-
uting (spacing) the practice of to-be-learned skills or
information with time or other intervening activities. In
the short term, however, massed practice is often better.
Thus, whether one wishes to learn how to type, play
badminton, speak a foreign language, or solve geometry
problems, one should consider implementing a distrib-
uted practice schedule, even if such a schedule might
induce more errors during practice or acquisition.

Variability of Practice

Similar to distributing practice, varying the conditions of
practice or study sessions—for example, by having a
trainee practice skills related to but different from the
target skill—can also have detrimental effects on perfor-
mance during acquisition but then foster long-term learn-
ing and transfer. Most of the research in this vein has
focused on motor learning, although a handful of studies
on verbal learning have also demonstrated the long-term
benefits of practice variability. We now review research
from both the motor- and verbal-learning traditions that
has shown dissociable effects of practice variability on
learning and performance.

Motor learning

Research on motor learning and practice variability sug-
gests that if a basketball player, for example, wants to

Learning Versus Performance 183

shoot accurate free throws, he or she should not only
practice from the foul line itself but also from various
positions neighboring the foul line. Such variable prac-
tice might not appear to be effective during practice—
specifically, more performance errors would likely be
induced relative to shooting only from the foul line—but
would facilitate long-term learning. As discussed in more
detail later, varying the conditions of practice seems to be
effective for learning because it enables one to become
familiar with, and learn to manipulate the parameters of,
the general motor program underlying some skill, like
shooting a basketball (Schmidt, 1975). We now discuss
several findings from the motor-learning domain suggest-
ing that increasing practice variability, while potentially
inducing more errors during training, or acquisition, also
has the potential to confer long-term learning benefits
(for a review, see Guadagnoli & Lee, 2004).

In their important article, Kerr and Booth (1978) pro-
vided compelling evidence that varying the conditions of
practice, as opposed to keeping them fixed, can boost
long-term learning of a motor skill. In their study, chil-
dren tossed beanbags at a target on the floor from dis-
tances of 2 and 4 feet (varied practice) or only 3 feet
(fixed practice). After a delay, all participants were tested
from a distance of 3 feet, the sole distance practiced by
participants in the fixed-practice group. Intuition would
suggest that participants in the fixed-practice group, who
exclusively practiced from the tested distance, would do
better than those in the varied-practice group, who never
practiced at the tested distance. The results, however,
showed the opposite pattern: Varying the practice dis-
tances led to more accurate tosses from 3 feet away on
the final test, showcasing the benefits of variable practice
in producing transfer of a motor skill, a result that has
been replicated and extended (e.g., Pigott & Shapiro,
1984; Roller, Cohen, Kimball, & Bloomberg, 2001; Wulf,

Subsequent research found that variable practice can
foster the learning of other complex motor skills, such as
shooting a basketball (Landin, Hebert, & Fairweather,
1993) and mastering a forehand racket skill (Green,
Whitehead, & Sugden, 1995). In the basketball study, two
groups of participants practiced shooting basketball free
throws over a period of 3 days. In the fixed-practice con-
dition, participants shot the free throws exclusively from
the criterion distance of 12 feet, whereas participants in
the variable-practice condition shot from the criterion
distance as well as from two other distances (8 feet and
15 feet). It is important to note that the total number of
free throws (120) practiced by both groups was equated.
The retention test, administered 72 hr after the practice
phase, consisted of participants shooting 10 free throws
from the criterion distance (12 feet). Again, the common-
sense prediction would be that participants in the

fixed-practice condition would make more free throws
on the final test because they practiced more free throws
from that distance compared with participants in the vari-
able-practice condition. The counterintuitive finding,
however, was that participants in the variable-practice
condition made more free throws on the delayed-reten-
tion test, suggesting that practicing from multiple loca-
tions engendered more familiarity with the general motor
program underlying the skill.

The learning of simpler motor skills has also been
repeatedly shown to benefit from varying the conditions
of practice, even in cases when such practice has detri-
mental effects on acquisition performance. Many of the
studies that have produced this learning–performance
interaction effect have examined timing skills (e.g.,
Catalano & Kleiner, 1984; Hall & Magill, 1995; Lee, Magill,
& Weeks, 1985; Wrisberg & Mead, 1983; Wulf & Schmidt,
1988). For example, in one study, participants attempted
to knock over a barrier with their arm from a given start-
ing point, with the goal of doing so in precisely 200 ms.
A variable group practiced from four different starting
points (15, 35, 60, and 65 cm), whereas a constant group
always practiced from the same starting point (e.g., 60
cm). As displayed in Figure 3, the variable group per-
formed worse than the constant group during acquisi-
tion, producing more absolute errors when attempting to
execute the arm movement in the target time of 200 ms,
yet showed better learning on subsequent immediate and
delayed (1 day) transfer tests in which a new starting
point (50 cm) was tested (McCracken & Stelmach, 1977).
A similar learning–performance interaction was shown in
a study that examined the effects of variable practice in
learning a criterion handgrip force. Compared with those
who practiced solely to reach the criterion force, partici-
pants who practiced additional handgrip forces per-
formed worse during acquisition at reaching the criterion
force but were more accurate in producing the criterion
force after a delay (C. H. Shea & Kohl, 1991; see also C.
H. Shea & Kohl, 1990).

Verbal learning

One long-standing and widespread piece of advice regu-
larly given to students is to find a quiet location—say, a
favorite corner of the library—and to study there on a
consistent basis. Keeping study conditions constant, it is
thought, benefits learning. However, analogous to find-
ings in the motor domain, studies have shown that induc-
ing variation during study sessions—for example, by
varying the environmental context in which to-be-
remembered material is studied or increasing the varia-
tion of to-be-solved problems—can also benefit verbal
learning. Inducing such variation often has negligible
effects on acquisition performance, or may even impede

184 Soderstrom, Bjork

it, but it can enhance long-term learning because the
material becomes associated with a greater range of
memory cues that serve to facilitate access to that mate-
rial later. Several studies in the verbal-learning tradition
have demonstrated this empirically.

One study examined whether varying the physical
context, or environment, in which material is studied can
bolster learning when that material is tested in a new con-
text, an issue that remains relevant given that modern
standardized tests (e.g., SAT, GRE) are often administered
in unfamiliar locations. The participants first studied a list
of 40 words. Half the participants studied the list in Room
A, a particular location on the University of Michigan cam-
pus; the other participants studied the list in Room B, a
different location on the Michigan campus. Three hours
later, half of the participants in each group restudied the
words again in the same room, whereas the other partici-
pants studied the list again in the other location. On the
final test, administered 3 hr after the second study session,
all participants were tested on the words in a neutral loca-
tion, Room C. Strikingly, participants who studied in dif-
ferent rooms recalled approximately 21% more of the
words than participants who studied in the same room,
demonstrating the mnemonic benefit of variable practice
(Smith, Glenberg, & Bjork, 1978). Thus, if participants
were tested in a novel location, varying the physical study
environments bolstered learning, a finding that was later
replicated using the same or similar materials (Glenberg,
1979; Smith, 1982). Another study replicated this finding

with more complex learning material by showing that
participants’ 5-day retention of statistical concepts was
better when it occurred after four successive lectures
given in four different locations as opposed to when all of
the lectures were given in the same location. Performance
on short-term retention tests administered immediately
after each statistics lecture, however, was similar for both
groups (Smith & Rothkopf, 1984).

In addition to increasing the variation of the environ-
mental context, long-term learning and transfer but not
necessarily short-term performance can also profit from
increasing the variation of problems during an acquisi-
tion phase. For example, in a study that examined the
effects of variable practice on a task that involved trou-
bleshooting a computer-based simulation of a chemical
process plant, participants produced a pattern of results
indicative of a “transfer paradox.” Specifically, highly vari-
able practice problems, relative to low-variability prob-
lems, induced more performance errors during practice
but had positive effects on learning, as evidenced by the
number of new problems solved on a later test (Van
Merrienboer, de Croock, & Jelsma, 1997). Such encoding
variability has also been shown to enhance analogical
reasoning (Gick & Holyoak, 1983) and geometrical prob-
lem solving (Paas & Van Merrienboer, 1994), as well as
the retention of text material (Mannes & Kintsch, 1987)
and face–name pairs (Smith & Handy, 2014).

In yet another study that demonstrated learning bene-
fits of variable practice with verbal materials, participants







Last 30 Acquisiton

Immediate Transfer

Delayed Transfer




r (


Constant Group Variable Group

Fig. 3. Absolute timing errors in performing a ballistic timing task during the acquisition
phase and on immediate and delayed transfer tests as a function of practice condition (con-
stant vs. variable). (Note that lower scores represent better performance and learning.). Data
are adapted and approximated from McCracken and Stelmach (1977).

Learning Versus Performance 185

practiced solving anagrams by either repeatedly solving
the anagram that was tested later (e.g., LDOOF was solved
three times during the practice phase and appeared on
the test) or solving multiple versions of the anagram that
was tested later (e.g., DOLOF, FOLOD, and OOFLD were
practiced and LDOOF appeared on the test). Of interest
was whether solving multiple variants of the anagram—
that is, increasing the variability of the problems—would
enhance participants’ ability to solve the anagrams later.
Indeed, despite participants in the variable practice condi-
tion taking relatively longer to solve the anagrams during
the practice phase, revealing a short-term performance
decrement, they solved relatively more of the anagrams
on a later test (M. K. Goode, Geraci, & Roediger, 2008).
Like several of the results reviewed in the previous sec-
tion on motor learning, this result is counterintuitive
because the variable practice group never attempted to
solve the specific anagram that was later tested, whereas
the other group solved it three times during the practice
phase. Thus, these results conceptually replicate the out-
come of Kerr and Booth’s (1978) motor-learning experi-
ment, in which tossing beanbags at a target from various
nontested distances was better for learning than practic-
ing those tosses from the tested distance.


The long-term retention and transfer of motor skills—
both simple and complex—often profit from the type of
practice that entails one to perform multiple iterations,
rather than a single iteration, of those motor skills, despite
such practice potentially having negligible or even nega-
tive effects on performance during training. The same
has been revealed in verbal-learning experiments that
have increased the variation of to-be-solved problems or
varied the environmental context in which to-be-remem-
bered material was studied. Variable practice, it seems,
broadens one’s familiarity with the general underlying
motor skill or knowledge base needed to successfully
perform a task.

Retrieval Practice

Decades of research suggest that the retrieval processes
triggered by testing actually changes the retrieved infor-
mation in important ways. That is, tests act not only as
passive assessments of what is stored in memory (as is
often the traditional perspective in education) but also
as vehicles that modify what is stored in memory. This
section reviews evidence from both the motor- and ver-
bal-learning domains that lead to such a conclusion. In
the motor-skills literature, for example, to-be-learned
movements that are self-produced are typically better
learned than those that are externally guided or simply

observed. Likewise, testing one’s memory for verbal
information, or having participants generate the informa-
tion themselves, enhances long-term retention of that
material compared with reading it over and over, even in
cases when corrective feedback is not provided. A criti-
cal finding, relevant to the learning–performance distinc-
tion, is that conditions of retrieval practice that often
facilitate long-term retention frequently may appear
unhelpful in the short term compared with their counter-
part conditions.

Motor learning

When teaching a motor skill, such as a gymnastics flip or
a golf swing, it is commonplace for instructors to physi-
cally guide the learner through the desired motions.
Intuition suggests that this type of instruction should be
beneficial; indeed, research has shown that guiding
learners reduces performance errors during acquisition
compared with when learners attempt to produce the
skill without guidance (i.e., are encouraged to retrieve
the skill on their own). The problem is that on assess-
ments of long-term learning when guidance can no lon-
ger be relied on, the reverse is often true—that is,
practicing a skill without guidance frequently produces
better learning than does being guided during acquisition
(for a review on guidance research in motor-related tasks,
see Hodges & Campagnaro, 2012). The long-term learn-
ing of motor skills, but not necessarily short-term perfor-
mance, also profits from a test (as opposed to a restudy
opportunity) and when learners are permitted to gener-
ate their own to-be-remembered motor skills (as opposed
to when the skills are chosen for them).

Early research on the effects of guidance (e.g., Melcher,
1934; Waters, 1930) showed that providing physical assis-
tance during the acquisition of simple to-be-learned
movements had positive effects when participants were
subsequently asked to perform those movements on
their own, suggesting that learning profits from initial
guidance. However, the retention intervals in these stud-
ies were particularly short, and thus any claims of long-
term learning were tenuous. It was not until decades later
that the first studies to examine the long-term effects of
guidance emerged. In one such study, participants prac-
ticed a joystick pursuit-tracking task while either being
physically guided by another person or not. The guided
group outperformed the unguided group during training
and on initial short-term performance tests, but on a later
retention test administered 6 weeks later, the unguided
group demonstrated better learning than the guided
group. Furthermore, the guided group failed to show bet-
ter retention than a group of participants who had never
performed the task but simply watched (Baker, 1968; see
also Armstrong, 1970).

186 Soderstrom, Bjork

Subsequent research has replicated and extended the
learning and performance effects of guidance. For exam-
ple, on a task that involved manipulating a lever to vari-
ous positions, a physically guided group performed
better during acquisition (i.e., made fewer performance
errors) but worse after a retention interval, relative to an
unguided group (Winstein, Pohl, & Lewthwaite, 1994).
Likewise, during training of a bimanual coordination task
that involved arm extensions, guided practice prevented
performance errors; however, it also yielded less long-
term learning compared with conditions in which partial
guidance or no guidance was provided (Feijen, Hodges,
& Beek, 2010; see also Tsutsui & Imanaka, 2003). Finally,
in a study that examined whether a harness could serve
as an aid to properly modify the bowling technique
involved in the sport of cricket, it was found that the
restriction applied by the harness improved techniques
in the short term but failed to yield any long-term learn-
ing benefits, compared with when no harness was used
(Wallis, Elliot, & Koh, 2002). Clearly, guidance during
training can have differential effects on learning and

Another, rather simple way to examine the effects of
retrieval practice on learning and performance is to allow
learners to first observe the to-be-learned skill and then
either test the learners (i.e., require them to reproduce, or
retrieve, the skill on their own) or present the skill again
without the requirement to reproduce it. A subsequently
administered test could then reveal whether retrieval
practice, relative to re-presentation trials, enhances learn-
ing. It is surprising that scant empirical work in the
motor-learning domain has used this sort of method to
better understand the potential benefits of retrieval prac-
tice. Representing a notable exception, one study exam-
ined the effects of retrieval practice on learning an
arm-positioning task. After an initial presentation of to-
be-learned positions, participants either were tested on
the positions several times or were simply re-presented
with them over and over without being tested. Participants
who engaged in retrieval practice showed better long-
term retention of the arm positions than those who sim-
ply observed the positions multiple times. The opposite
was true, however, when performance was assessed dur-
ing acquisition (Hagman, 1983). Subsequent motor-skills
research replicated the long-term learning benefits con-
ferred by this type of retrieval practice (i.e., testing vs.
restudying; Boutin et al., 2012; Boutin, Panzer, & Blandin,

Finally, the learning of motor skills profits from
another form of retrieval practice—namely, permitting
learners to generate their own to-be-learned movements
as opposed to the movements being selected for them.
In one of the earliest and most convincing demonstra-
tions of this preselection effect, participants reproduced

rapid arm movements that were either previously
selected by themselves or imposed by the experimenter.
Retention of the arm movements—in terms of both
rapidity and precision—favored the selection group,
even though no indicators of such long-term learning
could be gleaned from the acquisition phase (Stelmach,
Kelso, & Wallace, 1975). The preselection effect quickly
emerged as one of the most robust and reliable effects in
the motor-learning literature (see also Martenuik, 1973;
for an early review, see Kelso & Wallace, 1978).

Verbal learning

Similar to the research in the motor-learning domain,
empirical work investigating retrieval practice (or testing)
of verbal material dates back decades, out of which has
emerged the consensus that retrieving information from
memory does more than simply reveal that the informa-
tion exists in memory. In fact, the act of retrieval is a
“memory modifier” (R. A. Bjork, 1975) in the sense that it
renders the successfully retrieved information more
recallable in the future than it would have been other-
wise, a finding that has been termed the testing effect,
which has been demonstrated across the life span using
a wide range of materials and outcome measures (for
reviews, see Carpenter, 2012; Roediger & Butler, 2011;
Roediger & Karpicke, 2006a). In other words, retrieval
practice is itself a potent learning event. In the short term,
however, retrieval practice often appears to fail to confer
any mnemonic benefits compared with conditions in
which the material is restudied instead of tested. We now
consider work on retrieval practice in the verbal-learning
domain that has necessitated the distinction between
learning and performance.

Although the first large-scale studies on the testing
effect can be traced back to Gates (1917) and Spitzer
(1939), it was not until the 1970s that researchers pro-
vided compelling evidence that retrieval practice can
have differential effects on learning and performance. In
one study, for example, participants studied 40 single
words either three times before taking a free-recall test
(SSST) or once before taking three free-recall tests (STTT).
During the fourth phase of this procedure in which both
groups were tested, participants assigned to the SSST
condition showed greater short-term recall performance
than those in the STTT condition—in other words,
repeated studying was better than repeated testing. Long-
term recall assessed 2 days later, however, favored the
STTT condition (Hogan & Kintsch, 1971). These findings
were later replicated and extended in a study that found
that repeated studying led to better recall than repeated
testing after 5 min (50% vs. 28%) but that repeated testing
trumped repeated studying, albeit only slightly, on a
delayed recall test administered 2 days later (25% vs.

Learning Versus Performance 187

23%; Thompson, Wenger, & Bartling, 1978). That same
year, expanded-interval testing schedules were found to
produce better recall of to-be-learned names than equal-
interval testing schedules, but both of these conditions
led to better long-term learning than did a massed-testing
condition, in which several tests were administered in
succession immediately after the presentation of a given
name, a condition that showed nearly errorless perfor-
mance during the acquisition phase (Landauer & Bjork,

In another study that provided a convincing demon-
stration of a learning-performance interaction as it relates
to retrieval practice, one group of participants (repeated
study) studied a 40-word list five consecutive times
(SSSSS), whereas another group (repeated test) studied
the list once before four consecutive recall tests (STTTT).
Final recall tests were then administered to different
groups of participants (from each group) after 5 min or 1
week. The repeated-study group outperformed the
repeated-test group by a large margin on the immediate
(5 min) test, but on the delayed (1 week) test, the oppo-
site pattern was observed—specifically, repeated testing
led to better long-term retention than did repeated study-
ing. It was also clear that testing helped stabilize memory,
as forgetting over time was far more pronounced in the
repeated-study group than the repeated-test group. When
specifically considering 1-week recall as a percentage of
5-min recall, researchers found that repeated studying
and repeated testing were associated with approximately
75% and 30% forgetting, respectively (Wheeler, Ewers, &
Buonanno, 2003).

Thus far, we have reviewed studies on the testing
effect that have used relatively simple learning materials
(e.g., single words, word pairs); however, it is also clear
that retrieval practice can have differential effects on
learning and performance when more educationally rel-
evant materials are used. One such study involved par-
ticipants first studying prose passages covering general
topics, such as the sun and sea otters. In one condition,
participants then restudied the passage in its entirety,
whereas in another condition, participants were tested,
without feedback, for their ability to recall the studied
material. Final recall tests were then administered to dif-
ferent groups of participants from each condition after 5
min, 2 days, or 1 week. The results, which are shown in
Figure 4, are clear. After 5 min, participants who restud-
ied the passage showed better recall performance than
did participants who took an intervening test without
feedback. On the delayed retention tests, however, there
was a significant reversal such that the tested group
recalled more of the material after 2 days and 1 week
than the restudy group, a finding that was subsequently
replicated and extended in the study’s second experi-
ment (Roediger & Karpicke, 2006b). What makes the
results of this study (and others like it) particularly
impressive is that no feedback was given to participants
in the tested condition during the initial test, which
means that participants in the test condition were reex-
posed only to the material they were initially able to
recall—approximately 70% of the passage—whereas par-
ticipants in the restudy condition were reexposed to the
entire passage before the final retention tests. Despite
this disadvantage, participants in the tested group
retained more information over the long term.

Research on the generation effect, a closely related
phenomenon to the testing effect, also points to the long-
term learning benefits of retrieval practice (for important
differences between the generation effect and the testing
effect, see Karpicke & Zaromb, 2010). In a typical genera-
tion experiment, participants are asked to either generate
the to-be-learned items themselves—for example, by
producing opposites when presented with a word (e.g.,
hot–???)—or to simply read the items (e.g., long–short). A
later retention test is then administered, which usually
consists of presenting the cues (hot–???, long–???) and
asking participants to recall their corresponding targets
(cold, short). Slamecka and Graf (1978; see also Jacoby,
1978) are often credited as the first to demonstrate that
generating items from semantic memory is better for
learning than simply reading them, a finding that has
been replicated hundreds of times using various materi-
als, procedures, and outcome measures (for a review, see
Bertsch, Pesta, Wiscott, & McDaniel, 2007). For current
purposes, it is important to note that unless participants
can successfully generate every to-be-generated item













5 Minutes 2 Days 1 Week









Retention Interval

Study, Study

Study, Test

Fig. 4. Proportion of idea units correctly recalled on immediate (5
min) and delayed (2 days and 1 week) retention tests after participants
studied the passages either twice or once before taking an initial test.
(Note that higher scores represent better performance and learning.)
Error bars represent standard errors of the means. Data are adapted
from Roediger and Karpicke (2006b).

188 Soderstrom, Bjork

during the study phase (which almost never happens),
generated items will always be associated with worse
acquisition performance than read items if a test was
given immediately after each item. This is because, simi-
lar to unsuccessful retrieval attempts in testing-effect
studies, unsuccessful generation attempts prevent expo-
sure to the material that will be tested later. Despite this
short-term performance hindrance, generation still
enhances long-term learning.

Even more compelling evidence in favor of the learn-
ing–performance distinction comes from research that
has revealed that learning can profit from generation
attempts that are assured to be incorrect during acquisi-
tion, a phenomenon that was demonstrated some time
ago (Kane & Anderson, 1978; Slamecka & Fevreiski,
1983) and is now garnering considerable empirical atten-
tion once again (Grimaldi & Karpicke, 2012; Hays,
Kornell, & Bjork, 2013; Huelser & Metcalfe, 2012; Knight,
Ball, Brewer, DeWitt, & Marsh, 2012; Kornell, Hays, &
Bjork, 2009; Potts & Shanks, 2014; Yan, Yu, Garcia, &
Bjork, 2014).

This resurgence in interest in the potential benefits of
failed generation was spurred by research using a para-
digm in which participants study weakly related word
pairs, some of which are presented intact (e.g., whale–
mammal) for study, whereas the others require that the
participants, on the basis of the cue by itself (e.g., whale–
???), first try to predict the to-be-learned response.
Critically, by choosing weakly related pairs as the materi-
als, experimenters can ensure that participants almost
always fail to guess the correct target. That is, when pre-
sented with whale, participants will almost always gener-
ate something other than mammal (e.g., big, ocean,
blue). Nevertheless, across multiple experimental designs
using this paradigm, failed retrieval attempts prior to
encoding were found to enhance learning (Kornell et al.,
2009). One possible explanation for this effect is that
attempting to predict the to-be-learned response acti-
vates the broad semantic network associated with the
cue word, which, in turn, may facilitate associating the
response to the cue (Grimaldi & Karpicke, 2012; Hays
et  al., 2013; Huelser & Metcalfe, 2012; but see Potts &
Shanks, 2014). More generally, this research indicates—as
counterintuitive as it may seem—that the production of
errors during acquisition can, under some circumstances,
actually boost long-term retention.


Evidence from both the motor- and verbal-learning
domains shows that retrieval practice can have opposing
effects on learning and performance. Motor-learning
studies have revealed that, on the whole, physical guid-
ance often reduces performance errors during training

but that unguided, active involvement promotes better
long-term retention of skills. Likewise, practicing retrieval
of verbal materials may appear unhelpful during acquisi-
tion and on immediate memory tests, but it provides sub-
stantial benefits in preserving or stabilizing long-term
memory. It would seem prudent, therefore, that trainers
and instructors incorporate retrieval practice into their
curriculum and that students test themselves as a means
to optimize their own learning.


To what extent are educators and students aware of what
activities are beneficial for long-lasting learning? In par-
ticular, what does a learner need to know to manage his
or her own self-regulated learning in an optimal way?
These important questions concern metacognition,
which, broadly construed, refers to thinking about think-
ing (see Nelson, 1996). In the domain of learning and
memory, it denotes more specifically (a) one’s knowl-
edge and understanding of how learning and memory
operate and (b) the interplay between the monitoring
and controlling of one’s own ongoing learning and mem-
ory or that of others (for reviews, see R. A. Bjork,
Dunlosky, & Kornell, 2013; Soderstrom, Yue, & Bjork, in
press). Elucidating how people think about and monitor
their own learning is paramount because subjective
experience plays a causal role in determining subsequent
behavior (e.g., deciding what material should be restud-
ied and for what duration), and thus the appropriateness
of such behavior will necessarily depend on the meaning
and validity of learners’ subjective experiences (see
Nelson & Narens, 1990).

Although there is overwhelming empirical evidence
that learning and performance are dissociable, there
appears to be a lack of understanding on the part of
instructors and learners alike that performance during
acquisition is a highly imperfect index of long-term learn-
ing. As a consequence, what is effective for learning is
often misaligned with our metacognitive assessments of
what we think is effective for learning (for a review, see R.
A. Bjork, 1999). This disconnect has been clearly demon-
strated in surveys of students’ beliefs about learning. For
example, one study investigated undergraduates’ aware-
ness of six empirically supported learning strategies, three
of which—spacing versus massing, testing versus restudy-
ing, and generating versus reading—we have discussed in
earlier sections of this review. Overwhelmingly, students
endorsed as most effective those strategies known to
enhance short-term performance, a pattern that was strik-
ingly evident when students were confronted with choos-
ing between massed or spaced study: 93.33% of surveyed
students incorrectly endorsed massed study as being
more effective for learning than spaced study ( J. McCabe,

2011). From a research perspective, this is quite remark-
able (and alarming) considering that the spacing effect
has been demonstrated hundreds of times in the past cen-
tury and has emerged as one of the most robust and reli-
able effects in all of memory research. Fortunately, this
same study also found that educational interventions—for
example, a cognition course or targeted instruction on
effective learning techniques—helped ameliorate these

Other surveys have investigated how students study on
their own. For example, a survey of 472 college students
found that most students reported using a rereading strat-
egy. Additionally, although 90% of students reported using
self-testing, most them reported doing so in order to iden-
tify gaps in their knowledge, rather than because they
believed that self-testing conferred a direct learning ben-
efit. Moreover, 64% of students reported not revisiting
material once they felt like they had mastered it, while
only 36% of students reported that they would restudy or
test themselves later on that information (Kornell & Bjork,
2007; for similar results, see Hartwig & Dunlosky, 2012).
Another survey showed that students are generally
unaware of the benefits of retrieval practice compared
with rereading. When asked to report and rank their own
study strategies, 84% of students ranked rereading as one
of their strategies of choice, whereas only 11% of students
reported using retrieval practice at all (Karpicke, Butler, &
Roediger, 2009). It was argued that students prefer reread-
ing because it produces a heightened sense of fluency or
familiarity with the material, which students misinterpret
as an index of learning. In other words, students seem to
favor rereading because it leads to relatively greater per-
ceived gains in performance.

Similar illusions of competence have been demon-
strated in research that has examined how and to what
degree of accuracy people can monitor or evaluate their
own learning. Such experimental research has used both
retrospective and prospective judgments. With respect to
retrospective judgments—that is, subjective evaluations of
learning that require the learner to assess some past expe-
rience—people often erroneously endorse relatively inef-
fective conditions of learning. In Baddeley and Longman’s
(1978) study involving British postal workers, for exam-
ple, distributing practice was better than massing it for the
long-term retention of data entry (i.e., keystroke) skills;
however, learners in the distributed-practice group
reported being relatively less satisfied with their training
because they felt they were falling behind the massed-
practice group, which, in fact, was true during the acquisi-
tion phase. Thus, learners appear to interpret short-term
performance as a reliable guide to long-term learning.

Such biased retrospective judgments have also been
shown after tests of inductive learning. In one study, for
example, participants learned artists’ painting styles

according to a study schedule that was either massed
(blocked)—that is, every painting from an artist was pre-
sented successively before moving on to the paintings
from a new artist—or spaced (interleaved)—that is, paint-
ings from several artists were mixed together. On a final
induction test, participants were presented with new
paintings and were asked to identify which of the previ-
ously studied artists painted them. Such a test is consid-
ered a test of inductive learning because success on such
a test requires one to have extracted the artists’ general
painting styles from sets of exemplars. The results clearly
showed that inductive learning was enhanced by the
spaced study schedule compared with the massed sched-
ule. It is interesting to note, however, that when asked
after the induction test which study schedule helped them
learn better, an overwhelming majority of participants
endorsed massing (Kornell & Bjork, 2008). This finding is
all the more remarkable given that participants had
already experienced the test in which their learning prof-
ited from interleaving. Subsequent research on inductive
learning has demonstrated that learners, when permitted
to choose their own study schedule, also prefer massing
(Tauber, Dunlosky, Rawson, Wahlheim, & Jacoby, 2013).
Thus, not only does massing produce a sense of fluency
during acquisition performance that is misinterpreted as
learning, but learners also seem to hold the misguided the-
ory that massing one’s study is an effective way to learn.

Prospective metacognitive judgments, like retrospec-
tive ones, can also be heavily influenced by short-term
performance. For current purposes, the most relevant
prospective judgment—and the one that garners the most
empirical attention in contemporary metacognitive
research—is the judgment of learning ( JOL), which is
typically solicited during an acquisition, or encoding,
phase. Here, learners are asked to predict—usually on a
0%–100% scale—the likelihood that some information
will be remembered later. In other words, JOLs involve
learners predicting their own learning. Collecting such
predictions permits an examination of how learners
decide which information has been learned and which
has not and how well those predictions correspond to
actual learning on a later test. Although some early work
on verbal learning showed that JOLs predicted actual
learning relatively well (e.g., Arbuckle & Cuddy, 1969),
the more recent JOL literature in this domain is rife with
examples in which people’s immediate JOLs are not diag-
nostic of future learning (e.g., Benjamin, Bjork, &
Schwartz, 1998; Castel, McCabe, & Roediger, 2007; Koriat
& Bjork, 2005; Koriat, Bjork, Sheffer, & Bar, 2004; Mazzoni
& Nelson, 1995; Rhodes & Castel, 2008; Roediger &
Karpicke, 2006b; Soderstrom & McCabe, 2011; Yue,
Castel, & Bjork, 2013), revealing striking illusions of com-
petence and compelling evidence that JOLs are inferen-
tial in nature, based on cues rather than on memory

strength (see Koriat, 1997). We now discuss several
examples in which learners’ JOLs have exposed miscon-
ceptions about learning (for a more comprehensive
review, see Schwartz & Efklides, 2012).

Generally speaking, learners tend to be overconfident
in predicting their own learning and regularly exhibit
what has been termed the stability bias, which refers to
the tendency to believe that current accessibility of
retrieved information (i.e., performance) will remain sta-
ble across time, rather than appreciating those factors
that may impair or enhance later learning (Kornell &
Bjork, 2009; see also Ariel, Hines, & Hertzog, 2014;
Kornell, 2011). In a study that demonstrated a particularly
striking example of a stability bias, participants studied
related and unrelated word pairs, making JOLs after each
item. Separate groups of participants were asked to base
their predictions on how well they would remember the
pairs on an immediate test, a test after 1 day, or a test
after 1 week. As illustrated in Figure 5, participants pro-
duced a pattern of results demonstrating apparent insen-
sitivity to retention interval—specifically, equivalent JOLs
were given across the three retention intervals. However,
and as expected, actual recall decreased as a function of
retention interval. Also evident in Figure 5 is that JOLs
were highly sensitivity to the relatedness of the word

pairs, which led the authors to conclude that encoding
fluency, or how easily information is processed during
study, can largely drive JOLs (Koriat et al., 2004; see also
Begg, Duft, Lalonde, Melnick, & Sanvito, 1989; Koriat,
2008; Undorf & Erdfelder, 2011; for an alternative account,
see Mueller, Tauber, & Dunlosky, 2013), even at the
expense of other extremely relevant information—in this
case, retention interval.

Other research supports the conjecture that retrieval
fluency (i.e., how easily to-be-remembered information
is retrieved during an acquisition phase), like encoding
fluency, can also influence JOL magnitude (see, e.g.,
Benjamin et  al., 1998; Hertzog, Dunlosky, Robinson, &
Kidder, 2003). In one demonstration of this, participants
answered relatively easy general-knowledge questions,
after which time they predicted, on an item-by-item basis,
the likelihood that they would be able to recall a given
answer on a later free-recall test—that is, the likelihood
they would be able to recall having given a particular
answer without the question being provided again. The
results indicated that answers that took the shortest time
to generate were given higher JOLs compared with those
answers that were generated slowly. In other words, par-
ticipants based their JOLs on short-term performance—in
this case, retrieval latency. Later recall, however, showed
the opposite pattern: Answers that took a longer time to
generate were recalled at a higher rate than were answers
generated more quickly, presumably because the effort
involved in generating an answer is positively related
with its subsequent recall (Benjamin et al., 1998). Thus,
while retrieval fluency was related to both JOLs and later
recall, the direction of this relationship differed whether
it was assessed subjectively (via JOLs) or objectively (via
final recall), demonstrating, among other things, that
learners are captured by gains in short-term performance
and can mistakenly conflate such gains with long-term

The benefits of retrieval, more generally, are not
appreciated by learners either (see, e.g., Karpicke, 2009;
Kornell & Son, 2009). As discussed previously, the testing
effect refers to the finding that retrieval practice acts as a
learning event, rendering retrieved information more
recallable in the future than it would have been other-
wise (see Roediger & Karpicke, 2006a). In Roediger and
Karpicke’s (2006b) study in which participants studied
prose passages and then were either tested on those pas-
sages or restudied them, long-term retention, measured 1
week after the study phase, increased as a function of
testing opportunities during acquisition. However, par-
ticipants predicted the opposite pattern—specifically,
that learning after 1 week would be best when the pas-
sages were studied multiple times without being tested, a
pattern of performance that was, in fact, demonstrated in
the short term (after 5 min). Again, learners seem to


Immed Day Week



Retention Interval





Related Pairs

Unrelated Pairs

Fig. 5. Mean judgment of learning (JOL) and recall as a function of
retention interval (immediate [Immed], 1 day, or 1 week) for related
and unrelated word pairs. (Note that higher scores represent elevated
predictions [JOLs] and better performance and learning [recall].) Error
bars represent 95% confidence intervals. Data are adapted from Koriat,
Bjork, Sheffer, and Bar (2004).

assume that whatever boosts performance will also profit
long-term retention.

Finally, as is the case with retrospective judgments,
higher JOLs are given to material or skills that are studied
or practiced in a massed (blocked) schedule compared
with a spaced (distributed) schedule. In one example, par-
ticipants were presented with a list of to-be-remembered
words. Within the list, a second repetition of each item
occurred either immediately after its first presentation
(massed) or following a number of other items (spaced).
Participants predicted that the massed items would be bet-
ter remembered than the spaced items, whereas actual
recall showed the opposite pattern (Zechmeister &
Shaughnessy, 1980). An analogous result was subsequently
found for the learning of simple keystroke patterns: Despite
the fact that distributed practice led to relatively greater
long-term gains in learning the keystrokes, participants pre-
dicted that blocked practice—which did boost short-term
performance—would be better after a delay (Simon &
Bjork, 2001). This mismatch between JOLs and actual learn-
ing—that people’s JOLs favor massed practice, whereas
actual learning profits more from distributed practice—has
also been replicated for the learning of piano melodies
(Abushanab & Bishara, 2013).

Given that people are generally unaware of what
activities are beneficial for long-term retention and that
learners, by and large, have trouble accurately monitor-
ing their own ongoing learning, it is important to identify
ways to foster metacognitive sophistication in order to
optimize self-regulated learning (see R. A. Bjork et  al.,
2013). Instructors and students, for example, need to
become familiar with the types of learning strategies that
promote long-term learning—some of which we have
already discussed in the present review—before we can
expect the use of such strategies to be encouraged by
teachers or adopted by their pupils (for a review of the
utility of various study strategies, see Dunlosky, Rawson,
Marsh, Nathan, & Willingham, 2013). As well, research in
metacognition should endeavor to find methods of
improving people’s monitoring capabilities such that
learners become accurate forecasters and, as a result,
effective managers, of their own ongoing learning.
Fortunately, the number of studies on this topic is mount-
ing (e.g., Castel, 2008; DeWinstanley & Bjork, 2004; Koriat
& Bjork, 2006; D. P. McCabe & Soderstrom, 2011; Nelson
& Dunlosky, 1991; Soderstrom & Bjork, 2014; Soderstrom
& Rhodes, 2014; Thiede & Anderson, 2003; Tullis, Finley,
& Benjamin, 2013), a trend that we hope continues given
the importance of such work.


Both survey and experimental research in metacognition
have revealed that learners often mistakenly conflate

short-term performance with long-term learning, ostensi-
bly thinking, “If it’s helping me now, it will help me later.”
The extant survey literature on beliefs about learning
suggests that students, by and large, endorse and use
strategies that may confer short-term performance gains
but do not foster long-term learning. Likewise, research
that has examined how people monitor their own ongo-
ing learning has revealed that both retrospective and pro-
spective judgments are heavily influenced by acquisition
factors, a bias that often produces striking illusions of
competence. Given that people act on their subjective
experiences, it is imperative that people learn how to
learn by becoming knowledgeable of what effective
learning entails. It is important, too, that such metacogni-
tive sophistication is fostered early on in one’s education.

Contemporary Theoretical Perspectives

As discussed earlier, learning theorists from decades ago
(e.g., Estes, 1955a; Guthrie, 1952; Hull, 1943; Skinner,
1938; Tolman, 1932; Tulving & Pearlstone, 1966) used
terms in their own theories that distinguished between
learning and performance. Given the early research on
latent learning, overlearning, and fatigue, this distinction
was necessary. To account for more recent empirical
work in the motor- and verbal-learning domains, contem-
porary learning theorists also differentiate between the
relatively permanent changes in behavior and knowledge
that characterize long-term learning and the temporary
fluctuations in performance that occur across the training
or acquisition process. Although we briefly mentioned
possible explanations of several of the various empirical
findings reported in previous sections of this review, we
now discuss in more detail the dominant contemporary
learning theories that address the distinction between
learning and performance.

R. A. Bjork and Bjork (1992), in an attempt to formu-
late an account of a wide range of fundamental human
learning phenomena, resurrected the learning–perfor-
mance distinction in their new theory of disuse by intro-
ducing the terms storage strength and retrieval strength.
Storage strength refers to the degree to which memory
representations (i.e., knowledge and procedures) are
integrated or entrenched with other memory representa-
tions, whereas retrieval strength represents the current
ease of access or activation of those memory representa-
tions given current mental and environmental cues.
Current performance, which can be observed, is indexed
by retrieval strength, whereas long-term learning is
indexed by storage strength, which acts as a latent vari-
able by enhancing the gain of retrieval strength during
opportunities for study or practice and impeding the loss
of retrieval strength across time and intervening or inter-
fering events. Furthermore, storage capacity, unlike

retrieval capacity, is assumed to be limitless and, once
accumulated, never lost. This latter assumption—that the
storage strength of memories are permanent—distin-
guishes Bjork and Bjork’s new theory of disuse from
Thorndike’s (1914) original law of disuse, which asserted
that memories, without continued use, will decay over
time and can eventually disappear entirely.

According to the new theory of disuse, gains in stor-
age strength are expressed as a negatively accelerated
function of current retrieval strength—that is, the more
accessible representations are in memory when study or
test events occur, the less gains in storage strength can
be achieved for those representations. Consequently,
conditions that increase current retrieval strength might
benefit performance in the short term but will fail to
produce the type of permanent changes that character-
ize long-term learning. In contrast, situations that reduce
current retrieval strength (i.e., produce forgetting)—for
example, distributing study or practice sessions (as
opposed to massing them), varying the conditions of
learning (as opposed to keeping them constant), and
encouraging retrieval practice (as opposed to restudy)—
yield relatively greater gains in storage strength and thus
lead to enhanced long-term retention and transfer. As
argued by R. A. Bjork (2011), this interplay between
retrieval strength and storage strength—namely, that for-
getting can foster learning—is adaptive, yet counterintui-
tive, and has broad implications for treatment (see R. A.
Bjork & Bjork, 2006; Lang, Craske, & Bjork, 1999) as well
as training.

From a formal-modeling standpoint, the new theory of
disuse shares a number of properties with contextual-
fluctuation models (see, e.g., Mensink & Raaijmakers,
1988, whose model traces back to the influential stimu-
lus-fluctuation model proposed by Estes, 1955a, 1955b).
The basic idea is that the performing–learning organism
is heavily influenced by current cues, which gradually
change or fluctuate as time and events go on and differ-
ent aspects of the external and internal environments are
“sampled.” When cues are not changing, or are changing
slowly, as in massed practice, for example, performance
will increase rapidly, but forgetting will be rapid as well,
as cues change across a retention interval. As contextual
variation across acquisition trials either is introduced or
occurs spontaneously, performance will improve more
slowly, but more total cues will become associated with
to-be-learned responses, which will enhance learning, as
measured after a delay or in an altered context. Basically,
to borrow Estes’s initial language, response strength (per-
formance) is indexed by how associated some target
response is to the current cues, whereas habit strength
(learning) is indexed by how much some target response
is associated to the whole range of cues that characterize
some task and situation.

In the motor skills literature, specifically, the schema
theory of motor control and the reloading hypothesis
offer highly cited explanations for the learning and per-
formance effects produced by variable and distributed
practice, respectively. Originally postulated by Schmidt
(1975), the schema theory of motor control claims that
variable practice—that is, practicing iterations of a skill
that are related to but different from the target skill—fos-
ters long-term learning because it sensitizes one to the
general motor program, or schema, underlying a skill
(see also Schmidt, 2003). To flesh out this notion, con-
sider that discrete motor skills (e.g., shooting a basket-
ball, serving a tennis ball, swinging a golf club) involve
the coordination and implementation of classes of sim-
pler movements, each associated with unique parame-
ters, such as its timing, speed, and force. In order to
successfully reconstruct the parameters of the move-
ments required to execute a given skill, learners need to
become familiar with how the various rules that govern
one class of movements are related to the rules that gov-
ern the other relevant classes of movements and how
such interdependencies affect outcomes. An effective
way of doing this, according to schema theory, is to
increase the variation of the practiced skill such that one
is required to learn how to adjust the necessary move-
ment parameters to achieve desired goals. As we have
already discussed in this review, practice variability, while
having the potential to induce more errors during acqui-
sition compared with fixed practice conditions, often
leads to substantial gains in long-term retention and

In terms of the learning benefits associated with distrib-
uted practice, the reloading hypothesis asserts that spacing
out practice sessions with time or other activities encour-
ages the “reloading,” or reproducing, of the motor pro-
grams needed to execute to-be-learned skills (Lee & Magill,
1983, 1985). This is because the spacing inserted between
practice sessions results in a temporary loss of access to
the relevant motor commands. The effortful processing
required to reload the commands during distributed prac-
tice appears to facilitate learning but impede short-term
performance, compared with blocked (massed) practice in
which skills are performed over and over again.

Last, the general idea that what can hurt performance
can help learning is captured in the desirable difficulties
framework (R. A. Bjork, 1994; see also, E. L. Bjork & Bjork,
2011; R. A. Bjork, 2013). Manipulations such as distributed
practice, variable practice, and retrieval practice are “desir-
able” because they support better long-term retention and
transfer compared with their counterpart conditions. Such
effective learning manipulations are also “difficult,” how-
ever, in the sense that they can degrade performance dur-
ing acquisition or training and, consequently, are likely to
be interpreted as ineffective by instructors and students

alike. As unintuitive as it may seem, the active cognitive
processes engendered by confronting and resolving diffi-
culties during acquisition serves to effectively link or
entrench new information with knowledge that already
exists in memory. Furthermore, given that these active
processes are also likely recruited during later assess-
ments of long-term retention, the notion of desirable
difficulties generally accords with the principle of trans-
fer-appropriate processing (Morris, Bransford, & Franks,
1977) and the related encoding specificity principle
(Tulving & Thomson, 1973), both of which contend that
memory will improve to the extent that the engaged study
and test processes overlap. It is important to note, how-
ever, that when the difficulties cannot be overcome by the
learner—for example, when previously encountered infor-
mation cannot be successfully retrieved during retrieval
practice—they become undesirable (see McDaniel &
Butler, 2010). Thus, an ongoing challenge for researchers
has been to identify when difficulties are desirable for
learning and when they are not, so as to appropriately
inform the instructional practices of instructors and the
study behaviors of students.


Several current theoretical perspectives make the crucial
distinction between short-term performance and long-
term learning. According to the new theory of disuse,
dissociable effects of learning and performance arise as a
result of the adaptive interplay between storage strength—
the extent to which new and prior knowledge is
integrated— and retrieval strength—the relative ease with
which information can be accessed. The schema theory
of motor control claims that variable practice promotes
long-term retention and transfer by familiarizing learners
with the general motor programs that underlie motor
skills. Likewise, the reloading hypothesis asserts that dis-
tributed practice encourages learners to reload or repro-
duce the to-be-learned motor skills during acquisition,
which is a potent learning event, despite appearing not
to be during acquisition. Finally, the desirable difficulties
framework proposes that manipulations that appear to
be difficult—both objectively and subjectively—during
acquisition or training can be desirable for long-term
retention and transfer because they engender active
encoding processes.

Concluding Comments

We have provided the first integrative review of the over-
whelming empirical evidence that necessitates the critical
distinction between learning—the relatively permanent
changes in behavior or knowledge that support long-term
retention and transfer—and performance—the temporary

fluctuations in behavior or knowledge that are observed
and measured during training or instruction or immedi-
ately thereafter. Dating back nearly a century ago, early
research on latent learning, overlearning, and fatigue pro-
vided the first insights into the learning–performance dis-
tinction by showing that substantial learning could occur
in the absence of any discernible changes in performance.
This work—conducted with both humans and nonhuman
animals—compelled learning theorists at that time to
make corresponding conceptual distinctions in their own
theories of learning and memory. More recent research in
the motor-skills and verbal-learning literatures have dem-
onstrated the converse to also be true—specifically, that
changes in short-term performance often bear no rela-
tionship to long-term learning. In fact, the results of vari-
ous studies on distributed practice, variable practice, and
retrieval practice suggest that learning and performance
can be at odds, such that conditions that appear to
degrade acquisition performance are often the very con-
ditions that yield the most durable and flexible learning.
Finally, research in metacognition suggests that fleeting
gains during acquisition are likely to fool instructors and
students into thinking that permanent learning has taken
place, creating powerful illusions of competence.

That learning and performance are dissociable has
widespread implications for theory, research, and prac-
tice. Any present (or future) comprehensive theory of
learning and memory needs to distinguish, in some way,
between the relatively permanent changes in behavior
and knowledge that characterize long-term learning and
transfer and the momentary changes in performance that
occur during the acquisition of such behavior and knowl-
edge. Likewise, researchers interested in elucidating fac-
tors that optimize learning should be cognizant of the
possibility that the effects of manipulating a given vari-
able might very well interact with retention interval—in
other words, the variable might have differential influ-
ences on learning and performance. As such, we recom-
mend that experimenters include both short- and
long-term measures in their studies.

Finally, given that the goal of instruction and practice—
whether in the classroom or on the field—should be to
facilitate learning, instructors and students need to appre-
ciate the distinction between learning and performance
and understand that expediting acquisition performance
today does not necessarily translate into the type of learn-
ing that will be evident tomorrow. On the contrary, condi-
tions that slow or induce more errors during instruction
often lead to better long-term learning outcomes, and
thus instructors and students, however disinclined to do
so, should consider abandoning the path of least resis-
tance with respect to their own teaching and study strate-
gies. After all, educational interventions should be based
on evidence, not on historical use or intuition.

James S. McDonnell Foundation Grant 29192G supported the
writing of this review. We thank Henry L. Roediger III for his
insightful comments on an earlier version of this article.

Declaration of Conflicting Interests

The authors declared no conflicts of interest with respect to the
authorship or the publication of this article.


