Literature review on behavior analysis
I need a literature review or a capstone project. i will provide articles and outline with purpose and hypothesis to work with. Feel free to add anymore that you think will fit with this literature review
Valeria Castano
Literature Review Outline
Introduction
· Extension to Carnett et al. (2014) study
· Study the effects of preservative interest based token economies on task behavior
· Research is warranted to find the most effective method for on task behavior.
·
Carnett Amarie, Raulston Tracy, Lang Russell, Tostanoski Amy, Lee Allyson, Sigafoos Jeff, & Machalicek Wendy. (2014). Effects of a Perseverative Interest-Based Token Economy on Challenging and On-Task Behavior in a Child with Autism. Journal of Behavioral Education, 23(3), 368–377.
Body
Token Economies
·
Doll, C., McLaughlin, T. F., & Barretto, A. (2013). The token economy: A recent review and evaluation. International Journal of basic and applied science, 2(1), 131-149.
·
Hine, J. F., Ardoin, S. P., & Call, N. A. (2018). Token economies: Using basic experimental research to guide practical applications. Journal of Contemporary Psychotherapy, 48(3), 145-154.
·
Williamson, R. L., & McFadzen, C. (2020). Evaluating the Impact of Token Economy Methods on Student On-task Behaviour within an Inclusive Canadian Classroom. International Journal of Technology and Inclusive Education (IJTIE), 9(1), 1531-1541.
·
Boniecki, K. A., & Moore, S. (2003). Breaking the Silence: Using a Token Economy to Reinforce Classroom Participation. Teaching of Psychology, 30(3), 224.
Interests
· Carnett Amarie, Raulston Tracy, Lang Russell, Tostanoski Amy, Lee Allyson, Sigafoos Jeff, & Machalicek Wendy. (2014). Effects of a Perseverative Interest-Based Token Economy on Challenging and On-Task Behavior in a Child with Autism. Journal of Behavioral Education, 23(3), 368–377.
·
Charlop-Christy, M. H., & Haymes, L. K. (1998). Using objects of obsession as token reinforcers for children with autism. Journal of Autism and Developmental Disorders, 28(3), 189-198.
·
Hirst, E. S. J., Dozier, C. L., & Payne, S. W. (2016). Efficacy of and preference for reinforcement and response cost in token economies. Journal of Applied Behavior Analysis, 49(2), 329.
·
Soares, D. A., Harrison, J. R., Vannest, K. J., & McClelland, S. S. (2016). Effect Size for Token Economy Use in Contemporary Classroom Settings: A Meta-Analysis of Single-Case Research. School Psychology Review, 45(4), 379–399.
Conclusion
· Main findings: Studies have shown that including objects of interest in a clients token board will increase engagement.
· This study replicates past research conducted by Carnett et al. (2014) but also extends it by using multiple participants.
· The purpose of this study he purpose of this capstone is to extend the work of Carnett et al. (2014) and Charlop-Christy and Haymes (1998) and compare the effects of a token economy that does not include a child’s perseverative interest versus a token economy that includes the child’s perseverative interest on on-task behavior
· I hypothesize that the token economy with the clients perseverative interest incorporated in the token economy will increase on task behavior.
References:
Boniecki, K. A., & Moore, S. (2003). Breaking the Silence: Using a Token Economy to Reinforce Classroom Participation. Teaching of Psychology, 30(3), 224.
Carnett Amarie, Raulston Tracy, Lang Russell, Tostanoski Amy, Lee Allyson, Sigafoos Jeff, & Machalicek Wendy. (2014). Effects of a Perseverative Interest-Based Token Economy on Challenging and On-Task Behavior in a Child with Autism. Journal of Behavioral Education, 23(3), 368–377.
Charlop-Christy, M. H., & Haymes, L. K. (1998). Using objects of obsession as token reinforcers for children with autism. Journal of Autism and Developmental Disorders, 28(3), 189-198.
Doll, C., McLaughlin, T. F., & Barretto, A. (2013). The token economy: A recent review and evaluation. International Journal of basic and applied science, 2(1), 131-149.
Hine, J. F., Ardoin, S. P., & Call, N. A. (2018). Token economies: Using basic experimental research to guide practical applications. Journal of Contemporary Psychotherapy, 48(3), 145-154.
Hirst, E. S. J., Dozier, C. L., & Payne, S. W. (2016). Efficacy of and preference for reinforcement and response cost in token economies. Journal of Applied Behavior Analysis, 49(2), 329.
Soares, D. A., Harrison, J. R., Vannest, K. J., & McClelland, S. S. (2016). Effect Size for Token Economy Use in Contemporary Classroom Settings: A Meta-Analysis of Single-Case Research. School Psychology Review, 45(4), 379–399.
Williamson, R. L., & McFadzen, C. (2020). Evaluating the Impact of Token Economy Methods on Student On-task Behaviour within an Inclusive Canadian Classroom. International Journal of Technology and Inclusive Education (IJTIE), 9(1), 1531-1541.
Breaking the Silence: Using a Token Economy
to Reinforce Classroom Participation
Kurt A. Boniecki
Stacy Moore
University of Central Arkansas
We propose a procedure for increasing student participation, par-
ticularly in large classes. The procedure establishes a token econ-
omy in which students earn tokens for participation and then
exchange those tokens for extra credit. We evaluated the effective-
ness of the procedure by recording the degree of participation in an
introductory psychology class before, during, and after implemen-
tation of the token economy. Results revealed that the amount of di-
rected and nondirected participation increased during the token
economy and returned to baseline after removal of the token econ-
omy. Furthermore, students responded faster to questions from the
instructor during the token economy than during baseline, and this
decrease in response latency continued even after removal of the to-
ken economy.
A considerable literature attests to the importance of ac-
tive learning in which students engage and process course
material rather than passively receive it (e.g., Benjamin,
1991; Bligh, 2000; Bonwell & Eison, 1991). One way instruc-
tors can facilitate active learning is to challenge the class pe-
riodically with relevant questions and encourage students to
offer questions and comments. However, instructors may
avoid this form of classroom interaction because of a phe-
nomenon we call “the silence,” the uncomfortable time fol-
lowing the instructor’s question when no one responds. The
silence is a particular problem in large classes in which stu-
dents feel relatively anonymous and are reluctant to partici-
pate (McKeachie, 2002). Instructors can use a variety of
techniques to combat the silence, such as waiting out the si-
lence (Kendall, 1994), calling on students by name (Gurung,
2002), or initiating small group discussions (McKeachie,
2002). In this article, we present another method for break-
ing the silence that is effective and easy to use, particularly in
large classes.
Our method relies on extra credit to reinforce participation.
Other faculty have used extra credit as an incentive to improve
exam performance (Junn, 1995; Nation & Bourgeois, 1978),
read journal articles (Carkenord, 1994), seek writing assis-
tance (Oley, 1992), demonstrate critical thinking (Junn,
1994), improve behavior modification projects (Barton,
1982), and avoid procrastination (Lloyd & Zylla, 1981;
Powers, Edwards, & Hoehle, 1973). Our method creates a to-
ken economy in which students earn tokens for participation.
Immediately following participation, the instructor presents a
token to the student. At the end of class, students exchange
their tokens for extra credit toward their course grades.
Hodge and Nelson (1991) also used reinforcement to
shape classroom participation. In their study, the instructor
wrote students’ initials on the board and placed plus marks
next to the initials of students who exhibited the desired
amount of participation. Although similar to our method,
Hodge and Nelson’s procedure differs from ours in several
ways. For instance, their procedure is feasible only in small
classes, whereas our method is relatively easy to use in classes
of almost any size. Indeed, the first author has successfully
used our method in classrooms that seat as many as 200 stu-
dents. Also, Hodge and Nelson evaluated the effectiveness of
their technique based on students’ self-reported participa-
tion. In contrast, we evaluated the effectiveness of our
method more objectively by having a research assistant ob-
serve the degree of student participation prior, during, and af-
ter the token economy.
Method
Participants
Sixty-three undergraduate students enrolled in an intro-
ductory psychology course at the University of Central Ar-
kansas participated in the study.
Procedure
The class met 75 min twice weekly for 16 weeks. We con-
ducted the study over the final 11 class meetings of the term.
During each of these 11 class meetings, the instructor period-
ically directed relevant questions to the class, and students
who wanted to answer the questions raised their hands. The
instructor then called on students in the order in which they
raised their hands until a student answered the question cor-
rectly. If no one raised a hand within 60 sec following a ques-
tion, the instructor announced the answer and continued
with the lecture.
The first 4 of the 11 class meetings served as the baseline
period. During this time, students did not receive any explicit
reward for answering a question correctly. Over the next 4
class meetings, the instructor implemented the token econ-
omy. The instructor announced that the first person to an-
swer a question correctly would receive a token. The tokens
were wooden checker pieces purchased from a local hobby
store. The pieces were heavy enough to throw, but light
224 Teaching of Psychology
enough not to cause injury if they missed their target. At the
end of each class meeting, students could exchange each to-
ken for one point added to their next exam grade. Each exam
point was worth 0.25% of the course grade. If students did
not turn in their tokens at the end of the class meeting, those
tokens were void, and students could not exchange them for
extra credit in the future. This rule ensured that the instruc-
tor had to keep a supply of tokens for only one class meeting
and avoided claims of lost tokens. During the final 3 class
meetings, the instructor discontinued the token economy
and informed students that they could no longer earn tokens
for correct answers. As required by our university’s institu-
tional review board, the instructor also provided students
who had not earned extra credit during the token economy
with alternative extra credit opportunities during the re-
moval period. After the removal period, the instructor fully
debriefed students about the study.
During each of the final 11 class meetings, a research assis-
tant sat in the last row of the classroom where she had an un-
obstructed view of all students and posed as a student in the
class (e.g., by pretending to take notes). The research assis-
tant recorded the amount of directed participation (number
of students who raised their hands in response to a question
from the instructor), latency to participation (amount of time
following each question until the first hand was raised), and
amount of nondirected participation (number of times any
student spontaneously asked the instructor a question or en-
gaged the instructor in discussion). The research assistant
measured latency using a hand-operated digital stopwatch,
which she kept hidden at all times.
Results
The instructor asked 16 questions during baseline, 14 dur-
ing the token economy, and 16 during removal. Overall, the
instructor asked a mean of 4.18 questions per class meeting.
Only once did no student raise a hand following a question
from the instructor. We recorded and analyzed this question,
which occurred during baseline, as zero directed participa-
tion, but removed it from the analysis of latency to participa-
tion. Table 1 presents a summary of all three dependent
measures across the three phases.
Directed Participation
We analyzed amount of directed participation using fo-
cused chi-square tests. We adjusted the expected frequencies
to control for the different number of questions across the
three phases. Compared to baseline, significantly more stu-
dents raised their hands in response to the instructor’s ques-
tions during the token economy, χ2(1, N = 77) = 11.85, p <
.001. Furthermore, students raised significantly fewer hands
during removal than during the token economy, χ2(1, N =
77) = 11.85, p < .001, but the number of hands raised during
removal was not significantly different from baseline, χ2(1, N
= 52) = 0.00.
Latency to Participation
We conducted a one-way ANOVA of the latency data.
Each question from the instructor, rather than each student
in the class, constituted the unit of analysis. The ANOVA re-
vealed a significant difference between the mean latencies of
the three phases, F(2, 42) = 8.23, p = .001, η = .53. Tukey’s
honestly significant difference (HSD) test indicated that stu-
dents raised their hands significantly faster during the token
economy than during baseline (p = .001). However, Tukey’s
HSD tests showed that latency to participation during re-
moval was not significantly slower than during the token
economy (p > .20), but was significantly faster than during
baseline (p = .05).
Nondirected Participation
We analyzed amount of nondirected participation using
focused chi-square tests. We adjusted the expected frequen-
cies to control for the different number of class meetings
across the three phases. Compared to baseline, students
spontaneously participated significantly more during the to-
ken economy, χ2(1, N = 125) = 19.21, p < .001. However,
during removal students spontaneously participated signifi-
cantly less than during the token economy, χ2(1, N = 120) =
11.56, p < .001. Furthermore, nondirected participation did
not significantly differ between baseline and removal, χ2(1, N
= 71) = 0.38, p > .44.
Discussion
As we hoped, the amount of directed and nondirected par-
ticipation dramatically increased following the implementa-
tion of the token economy. Students were more than twice as
likely to raise their hands following a question during the to-
ken economy than during baseline. Likewise, students were
more than twice as likely to ask questions and to make com-
ments spontaneously during the token economy than during
baseline, even though the instructor did not directly rein-
force this form of participation with tokens. Thus, in general,
students appeared more willing to contribute to the class dur-
ing the token economy. Once the instructor removed the to-
Vol. 30 No. 3, 2003 225
Table 1. Means for the Dependent
Measures Across the Three Phases
Dependent Measure Baseline
Token
Economy Removal
Directed
participation/question 1.63a 3.64b 1.63a
Latency to
participation/questiona 6.16a 0.56b 2.93b
Nondirected
participation/class
period 9.50a 21.75b 11.00a
Note. Values within a row not sharing a subscript are significantly
different (p ≤ .05).
aTime latencies are reported in seconds.
ken economy, both directed and nondirected participation
fell back to baseline levels, but not below them. This result
suggests that the token economy did not reduce students’ in-
trinsic motivation to participate.
We also were impressed by the shorter amount of time it
took students to respond to a question during the token econ-
omy compared to baseline. During baseline, an average of 6 sec
passed before a student raised a hand, but during the token
economy, this latency dropped to less than 1 sec. A person may
question whether a student can formulate a thoughtful answer
in less than 1 sec. Although we collected no data to address
this concern directly, the instructor and research assistant no-
ticed little change in the quality of students’ responses across
the phases of the study. Furthermore, we believe that, during
the token economy, students often raised their hands not be-
cause they had an answer, but because they wanted to be the
first to answer. Reder (1987) showed that students can quickly
assess whether they know an answer before actually recalling
the answer from memory. Indeed, during the token economy,
many students took a few seconds to formulate their response
after being called on by the instructor. In contrast, during re-
moval, when there was no competition for tokens, students ap-
peared to wait until they formulated an answer before raising
their hands—nearly 3 sec, on average, after the instructor
asked the question. However, this latency during removal was
still half the latency of the baseline phase, which suggests that
the token economy may have a lasting effect on the speed of
participation.
The contingency between the presence of the token
economy and the amount and speed of participation
strongly suggests that the tokens were responsible for in-
creasing participation. However, we are aware that the de-
sign of this study does not allow a definitive causal
conclusion. A comparable control group and random as-
signment would have provided a stricter test of the token
economy’s effectiveness, but these methodological luxuries
were not possible. Thus, alternative explanations abound.
For example, the instructor covered different topics across
the three periods—developmental psychology during base-
line, personality and psychological disorders during the to-
ken economy, and therapies and social psychology during
removal. Perhaps the topics covered during the token econ-
omy facilitated more participation than the topics covered
during baseline and removal. Nonetheless, we have confi-
dence in the token economy for two reasons beyond these
results. First, a large body of research attests to the effec-
tiveness of token economies and other operant techniques
to modify human behavior (Glynn, 1990; Kazdin, 1982;
Miltenberger, 1997). Second, the instructor in this study
(the first author) has used the token economy effectively
across the entire terms of several courses.
In all the classes in which the instructor has used the token
economy, only one student has complained of being unable
to earn tokens. One way of avoiding this complaint is to pro-
vide alternative extra credit opportunities, although too
many opportunities may reduce the token economy’s effec-
tiveness. Another way is to set a maximum limit on the num-
ber of tokens that can be earned. The “faster” students reach
the limit early, thereby increasing the chance of other stu-
dents earning tokens.
We believe the token economy procedure is a simple and
effective means of breaking the silence, especially in large
classes. In addition, the procedure serves as an excellent
demonstration of operant conditioning and the utility of to-
ken economies. Indeed, during the removal period, while
the instructor described token economies, one student
spontaneously noted that the instructor had used a token
economy to increase students’ participation. We believe
this sudden connection promotes an “a-ha” experience for
the class and a deeper understanding of the material. Fur-
thermore, the first author has noticed an increase in stu-
dent attendance, enthusiasm, and preparation when he has
used the token economy. Students have commented that
they enjoy the procedure because it makes class more excit-
ing and interactive.
Finally, the token economy system described in this study
is flexible and easily adapted to an instructor’s teaching style.
We understand that some instructors do not like to use extra
credit in their courses. However, instead of extra credit to-
ward the students’ course grades, tokens could be worth
credit toward “purchasing” desirable options, such as drop-
ping a quiz or being excused from the final exam (see Komaki,
1975). Alternatively, instructors could replace tokens with
other easily delivered rewards, such as candy. As long as stu-
dents perceive a contingency between some positive rein-
forcer and their participation, instructors may develop
variations to suit their teaching style.
References
Barton, E. J. (1982). Facilitating student veracity: Instructor applica-
tion of behavioral technology to self modification projects.
Teaching of Psychology, 9, 99–101.
Benjamin, L. T., Jr. (1991). Personalization and active learning in
the large introductory psychology class. Teaching of Psychology, 18,
68–74.
Bligh, D. A. (2000). What’s the use of lectures? San Francisco:
Jossey-Bass.
Bonwell, C. C., & Eison, J. A. (1991). Active learning: Creating excite-
ment in the classroom (Rep. No. ISBN–1–878380–08–7). Washing-
ton, DC: School of Education and Human Development, George
Washington University. (ERIC Document Reproduction Service
No. ED 336049)
Carkenord, D. M. (1994). Motivating students to read journal arti-
cles. Teaching of Psychology, 21, 162–164.
Glynn, S. M. (1990). Token economy approaches for psychiatric pa-
tients: Progress and pitfalls over 25 years. Behavior Modification,
14, 383–407.
Gurung, R. (2002, June). Sleeping students don’t talk (or learn): En-
hancing active learning via class participation. In P. Price (Chair),
Active learning in the classroom: Overview and methods. Symposium
conducted at the 14th annual meeting of the American Psycho-
logical Society, New Orleans, LA.
Hodge, G. K., & Nelson, N. H. (1991). Demonstrating differential
reinforcement by shaping classroom participation. Teaching of Psy-
chology, 18, 239–241.
Junn, E. (1994). “Pearls of wisdom”: Enhancing student class partici-
pation with an innovative exercise. Journal of Instructional Psychol-
ogy, 21, 385–387.
Junn, E. N. (1995). Empowering the marginal student: A skills-based
extra-credit assignment. Teaching of Psychology, 22, 189–192.
226 Teaching of Psychology
Kazdin, A. E. (1982). The token economy: A decade later. Journal of
Applied Behavior Analysis, 15, 431–445.
Kendall, B. (1994). Moment of silence. In E. Bender, M. Dunn, B.
Kendall, C. Larson, & P. Wilkes (Eds.), Quick hits: Successful strat-
egies by award winning teachers (p. 18). Bloomington: Indiana Uni-
versity Press.
Komaki, J. (1975). Neglected reinforcers in the college classroom.
Journal of Higher Education, 46, 63–74.
Lloyd, M. E., & Zylla, T. M. (1981). Self-pacing: Helping students
establish and fulfill individualized plans for pacing unit tests.
Teaching of Psychology, 8, 100–103.
McKeachie, W. J. (2002). McKeachie’s teaching tips: Strategies, re-
search, and theory for college and university teachers (11th ed.).
Boston: Houghton Mifflin.
Miltenberger, R. G. (1997). Behavior modification: Principles and pro-
cedures. Pacific Grove, CA: Brooks/Cole.
Nation, J. R., & Bourgeois, A. E. (1978). PASS, an alternative
method of teaching introductory psychology. Research in Higher
Education, 8, 273–282.
Oley, N. (1992). Extra credit and peer tutoring: Impact on the qual-
ity of writing in introductory psychology in an open admissions col-
lege. Teaching of Psychology, 19, 78–81.
Powers, R. B., Edwards, K. A., & Hoehle, W. F. (1973). Bonus
points in a self-paced course facilitates exam-taking. Psychologi-
cal Record, 23, 533–538.
Reder, L. M. (1987). Strategy selection in question answering. Cog-
nitive Psychology, 19, 90–138.
Notes
1. We thank Bill Lammers and Timothy Johnston for their helpful
comments on an earlier draft of this article.
2. Send correspondence to Kurt A. Boniecki, University of Central
Arkansas, Department of Psychology and Counseling, 201
Donaghey Avenue, UCA Box 4915, Conway, AR 72035; e-mail:
kurtb@mail.uca.edu.
Vol. 30 No. 3, 2003 227
Effects on Content Acquisition of Signaling Key Concepts
in Text Material
Jeffrey S. Nevid
Jodi L. Lampmann
St. John’s University
Eighty college students read textbook passages that either included
marginal inserts to signal key concepts or did not include these in-
serts. Signaling key concepts enhanced performance on content quiz-
zes overall and on subsets of items assessing signaled material.
Performance was not affected on subsets of items for nonsignaled
content. Students reported preferring the signaled format and found
it both clearer and easier to understand than the nonsignaled format.
Signaling key concepts by extracting and highlighting them in mar-
ginal inserts may facilitate encoding and retention of these concepts.
Even in this day of multimedia enhancements in the class-
room, textbooks remain very much at the core of the learning
process. In recent years, increasing concerns about declining
student competencies in mastering basic subject matter have
led to the incorporation of numerous pedagogical aids
(Weiten & Wight, 1992), including the SQ3R study method,
marginal running glossaries, pronunciation guides, built-in or
accompanying study guides, self-scoring quizzes, chap-
ter-by-chapter learning objectives, and interactive laboratory
demonstrations on CD–ROMs and companion Web sites.
Publishers are spending increasing amounts of money pro-
ducing textbooks, and this increase is passed along to con-
sumers via higher prices (Weiten & Wight, 1992). Despite
these changes, it remains unclear whether the benefits of
learning enhancements are worth the additional costs. Sur-
prisingly, there is little research on the use of pedagogical fea-
tures as learning devices.
Most reported studies on textbook pedagogy are limited to
student surveys. In one survey, Weiten, Guadagno, and Beck
(1996) assessed student familiarity with pedagogical devices,
their likelihood of using them, and their perceptions of the de-
vices’ value. Students were generally familiar with most peda-
gogical aids, but reported they rarely used some of the aids,
such as outlines and discussion questions. Among the most
highly valued and widely used pedagogical aids were boldfaced
technical terms, chapter summaries, and running or chapter
glossaries.
Other investigators reported similar findings, with students
generally endorsing the value of boldfaced technical terms,
running or chapter glossaries, chapter summaries, and
self-tests (Marek, Griggs, & Christopher, 1999; Weiten,
Deguara, Rehmke, & Sewell, 1999). Students also tend both
to value and make greater use of pedagogical devices that take
little time to read and those that they perceive as relevant in
helping them prepare for course examinations (Marek et al.,
1999; Weiten et al., 1996). Students appear to be more con-
cerned with meeting course demands and less concerned with
developing more elaborate study patterns (Marek et al., 1999).
Weiten and his colleagues (1999) reported small, but sig-
nificant positive correlations between grade point averages
and students’ ratings of how likely they were to use pedagogi-
cal devices. Although correlational links between academic
success and use of pedagogical aids may be encouraging, they
cannot be used as a basis for drawing cause–effect relations.
Effect Size for Token Economy Use in Contemporary
Classroom Settings: A Meta-Analysis of
Single-Case Research
Denise A. Soares
University of Mississippi
Judith R. Harrison
Rutgers University
Kimberly J. Vannest
Texas A&M University
Susan S. McClelland
University of Mississippi
Abstract. Recent meta-analyses of the effectiveness of token economies (TE
s)
report insufficient quality in the research or mixed effects in the results. This study
examines the contemporary (post-Public Law 94-142) peer-reviewed published
single-case research evaluating the effectiveness of TEs. The results are stratified
across quality of demonstrated functional relationship using a nonparametric
effect size (ES) that controls for undesirable baseline trends in the analysis. In
addition, moderators (i.e., classroom setting, age of participant, outcomes, use of
response cost, and use of verbal cueing) were analyzed. Eighty-eight AB phas
e
contrasts were calculated from 28 studies (1980 –2014) representing 90 partici-
pants and produced a weighted mean ES of 0.82 (SE � 0.03, 95% CI [0.77, 0.88]).
Strong quality produced a combined weighted mean ES of 0.85 (SE � 0.642, 95%
CI [0.74, 0.97]). Moderator analyses revealed that a TE was slightly more
effective for youth between the ages of 6 and 15 years than for children between
the ages of 3 and 5 years or when used with behavioral goals in comparison to
academic goals. However, no difference was found when implemented in general
or special education settings or with the inclusion of response cost or verbal
cueing.
A token economy (TE) is one of a hand-
ful of interventions found in classroom set-
tings. Based on the well-established principles
of reinforcement described by Skinner (1931),
a TE is a secondary reinforcement system
(Alberto & Troutman, 2003), whereby inher-
Correspondence concerning this paper should be addressed to Denise A. Soares, University of Mississippi,
P.O. Box 1848, 49 Guyton Drive University, MS
386
77; e-mail: dasoares@olemiss.edu
Copyright 2016 by the National Association of School Psychologists, ISSN 0279-6015, eISSN 2372-966x
School Psychology Review,
2016, Volume 45, No. 4, pp.
379
–
399
379
ently neutral items (i.e., tokens) are awarded
for the demonstration of targeted behaviors.
Tokens are accumulated and exchanged for
backup reinforcers valued by the student (Ka-
zdin, 1971; Simonsen, Fairbanks, Briesch,
Myers, & Sugai, 2008). A TE has historically
been considered a best-practices behavior
management strategy for use in schools (Fil-
check, McNeil, Greco, & Bernard, 2004; Mat-
son & Boisjoli, 2009) and is one intervention
frequently implemented within the positive
behavioral interventions and supports frame-
work. However, the emphasis on meta-ana-
lytic thinking (see Maggin, Chafouleas, God-
dard, & Johnson, 2011) that evolved after the
passage of the Individuals with Disabilities
Education Improvement Act (2004) and No
Child Left Behind Act (2001) has provoked
questions regarding its effectiveness (Maggin
et al., 2011). In the following sections, we
describe the historical and current research on
the use of TEs and gaps in the literature that
are addressed by this meta-analysis.
TOKEN ECONOMY RESEARCH
Numerous individual studies have
demonstrated successful application of TEs
across populations and settings. Specifi-
cally, TEs have produced positive effects
for students with emotional and behavioral
problems (Cavalier, Ferretti, & Hodges,
1997), intellectual disabilities (Millersmith,
Weber, & McLaughlin, 2013), attention def-
icit hyperactivity disorder (DuPaul & Wey-
andt, 2006), learning disabilities (Higgins,
Williams, & McLaughlin, 2001), and schizo-
phrenia (Ulmer, 1976). The use of TEs has
been effective not only in schools (Filcheck et
al., 2004) but also in residential treatment cen-
ters (Murray & Sefchik, 1992), mental health
hospitals (Hopko, Lejuez, Lepage, Hopko, &
McNeil, 2003), prisons or detention centers
(Bippes, McLaughlin, & Williams, 1986), and
colleges (Stilitz, 2009).
A TE has been deemed an effective in-
tervention by two seminal reviews (Kazdin &
Bootzin, 1972; Kazdin, 1982), two systematic
reviews (Dickerson, Tenhual, & Green-Paden,
2005; Matson & Boisjoli, 2009), and one
meta-analysis (Maggin et al., 2011). Kazdin
and Bootzin (1972) and Kazdin (1982) evalu-
ated benefits of using a TE, such as immediate
reinforcement of behavior to maintain perfor-
mance across time. They also identified obsta-
cles to its effective implementation, such as
inadequate staff training, client resistance, cir-
cumvention of contingencies, and lack of re-
sponse. In 1982, Kazdin updated the original
review, evaluating the progress in the field
since 1972. The authors found research had
uncovered solutions to previously identified
obstacles such as individualizing tokens
and backup reinforcers (i.e., frequency, value)
to enhance responsiveness, revising methods
of staff training, decreasing resistance, and
emphasizing the need to maintain effects
across time. However, the authors did not
complete systematic literature reviews, as
their goal was to identify obstacles and meth-
ods of overcoming them and not to synthesize
the literature. Thus, these two articles do not
identify or appraise the state of the literatur
e.
Two systematic literature reviews
(Dickerson et al., 2005; Matson & Boisjoli,
2009) synthesized and reported information
from source articles. Dickerson et al. (2005)
evaluated the use of a TE to improve socially
appropriate behaviors of individuals with
mental health disorders in hospital settings.
They reviewed 13 studies (group and single-
case experimental design [SCED]) with 1,074
participants ranging from 18 to 55 years old;
29% were diagnosed with schizophrenia, 13%
with psychotic disorder, and 57% with other
mental illnesses. Results indicated that a TE
was effective for increasing adaptive behav-
iors such as work performance, social interac-
tion, and the daily care skills of these patients.
Similarly, Matson and Boisjoli (2009) evalu-
ated the effects of a TE on the behaviors of
individuals with autism and/or development
disabilities. They reviewed 16 group and
SCED studies conducted in multiple settings,
such as schools, homes, summer camps, group
homes, state hospitals, and a developmental
center. The 164 participants ranged in age
from 4 to 18 years; approximately 91% were
children with intellectual disabilities and 8%
were children with autism. Results indicated
School Psychology Review, 2016, Volume 45, No. 4
380
that a TE was associated with positive out-
comes in social, behavioral, and academic ar-
eas. Although these studies systematically re-
viewed the literature, neither quantified the
effect associated with the use of a TE in
schools through meta-analytic procedures.
Maggin et al. (2011) raised questions
regarding a TE as an evidence-based interven-
tion in the only meta-analysis to date. They
conducted a meta-analysis of SCED studies to
extend findings from earlier reviews. Maggin
et al. targeted behavioral outcomes and coded
the studies utilizing the Protocol for Assessing
Single-Subject Research Quality (PASS-RQ;
Maggin & Chafouleas, 2010) developed by
the authors, with indicators from the guide-
lines of Horner et al. (2005) for quality re-
search and the What Works Clearinghouse
(WWC) standards (Kratochwill et al., 2010).
In addition, Maggin et al. calculated four dif-
ferent meta-analytic effect sizes (ESs): percent
of nonoverlapping data (PND; Scruggs &
Mastropieri, 2001) � 78.49%; improvement
rate difference (IRD; Parker, Vannest, &
Brown, 2009) � 51.47%; standardized mean
difference (SMD; Busk & Serlin, 1992) � 8.02;
and raw-data multilevel ES (RMD; Van den
Noortgate & Onghena, 2003, 2008) � 8.74.
Maggin et al. stated, “Three of the four effect
size (ES) measures found a significant improve-
ment” (p. 550). The authors reported that the
three significant ESs were PND, SMD, and
RMD (D. Maggin, personal communication,
March 2, 2016). However, the authors con-
cluded design quality did not meet WWC stan-
dards for 70% of the included studies because of
fewer than three opportunities to demonstrate an
effect (n � 7) or fewer than three data points per
phase (n � 10). Weaknesses were found in the
description of measurement procedures used; the
number of data points per phase; and the report-
ing of treatment fidelity, interobserver agree-
ment (IOA), and social validity. The message
from these findings is that rigorous research with
sufficient methodological quality is needed to
support a TE as an evidence-based intervention.
Additional information can be contributed to the
findings of this meta-analysis through the inclu-
sion of studies that evaluate both academic and
behavioral outcomes and stratification of results
across design quality.
Building on earlier studies, these re-
searchers provide valuable information, which
represents the historical research in TE litera-
ture. The studies (i.e., Kazdin, 1982; Kazdin &
Bootzin, 1972) were characterized as literature
reviews emphasizing many strengths of TEs
and identifying obstacles; however, the au-
thors provided summary findings and not spe-
cific data from the source articles. The more
recent reviews (Dickerson et al., 2005; Matson
& Boisjoli, 2009) indicated a TE was associated
with an increased effect on social, behavioral,
and academic outcomes; however, neither re-
view reported an ES, confidence intervals (CIs),
or design quality. In addition, Dickerson et al.
(2005) searched only one database, the National
Library of Medicine’s PubMed, and Matson and
Boisjoli (2009) reviewed “representative litera-
ture.” Finally, Maggin et al. (2011) found large
effects of a TE on behavioral outcomes; how-
ever, the researchers contended that a TE could
not be considered an evidence-based interven-
tion because of the quality of the research. Ad-
ditional research is needed so that a TE can
potentially be considered an evidence-based
strategy.
SCED AND QUALITY
SCED has a long history of use in the
applied fields of education and human behav-
ior and is particularly suited to school-based
practices allowing the single subject to serve
as his or her own control (Horner et al., 2005).
As such, SCED is especially relevant to re-
views of school-based use of TEs. In recent
years, experts have developed quality indica-
tors and standards for individual SCED studies
that can be utilized to synthesize the method-
ological rigor of a group of studies (Horner et
al., 2005; Kratochwill et al., 2010). Horner et
al. (2005) identified the criteria necessary to
evaluate the quality of (a) research reporting
(e.g., description of participants and settings,
social validity, research questions) and (b) de-
sign (e.g., dependent variable, independent
variable, baseline, experimental control or in-
ternal validity, external validity). In addition,
Main and Moderator Effects for Token Economies
381
Kratochwill et al. (2010) outlined specific cri-
teria for a study to meet standards or meet
standards with reservations based on (a) the
number of phases per design, (b) the number
of data points per phase, and (c) the percent of
IOA that must be measured. On the basis of
recommendations from experts in the field
(i.e., Maggin, Briesch, & Chafouleas, 2013),
combining conceptually relevant constructs,
such as operational definitions of participants
and settings (see Horner et al., 2005) and
sufficient evidence to support a functional re-
lationship between the dependent and inde-
pendent variable (Kratochwill et al., 2010),
results in a rigorous evaluation of SCED qual-
ity. Thus, design quality can be evaluated
based on these criteria, and outcomes can be
stratified by quality.
EFFECT SIZES
With the emphasis on evidence-based
interventions, the need for quantifying inter-
vention effects achieved through SCED stud-
ies has come to the forefront. In 2007, Parker
and Hagan-Burke found over 40 ESs; how-
ever, the field has not come to consensus for
which is best (Manolov, Solanas, Sierra, &
Evans, 2011). Tau-U (an ES from the fre-
quently used Kendall’s � and Mann-Whitney
U) is summarized by Parker, Vannest, Davis,
and Sauber (2011) as “having statistical power
that is flexible and can calculate trend only,
nonoverlap between phases only, or a combi-
nation of the two” (p. 291). Tau-U is a con-
servative measure that offers important bene-
fits of a “bottom-up” approach (Parker &
Vannest, 2012), designed to explain the im-
pact of changes at the individual phase con-
trast on the overall effect. Benefits of Tau-U’s
nonparametric bottom-up approach include (a)
consistency with visual analysis; (b) applica-
bility to short data series and simple designs;
(c) appropriateness with any design; (d) char-
acterization by strong statistical power (i.e.,
one of the strongest parametric tests; Parker et
al., 2011, p. 288); (e) control in Phase A trend;
and (f) usefulness at three levels—nonaggre-
gated data from a single client, aggregated
data from a complex design, and meta-analy-
ses (Parker et al., 2011). Tau-U allows for the
calculation of CIs and p values. All data are
used, reflecting the interventionist experimen-
tal perspective that each data point reflects
performance.
MODERATORS
Although a TE has been found to be
effective, a comparison has never been made
to determine in which environment, with
which participants, and with which outcome
measures (i.e., academic and/or behavioral) a
TE is most successful. Moderator variables
can account for variations across studies (e.g.,
characteristics of setting, participant, outcome,
or implementation). Identifying the impact of
these variables has the potential to increase the
effectiveness and efficiency of a TE for edu-
cators. Although numerous moderators could
be hypothesized, to avoid Type I error (Fairch-
ild & MacKinnon, 2009) in analyses (i.e.,
finding a moderator effect when none exists)
and minimize the chances of inaccurate re-
sults, we identified five potential moderators
for which we have strong hypotheses: (a)
classroom setting; (b) age of participant; (c)
type of outcome, academic or behavioral; (d)
use of response cost (RC); and (e) use of
verbal cueing. Next, we describe our hypoth-
eses and rationale for those hypotheses.
Some authors contend that a TE is most
effective in settings with small teacher-to-stu-
dent ratios, such as self-contained special ed-
ucation classrooms (Center & Wascom, 1984;
Kazdin & Geesey, 1980), with younger stu-
dents (Filcheck et al., 2004), and with only
behavioral outcomes (Himle, Woods, & Bu-
naciu, 2008; Jones, Weber, & McLaughlin,
2013). These findings have the potential to
limit its use in different types of classroom
settings, with certain students, and to address
specific behaviors. However, we hypothesize
that setting does not change the effectiveness
of a TE, as individual studies seem to indicate
otherwise. A TE has been found to be effec-
tive (a) in multiple instructional settings such
as inclusive, general education, special educa-
tion, and alternative settings (De Martini-
Scully, Bray, & Kehle, 2000; Rhode, Jenson,
School Psychology Review, 2016, Volume 45, No. 4
382
& Reavis, 1993); (b) with differing age groups
such as elementary-age students (Akin-Little
& Little, 2004; Christensen, Young, & March-
ant, 2004; Filcheck et al., 2004), junior high
students (Carlson, Pelham, Milich, & Dixon,
1992, Cavalier et al., 1997; Feindler, Marriott,
& Iwata, 1984; Heaton & Safer, 1982), and
high school students (Schellenberg, Skok, &
McLaughlin, 1991); and (c) to increase aca-
demic (Klimas & McLaughlin, 2007; Salend,
Tintle, & Balber, 1988; Sran & Borrero, 2010)
as well as behavioral outcomes (Center &
Wascom, 1984; De Martini-Scully et al.,
2000). Thus, on the basis of these studies, we
hypothesize that these potential variables do
not moderate the effects of a TE.
Similarly, some authors contend that a
TE can be a very complex intervention, which
leads to lack of adoption in schools (Milten-
berger, 2001; Rosen, Taylor, O’Leary, &
Sanderson, 1990; Skinner, Cashwell, & Bunn,
1996). Adding procedures, such as RC or ver-
bal cueing, increases the complexity of the
intervention. RC is a procedure designed to
decrease behavior by contingently withdraw-
ing a specific amount of reinforcement follow-
ing an inappropriate behavior or response (Ka-
zdin, 1972). The impact of RC on a TE’s
effectiveness is not clear. Some previous re-
search has supported RC procedures in TE
systems (Gresham, 1979; Rapport, Murphy, &
Bailey, 1980; Witt & Elliot, 1982). Other stud-
ies (e.g., Phillips, Phillips, Fixsen, & Wolf,
1971) found RC has harmful side effects, such
as the opportunity for the implementer to over-
penalize and the possibility of decreasing the
incentive of demonstrating the target behavior.
Verbal cueing (i.e., prompting) is another pro-
cedure frequently added that increases the
complexity of a TE. Similarly, some research
findings regarding the effect of adding verbal
cueing to a TE have suggested the procedure
might increase effectiveness (Latham &
Locke, 1991) and others suggested it might
not (Balcazar, Hopkins, & Suarez, 1985;
Kluger & DeNisi, 1996). We hypothesized
that RC and verbal cueing are not necessary
for a TE to be effective. This hypothesis is
founded on some research suggesting that nei-
ther RC nor verbal cueing is necessary in
hopes of reducing complexity of implementa-
tion that might decrease adoption and use.
Although we have strong hypotheses re-
garding the moderating nature of the previ-
ously discussed variables, no meta-analysis of
TEs has reported moderator analysis results.
Maggin et al. (2011) conducted moderator
analyses with participant characteristics and
intervention features. However, they did not
report results because of “the likely presence
of family-wise error in these findings” (p.
547). Thus, we conducted a moderator anal-
ysis for which we had strong a priori hy-
potheses to decrease the risk of Type I error
(McKillup, 2011).
PURPOSE
The current study addresses limitations
to the prior literature and provides additional
information to the field by (a) evaluating re-
search design quality, (b) calculating ESs and
CIs, (c) stratifying results by design quality,
and (d) evaluating moderator analyses of peer-
reviewed literature through 2014, with a focus
on both academic and behavioral outcomes.
Thus, this meta-analysis addressed the follow-
ing research questions:
1. What is the research design quality of
studies of the effectiveness of TEs across
SCED studies?
2. What is the overall effect of TEs in
public school classrooms?
3. Do the effects of TEs differ by design
quality?
4. What are the effects of potential
moderators?
METHOD
We conducted this study in four phases
and organized the methods accordingly. Meth-
ods are adapted from Bowman-Perrott, Burke,
Zhang, and Zaini (2014). Details are included
below for literature review and study selection,
data extraction, ES and CI calculations, stratifi-
cation across methodological quality, and mod-
erator analyses. Procedures for each phase are
described in detail.
Main and Moderator Effects for Token Economies
383
Phase 1: Literature Review and Study
Selection
We applied standard methods identified
by Cooper and Hedges (1994) to search the
EBSCO Research Complete, Education Full
Text, PsycINFO, and Education Resource In-
formation Center (ERIC) electronic databases.
Key words, Boolean strings, and truncated
words used to conduct the search included
“token economy,” “intervention,” “reinforce-
ment,” “contingency management,” “system-
atic positive reinforcement,” “tokens,” “oper-
ant conditioning,” “applied behavior analysis,”
“backup reinforcers,” “behavior therapy,”
“points,” and/or “response cost.” In addition to
the search of databases, we conducted a hand
search for titles related to TEs or secondary
reinforcement by reviewing the tables of con-
tents for the years 1980 to 2014 in 21 journals on
special education, school psychology, and be-
havioral psychology (e.g., Journal of Positive
Behavior Interventions, Behavior Therapy, Be-
havioral Interventions, and The Journal of Spe-
cial Education). We selected the year 1980 as a
delimiter based on a desire to use classroom
settings that were more likely to be inclusive of
students with and without disabilities than class-
rooms before or immediately after the passage of
Public Law 94-142 in 1975. We conducted his-
torical searches with all of the resulting screened
articles, and the process was repeated to include
studies with potential to meet the inclusion cri-
teria. The initial search conducted by the first
author yielded 1,833 results. Article titles and
abstracts were screened based on inclusion cri-
teria (described in the following subsection) re-
sulting in the elimination of 1,436 studies. We
eliminated 278 of the remaining
397
articles
based on information in the article indicating the
study was not conducted in a school setting, was
exclusively a descriptive study, or was without
peer review. Fifty-five articles remained. During
the gathering of articles, reliability checks were
conducted and assessed using simple percent of
agreement (Sum of agreement/Total number of
agreements � disagreements � 100; House,
House, & Campbell, 1981). Initial agreement for
article inclusion was 100%.
Inclusion Criteria
We examined the full text of 55 studies
for potential inclusion in this meta-analysis. A
TE was operationally defined as a program in
which students earned tokens for identified
academic skills (e.g., task engagement, accu-
racy, and completion) or behaviors (e.g., dis-
ruptive, out of seat) and then exchanged the
earned tokens for backup reinforcers (Alberto
& Troutman, 2003; Martin & Pear, 2003). The
first author and a doctoral student indepen-
dently coded the 55 articles for inclusion cri-
teria in separate spreadsheets. We included
studies if they (a) were published between the
years 1980 and 2014, (b) occurred in U.S.
public school classroom settings, (c) included
school-age children (i.e., 3 to 21 years old), (d)
were published in peer-reviewed journals, and
(e) included SCED with published data in
readable graphs. We elected to only include
studies published in peer-reviewed journals as
studies were reviewed and filtered for quality
to maintain standards, scientific merit, and va-
lidity (Voight & Hoogenboom, 2012). We as-
sessed for publication bias statistically (see
Publication Bias subsection).
Exclusion Criteria
We excluded 27 studies: 6 addressed
multicomponent interventions for which the
intervention data could not be disaggregated, 5
did not include a visual graph of data from
which raw data could be digitized, 4 were not
intervention studies, 4 were graduate theses, 3
included participants other than school-age
children, 2 were set in international class-
rooms, 1 was set in a classroom within a
residential treatment center, 1 focused on the
implementation of the TE by paraprofession-
als and did not include intervention data for
the children in the study, and 1 included the
intervention in the baseline data. After exclu-
sion of these studies, the literature search re-
sulted in 28 SCED studies in which a TE was
the intervention in a classroom setting with
school-age children.
Publication Bias
We examined the final set of selected
studies for publication bias (i.e., tendency of
School Psychology Review, 2016, Volume 45, No. 4
384
studies with null effects to not be published;
Rosenthal & DiMatteo, 2001) using the Egg-
er’s test (Egger, Smith, Schneider, & Minder,
1997). The Egger’s test evaluates Y inter-
cept � 0 using linear regression of the effect
against precision. The intercept for the Egg-
er’s test (1.04; 90% CI [– 0.03, 2.11]; p � .52)
indicated that statistically significant publica-
tion bias was not found in this sample. Heter-
ogeneity was measured using H and I2 (Hig-
gins & Thompson, 2002), where H � 1.5
(95% CI [1.2, 1.8]) and I2 � 54.0% (95% CI
[29.5, 70.0]), indicating less than notable het-
erogeneity in the sample with H values
above 1.5 considered not notable (Abramson,
2011).
Coding
We coded for four purposes: (a) inclu-
sion and exclusion in the review, (b) dem-
onstration of a functional relationship
between the independent and dependent
variables according to Horner et al. (2005)
and the WWC standards for SCED (design
quality; Kratochwill et al., 2010; see Table 1
for specific criteria), (c) moderator vari-
ables, and (d) descriptive reporting of pro-
cedural integrity or fidelity. Specific codes
are listed in Tables 1 and 2. Each of the
coded variables provided the basic data for
the reliability analyses.
We assessed reliability of data coding
by IOA checks between the two independent
raters (the first author and a doctoral student).
Coding-sheet training and trial coding were
performed for agreement between the two rat-
ers before reliability was calculated. Within a
discussion format, the two raters identified one
example and one nonexample of each coding
variable. If difficulties in this task arose be-
cause of lack of code clarity, the pair deliber-
ated until clarity was achieved with 100%
agreement. Official coding began when a min-
imum acceptable value of IOA (�80%) was
met (Hartmann, Barrios, & Wood, 2004).
Each rater coded every variable in all articles
independently as is commonly done in com-
puting intercoder reliability in meta-analyses
(Yeaton & Wortman, 1993).
We calculated Cohen’s � reliability
agreement for coding using NCSS (Hintze,
2004) by entering the agreement– disagree-
ment matrix for analysis. Kappa is a conser-
vative measure of reliability and perhaps even
underestimates agreement (Ary & Suen, 1989;
Strijbos, Martens, Prins, & Jochems, 2006).
Kappa was 0.96.
Phase 2: Design Quality Evaluation
We evaluated design quality based on
published guidelines (Horner et al., 2005;
Kratochwill et al., 2010) with a goal of strat-
ifying results across study quality. Two raters
(first and second authors) independently as-
signed ratings of weak, medium, and strong to
each included study. Weak ratings are equiv-
alent to the “does not meet standards” cate-
gory and include fewer than three opportuni-
ties for demonstration of an effect and fewer
than three data points per phase. Medium rat-
ings are equivalent to the WWC “meets stan-
dards with reservations” and include at least
three opportunities for demonstration of an
effect with at least three data points per phase
for reversal or multiple-baseline designs and
four for multielement designs and reporting
of IOA. Strong ratings are equivalent to the
WWC design standards that “meet criteria”
and include at least three opportunities for
demonstration of an effect with five or more
data points per phase and reporting of IOA.
Studies without three demonstrations of
effect do not meet criteria (e.g., ABCD,
ABBCC) as they do not provide three oppor-
tunities for demonstration of an effect. Simul-
taneous, multiple-probe, alternating-treatment
(e.g., ABCBD), and multielement designs
were coded as their underlying design (e.g.,
ABA, ABC). Horner et al. (2005) described
reporting the level of treatment integrity as
“highly desirable” (p. 174), but it was not a
tenet described by Kratochwill et al. (2010) to
meet the standards for design quality for
SCED. Thus, we did not include it in our
ratings of design quality, but we did report the
number of studies that stated the percent of
treatment integrity.
Main and Moderator Effects for Token Economies
385
T
ab
le
1
.
T
au
-U
E
ff
ec
t
S
iz
es
fo
r
E
ac
h
S
tu
d
y
b
y
Q
u
al
it
y
an
d
A
u
th
o
r
D
es
ig
n
Q
ua
li
ty
a
A
ut
ho
rs
an
d
Y
ea
r
O
ut
co
m
e
D
es
ig
n
P
ha
s
e
A
B
C
on
tr
as
ts
,
n
P
ar
ti
ci
pa
nt
s,
n
T
au
-U
95
%
C
I
A
ca
d
or
B
eh
av
T
yp
e
W
ea
k
F
il
ch
ec
k
et
al
.
,
20
04
B
eh
av
D
is
ru
pt
iv
e
A
B
A
C
C
M
1
17
0.
67
�0
.6
0,
1.
00
W
ea
k
H
im
le
et
al
.,
20
08
B
eh
av
D
is
ru
pt
iv
e
M
ul
ti
el
e
m
en
t
N
4
4
0.
65
�0
.3
4,
1.
00
W
ea
k
K
az
di
n
&
G
ee
se
y,
19
80
A
ca
d
T
as
k
en
ga
ge
m
en
t
S
im
ul
ta
ne
ou
s
N
2
2
1.
00
�0
.7
8,
1.
00
W
ea
k
K
az
di
n
&
M
as
ci
te
ll
i,
19
80
A
ca
d
T
as
k
en
ga
ge
m
en
t
S
im
ul
ta
ne
ou
s
N
2
2
0.
99
�0
.7
3,
1.
00
W
ea
k
K
li
m
as
&
M
cL
au
gh
li
n,
20
07
A
ca
d
A
ss
ig
nm
en
t
co
m
pl
et
io
n
A
B
C
M
3
1
1.
0
0
�0
.7
6,
1.
00
W
ea
k
M
il
le
rs
m
it
h
et
al
.,
20
13
A
ca
d
A
ss
ig
nm
en
t
ac
cu
ra
cy
R
ev
er
sa
l
N
2
1
.9
7
�0
.3
6,
1.
00
W
ea
k
R
os
en
be
rg
,
19
86
B
eh
av
D
is
ru
pt
iv
e
A
B
C
N
5
5
0.
99
�0
.6
9,
1.
00
W
ea
k
S
al
en
d
&
A
ll
en
,
19
85
B
eh
av
D
is
ru
pt
iv
e
A
B
C
B
C
N
2
2
1.
00
�0
.6
2,
1.
00
W
ea
k
S
ra
n
&
B
or
re
ro
,
20
10
A
ca
d
T
as
k
ac
cu
ra
cy
A
B
C
D
N
12
4
0.
40
�0
.2
2,
0.
55
W
ea
k
S
te
ve
ns
,
S
id
en
er
,
R
ee
ve
,
&
S
id
en
er
,
20
11
A
ca
d
T
as
k
ac
cu
ra
cy
M
ul
ti
pr
ob
e
de
si
gn
M
4
2
1.
00
�0
.6
0,
1.
00
W
ea
k
S
ul
li
va
n
&
O
’L
ea
ry
,
19
90
B
eh
av
D
is
ru
pt
iv
e
A
B
B
C
C
N
2
1
1.
00
�0
.5
7,
1.
00
M
ed
iu
m
C
ar
ne
tt
et
al
.,
20
14
B
eh
av
D
is
ru
pt
iv
e
A
lt
er
na
ti
ng
-t
re
at
m
en
t
de
si
gn
G
2
1
1.
0
�0
.4
5,
1.
00
M
ed
iu
m
D
e
M
ar
ti
ni
-S
cu
ll
y
et
al
.,
20
00
B
eh
av
D
is
ru
pt
iv
e
M
B
D
N
2
2
0.
95
�0
.6
3,
1.
00
M
ed
iu
m
H
ig
gi
ns
et
al
.,
20
01
B
eh
av
D
is
ru
pt
iv
e
M
B
D
M
3
1
0.
98
�0
.6
3,
1.
00
M
ed
iu
m
Jo
ne
s
et
al
.,
20
13
B
eh
av
D
is
ru
pt
iv
e
A
B
A
B
N
2
2
.7
2
�0
.2
2,
1.
00
M
ed
iu
m
M
cG
oe
y
&
D
uP
au
l,
20
00
B
eh
av
D
is
ru
pt
iv
e
R
ev
er
sa
l
M
4
4
0.
75
�0
.4
2,
1.
00
M
ed
iu
m
S
im
on
,
A
yl
lo
n,
&
M
il
an
,
19
82
B
eh
av
D
is
ru
pt
iv
e
A
B
C
B
N
3
1
0.
78
�0
.3
8,
1.
00
M
ed
iu
m
S
m
it
h
&
F
ow
le
r,
19
84
B
eh
av
D
is
ru
pt
iv
e
M
B
D
N
6
6
0.
92
�0
.6
5,
1.
00
M
ed
iu
m
T
ho
m
ps
on
,
M
cL
au
gh
li
n
,
&
D
er
by
,
20
11
B
eh
av
D
is
ru
pt
iv
e
M
B
D
N
3
1
.9
5
�0
.5
8,
1.
00
M
ed
iu
m
T
ru
ch
li
ck
a,
M
cL
au
gh
li
n,
&
S
w
ai
n,
19
98
A
ca
d
T
as
k
ac
cu
ra
cy
M
B
D
M
3
3
0.
35
�0
.1
2,
0.
58
S
tr
on
g
C
en
te
r
&
W
as
co
m
,
19
84
A
ca
d
T
as
k
ac
cu
ra
cy
R
ev
er
sa
l
N
5
5
0.
71
�0
.4
8,
0.
94
S
tr
on
g
C
on
ye
rs
et
al
.,
20
04
B
eh
av
D
is
ru
pt
iv
e
A
lt
er
na
ti
ng
-t
re
at
m
en
t
de
si
gn
N
2
2
0.
83
�0
.5
2,
1.
00
S
tr
on
g
M
ag
li
o
&
M
cL
au
gh
li
n,
19
81
B
eh
av
D
is
ru
pt
iv
e
R
ev
er
sa
l
M
1
1
1.
00
�0
.7
9,
1.
00
(T
ab
le
1
co
nt
in
ue
s)
School Psychology Review, 2016, Volume 45, No. 4
386
T
ab
le
1
.
C
o
n
ti
n
u
ed
D
es
ig
n
Q
ua
li
ty
a
A
ut
ho
rs
an
d
Y
ea
r
O
ut
co
m
e
D
es
ig
n
P
ha
se
A
B
C
on
tr
as
ts
,
n
P
ar
ti
ci
pa
nt
s,
n
T
au
-U
95
%
C
I
A
ca
d
or
B
eh
av
T
yp
e
S
tr
on
g
M
ot
tr
am
,
B
ra
y,
K
eh
le
,
B
ro
ud
y,
&
Je
ns
on
,
20
02
B
eh
av
D
is
ru
pt
iv
e
M
B
D
M
3
3
0.
99
�0
.8
3,
1.
00
S
tr
on
g
M
us
se
r,
B
ra
y,
K
eh
le
,
&
Je
ns
on
,
20
01
B
eh
av
D
is
ru
pt
iv
e
M
B
D
M
3
3
0.
88
�0
.5
3,
1.
00
S
tr
on
g
R
ei
tm
an
et
al
.,
20
04
B
eh
av
D
is
ru
pt
iv
e
A
lt
er
na
ti
ng
-t
re
at
m
en
t
de
si
gn
N
3
3
0.
69
�0
.4
0,
0.
95
S
tr
on
g
S
al
en
d
&
L
am
b,
19
86
B
eh
av
D
is
ru
pt
iv
e
R
ev
er
sa
l
M
2
9
1.
00
�0
.5
3,
1.
00
S
tr
on
g
S
al
en
d,
T
in
tl
e,
&
B
al
be
r,
19
88
A
ca
d
T
as
k
en
ga
ge
m
en
t
R
ev
er
sa
l
N
2
2
1.
00
�0
.5
3,
1.
00
O
ve
ra
ll
88
90
0.
82
�0
.7
7,
0.
88
N
o
te
.
A
B
co
nt
ra
st
s
w
er
e
us
ed
in
ef
fe
ct
si
ze
ca
lc
ul
at
io
ns
.
T
he
po
ss
ib
le
ra
ng
e
fo
r
C
I
is
0
to
1.
A
�
ba
se
li
ne
;
A
ca
d
�
ac
ad
em
ic
;
B
�
fi
rs
t
in
te
rv
en
ti
on
;
C
�
se
co
nd
in
te
rv
en
ti
on
;
D
�
th
ir
d
in
te
rv
en
ti
on
;
B
eh
av
�
be
ha
vi
or
al
;
C
I
�
co
nfi
de
nc
e
in
te
rv
al
;
G
�
au
th
or
s
in
cl
ud
ed
ge
ne
ra
li
za
ti
on
ph
as
e,
M
�
au
th
or
s
in
cl
ud
ed
m
ai
nt
en
an
ce
ph
as
e;
M
B
D
�
m
ul
ti
pl
e-
ba
se
li
ne
de
si
gn
;
N
�
au
th
or
s
in
cl
ud
ed
ne
it
he
r
ge
ne
ra
li
za
ti
on
ph
as
e
no
r
m
ai
nt
en
an
ce
ph
as
e.
a
S
tr
o
n
g
in
di
ca
te
s
at
le
as
t
th
re
e
de
m
on
st
ra
ti
on
s
of
an
ef
fe
ct
w
it
h
fi
ve
or
m
or
e
da
ta
po
in
ts
pe
r
ph
as
e
an
d
re
po
rt
in
g
of
in
te
ro
bs
er
ve
r
ag
re
em
en
t;
m
ed
iu
m
,
at
le
as
t
th
re
e
de
m
on
st
ra
ti
on
s
of
ef
fe
ct
w
it
h
th
re
e
to
fo
ur
da
ta
po
in
ts
pe
r
ph
as
e
an
d
re
po
rt
in
g
of
in
te
ro
bs
er
ve
r
ag
re
em
en
t;
an
d
w
ea
k,
fe
w
er
th
an
th
re
e
de
m
on
st
ra
ti
on
s
of
an
ef
fe
ct
an
d
th
re
e
to
fo
ur
da
ta
po
in
ts
pe
r
ph
as
e.
Main and Moderator Effects for Token Economies
387
Phase 3: ES, CIs, and Stratification
Calculation
To calculate the Tau-U, we extracted
raw data from the graphs and figures of the
included studies. We used a computer scanner
and software program to view and assign
data values electronically. Digitizing the
data resulted in an exact reconstruction of
the original graphic data providing numeric
raw data to enable proper comparisons
(Glass, 1976). All SCED graphs of included
articles were digitized with GetData Graph
Digitizer (Version 2.21).
ES and Visual Analysis
We calculated Tau-U for each individ-
ual contrast between the baseline (e.g., A1)
and the adjacent intervention contrast (e.g.,
B1) for each unit of analysis (i.e., student,
class, behavior). Tau-U is derived from Ken-
dall’s � and Mann-Whitney U (see Parker et
al., 2011) and is calculated by merging trend
and nonoverlap data. We completed calcula-
tions using the online Tau-U calculator (Van-
nest, Parker, & Gonen, 2011), selecting the
analysis to “control for baseline trend,” for
individual ES (i.e., an ES for each participant,
behavior, or setting). The individual ESs were
entered into the statistical program WinPEPI
for analysis to produce the “combined” omni-
bus ES and 95% CI (Abramson, 2011). The
algorithm for WinPEPI to calculate the overall
ES is the weighted average of all individual
ESs, with weights equaling the inverse of the
variance (i.e., not standard error). We coded
the Tau-U effects as small (0 – 0.65), medium
(0.66 – 0.92), and large (0.93–1.00), which are
equivalent to ranges recommended for non-
overlap of all pairs (Parker et al., 2009) to
compare Tau-U to effects garnered through
visual analysis, our next step.
We visually analyzed data from each
study to determine whether a functional rela-
tionship existed between the independent and
dependent variables based on recommenda-
tions from Kratochwill et al. (2010). Compar-
ing the ES to visual analysis in SCED studies
increases the credibility of the ES (Parker &
Vannest, 2012). This is in part because (a)
visual analysis is the traditionally accepted
approach to SCED analysis (Kratochwill et al.,
2010) and (b) the documentation of concurrent
validity between a visual analysis and an ES
is a component of a bottom-up approach
(Ninci et al., 2015; Parker & Vannest, 2012).
The first and second authors visually analyzed
the data following procedures recommended
by Kratochwill et al. (2010) with additional
Table 2. Summary of Moderators
Study
Characteristic Category
Studies,
n
AB Contrasts,
na
Participants,
n Tau-U SE 95% CI z scoreb p valueb
Setting Special 16 47 47 0.89 0.05 �0.80, 0.98
General 12 41 43 0.86 0.07 �0.74, 0.98 0.04 .70
Age 3–5 years 6 29 22 0.64 0.06 �0.53, 0.75
6–15 years 22 64 68 0.91 0.04 �0.84, 0.98 2.33 .02*
Outcome Academic 8 29 21 0.89 0.08 �0.73, 1.00
Behavioral 20 59 73 0.93 0.07 �0.75, 1.00 0.04 .69
Response
cost Yes 16 47 47 0.84 0.06 �0.75, 0.95
No 12 41 43 0.91 0.05 �0.81, 1.00 0.99 .32
Verbal cue Yes 9 25 20 0.83 0.06 �0.74, 0.92
No 19 63 70 0.88 0.28 �0.33, 1.00 0.20 .84
Note. A � baseline; B � intervention; CI � confidence interval.
aNumber of AB contrasts used in effect size calculation.
bReliable difference z-test scores and corresponding p values are reported for the moderators.
* p
.05.
School Psychology Review, 2016, Volume 45, No. 4
388
guidance from Lane and Gast (2014). First, we
analyzed within-phase data by visually in-
specting the (a) level (i.e., mean, median); (b)
trend (i.e., baseline trend, slope of the best
fitting line); and (c) variability (i.e., bounce) of
the data around the line within each phase.
Second, we analyzed between-phase data by
visually inspecting the (a) immediacy of the
effect of a TE by observing the level change
between the data at the end of a phase (last
three data points) and the beginning of the
next phase (first three data points), (b) fre-
quency of overlap between two phases by
determining the number of data points in one
phase that overlapped with the adjacent phase,
and (c) consistency of the data across similar
phases.
From these analyses, cumulatively, we
conceptualized the effect of a TE derived from
each study as follows: (a) no effect, with no
evidence of a functional relation between the
independent and dependent variables; (b)
weak effect, with some evidence of a func-
tional relation with latency between the intro-
duction of the independent variable, variabil-
ity of data in the baseline and/or intervention
phases, overlap between adjacent phases, and
variability of data in similar phases; (c) me-
dium effect, with mixed evidence of a func-
tional relation with either latency between the
introduction of the independent variable, vari-
ability of data in the baseline and/or interven-
tion phases, overlap between adjacent phases,
or variability of data in similar phases; or (d)
strong effect, with clear evidence of a func-
tional relationship demonstrated by a mean
consistent level in the baseline indicating the
need for intervention, evidence of an immedi-
ate effect between phases with a positive
trend, minimal variability in all phases, no
overlap between phases, and clear consistency
between similar data phases.
We individually completed these analy-
ses, and initial agreement between observers
was 95%. When there was disagreement be-
tween the two, discussion occurred until a
consensus was reached with 100% agreement.
We then compared the magnitude of effect
from visual analysis (i.e., no, weak, medium,
or strong) with the magnitude of effect from
Tau-U (i.e., small, medium, or large).
Phase 4: Moderator Analyses
We coded moderator data for setting
(special education, general education), age of
participant (preschool children ages 3 to 5
years and school-age children ages 6 to 15
years), outcome (academic, such as task accu-
racy or task engagement, or behavioral, such
as out of seat or disruptive), and two proce-
dural differences (i.e., RC and verbal cueing).
Following the procedures of Bowman-Perrott
et al. (2014), we analyzed moderator effects
by dichotomously coding the moderator vari-
ables within the studies and examining statis-
tically significant differences between the ES
(Tau-U) of studies within each category.
We calculated a reliable difference (i.e.,
difference that cannot be accounted for solely
by chance) for each moderator pair to deter-
mine if the differences were statistically
significant using the following formula:
(L1 – L2)/�[(SETau1
2) � (SETau2
2)], where
L1 is the first level of the moderator (e.g.,
academic outcome) and L2 is the second level
of the moderator (e.g., behavioral outcome).
Specifically, we compared effects for a TE in
general and special education settings, for
children ages 3 to 5 years (preschool age)
and 6 to 15 years (school age), for academic
and behavioral outcomes, for use or nonuse of
RC, and for use or nonuse of verbal cueing.
Reliable-difference z-test scores and p values
are reported in the Results section.
RESULTS
We conducted this study in four phases
and organized the results accordingly. Results
are included below for the literature review and
study selection, design quality, effect size and
stratification, and the moderator analysis.
Phase 1: Literature Review and Study
Selection
Twenty-eight studies met the inclusion
criteria and included 90 students and 88
opportunities for demonstrations of effect
Main and Moderator Effects for Token Economies
389
(A1B1). Seventy-nine percent of the studies
were implemented prior to 2005 when the
quality indicators were published, and of those
studies, 50% were published between 1980
and 1989. Forty-three percent of the studies
were implemented in general education class-
rooms, and 57% were implemented in special
education classrooms. Seventy-nine percent of
the studies included children ages 6 to 15
years, and 21% included children ranging
from 3 to 5 years old. Seventy-one percent of
the studies used behavioral outcome measures
(e.g., disruptive, talking out, out of seat), and
29% used outcome measures of academic be-
haviors (e.g., task accuracy, completion, en-
gagement). RC was used in 16 studies (57%).
Verbal cueing was used in nine studies (31%).
Six designs (series phase; multiple baseline;
alternating intervention design; reversal de-
sign; simultaneous; multiple probe) were uti-
lized in the studies (see Table 1). Ten studies
(36%) reported using a follow-up or mainte-
nance phase, and one (3%) reported on gener-
alizing a TE to another setting. Seventeen
studies (61%) did not include maintenance to
monitor the use of a TE on the outcome mea-
sure (see Table 1).
Phase 2: Design Quality
Each of the 28 studies was visually an-
alyzed for quality using the rubric of internal
validity described in the Method section.
Eleven studies were rated as weak quality,
nine as medium quality, and eight as strong
quality. Of the 20 weak- and medium-quality
studies, 16 had insufficient demonstrations of
effect, 8 had at least three to four data points
per phase, and 2 did not report IOA. Integrity
data were collected in 10 studies. Integrity
ranged from 30% to 100% (M � 86.14%,
SD � 23.36%).
Phase 3: ES, CIs, and Stratification
Tau-U was calculated for 88 baseline
versus intervention contrasts (A1 versus B1)
controlling for baseline trend for the 28 stud-
ies. The weighted mean Tau-U of the TE
was 0.82 (SE � 0.03; 95% CI [0.77, 0.88])
and ranged from 0.35 to 1.00. Figure 1 illus-
trates the range of ESs and 95% CIs for each
study. Categorically, 19 of the studies had a
Tau-U of 0.80 or above, 7 studies were be-
tween 0.50 and 0.79, and 2 studies fell be-
low 0.50 (i.e., 0.20 and 0.49).
ES Stratified by Design Quality
Stratifying the ESs by quality level pro-
duced the following scores: Eleven studies in
the weak-quality range had a combined Tau-U
of 0.77 (SE � 0.05; 95% CI [0.67, 0.87]). The
nine studies with a medium-quality rating had
a combined Tau-U of 0.84 (SE � 0.04; 95%
CI [0.76, 0.93]). The eight studies categorized
as strong quality had a combined Tau-U
of 0.85 (SE � 0.06; 95% CI [0.74, 0.97]).
Medium- and strong-quality studies produced
a large Tau-U of 0.84 (SE � 0.03; 95% CI
[0.78, 0.91]).
Visual Analyses
We visually analyzed 88 baseline and
intervention phases and contrasts. Results in-
dicated baseline trend in 82% of baseline
phases (n � 72), variability in 73% of baseline
and intervention phases (n � 65), a strong
immediate effect in 75% of phase changes
(n � 66), a weak to medium immediate effect
in 15% of phase changes (n � 13), and no
immediate effect in 10% of phase changes
(n � 9). In addition, 40% (n � 35) included
overlapping data in adjacent phases.
From these visual analyses, we deter-
mined that 58% of phase contrasts (n � 51)
represented strong effects, 13% (n � 11) rep-
resented medium effects, 22% (n � 19) rep-
resented weak effects, and 8% (n � 7) repre-
sented no effect. These results showed a 75%
agreement (n � 66) with Tau-U. In compari-
son to visual analysis, Tau-U slightly over-
rated the effect on 14 occasions, in which the
author team judged the data to represent (a) no
effect compared to small effect for Tau-U on
six occasions, (b) a weak effect compared to a
medium Tau-U on three occasions, and (c) a
medium effect compared to a large Tau-U on
five occasions. In comparison to visual analy-
sis, Tau-U slightly underrated the effect on
five occasions, in which the author team
judged the data to represent (a) a medium
School Psychology Review, 2016, Volume 45, No. 4
390
effect to a small Tau-U on two occasions and
(b) a strong effect to a medium Tau-U on three
occasions.
Phase 4: Moderator Analyses
The results of five potential moderator
analyses are presented as follows. (Also see
Table 2.)
Setting
Twelve studies with 43 participants
and 41 phase contrasts were coded for general
education and 16 studies with 47 participants
and phase contrasts were coded for special
education. Results indicated that studies in
general education settings had a lower ES
(0.86; SE � 0.07; 95% CI [0.74, 0.98]) than
special education settings (ES � 0.89; SE �
0.05; 95% CI [0.80, 0.98]). When the two cate-
gories were compared for parameter estimates,
overlapping CIs indicated there may not be a
statistically significant difference, which was
confirmed by the values from the reliable-differ-
ence formula: z � 0.04, p � .70.
Age
The preschool category of children
ages 3 to 5 years contained six studies, 22
Figure 1. Forest Plot of ESs for 28 Included Studies and Overall ES
Note. ES � effect size; Min CI � minimum confidence interval; Max CI � maximum confidence interval.
Main and Moderator Effects for Token Economies
391
participants, and 29 phase contrasts. The
school-age category for ages 6 to 15 years
contained 22 studies, 68 participants, and 64
phase contrasts. Results indicated that studies
for ages 3 to 5 years had a lower ES (0.64;
SE � 0.06; 95% CI [0.53, 0.75]) than for
ages 6 to 15 years (ES � 0.91; SE � 0.04;
95% CI [0.84, 0.98]). When the two categories
were compared for parameter estimates, non-
overlapping CIs were observed indicating that
the moderator might be statistically signifi-
cantly different. The values from the reliable-
difference formula were z � 2.33 and p � .02,
confirming statistically significant results.
Outcome Measures (Academic, Behavioral)
Eight studies with 21 participants and 29
phase contrasts were coded as academic out-
comes (e.g., task accuracy, task engagement;
see Table 1). Twenty studies with 73 partici-
pants and 59 phase contrasts were coded as
behavioral outcomes (e.g., disruptive behav-
ior, noncompliance). Results indicated that
studies that included academic outcomes had a
lower ES (0.89; SE � 0.08; 95% CI
[0.73, 1.00]) than studies that included behav-
ioral outcomes (ES � 0.93; SE � 0.07; 95%
CI [0.75, 1.00]). When the two categories
were compared for parameter estimates, over-
lapping CIs indicated there may not be a sta-
tistically significant difference, which was
confirmed by the values from the reliable-
difference formula: z � 0.04, p � .69.
Response Cost
Sixteen studies that used RC with 47
participants and 47 phase contrasts were ag-
gregated and compared with the 12 studies
with 43 participants and 41 phase contrasts
without RC. Results indicated that studies that
included RC had a lower ES (0.84; SE � 0.06;
95% CI [0.75, 0.95]) than studies that did not
include RC (ES � 0.91; SE � 0.05; 95% CI
[0.81, 1.00]). When the two categories were
compared for parameter estimates, overlap-
ping CIs indicated there may not be a sta-
tistically significant difference, which was
confirmed by the values from the reliable-
difference formula: z � 0.99, p � .32.
Verbal Cueing
Nine studies that included verbal cueing
with 20 participants and 25 phase contrasts
were aggregated and compared with 19 studies
including 70 participants and 63 phase
changes without verbal cueing. Results indi-
cated that studies using verbal cueing pro-
duced a lower ES (0.83; SE � 0.06; 95% CI
[0.74, 0.92]) than studies without verbal cue-
ing (ES � 0.88; SE � 0.28; 95% CI
[0.33, 1.00]). When the two categories were
compared for parameter estimates, overlap-
ping CIs indicated there may not be a statisti-
cally significant difference, which was con-
firmed by the values from the reliable-differ-
ence formula: z � 0.20, p � .84.
DISCUSSION
This meta-analysis synthesized and an-
alyzed the findings from SCED studies to
evaluate the effectiveness of TEs in public
schools from 1980 to 2014. We specifically set
out to (a) evaluate the quality of the design, (b)
calculate ESs and CIs, (c) stratify the results
across quality, and (d) evaluate moderator
variables of peer-reviewed literature, with a
focus on both academic outcomes (e.g., task
accuracy, task engagement) and behavioral
outcomes (e.g., disruptive behavior, noncom-
pliance). Supporting and contributing infor-
mation to previous findings (Dickerson et al.,
2005; Matson & Boisjoli, 2009), these results
suggest that a TE is an effective intervention,
specifically for use in classroom settings.
We first sought to evaluate the quality of
the design in SCED on TEs. We evaluated the
design quality, and contrary to Maggin et al.
(2011), who found 30% of studies with de-
signs of medium to strong quality, we found
64% rated as medium to strong. The remain-
der of studies were rated as low primarily
because of insufficient demonstration of ef-
fects, lack of reporting of sufficient informa-
tion (e.g., IOA), and the number of data points
per phase. We found a large majority of in-
cluded studies reported IOA. However, only a
third of the studies reported treatment fidelity.
This difference can, at least partially, be ex-
plained by the different methodologies uti-
School Psychology Review, 2016, Volume 45, No. 4
392
lized, which resulted in inclusion of unique
studies. Maggin et al. included studies that had
not been through a peer-review process in an
attempt to eliminate publication bias. We
elected to statistically address publication
bias. Thus, Maggin et al. included 10 studies
published prior to 1980, two dissertations, and
one study presented at a conference, and our
review did not include studies published prior
to 1980 and only included studies published
after the peer-review process. Thus, there were
only four overlapping studies between this
meta-analysis and the Maggin et al. meta-
analysis. Therefore, it appears that the quality
of the research is increasing with time and
through the peer-review process.
The second question addressed by this
study was the overall effectiveness of TEs.
The overall Tau-U was large from 88 individ-
ual ESs. Similar to Maggin et al. (2011), re-
sults provide preliminary evidence that a TE is
an effective intervention. However, it is im-
portant to note that we chose to evaluate the
effectiveness of a TE with Tau-U to increase
the trustworthiness of our results as 82% of the
included studies showed a positive baseline
trend, which is a threat to internal validity
addressed by Tau-U. These results suggest
further analytic evidence that a TE is effective
at reducing challenging behaviors. In addition,
results imply that a TE is an intervention that
can be utilized to increase academic readiness
skills in school settings.
The third question addressed by this
meta-analysis was whether effects differed
across design quality. When stratified, the ES
for studies considered medium or large quality
was strong. Given the results, methodological
quality did not appear to explain differences in
the effectiveness of the intervention; rather,
methodological quality seemed to only explain
the extent to which the study could be repli-
cated. From these results, it is apparent that
although some design quality issues are evi-
dent, a majority of the data and designs are
sufficient for a TE to be preliminarily consid-
ered an evidence-based intervention for imple-
mentation in classrooms.
The fourth question addressed was po-
tential moderators. Results supported our a
priori hypotheses that neither setting nor out-
come nor addition of RC or verbal cueing
moderated the effects of a TE. A TE appeared
to be equally as effective in general and spe-
cial education classes, to target academic and
behavioral outcomes, and with and without
RC or verbal cueing. However, it does seem
that a TE is slightly more effective with older
children than with younger children.
The finding that setting did not moderate
the effects of a TE is important. If the TE is an
intervention needed by a student who receives
special education services, it appears that re-
sults provide initial support for use of a TE in
general or special education settings. How-
ever, this finding is not consistent with the
report of DuPaul, Eckert, and McGoey (1997),
who found that interventions had a greater
impact on behavior when they were imple-
mented in special education classrooms as op-
posed to implementation in general education
classrooms. Thus, more research is needed to
determine if a TE can be implemented effec-
tively in both general and special education
classrooms.
Data suggest the only statistically signif-
icant moderator was the age of the participant.
Results indicated a TE was more effective for
children age 6 years and above. Nonoverlap-
ping CIs were observed in this moderator,
indicating the difference was statistically sig-
nificant. A TE was effective with both groups
of children; however, it appears to be most
effective with older children. This finding
might be contributed to the probability of
older students understanding the procedures
and being able to identify items or activities
that would be motivating as backup reinforc-
ers. This finding is consistent with previous
research (Brumfield & Roberts, 1998; Shriver
& Allen, 1997) and suggests that the age of the
child may be predictive of compliance with
this intervention. The implication for this
seems to be that implementers need to spend
additional time on training, modeling, and
practice prior to use with younger children to
increase the likelihood of the child under-
standing the concept of backup reinforcers.
Furthermore, the finding that outcomes
did not moderate the effect of a TE is mean-
Main and Moderator Effects for Token Economies
393
ingful. The equality of a TE for both academic
(e.g., task accuracy, task engagement) and be-
havioral outcomes implies a single interven-
tion may be incorporated for dual targets,
which is likely to simplify the procedures for
increased efficiency. Thus, practitioners can
use TEs to target various outcomes.
Past studies have evaluated the effects
across procedural differences, and our findings
contribute to mixed results in this area. Spe-
cifically, our findings suggest that a TE is
effective with or without the use of RC. This
finding differs from those who have found RC
increases effectiveness (Center & Wascom,
1984; De Martini-Scully et al., 2000). Further-
more, while our findings indicate a TE with
and without verbal cues is equally effective,
Kluger and DeNisi (1996) found that to min-
imize the negative effects of the use of verbal
cues (e.g., decreases internal motivation,
draws attention to negative), cueing should
only be used during goal setting. In addition,
the researchers found if more verbal cueing
was needed, there was a lack of understanding
of or confusion about the expected behaviors.
These findings may be affected by the strength
of the token for expected behaviors and the
exchange for the secondary reinforcer. As
such, practitioners are advised to consider the
importance of the secondary reinforcer to the
student and change the secondary reinforcer
when it is no longer motivating. Further study
is needed to clarify these findings; however,
we are encouraged that practitioners might be
able to simplify a TE by eliminating RC and
verbal cueing.
Several findings of the visual analysis
are noteworthy and slightly temper the confi-
dence with which we can say definitively that
the studies strongly support TEs. However,
Tau-U addresses some of the concerns. First, a
majority of the studies had a positive baseline
trend. This finding could be problematic be-
cause of potential issues with internal validity.
However, while this trend influenced out-
comes of visual analyses, it did not influence
Tau-U, as we controlled for baseline trend
within the analyses. Second, 67% of the
phases had variable data, and 40% of adjacent
phases had overlapping data points. These
findings indicated fluctuation in participant per-
formance and affected our decisions regarding
the functional relationship between a TE and
the outcomes. However, 78.70% of calculated
Tau-U and decisions made through visual
analysis were in agreement, providing stron-
ger support for our findings.
Although results should be interpreted
with caution because of low design quality,
the current meta-analysis extends the empiri-
cal literature suggesting the potential for a TE
to be effective in classroom settings. Findings
indicate medium to large effects for a TE
overall when used to achieve academic and
behavioral outcomes and when used with or
without RC or verbal cues. Aggregated results
across design quality reveal preliminary sup-
port for use of TEs in special and general
education classrooms. Professionals serving
children in classroom settings may be encour-
aged by the finding that a TE does not have to
include additional components (e.g., verbal
cueing and RC), thus minimizing the com-
plexity of the intervention.
Limitations
Limitations of the current meta-analysis
should be mentioned. First, studies were only
included that were published in peer-reviewed
journals as we were interested in the design
quality of published studies. We conducted the
Egger’s test to evaluate the sample for publi-
cation bias; however, caution should be given
to interpretations because of the small number
of included studies. Second, although Wolery
(2013) and others suggested treatment integ-
rity be included as a component of design
quality, we elected not to include it in our
measure to align coding with the current stan-
dards for demonstrating a functional relation-
ship between the dependent and independent
variables (i.e., Horner et al., 2005; Kratochwill
et al., 2010). Third, as with any nonparametric
measure of effect, Tau-U has some limitations
including ceiling effects, as demonstrated here
with 9 studies hitting the upper ceiling and 23
studies with CIs that hit the upper ceiling.
Using nonparametric effects, in this study,
outweighed the limitations. Fourth, age group-
School Psychology Review, 2016, Volume 45, No. 4
394
ings may potentially have limited this study.
Ages 6 to 15 years is a broad range, and results
may vary within the category. Finally, al-
though design quality was rated as medium to
strong for a majority of our studies, results
should be interpreted in light of the design
quality weaknesses of the remaining studies.
Implications for Practice and Future
Research
Educators and professionals who work
with children in schools struggle for econom-
ical (use of time, training, money, personnel,
expertise) interventions. Within school-wide
behavior programs, students frequently re-
ceive tokens, often in the form of “bucks” that
can be traded in at a school-level store. In
addition, a similar system could be put in
place in a classroom setting. A TE can be built
within an individual student behavior contract
(see Soares, Cegelka, & Payne, 2016). Stu-
dents can earn tokens and work toward having
a sufficient number of tokens to purchase a
desired item. A TE is effective with and with-
out RC. Therefore, the procedure can be 100%
positive with no removal of reinforcers.
More and varied research on the effects
of a TE is needed. Future studies should cal-
culate more than one ES (Kratochwill et al.,
2010) and examine the level of treatment in-
tegrity correlated with intervention effective-
ness. We found that only 10 studies out of 28
reported levels of treatment integrity, and
Maggin et al. (2011) found that 2 of 24 studies
reported treatment fidelity. When authors do
not report treatment integrity, implementation
is assumed. Low levels of treatment integrity
in schools are concerning. However, reports of
low treatment fidelity in educational settings
are frequent (Becker & Domitrovich, 2011;
Riley-Tillman & Eckert, 2001) and potentially
decrease the likelihood of effectiveness. Our
finding suggests that a TE is effective and
based on implementation of the intervention as
designed. Researchers are only beginning to
evaluate the level of integrity that is needed to
achieve effects with evidence-based interven-
tions in educational settings outside of the
tight controls of research (see Owens et al.,
2014). Thus, this is an area of caution and one
in which much research is needed. In addition,
research is needed to identify the levels of
professional development and ongoing coach-
ing required to maintain reliable implementa-
tion and generalization across behaviors and
settings. Furthermore, researchers need to re-
fer to the current quality standards when de-
signing and implementing SCED.
REFERENCES
References marked with an asterisk indicate
studies included in the meta-analysis.
Abramson, J. H. (2011). WINPEPI updated: Computer
programs for epidemiologists, and their teaching
potential. Epidemiological Perspective & Innova-
tions, 8, 1–9. doi:10.1186/1742-5573-8-1. Retrieved
from http://archive.biomedcentral.com/1742-5573/
content/8/1/1
Akin-Little, K. A., & Little, S. G. (2004). Re-examining
the over justification effect: A case study. Journal of
Behavioral Education, 13, 179 –192. doi:10.1023/
B:JOBE.0000037628.81867.69
Alberto, A. A., & Troutman, A. C. (2003). Applied be-
havior analysis for teacher (6th ed.). Upper Saddle
River, NJ: Merrill Prentice Hall.
Ary, D., & Suen, H. K. (1989). Analyzing qualitative
behavioral observational data. Mahwah, NJ: Law-
rence Erlbaum Associates.
Balcazar, F., Hopkins, B. L., & Suarez, Y. (1985). A
critical, objective review of performance feedback.
Journal of Organizational Behavior Management,
7(3– 4), 65– 89. doi:10.1300/J075v07n03_05
Becker, K. D., & Domitrovich, C. E. (2011). The concep-
tualization, integration, and support of evidence-based
interventions in the schools. School Psychology Re-
view, 40(4), 582–589. Retrieved from http://eric.ed.
gov/?id�EJ962068
Bippes, R., McLaughlin, T. F., & Williams, R. L. (1986).
A classroom token system in a detention center: Ef-
fects for academic and social behavior. Techniques: A
Journal for Remedial Education and Counseling, 2,
126 –132. Retrieved from http://psycnet.apa.org/
psycinfo/1987-14003-001
Bowman-Perrott, L., Burke, M., Zhang, N., & Zaini, S.
(2014). Direct and collateral benefits of peer tutoring
on social and behavioral outcomes: A meta-analysis of
single-case studies. School Psychology Review, 43,
260 –285. Retrieved from http://bmo.sagepub.com/
content/early/2014/09/25/0145445514551383.abstract
Brumfield, B. D., & Roberts, M. W. (1998). A comparison of
two measurements of child compliance with normal pre-
school children. Journal of Clinical Child Psychology,
27(1), 109 –116. doi:10.1207/s15374424jccp2701_12
Busk, P. L., & Serlin, R. C. (1992). Meta-analysis for
single-case research. In T. R. Kratochwill & J. R.
Levin (Eds.), Single-case research design and analy-
sis: New directions for psychology and education (pp.
187–212). Hillsdale, NJ: Erlbaum.
Carlson, C. L., Pelham, W. E., Milich, R., & Dixon, J.
(1992). Single and combined effects of methylpheni-
date and behavior therapy on the classroom perfor-
Main and Moderator Effects for Token Economies
395
mance of children with attention-deficit hyperactivity
disorder. Journal of Abnormal Child Psychology, 20,
213–232. doi:10.1007/BF00916549
*Carnett, A., Raulston, T., Lang, R., Tostanoski, A., Lee,
A., Sigafoos, J., & Machalicek, W. (2014). Effects of a
perseverative interest-based token economy on chal-
lenging and on-task behavior in a child with autism.
Journal of Behavioral Education, 23(3), 368 –377. doi:
10.1007/s10864-014-9195-7
Cavalier, A. R., Ferretti, R. P., & Hodges, A. E. (1997).
Self-management within a classroom token economy
for students with learning disabilities. Research in De-
velopmental Disabilities, 18, 167–178. doi:10.1016/
S0891-4222(96)00045-5
*Center, D. B., & Wascom, A. (1984). Transfer of
reinforcers: A procedure for enhancing response
cost. Educational and Psychological Research, 4,
19 –27. Retrieved from http://davidcenter.com/
documents/Publications/39
Christensen, L., Young, K. R., & Marchant, M. (2004).
The effects of a peer-mediated positive behavior sup-
port program on socially appropriate classroom behav-
ior. Education & Treatment of Children, 27, 199 –234.
Retrieved from http://www.jstor.org/stable/42900544
*Conyers, C., Miltenberger, R. G., Gubin, A., Barenz,
R., Jurgens, M., Sailer, A., . . . Kopp, B. (2004). A
comparison of response cost and differential rein-
forcement of other behavior to reduce disruptive
behavior in a preschool classroom. Journal of Ap-
plied Behavior Analysis, 37, 411– 415. doi:10.1901/
jaba.2004.37-411
Cooper, H. M., & Hedges, L. V. (Eds.). (1994). The
handbook of research synthesis. New York, NY: Rus-
sell Sage Foundation.
*De Martini-Scully, D., Bray, M. A., & Kehle, T. J.
(2000). A packaged intervention to reduce disruptive
behaviors in general education students. Psychology in
the Schools, 37, 149 –156. doi:10.1002/(SICI)1520-
6807(200003)37:2
149::AID-PITS6�3.0.CO;2-K
Dickerson, F. B., Tenhual, W. N., & Green-Paden, L. D.
(2005). The token economy for schizophrenia: Review
of the literature and recommendations for future re-
search. Schizophrenia Research, 75, 405– 416. doi:
10.1016/j.schres.2004.08.026
DuPaul, G. J., Eckert, T. L., & McGoey, K. E. (1997).
Interventions for students with attention-deficit/hyper-
activity disorder: One size does not fit all. School
Psychology Review, 26, 369 –381. Retrieved from
http://www.nasponline.org/publications/periodicals/
spr/volume-26/volume-26-issue-3/interventions-for-
students-with-attention-deficit/hyperactivity-disorder-
one-size-does-not-fit-all
DuPaul, G. J., & Weyandt, L. L. (2006). School-based
intervention for children with Attention Deficit Hyper-
activity Disorder: Effects on academic, social, and
behavioural functioning. International Journal of Dis-
ability, Development and Education, 53(2), 161–176.
doi:10.1080/10349120600716141
Egger, M., Smith, G. D., Schneider, M., & Minder, C.
(1997). Bias in meta-analysis detected by a simple,
graphical test. BMJ, 315(7109), 629 – 634. doi:
10.1136/bmj.315.7109.629
Fairchild, A. J., & MacKinnon, D. P. (2009). A general
model for testing mediation and moderation effects.
Prevention Science: The Official Journal of the Society
for Prevention Research, 10(2), 87–99. doi:10.1007/
s11121-008-0109-6
Feindler, E. L., Marriott, S. A., & Iwata, M. (1984). Group
anger control training for junior high school delin-
quents. Cognitive Therapy and Research, 8(3), 299 –
311. doi:10.1007/BF01173000
*Filcheck, H. A., McNeil, C. B., Greco, L. A., & Bernard,
R. S. (2004). Using a whole-class token economy and
coaching of teacher skills in a preschool classroom to
manage disruptive behavior. Psychology in the
Schools, 41, 351–361. doi:10.1002/pits.10168
GetData Graph Digitizer (Version 2.21) [Software]. Re-
trieved from http://www.getdata-graph-digitizer.com/
download.php
Glass, G. V. (1976). Primary, secondary, and meta-anal-
ysis of research. Educational Researcher, 5, 3– 8. doi:
10.3102/0013189×005010003
Gresham, F. M. (1979). Comparison of response cost
and timeout in a special education setting. Journal of
Special Education, 13, 199 –208. doi:10.1177/
002246697901300211
Hartmann, D. P., Barrios, B. A., & Wood, D. D. (2004).
Principles of behavioral observation. In S. N. Haynes
& E. M. Heiby (Eds.), Comprehensive handbook of
psychological assessment: Behavioral assessment
(Vol. 3, pp. 108 –127). Hoboken, NJ: John Wiley and
Sons.
Heaton, R. C., & Safer, D. J. (1982). Secondary school
outcome following a junior high school behavioral
program. Behavior Therapy, 13, 226 –231. doi:
10.1016/S0005-7894(82)80066-X
Higgins, J. P., & Thompson, S. G. (2002). Quantifying
heterogeneity in a meta-analysis. Statistics in Medi-
cine, 21(11), 1539 –1558. doi:10.1002/sim.1186
*Higgins, J. W., Williams, R. L., & McLaughlin, T. F.
(2001). The effects of a token economy employing
instructional consequences for a third-grade student
with learning disabilities: A data-based case study.
Education & Treatment of Children, 24, 99 –106. Re-
trieved from http://www.jstor.org/stable/42899646
*Himle, M. B., Woods, D. W., & Bunaciu, L. (2008).
Evaluating the role of contingency in differentially
reinforced tic suppression. Journal of Applied Behav-
ior Analysis, 41, 285–289. doi:10.1901/jaba.2008.41-
285
Hintze, J. (2004). NCSS 2004 [Computer software]. Re-
trieved from http://www.ncss.com
Hopko, D. R., Lejuez, C. W., Lepage, J. P., Hopko, S. D., &
McNeil, D. W. (2003). A brief behavioral activation
treatment for depression: A randomized pilot trial within
an inpatient psychiatric hospital. Behavior Modification,
27(4), 458 – 469. doi:10.1177/0145445503255489
Horner, R. H., Carr, E. G., Halle, J., McGee, J., Odom, S.,
& Wolery, M. (2005). The use of single-subject re-
search to identify evidence-based practice in special
education. Exceptional Children, 71, 165–179. doi:
10.1177/001440290507100203
House, A. E., House, B. J., & Campbell, M. B. (1981).
Measures of interobserver agreement: Calculation for-
mulas and distribution effects. Journal of Behavioral
Assessment, 3, 37–57. doi:10.1007/BF01321350
Individuals with Disabilities Education Improvement Act,
H.R. 1350, 108th Cong. (2004).
*Jones, M. N., Weber, K. P., & McLaughlin, T. F.
(2013). No teacher left behind: Educating students
with ASD and ADHD in the inclusion classroom.
School Psychology Review, 2016, Volume 45, No. 4
396
The Journal of Special Education Apprenticeship,
2(2), 1–22. Retrieved from http://josea.info/
archives/vol2no2/vol2no2-5-FT
Kazdin, A. E. (1971). The effect of response cost in sup-
pressing behavior in a pre-psychotic retardate. Journal of
Behavior Therapy and Experimental Psychiatry, 2, 137–
140. doi:10.1016/0005-7916(71)90029-2
Kazdin, A. E. (1972). Response cost: The removal of
conditioned reinforcers for therapeutic change. Be-
havior Therapy, 3, 533–546. doi:10.1016/S0005-
7894(72)80001-7
Kazdin, A. E. (1982). Single-case research designs: Meth-
ods for clinical and applied settings. New York, NY:
Oxford University Press.
Kazdin, A. E., & Bootzin, R. R. (1972). The token econ-
omy: An evaluative review. Journal of Applied Behav-
ior Analysis, 5, 343–372. doi:10.1901/jaba.1972.5-343
*Kazdin, A. E., & Geesey, S. (1980). Enhancing class-
room attentiveness by preselection of back rein forcers
in a token economy. Behavior Modification, 4, 98 –
114. doi:10.1177/014544558041006
*Kazdin, A. E., & Mascitelli, S. (1980). The opportunity
to earn oneself off a token system as a reinforcer for
attentive behavior. Behavior Therapy, 11, 68 –78. doi:
10.1016/s0005-7894(80)80037-2
*Klimas, A., & McLaughlin, T. F. (2007). The effects of
a token economy system to improve social and aca-
demic behavior with a rural primary aged child with
disabilities. International Journal of Special Educa-
tion, 22, 72–77. Retrieved from http://eric.ed.gov/
?id�EJ814513
Kluger, A. N., & DeNisi, A. (1996). The effects of feed-
back interventions on performance: A historical re-
view, a meta-analysis, and a preliminary feedback
intervention theory. Psychological Bulletin, 119, 254 –
284. doi:10.1037/0033-2909.119.2.254
Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin,
J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R.
(2010). Single-case designs technical documentation.
Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_
scd
Lane, J. D., & Gast, D. L. (2014). Visual analysis in single-
case experimental design studies: Brief review and guide-
lines. Neuropsychological Rehabilitation, 24(3– 4), 445–
463. doi:10.1080/09602011.2013.815636
Latham, G. P., & Locke, E. A. (1991). Self-regulation
through goal setting. Organizational Behavior and Hu-
man Decision Processes, 50, 212–247. doi:10.1016/
0749-5978(91)90021-K
Maggin, D. M., Briesch, A. M., & Chafouleas, S. M. (2013).
An application of the What Works Clearinghouse stan-
dards for evaluating single-subject research: Self-man-
agement interventions. Remedial and Special Education,
34, 44 –58. doi:10.1177/0741932511435176
Maggin, D. M., & Chafouleas, S. M. (2010). PASS-RQ:
Protocol for assessing single-subject research quality.
Unpublished research instrument.
Maggin, D. M., Chafouleas, S. M., Goddard, K. M., &
Johnson, A. H. (2011). A systematic evaluation of token
economies as a classroom management tool for students
with challenging behavior. Journal of School Psychology,
49, 529 –554. doi:10.1016/j.jsp.2011.05.001
*Maglio, C., & McLaughlin, T. F. (1981). Effects of a
token reinforcement system and teacher attention in
reducing inappropriate verbalization with a junior high
school student. Corrective and Social Psychiatry, 27,
140 –145. Retrieved from http://psycnet.apa.org/
psycinfo/1982-24369-001
Manolov, R., Solanas, A., Sierra, V., & Evans, J. J.
(2011). Choosing among techniques for quantifying
single-case intervention effectiveness. Behavior Ther-
apy, 42(3), 533–545.
Martin, G., & Pear, J. (2003). Behavior modification:
What it is and how to do it? (7th ed.). Upper Saddle
River, NJ: Simon & Schuster.
Matson, J. L., & Boisjoli, J. A. (2009). The token econ-
omy for children with intellectual disability and/or
autism: A review. Research on Developmental Dis-
abilities, 30, 240 –248. doi:10.1016/j.ridd.2008.04.001
*McGoey, K. E., & DuPaul, G. J. (2000). Token rein-
forcement and response cost procedures: Reducing
the disruptive behavior of preschool children. School
Psychology Quarterly, 15, 330 –343. doi:10.1037/
h0088790
McKillup, S. (2011). Statistics explained: An introductory
guide for life scientists. Cambridge, UK: Cambridge
University Press.
*Millersmith, T., Weber, K. P., & McLaughlin, T. F.
(2013). The use of token economy and a math manip-
ulative for a child with moderate intellectual disabili-
ties. International Journal of Basics and Applied Sci-
ences, 1(3), 634 – 640. Retrieved from http://www.
insikapub.com/
Miltenberger, R. G. (2001). Behavior modification: Prin-
ciples and procedures (2nd ed.). Pacific Grove, CA:
Brooks/Cole.
*Mottram, L. M., Bray, M. A., Kehle, T. J., Broudy,
M., & Jenson, W. R. (2002). A classroom-based
intervention to reduce disruptive behaviors. Journal
of Applied School Psychology, 19, 65–74. doi:
10.1300/j370v19n01_05
Murray, L., & Sefchik, G. (1992). Regulating behavior
management practices in residential treatment facili-
ties. Children and Youth Services Review, 14(6), 519 –
539. doi:10.1016/0190-7409(92)90004-F
*Musser, E. H., Bray, M. A., Kehle, T. J., & Jenson, W. R.
(2001). Reducing disruptive behaviors in students with
serious emotional disturbance. School Psychology Re-
view, 30, 294 –304. Retrieved from http://www.
nasponline.org/publications/spr/abstract.aspx?ID�1590
Owens, J. S., Lyon, A. R., Brant, N. E., Masia-Warner, C.,
Nadeem, E., Spiel, C., & Wagner, M. (2014). Imple-
mentation science in school mental health: Key con-
structs in a developing research agenda. School Mental
Health, 6(2), 99 –111. doi:10.1007/s12310-013-9115-3
Ninci, J., Neely, L. C., Hong, E. R., Boles, M. B., Gilli-
land, W. D., Ganz, J. B., . . . Vannest, K. J. (2015).
Meta-analysis of interventions to improve functional
living skills for people with autism spectrum disorder.
Review of Journal of Autism and Developmental Dis-
orders. 2, 184 –198. doi:10.1007/s40489-014-0046-1
No Child Left Behind Act of 2001, 20 U.S.C. 70 § 6301
et seq (2001).
Parker, R., & Hagan-Burke, S. (2007). Useful effect size
interpretations for single-case research. Behavior Ther-
apy, 38, 95–105. doi:10.1016/j.beth.2006.05.002
Parker, R. I., & Vannest, K. J. (2012). Bottom up analysis
of single-case research designs. Journal of Behavioral
Education, 17(1), 254 –265. doi:10.1007/s10864-012-
9153-1
Parker, R. I., Vannest, K. J., & Brown, L. (2009). The
“improvement rate difference” for single-case re-
Main and Moderator Effects for Token Economies
397
search. Exceptional Children, 75, 135–150. Retrieved
from http://eric.ed.gov/?id�EJ842529
Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B.
(2011). Combining non-overlap and trend for single-
case research: Tau-U. Behavior Therapy, 42, 284 –299.
doi:10.1016/j.beth.2010.08.006
Phillips, E. L., Phillips, E. A., Fixsen, D. L., & Wolf,
M. M. (1971). Achievement place: Modification of the
behaviors of pre-delinquent boys within a token econ-
omy. Journal of Applied Behavior Analysis, 4, 45–50.
doi:10.1901/jaba.1971.4-45
Rapport, M. D., Murphy, A., & Bailey, J. S. (1980). The
effects of a response cost treatment tactic on hyperac-
tive children. Journal of School Psychology, 18, 98 –
111. doi:10.1016/0022-4405(80)90025-4
*Reitman, D., Murphy, M. A., Hupp, S. D. A., &
O’Callaghan, P. M. (2004). Behavior change and per-
ceptions of change: Evaluating the effectiveness of a
token economy. Child & Family Behavior Therapy,
26(2), 17–36. doi:10.1300/J019v26n02_02
Rhode, G., Jenson, W. R., & Reavis, H. K. (1993). The
tough kid book: Practical classroom management
strategies. Longmont, CO: Sopris West.
Riley-Tillman, T. C., & Eckert, T. L. (2001). Generaliza-
tion programming and school based consultation: An
examination of consultees’ generalization of consulta-
tion-related skills. Journal of Educational and Psycho-
logical Consultation, 12, 217–241. doi:10.1207/
s1532768xjepc1203_03
Rosen, L. A., Taylor, S. A., O’Leary, S. G., & Sanderson,
W. (1990). A survey of classroom management prac-
tices. Journal of School Psychology, 28(3), 257–269.
doi:10.1016/0022-4405(90)90016-Z
*Rosenberg, M. S. (1986). Maximizing the effectiveness
of structured classroom management programs: Imple-
menting rule-review procedures with disruptive and
distractible students. Behavioral Disorders, 11, 239 –
248. Retrieved from http://www.jstor.org/stable/
23882205
Rosenthal, R., & DiMatteo, M. R. (2001). Meta-analysis:
Recent developments in quantitative methods for liter-
ature reviews. Annual Review of Psychology, 52, 59 –
82. doi:10.1146/annurev.psych.52.1.59
*Salend, S. J., & Allen, E. M. (1985). Comparative effects of
externally-managed response cost systems on inappropri-
ate classroom behavior. Journal of School Psychology,
23, 59 – 67. doi:10.1016/0022-4405(85)90035-4
*Salend, S. J., & Lamb, E. A. (1986). Effectiveness of a
group-managed interdependent contingency system.
Learning Disability Quarterly, 9, 268 –273. doi:
10.2307/1510380
*Salend, S. J., Tintle, L., & Balber, H. (1988). Effects of
a student-managed response cost system on the behav-
ior of two mainstreamed students. The Elementary
School Journal, 89, 89 –97. doi:10.1086/461564
Schellenberg, T., Skok, R., & McLaughlin, T. F. (1991).
The effects of contingent free time on homework com-
pletion in English with high school English students.
Child & Family Behavior Therapy, 13(3), 1–12. doi:
10.1300/J019v13n03_01
Scruggs, T. E., & Mastropieri, M. A. (2001). How to
summarize single-participant research: Ideas and ap-
plication. Exceptionality, 9, 227–244. doi:10.1207/
S15327035EX0904_5
Shriver, M. D., & Allen, K. D. (1997). Defining child
noncompliance: An examination of temporal parame-
ters. Journal of Applied Behavior Analysis, 30(1), 173–
176. doi:10.1901/jaba.1997.30-173
*Simon, S. J., Ayllon, T., & Milan, M. A. (1982). Behav-
ioral compensation: Contrast like effects in the class-
room. Behavior Modification, 6, 407– 420. doi:
10.1177/014544558263006
Simonsen, B., Fairbanks, S., Briesch, A., Myers, D., &
Sugai, G. (2008). Evidence-based practices in class-
room management: Considerations for research to
practice. Education and Treatment of Children, 31,
351–380. doi:10.1353/etc.0.0007
Skinner, B. F. (1931). The concept of the reflex in the
description of behavior. Journal of General Psychology,
5, 427– 458. doi:10.1080/00221309.1931.9918416
Skinner, C. H., Cashwell, C. S., & Bunn, M. S. (1996).
Independent and interdependent group contingencies:
Smoothing the rough waters. Special Services in the
Schools, 12, 61–78. doi:10.1300/J008v12n01_04
*Smith, L. K., & Fowler, S. A. (1984). Positive peer
pressure: The effects of peer monitoring on children’s
disruptive behavior. Journal of Applied Behavior Anal-
ysis, 17, 213–227. doi:10.1901/jaba.1984.17-213
Soares, D. A., Cegelka, W. J., & Payne, J. S. (2016). The
token economy playbook: The ultimate guide to pro-
moting superior performance and personal growth.
San Diego, CA: University Readers.
*Sran, S. K., & Borrero, J. C. (2010). Assessing the
value of choice in a token system. Journal of Ap-
plied Behavior Analysis, 43, 553–557. doi:10.1901/
jaba.2010.43-553
*Stevens, C., Sidener, T. M., Reeve, S. A., & Sidener, D. W.
(2011). Effects of behavior-specific and general praise on
acquisition of tacts in children with pervasive develop-
mental disorders. Research in Autism Spectrum Disor-
ders, 5, 666 – 669. doi:10.1016/j.rasd.2010.08.003
Stilitz, I. (2009). A token economy of the early 19th
century. Journal of Applied Behavior Analysis, 42(4),
925–926. doi:10.1901/jaba.2009.42-925
Strijbos, J., Martens, R., Prins, F., & Jochems, W.
(2006). Content analysis: What are they talking
about? Computers & Education, 46, 29 – 48. doi:
10.1016/j.compedu.2005.04.002
*Sullivan, M. A., & O’Leary, S. G. (1990). Maintenance
following reward and cost token programs. Behavior
Therapy, 21, 139 –149. doi:10.1016/s0005-7894(05)
80195-9
*Thompson, M. J., McLaughlin, T. F., & Derby, K. M.
(2011). The use of differential reinforcement to de-
crease the inappropriate verbalizations of a nine-year
old girl with autism. Electronic Journal of Research in
Educational Psychology, 9(1), 183–196. Retrieved
from http://eric.ed.gov/?id�EJ926483
*Truchlicka, M., McLaughlin, T. F., & Swain, J. C.
(1998). Effects of token reinforcement and response
cost on the accuracy of spelling performance with
middle-school special education students with behav-
ior disorders. Behavioral Interventions, 13, 1–10.
doi:10.1002/(SICI)1099-078X(199802)13:1
1::AID-
BIN1�3.0.CO;2-Z
Ulmer, R. A. (1976). On the development of a token
economy mental hospital treatment program. Wash-
ington, DC: Hemisphere.
Witt, J. C., & Elliot, S. N. (1982). The response cost
lottery: A time efficient and effective classroom inter-
vention. Journal of School Psychology, 20(2), 155–
161. doi:10.1016/0022-4405(82)90009-7
School Psychology Review, 2016, Volume 45, No. 4
398
Wolery, M. (2013). A commentary single-case design
technical document of the What Works Clearinghouse.
Remedial and Special Education, 34(1), 39 – 43. doi:
10.1177/0741932512468038
Van den Noortgate, W., & Onghena, P. (2003). Hierar-
chical linear models for the quantitative integration of
effect sizes in single-case research. Behavior Research
Methods, Instruments, & Computers, 35, 1–10. doi:
10.3758/bf03195492
Van den Noortgate, W., & Onghena, P. (2008). A multi-
level meta-analysis of single-subject experimental
design studies. Evidence-Based Communication As-
sessment & Intervention, 2, 142–151. doi:10.1080/
17489530802505362
Vannest, K.J., Parker, R.I., & Gonen, O. (2011). Single
case research: Web based calculators for SCR analysis
(Version 1.0) [Web-based application]. College Sta-
tion, TX: Texas A&M University. Retrieved from
singlecaseresearch.org
Voight, M. L., & Hoogenboom, B. J. (2012). Publishing
your work in a journal: Understanding the peer review
process. International Journal of Sports Physical Ther-
apy, 7(5), 452– 460. Retrieved from http://www.ncbi.
nlm.nih.gov/pmc/articles/PMC3474310/
Yeaton, W. H., & Wortman, P. M. (1993). On the reli-
ability of meta-analytic reviews: The role of intercoder
agreement. Evaluation Review, 17(3), 292–309. doi:
10.1177/0193841X9301700303
Date Received: April 4, 2015
Date Accepted: August 26, 2015
Associate Editor: Lisa Bowman-Perrott
Denise A. Soares, PhD, is the Assistant Department Chair of Teacher Education, an
assistant professor of special education, and Special Education Program Coordinator at
the University of Mississippi. Her research interests include applied and practical expe-
riences in academic and behavior interventions for at-risk students, as well as examining
the efficacy of those interventions in classroom settings where teachers have competing
time demands.
Judith R. Harrison, PhD, is an assistant professor in the Department of Educational
Psychology–Special Education at Rutgers University in New Brunswick, New Jersey. Her
research interests include the effectiveness, acceptability, and feasibility of assessment,
interventions, and other services for youth with emotional and behavioral disorders and
attention deficit hyperactivity disorder.
Kimberly J. Vannest, PhD, is a professor in the Department of Educational Psychology–
Special Education at Texas A&M University. Her research interests are in determining
effective interventions for children and youth with or at risk for emotional and behavioral
disorders, including teacher behaviors and measurement.
Susan S. McClelland, PhD, is an associate professor of educational leadership and Chair
of the Department of Teacher Education at the University of Mississippi. Her research
interests include leadership for students with disabilities, literacy, school and organiza-
tional culture, and issues relating to rural education.
Main and Moderator Effects for Token Economies
399
Copyright of School Psychology Review is the property of National Association of School
Psychologists and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder’s express written permission. However, users may print,
download, or email articles for individual use.
Evaluating the Impact of Token Economy Methods on Student
On-task Behaviour within an Inclusive Canadian Classroom
Robert L. Williamson, Chelsea McFadzen
Simon Fraser University, Canada
Abstract
A token economy is a common classroom positive
behaviour support method whereby ‘tokens’ are
delivered to students contingent on exhibiting specific
behaviours. Students later exchange earned tokens for
items of interest. This project developed a prototype,
iPad-based tool that enabled teachers to deliver and
track tokens virtually. The virtual token economy
system was then compared to implementation using a
typical, physically delivered token economy method.
Both methods were evaluated concerning their impact
with regard to grade four-five student’s on-task
behavior within one inclusive Canadian classroom
using a multielement design. Individual impacts and
group effects were analyzed using an analysis of
variance with planned contrasts as well as visually
utilizing single case methods to assess efficacy
regarding each implementation method. Results
indicated that only one significant difference for one
individual subject was found between baseline (no
token economy) and both token economy systems. No
other significant differences were found between
individual or group on-task behaviours nor between
the baseline, physical and virtual methodologies
overall. Implications regarding evidence that TEs
represent evidence-based practice and suggestions for
future research are discussed.
1. Introduction
A token economy (TE) is a secondary
reinforcement system of positive behaviour support
whereby tokens (i.e., conditioned reinforcers) are
delivered to students for exhibiting specific
behaviours [1, 2, 3, 4, 5]. These tokens represent a
medium of exchange to be used by recipients to
purchase desired goods or privileges from a menu of
items [6, 2]. TE systems have been used in a variety of
settings and over many decades within an academic
environment [1, 2, 7]. Over a decade ago, TEs were
identified by Simonsen and colleagues (2008) as
meeting criteria for evidence-based practice and by the
American Psychological Association’s Task Force
on Promotion and Dissemination of Psychological
Procedures (1993) as a well-established psychological
procedure.
With a long history of use within academic and
other settings, the TE has enjoyed a reputation as an
evidence-based classroom behaviour management
tool and has widely been considered effective in
decreasing non-desired behaviours and increasing pro-
academic behaviours in students [8, 9, 4, 7]. Studies
have also shown that token economy systems have
been used to increase on task behaviours and decrease
non-desired behaviours [10, 11].
Some however, have questioned the assertion that
the TE should be considered an evidence-based
practice. Maggin, Chafouleas, Goddard and Johnson
[12] conducted a systematic evaluation of research
involving TEs as classroom management tools for
students with challenging behaviours and found that
the “…extant research on token economies (did) not
provide sufficient evidence to be deemed best practice
based on the WWC (What Works Clearinghouse)
criteria” [12]. Authors suggested that this finding was
largely due to inadequate research designs in their
uncovered literature. Specifically, the authors cited a
lack of information within studies regarding treatment
fidelity and social validity as among the
methodological problems found in the literature
available at the time of their systematic evaluation
[12]. This finding presents a profound concern, as
without sufficient description of the exact procedures
used within literature that finds positive or negative
results, the aggregate effectiveness of any given
implementation method of a TE cannot be
appropriately assessed.
In a more recent meta-analysis by Soares, Harison,
Vannest and McClelland [7] conducted five years after
Maggin et al., the use of a token economy in a
classroom setting was found to “…suggest that a TE is
an effective intervention, specifically for use in the
classroom setting” [7]. Unlike Maggin et al.’s finding
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 202
0
Copyright © 2020, Infonomics Society 1531
that only 30% of studies were rated as achieving a
medium to strong quality design based on WWC
standards, Soares et al. found that 64% met that same
WWC criteria as either medium or strong and
suggested that the quality of research regarding the
effectiveness of TEs is improving. Still, Soares et al.
commented that “…only a third of the studies reported
treatment fidelity.” [7]. Clearly, differences remain in
evaluating this intervention.
Ivy et al. [2] conducted a systematic literature
review regarding the quality of procedural
descriptions within TE research and found that “…of
the 96 articles reviewed, only 18 (19%) included
procedural descriptions of each component to a degree
sufficient to guide replication” [2]. This finding would
seem to support the previous assertions of Maggin et
al. [12] in noting a lack of adequate implementation
information regarding the specific TE methodologies
applied within published studies. Differing findings
leads to questions as to exactly how teachers are
implementing TE’s evaluated in previous studies and
if any specific attributes of implementation are more
or less impactful upon specific student behaviours.
There are six essential components of a TE system
[13, 2, 14]. These six components include (1) the
target behavior that is the focus of the intervention, (2)
the tokens themselves, which must have been
conditioned to function as reinforcers, (3) the backup
reinforcers that may be purchased with a token, (4) the
method by which tokens are earned, (5) the method by
which tokens may be exchanged for backup
reinforcers, and (6) the cost of the backup reinforcers
[2].
Traditionally, TE’s have been implemented within
classrooms using physical tokens that are delivered to
students. Teachers carry tokens (often fake money,
poker chips or similar type of token) on their person as
they teach. When a student displays the desired
behaviour, the teacher delivers a token to the student.
This physical method requires physical proximity to
the student receiving the token and delivering the
reward in person from teacher to student. This
requirement may interrupt teacher instructional
leadership.
The goal of this research was to examine any
relative impacts concerning student on task behaviour
between three separate conditions. Condition one
consisted of baseline (no TE implementation) while
the remaining two consisted of 1) a prototype virtual
iPad-based TE methodology known as ‘CARS’
(Class-wide Augmented Reward System) and 2) a
traditional, physically implemented TE system.
Within both method (iPad and physical), specific
implementation methods were defined and followed.
Both utilized a variable ratio reward system of token
delivery and the observation of interest was student
on-task behaviour.
2. Method
2.1. Setting
Participants were recruited from one typical
inclusive elementary school classroom located in a
non-urban area within the lower mainland of British
Columbia (BC), Canada. A grade 4-5 combined
inclusive classroom was chosen by the district based
on teacher and school interest in the study. The
classroom was located on the second floor of a single
school building, at the end of a long hallway. Inside
the classroom, tables were generally oriented in the
first two-thirds of room space while the front third
contained a crescent table (for group work) on the left,
a carpet just under a smart board in the center, and a
teacher desk in the front right of the room. The back
of the room contained cabinets over the length of that
wall. The wall adjacent to the back wall and opposite
the door was fully windowed. Supplies were placed on
lower cabinets along the windowed wall and iPads
were stored and charged in the corner on the lower
cabinets between the back and windowed walls. No
TE system had been in use within the selected
classroom prior to initiation of this
study.
2.2. Participants
This mixed grade 4-5 class consisted of 23 total
students within grades four (n=13) and five (n=10).
The class was taught by one Caucasian female, BC
licensed professional teacher with four years total
teaching experience. The classroom employed one full
time, one-on-one education assistant. Two students
were designated by the district as having chronic
health conditions and seven students were designated
as having behavioural difficulties using BC Ministry
of Education disability determination guidelines. All
students gave assent to participate in the study and
parent/guardian permissions per ethics review board
protocols were obtained. The teacher likewise gave
consent to participate in the research.
Three students were selected by the teacher for
individual observation. Bob (pseudonym) was in grade
four. Bob was a Caucasian male and was designated
by school personnel as having a behaviour disability
and a chronic health condition as defined by the BC
Ministry of Education. Hellen (pseudonym) was a
grade five Caucasian female and was designated by
school personnel as having a behaviour disability as
defined by the BC Ministry of Education. Mark
(pseudonym) was a grade four Caucasian male and
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 20
20
Copyright © 2020, Infonomics Society 1532
was also designated by school personnel as having a
behaviour disability as defined by the BC Ministry of
Education. All students were native English speakers
and participated in the typical BC standard curriculum
for 100% of the school day. For each of the three
students of specific interest, the behaviour of concern
was identified by the teacher as time on task and thus
this was the behaviour of observation in the present
study.
2.3. Intervention Agent and Training
The teacher represented the intervention agent for
this work. The role of the teacher as intervention agent
consisted of 1) learning how to implement a TE, 2)
teaching students how to engage in a TE, 3) initiating
both the physical and virtual TE methods on preset
days and phases of data collection and 4) adhering to
the academic schedule during implementation and
providing for reward redemption on days of
implementation. The teacher was trained by the first
and second authors regarding the six principles of a TE
via individual one-on-one training regarding specific
implementation factors in the classroom setting.
Efficacy of the teacher training was assessed by
observing the teacher’s instruction regarding the TE to
her students (as implemented in the classroom) by the
first and second authors. The teacher covered all six
aspects in a functional way with the students during
the introduction of the TE with the students. The
teacher was then observed during trial runs of both the
physical and virtual implementation methods in her
classroom. Implementation fidelity was observed via
an implementation fidelity requirements list (see
appendix A). Trial runs showed that the teacher
understood and implemented both TE methods as
required by the six components of a TE noted and
adhering 100% to the implementation fidelity
checklist as noted independently by the authors. No
additional qualification nor training was deemed
necessary nor provided prior to research data
collection implementation. Follow up training after
initiation of the TE research protocols was likewise
not required as implementation protocols during
implementation phases did not deviate from the
implementation requirements.
2.4. Materials
All students were provided with one 9.5-inch iPad
containing the student version of the CARS app each.
The teacher was provided with a 12.9-inch iPad pro
containing the teacher version of the CARS app.
The CARS system consisted of two interconnected
iPad apps. The teacher ‘signed up’ each student in an
online class portal. Students were then able to securely
log in to their individual student app on their
individual student iPad. The teacher likewise securely
signed in on the teacher app from the teacher iPad. The
student app allowed students to view tokens already
obtained (a bank), prizes available and token price of
each, as well as showed when a token was awarded via
a ‘pop up’, push-type individual text notification
message. The pop-up notification worked similar to all
text messages on the iPad and thus the student app did
not need to be activated in order for the pop-up
notification to appear. The student could also be
working on a different app on their iPad and the pop-
up notification of an awarded token would still appear.
The CARS prototype virtual application was
designed to mitigate possible struggles related to
physically delivering tokens and gathering data by
utilizing a specially designed, prototype iPad software
tool. Specifically, the prototype software tool was
designed to mitigate two difficulties teachers may face
when implementing a token economy: 1) The iPad-
based tool eliminated the need to physically deliver a
token to a student. Instead, the teacher delivered
tokens virtually by tapping on the picture of a student
on the teacher iPad. Alternately, the teacher could
award the whole class tokens via a ‘whole class’
button on the teacher app. It was hypothesized that this
would improve temporal contingency relating
behaviour to receipt of a token, save instructional time
and minimize modest disruption when a teacher using
a traditional physical TE might have been required to
disengage from an instructional activity to deliver a
token physically. Virtual token delivery also became a
private rather than a public event when the student’s
iPad software recorded delivery and delivered the
individual pop-up text message to the student’s iPad
confirming token receipt. 2) The iPad-based tool
automatically recorded token delivery time and
amount as data that was then available to the teacher
on the system’s web-based portal. The iPad also
recorded tokens exchanged by the student, what they
were exchanged for, and when the exchanges occurred
within the same portal. Although outside the focus of
this present study, such data could then be analyzed at
a later time by the teacher to validly adjust token
exchange intervals or reward choices for individual
students at the time and discretion of the teacher.
3. Methodology
This research utilized a multi-element design. A
single case alternating treatment (ABCBC) design was
used to visually investigate the efficacy of two
different versions of the token economy classroom
management strategy upon baseline student on task
behaviours. Baseline data (A) was taken in absence of
any TE system of behaviour support in place. Then the
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1533
token economy method was implemented under two
conditions: B) traditional (physical) token delivery
and C) the prototype iPad-based virtual token delivery.
On task behaviour was defined based on a related
definition from Lee, Sugai and Horner [15] as a
student that exhibits engagement of his/her senses and
focus on the activity of instruction indicated by the
teacher at the momentary time sampled. Student
actions such as pausing, sleeping, prolonged gaze in a
non-relevant direction, engaging or remaining
disengaged from communication depending on the
instructional activity and/or engaging in any non-
relevant activity was an indication that the student was
not reasonably attending to the instructional task.
3.1. Procedures
Data collection and research protocol
implementation was scheduled and took place during
morning academic activities. Each morning, students
first engaged in whole group (class-wide) instruction
led by the teacher. During whole group instruction, the
teacher frequently sat on a stool located on a small
carpet and within easy access to a classroom smart
board. During this time, students were able to choose
to sit in chairs or on a carpet during the whole group
instructional activities. Whole group attention to task
data collection was conducted during whole group
instruction activities and took place at this same pre-
determined and routine timeframe of the classroom
schedule. No individual student data was collected
during the whole group activities.
Following whole group instruction, students
attended recess for approximately 15 minutes. Upon
returning to the classroom, students engaged in
stations-based instruction. The stations were located
within the classroom (and one station sometimes
located just outside the open classroom door at a
hallway table). During stations work, the teacher led
instruction in reading development activities from one
of the stations (typically 3 or 4 total stations in
operation during the stations activities) by sitting
behind the crescent shaped table with her orientation
out toward the class and students from the forward
left-hand corner of the room. It was possible for the
teacher to see all students during stations instruction
(except any student engaged in activities using the
hallway table just outside the classroom door). Other
stations not led by the teacher were structured as
independent learning activities for students at those
stations. The classroom educational assistant generally
monitored student activities at the stations as well as
individual students during station activities in the
classroom. Students rotated as cued by the teacher
from station to station throughout the hour. During the
stations-based instruction, data was collected
regarding the three individual students of interest and
not regarding the whole group. Both hours of
instruction focused on language arts and reading
related activities. This identical schedule of activities
was followed each day that data was collected.
3.2. Data Collection
A momentary time sampling methodology [16]
was implemented by designating multiple 15-minute
periods over the course of each two hours +/- of data
collection per day. During the 15-minute intervals
within the first hour of whole class instruction,
observations concerning the on or off task behaviour
of the entire group of students was obtained using a
timed camera snapshot of the students at the end of
each minute of the 15-minute interval. Two cameras
(for accuracy of angle and inter-rater review purposes)
were placed high up in the front right and left corner
of the rectangular room in a way as to capture the
activities of all students within the room when the
picture was snapped. Pictures were snapped
automatically and without human interaction with the
cameras via a commercially purchased app designed
for that purpose. The snapping of pictures did not
make a sound, nor did it make any visually observable
action so as not to divert any student attention from the
lesson/task being taught. Additionally, an independent
data recorder (one or both authors) observed the group
directly and noted the activities to which the students
were to be engaged during that time. Following the 15-
minute period, pictures were analyzed to count how
many students were focused on the instruction or
engaged in a directed activity for that sample captured
in the photo based on the previously described
definition of on-task behaviour. Data points for group
on-task behaviour were then calculated by dividing the
number of students on task for each sample picture (15
pictures in 15 minutes) by the total number of students
within each picture frame. One data point was then
calculated as percent on task over the entire 15-minute
period by averaging the individual picture data points
taken in the 15-minute period for the group of
students. The unit of analysis was the average on-task
percent over a one 15-minute period.
During physical TE implementation and during
whole group instruction, the teacher utilized variable
ratio (slot machine) reward schedules to deliver tokens
by physically handing ‘toy dollar bills’ as tokens to
students paying attention. A variable ratio schedule of
token delivery is generally accepted as effective
regarding the reinforcement of on task behaviours [17,
18]. At times, the teacher would hand a bill to all
students by walking around the room as students
engaged in a directed activity related to the whole
group instruction. The use of variable ratio reward
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1534
(token) distribution was requested by the teacher in
order that any interruptions to the flow and pace of the
intended instruction would be minimized. Students
would be asked to keep the bills in an envelope until
access to their personal items (such as backpacks or
notebooks) were accessible.
During virtual implementation, the teacher would
award tokens through tapping a picture of an
individual student or tapping the group button on the
teacher iPad during whole group instruction while
maintaining a variable ratio reward schedule. All
students were located within visual proximity to their
individually assigned iPads during virtual
implementation, as the teacher announced the start of
the virtual TE implementation prior to whole group
teaching by asking all students to retrieve their
individually assigned iPads and sign in prior to lesson
initiation.
During the second hour of instruction
(stations/small group and independent activities), each
of three pre-selected students were observed during a
15-minute period using a momentary time sampling
methodology using the same definition of on-task
behaviour. Data was collected by the first and/or
second authors by observing each of the three students
at the end of each minute of each 15-minute period and
noting if the student was on or off task relative to the
educational activity assigned. This was then converted
to a percentage on task by dividing the number of
points of on task observations by the total number of
observations in the period (15) and multiplying by
100. A single percent on task data point was recorded
that represented one student over the entire 15-minute
period of observation per student. The unit of measure
for individual student on-task behaviour was one data
point representing the average on-task percentage of
the student over one 15-minute period.
During stations work, physical implementation of
the TE method was conducted by the teacher through
assigning individualized tasks to students at the station
in which the teacher was leading instruction and then
physically ‘roaming’ the room handing out bills to
those students on task. The teacher also awarded bills
occasionally to the students assigned to her own
station. During virtual implementation, the teacher
remained at the station in which she was directing
instruction and gave tokens electronically to students
on task by visually (and not physically) observing
students in the room. A variable ratio token delivery
schedule was used for both the physical and the iPad-
based methodologies. Additionally, at least one token
was delivered to one student (as a minimum
requirement) over each group and individual 15-
minute observation time period.
3.3. Inter-rater reliability
Inter-rater reliability (IRR) was conducted on 13 of
31 (32.2%) individual data collection sessions (each
containing 15 separate data points to compare) by
collecting data on the three individual students of
interest by both the first and second authors
simultaneously. After independent collection, data
points were compared and percent agreement over
each data point within each 15-minute period for each
individual student (of the three targeted students) and
a percent agreement was calculated. IRR achieved an
average of 89.96% agreement (range: 82.2%-97.7%)
for observations of the three individual students in
total. To conduct IRR on the whole group attention to
task data, the pictures were analyzed independently by
the first and second authors. Group IRR was
conducted on 5 of 14 sets of 15 pictures each or 35.7%
of total observed data points and achieved 93.72%
agreement.
3.4. Implementation Fidelity
Prior to implementation of the physical and virtual
token economy systems, the participating teacher was
trained in how to implement both forms of the TE
systems. Practice with each form (physical and virtual)
was conducted with feedback given to the teacher by
the authors. Following teacher training and practice,
the teacher relayed the method to the students in the
class and was observed as accurate in describing the
process to the students by study authors. The teacher
explained that 1) ‘paying attention’ to lessons and
activities was the desired behaviour. Teacher role
played attention vs. inattention with specific reference
to where one’s eyes were looking vis-à-vis lesson
involvement. Regarding tokens, the teacher explained
that when she noticed students paying attention, she
would award a token (individually) and if she noticed
the group paying attention, she would award all of
them a token. This was exemplified by asking students
to perform an activity as the teacher went to each
student noting if and how the student was paying
attention providing specific feedback to each as she
handed the student a token. The students were
surveyed by the teacher to obtain reasonable ‘prizes’
that students could redeem tokens to obtain. The
students brainstormed prizes and together with the
teacher, listed those that would be available and at
what price. This menu was posted in back of the room
on a corner cabinet door that was used for prize
redemption and within individual CARS student iPad
apps. Tokens were to be redeemed at recess periods,
lunch or after school. The recess period occurred
directly in between the two hours of data
collection/method implementation. Students
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1535
understood that these were the only times in which
prizes could be redeemed.
These initial preparations adhered to the six vital
components of TE methodologies noted by Ivy et al.
[2] in the following ways. 1) Students were trained by
the teacher and under the observation of study authors
as to exactly what behaviour constituted reward of a
token. 2) Students knew the value of each token by
participating in the development of the different
rewards and the costs related to each. According to the
teacher, all students cognitively understood relative
value and participated in choosing items/activities
they valued to be placed on the menu of rewards. 3)
Students understood that both ‘real’ and ‘virtual’
tokens could be combined to purchase items from the
menu. All students were capable of independent
mathematics required to add the physical and virtual
tokens together. 4) Students understood that tokens
were being given only during the two hours of
observation in the mornings in which the data
recorders (first and/or second author) were in the room
observing and taking data. The tokens were not given
on a specific schedule but instead were given
according to a variable ratio method by the teacher as
time and teaching methodology permitted her to note
the attentive behaviour of individual or groups of
students. Tokens were given no less than once during
each 15-minute period to at least one student. 5)
Students understood that recess, lunch and after school
were designated as times that tokens could be
redeemed based on teacher availability. Rewards were
available during at least one of the times each day of
study observation/implementation. 6) The menu of
rewards contained the costs for each. Last, the
implementation fidelity checklist was used to ensure
the teacher adhered to these mandates of
implementation each day of data collection achieving
100% adherence.
3.5. Data analysis
Regarding whole group attention to task, all data
for all students’ percent on task was calculated
between the three conditions. A one-way, independent
samples analysis of variance (ANOVA) was used to
analyze any difference between or within percent time
on-task among phases. Similarly, a one-way,
independent samples ANOVA was conducted for each
of the three students that were the focus of individual
behaviour support to examine any difference between
each student’s on-task performance among the three
conditions. Last, each condition and set of student data
was examined using single case visual analysis
techniques.
4. Results
Data was collected across eight total days between
April 30, 2018 and June 4, 2018. Some differences
exist concerning total number of 15-minute sessions
(data points) between whole group data and each of
the three individual student observations. This is due
to absences from the class for any given student thus
impacting total available time to observe and collect
data for that student. The following results step
through each planned comparison of means and visual
inspection process.
Regarding any differences in whole group student
on-task behaviours between the three conditions (no
intervention, physical method, virtual method), data
included all picture-based analysis of whole group
activities. No significant difference was detected
[F(2,15)=2.211, p< .05] between any of the phases.
Planned contrasts showed that the implementation of
either of the two methods (physical and virtual) did not
significantly differ from baseline (no TE) [t(15)= –
1.413, p<.05 (one tailed)] and that virtual
implementation did not significantly differ from the
physical implementation [t(15)=-1.557, p<.05 (two
tailed)] regarding whole group on-task behaviour.
Visual single case analysis of whole group data
similarly did not reveal notable trends between phases
(see Figure 1).
Figure 1. All Students
Regarding Bob (pseudonym), a one-way analysis
of variance was calculated to test if the mean instances
of on-task observations differed significantly between
any of the three phases at the p<.05 level. Results
indicated that there was a significant and large effect
of the TE (not any specific version) on the target on-
task behaviour of Bob [F(2,20)=4.375, p<.05, ω=.48].
Using a Tukey HSD post hoc analysis, the differences
in means between baseline and the virtual
implementation was significant (p<.05). Further
analysis using planned contrasts revealed that the
significant effect was shown between baseline and the
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
P
e
rc
e
n
t
O
n
-t
a
sk
Trial #
Phase A Phase B Phase C Phase B Phase C
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1536
TE implementation (both physical and virtual
combined) [t(20)=2.954, p<.01 (two tailed) but did not
indicate a significant difference between virtual and
physical implementation methods [t(20)=.029, p<.05].
Visual single case plot analysis confirmed a positive
difference between baseline and TE implementation
phases but did not exhibit trends between the two
implementation phases themselves (see Figure 2).
Helen (pseudonym), using a one-way analysis of
variance to test if the mean instances of on-task
observations differed significantly at the a<.05 level
between phases, resulted in a finding that no
significant differences existed in the means of on-task
data between any phase condition [F(2,21)=1.81,
p=.188]. Within the planned contrast examinations, no
significant affect was shown between baseline and the
TE implementation (both physical and virtual
combined) [t(19)=1.718, p<.05] (one tailed) nor was
any difference between physical and virtual
implementation methods found [t(19)=.102, p<.05]
(two tailed). Note that the examination of contrast
between the means of virtual and physical
implementation phases combined with regard to
baseline indicated a one-tailed significance of p=.051.
While this was not strictly significant statistically, it is
worth noting that this test barely missed the levels
required. Overall single case plot visual analysis did
not indicate any discernable patterns across
implementation phases (see Figure 3). The single case
visual analysis provided further cause to support a
finding of non-significant in regard to the virtual vs
physical implementation planned contrast
examination that was so close to a rounded p=.05
cutoff point.
Mark (see Figure 4) showed an overall decrease in
time on task over baseline achievements. A one-way
analysis of variance was conducted to test if the mean
instances of on-task observations differed
significantly at the p<.05 level. Results indicated that
no significant differences existed in the on-task
instances data between any condition [F(2,19)=1.122,
p<.05]. Further analysis using planned contrasts
indicated that no significant affect was shown between
baseline and the TE implementation (both physical
and virtual combined) [t(19)= -1.415, p<.05] (one
tailed) nor was any difference between physical and
virtual implementation methods found [t(19)=-.545,
p<.05] (two tailed). Note that because Mark’s visual
mean plot data indicated a negative slope, the planned
contrast concerning differences in the combined TE
methods and baseline at the two tailed level were also
not significant at p<.05. Overall visual analysis of
single case data did not indicate any observable trends
in data between phases of TE implementation (see
Figure 4).
4.1. Social Validity
Social validity was obtained via a student and
teacher questionnaire given to the students following
the data collection periods. Student questionnaires
contained three questions. First, students were asked if
they preferred the paper or iPad token delivery system.
Twenty students responded to this question and 16
indicated a preference for the iPad delivery. One
student stated that “…it was cool to see the points pop
up”, while another noted that the iPad was preferred
because “…you don’t have to count the points”. Two
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
P
e
rc
e
n
t
O
n
-t
a
sk
Trial #
Phase A Phase B Phase C Phase B Phase C
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
P
e
rc
e
n
t
O
n
-t
a
sk
Trial #
Phase C Phase CPhase BPhase A Phase B
Figure 4. Mark
Figure 3. Helen
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
P
e
rc
e
n
t
O
n
-t
a
sk
Trial #
Phase A Phase B Phase C Phase B Phase C
Figure 2. Bob
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1537
students preferred the paper token delivery stating that
such system allowed them to “…share it (paper
tokens) with Friends [SIC] if you want to save up for
something like Raptor room.” Two students stated that
they had equal preference for paper or iPad-based
tokens.
Question two asked students to rate, on a scale of
one to ten, how much they focused on obtaining tokens
through being ‘on task’ during periods of using either
of the two systems. Results indicated that 7 of 16
(43.8%) rated their attention to obtaining tokens as a 1
(did not focus on obtaining tokens at all). One student
stated, “After a while, the thought of getting a reward
wore down”. Another three students (18.7%) gave the
score of 3 and three more gave a score of 5. One of the
students that indicated a 3 stated that they only focused
on being on task to obtain tokens about a quarter of the
time “…because your [SIC] so busy working.” Two
students indicated a 10 in response to question 2 and
stated that they focused on receiving tokens “…all the
time.”
Asked in question three, which method (iPad,
Physical, Both, None) they would recommend
teachers use to help students focus on their work, one
indicated paper, five indicated both and ten indicated
iPad. One student that had indicated that they would
recommend both systems to teachers stated that they
did so “…cuz [SIC] then there would be two ways of
getting rich!” One student that indicated they would
recommend the iPad method stated they did so “…
because it (tokens) can’t be stolen.” and “Because its
[SIC] fun”. It should be noted that early in the
implementation, one instance of theft of physical
tokens (bills) occurred (and was rectified by the
teacher). This likely directly related to this student’s
reference to such possible issues on the anonymous
survey.
The teacher participant also provided social
validity feedback data through a separate
questionnaire. Overall, the teacher participant
indicated that the paper methodology was more
effective in helping keep students on task. The teacher
indicated that the paper method provided “…instant
gratification… students knew why they earned the
token… It caused a ripple effect around the student
who earned the token, that others (would) see what
happened and learn that if they did the same thing, they
too could earn a token.” As a corollary, the teacher
stated that “…(using) the iPad system, students did not
see when someone (else) earned a token because it
only showed up on the individual who earned the
(token on their) iPad.” Further, the teacher noted the
iPad app was difficult and time consuming to use.
5. Discussion
It is interesting to note that prior to the results of
the present research being presented to the subject
participants, the teacher indicated an overall
satisfaction with the TE as a classroom management
method. The teacher indicated that she felt the overall
attention to task for students increased during times in
which she implemented the TE methods. The results
seemed to be surprising to the teacher when reveled at
a classroom pizza party following the study.
5.1. Token delivery
A variable ratio schedule of reinforcement (token
delivery) is generally accepted as effective regarding
the tracking and reinforcement of on task behaviours
[17, 18]. The teacher in the present study also
requested this reinforcement schedule so that time to
deliver tokens, both virtual and physical, could occur
when breaks in her teaching flow allowed and so that
instruction would not be interrupted based on a fixed
interval reinforcement methodology. One possible
explanation for the overall ineffectiveness of both
token economy systems in the present study may be
related to the variable ratio schedule of reinforcement.
The reinforcement schedule that resulted from relying
on breaks in lesson flow may have been sub-optimal
for some students.
It is therefore possible that prior to the
implementation of a variable ratio reinforcement
schedule, students may require a more defined
schedule of interval reinforcement prior to the
application of a variable ratio methodology. Future
researchers should consider this possibility as well as
the equally possible reality that such alterations in
delivery schedule may be impractical for a teacher to
administer alone. Further study is required to address
such hypothesis.
Another area of interest was the non-public nature
of token delivery during the iPad based TE phases. It
may be that when students noticed delivery of tokens,
they made an effort to display the desired on-task
behaviour but the behaviour might have dissipated
when students noticed the teacher otherwise engaged.
If this had been the case, we likely would have
expected to see a difference in impact between the
private iPad deliver and the public physical deliver of
tokens. This was not the case in the present study.
Despite the teacher’s best intentions, within the
current study framework, she was unable to attend to
the on-task behaviour of the group 100% of the time
while teaching either group or station-based lessons.
This would seem to indicate that the need to physically
deliver tokens versus being able to do so from a
distance did not impact the teacher’s ability to attend
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1538
to the behaviours for which tokens were to be
delivered. The teacher seemed to confirm this
suspicion by stating in the follow-up questionnaire that
“…allow(ing) the EA (educational assistant) to hand
out the tokens instead of the teacher” for paper
delivery would be helpful and simplifying the finding
of specific students within the app’s interphase would
reduce the difficulty in delivering tokens to individuals
and/or small groups of individuals. These assertions
by the teacher seem to indicate difficulty with being
able to teach while simultaneously attending to the
observation of student on-task behaviours.
As with the previous assertion concerning public
vs private token delivery, physically walking over to
students (physical) vs taping an iPad (iPad) to deliver
tokens did not seem to impact the effectiveness of
either methodology as to impact on student on task
behaviour. We would have expected to see a
difference between the impact between physical and
iPad-based methods if delivery method had been an
important aspect of the method however this was not
observed.
It is likely that the need to simultaneously focus on
the fluid needs of instruction while teaching allows for
limited attention to matters of observation regarding
individual or group behaviours. Indeed, the teacher’s
token delivery occurred during times within lessons
that did not require her direct involvement with a
student. This hypothesis would seem to support recent
research regarding a teacher’s ability to mulit-task. As
cognitive tasks are divided between two or more
pressing needs, the quality and efficiency of results is
generally reduced [19, 20, 21]. Such a finding
regarding teacher abilities to multi-task would seem to
point to one possible reason for the overall failure of
the TE system in the present study.
5.3. Token redemption
Token redemption took place at least one time per
day at one or more pre-determined redemption
periods, however the students were required to ask the
teacher for redemption during the noted times.
Sometimes the teacher was otherwise engaged during
these times, speaking with other faculty members
while children were at play or preparing stations for
when children would return. Occasionally, the teacher
was required to serve as a recess monitor and was
unavailable to deliver tokens during recess. Overall,
this resulted in a less predictable token redemption
time period during both phases of TE implementation.
Students may have been discouraged if they had
intended on receiving a prize at a specific time period
in which the teacher was unable to comply with a
purchase request. While students had been told that not
all the redemption periods would be available due to
the teacher’s multiple commitments, and that one
would be available at minimum per day, the lack of a
solid, repetitive daily redemption schedule may have
negatively impacted the students’ motivation to
remain on task.
Again, future researchers should address this
redemption hypothesis in more detail to examine any
impact a more predictable redemption schedule may
have upon the overall time on task behaviours of
students. Like the delivery hypothesis, researchers
must also seek to understand if a predictable
redemption schedule is reasonable to maintain when
the teacher alone, implements the TE system. It may
be the case that additional help may be required if
predictable delivery of tokens and predictable
redemption periods other than the one time per day in
the present study are to be achieved.
5.4. Analysis of efficacy
Results indicated that the virtual delivery TE
system and the combined data from virtual and
physical methods were significantly effective over
baseline (no TE) for Bob only. No other individual or
whole group analysis showed a significant difference
between base line and the two TE approaches nor
between the two TE approaches themselves. This may
indicate that in spite of statistical indications, the
delivery of tokens to Bob was optimal or effective by
sheer chance alone (within the 5% error range). Also,
Bob’s data included an outlier in data point five (score
of 0). No obvious reason for Bob’s inattention during
that data observation period was noted and thus for
official analysis, the point remained within the data
set. It is important to note, however, that this possible
outlier influenced the magnitude of significant results.
Adding visual assessment of raw data, it seems that at
best, we can describe the results for Bob as
inconclusive.
While the current findings indicated support for the
findings of Maggan et al., and Ivy et al., [12, 2], the
current work would seem to contradict some other
available research regarding the effectiveness of TE
systems within an inclusive classroom setting. Given
the negative results of the present work in relation to
previous studies suggest that clarity in the
implementation of studied TEs is critical to
understanding conclusions drawn from any findings.
In the present work, implementation fidelity was
strictly noted and adhered to a pre-defined set of
standards. Given those standards, results showed the
method as implemented not to be an effective support
regarding on task behaviours within the student
population studied. When one considers the incredible
differences with which the idea of a TE can be
implemented (ie: multiple human intervention agents,
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1539
the behaviour/s of focus, the diversity of students
individual characteristics, the token delivery and
redemption schedules), it is likely not possible to
assert that any ‘generic’ TE method should be the
focus for analysis leading to the categorization of
evidence based practice. Instead, specific versions of
the TE, strictly defined, may be a more proper unit of
analysis.
6. Limitations
This work is limited to that observed within the
contexts of the participants within the location chosen
for the study. Results should not be used to justify
broader meaning outside of this context, as individual
circumstances exist in any defined population and
context of study.
Additionally, time on task represents a difficult
variable of measure. Specifically, data collectors were
required to identify the direction of each subject’s
attention or activity toward a direction, activity or
object that was relevant to the instruction being
provided at that time while simultaneously excluding
indicators of non-attention to task as defined by Lee,
Sugai and Horner [15]. The relevance of the direction
of attention or activity based on instruction can be
somewhat subjective to the person judging the data
point. For example, if a student is looking at his/her
shoes while the teacher is working mathematics on a
white board, the data recorder would likely mark the
data point as ‘not on task’ however if the teacher were
using eyelets of shoes as an example to count pairs of
objects, the same gaze would be recorded as ‘on task’.
IRR was used to indicate the breadth of subjectivity
with reasonable findings however it is important to
acknowledge such as a limitation to the results of the
present work.
7. Acknowledgements
This research was supported by the Social Sciences
and Humanities Research Council of Canada.
8. References
[1] Hackenberg, T.D. (2018). Token reinforcement:
Translational Research and Application. Journal of Applied
Behavior Analysis, 51, 393-435. doi: 10.1002/jaba.439
[2] Ivy, J.W., Meindl, J.N., Overley, E. & Robson, K.M.
(2017). Token Economy: A Systematic Review of
Procedural Descriptions, Behavior Modification, 00(0), 1-
30. doi: 10.1177/0145445517699559
[3] Kazdin, A.E. (1977). The token economy: A review and
evaluation, New York: Plenum Press.
[4] Robacker, C.M., Rivera, C.J. & Warren, S.H. (2016). A
Token Economy Made Easy Through ClassDojo,
Intervention in School and Clinic, 52(1), 39-43. doi:
10.1177/1053451216630279
[5] Simonsen, B., Fairbanks, S., Briesch, A., Myers, D., &
Sugai, G. (2008). Evidenced-based practices in classroom
management: Considerations for research to practice.
Education and Treatment of Children, 31. 351–380.
[6] Carnett, A., Rulston, R., Lang, R., Tostanoski, A., Lee,
A., Sigafoos, J. & Machalicek, W. (2014). Effects of a
Perseverative Interest-Based Token Economy on
Challenging and On-Task Behavior in a Child with Autism,
Journal of Behavior Education, 23, 368-377. doi:
10.1007/s10864-014-9195-7
[7] Soares, D.A., Harrison, J.R., Vannest, K.J. &
McClelland, S.S. (2016). Effect Size for Token Economy
Use in Contemporary Classroom Settings: A Meta-Analysis
of Single-Case Research, School Psychology Review, 45(4),
379-399
[8] Kazdin, A.E. (1982). The token economy: A decade
later., Journal of Applied Behavior Analysis, 15, 431-445.
[9] Matson, J.L. & Boisjoli, J.A. (2009). The token
economy for children with intellectual disability and/or
autism: A review, Research in Developmental Disabilities,
30, 240-248. doi: 10.1016/j.ridd.2008.04.001
[10] Alter, P. (2012). Helping Students with Emotional and
Behavioral Disorders Solve Mathematics Word Problems,
Preventing School Failure, 56(1), 55-64, doi:
10.1080/1045988X.2011.565283
[11] Ogasahara, K., Hirono, M. & Kato, S. (2013). Support
for on-task behavior through a token economy system:
Autistic youth who shows challenging behavior, Jamanese
Journal of Special Education, 51(1), 41-49
[12] Maggin, D. M., Chafouleas, S. M., Goddard, K. M., &
Johnson, A. H. (2011). A systematic evaluation of token
economies as a classroom management tool for students with
challenging behavior. Journal of School Psychology, 49,
529-554.
[13] Hackenberg, T.D. (2009). Token reinforcement: A
review and analysis. Journal of the Experimental Analysis of
Behavior, 91, 257-286. doi: 10.1901/jeab.2009.91-257
[14] Kazdin, A.E., & Bootzin, R.R. (1972). The token
economy: An evaluative review. Journal of Applied
Behavior Analysis, 5, 343-372.
[15] Lee, Y.Y., Sugai, G., & Horner, R.H. (1999). Using an
instructional intervention to reduce problem and off-task
behaviors. Journal of Positive Behavior Interventions, 1(4),
195-204
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1540
[16] Alberto, P. & Troutman, A. (2003). Applied Behavior
Analysis for Teachers: 6th Ed., Merrill Prentice Hall, Upper
Saddle River, New Jersey, Columbus Ohio
[17] Martens, L.K., Lochner, D.G., & Kelley, S.Q. (1992).
The effects of variable-interval reinforcement on academic
engagement: A demonstration of matching theory. Journal
of Applied Behavior Analysis, 25, 143-151
[18] Hulac, D., Benson, N., Nesmith, M.C. & Shervey, S.W.
(2016). Using Variable Interval Reinforcement Schedules to
Support Students in the Classroom: An Introduction With
Illustrative Examples, Journal of Educational Research and
Practice, 6(1), 90-96, doi:10.5590/JERAP.2016.06.1.06
[19] Bowman, L.L., Levine, L.E., Waite, B.M., & Dendron,
L. (2010). Can students really multitask? An experimental
study of instant messaging while reading. Computers &
Education, 54(4), 927-931, doi:
10.1016/j.compedu.2009.09.024
[20] Logie, R.H., Gilbooly, K.J., & Wynn, V. (1994).
Counting on working memory in arithmetic problem
solving. Memory & Cognition, 22(4), 395-410, doi:
10.3758/BF03200866
[21] Wieth, M.B. & Burns, B.D. (2014). Rewarding
Multitasking Negative Effects of an Incentive on Problem
Solving Under Divided Attention, Journal of Problem
Solving, 7, doi: 10.7771/1932-6246.1163
International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020
Copyright © 2020, Infonomics Society 1541
Vol.:(0123456789)
1 3
Journal of Contemporary Psychotherapy (2018) 48:145–154
https://doi.org/10.1007/s10879-017-9376-5
O R I G I N A L PA P E R
Token Economies: Using Basic Experimental Research to Guide
Practical Applications
Jeffrey F. Hine1 · Scott P. Ardoin2 · Nathan A. Call3
Published online: 12 December 2017
© Springer Science+Business Media, LLC, part of Springer Nature 2017
Abstract
This paper highlights the applicability of patterns seen within basic experimental research in relation to contemporary appli-
cation of token economies. Token economies are one of the most widely used interventions to promote behavior change,
and this procedure has evolved to be effective across many settings, behaviors, and individuals. Due to this widespread use,
casual implementation of the token economy might result in inconsistencies in responding and therefore an overall skepti-
cism in the procedure itself. We present multiple barriers that encumber practical application of token economies, including
insufficient conditioning and pairing of tokens, determining quality of backup reinforcers, unforeseen effects of motivating
operations, teaching the token exchange, effects of higher-order reinforcement schedules, ratio strain, and use of response
cost procedures. To assist practitioners in implementing more effective treatments, for each barrier we revisit the often
overlooked basic research involving features of conditioned reinforcement and reinforcement schedules. It is important to
translate the often complex implications of basic research so that practitioners can use this information to improve their own
practice as well as their confidence in disseminating use of this evidence-based treatment. To further guide practitioners
in using this knowledge in everyday settings, we also provide recommendations specific to each barrier as well as relevant
applied research and practical examples.
Keywords Token economy · Conditioned reinforcement · Applied behavior analysis
Introduction
Since first proposed by Ayllon and Azrin (1968) and subse-
quently refined by Kazdin (1977), the use of token econo-
mies has become one of the most venerable and widespread
applied interventions for producing behavior change (Kazdin
1982; Matson and Boisjoli 2009). Given this widespread
use and effectiveness across settings, many practitioners
(i.e., psychologists, teachers, and applied behavior analysts)
may have a general understanding of the procedures behind
establishing a token economy. Although there may be some
differences in the specifics of establishing a token economy
(e.g., Drabman and Tucker 1974; Miltenberger 2008), there
seems to be general consensus that establishing an effec-
tive token economy should at least include: (1) identifying
and operationally defining appropriate target behaviors; (2)
selecting appropriate tokens (e.g., durable, engaging, indi-
vidualized); (3) identifying backup reinforcers (e.g., primary
reinforcers, other conditioned reinforcers); (4) determining
values of tokens and exchange rates for backup reinforc-
ers; (5) determining methods of exchange; (6) determin-
ing how individuals can earn or lose tokens; (7) accurately
monitoring the program’s effects on the target behaviors; and
(8) adjusting the program to meet the long-term goals and
addressing barriers to success. If practitioners implement
these steps in a consistent and systematic manner, positive
behavior changes are likely to occur. In general, practitioners
can use the above framework as a “base” behavior manage-
ment system with which they can enact multiple options.
* Jeffrey F. Hine
jeffrey.hine@vanderbilt.edu
1 Vanderbilt University Medical Center Department
of Pediatrics, Vanderbilt Kennedy Center/Treatment
and Research Institute for Autism Spectrum Disorders
(TRIAD), 1211 21st Ave S, #110, Nashville, TN 37212,
USA
2 University of Georgia Department of Educational
Psychology, Center for Autism and Behavioral Education
Research, Athens, GA, USA
3 Emory University School of Medicine, Marcus Autism
Center, Atlanta, GA, USA
http://crossmark.crossref.org/dialog/?doi=10.1007/s10879-017-9376-5&domain=pdf
146 Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
However, during implementation of a token economy,
practitioners may encounter complex barriers to behavior
change and may struggle to enact solutions targeting those
barriers (Bailey et al. 2011; Kazdin 1982). Revisiting foun-
dational basic research on the underlying mechanisms of
token economies can assist practitioners in overcoming such
difficulties when they are encountered in practice. Wide-
spread failure to implement token economies without an
understanding of these mechanisms likely has an impact on
the progress of the individuals with whom such procedures
are adopted and the reputation of this strategy.
Much basic research exists demonstrating the effects
of the underlying mechanisms of token economies (Foster
et al. 2001; Hackenberg 2009). Ideally, practitioners would
consult this literature when faced with a practical dilemma.
A variable that often interferes with this venture; however,
includes the considerable effort required for practitioners
to assimilate and apply information gained from reading
basic experimental research. Practitioners might view this
research as inapplicable to everyday practice; yet, general
patterns of performance found with animals and humans in
the laboratory consistently emerge in applied research (Mace
and Critchfield 2010). We will highlight the applicability
of these patterns that may assist practitioners in identify-
ing potential barriers to individual success and implement-
ing a more fundamentally sound and thus effective token
economy.
Conditioning Tokens as Effective Reinforcers
A conditioned reinforcer is defined as an initially neutral
event or stimulus acquiring value through its relation to pri-
mary reinforcers and subsequently can serve as an effec-
tive independent reinforcer (Skinner 1974; Williams 1994).
Comprehensive research programs such as Fantino (1977),
Kelleher (1966), and Williams (1994) collectively demon-
strate that several species’ response rates increase if respond-
ing produces conditioned reinforcers. Perhaps the most
widely cited laboratory investigations of the effects of tokens
as conditioned reinforcers are the classic primate studies of
Wolfe (1936) and Cowles (1937). Contingent presentation of
tokens maintained responding across multiple experiments
even when subjects were not allowed to exchange the tokens
until the end of an experimental session. Malagodi (1967a, b,
c) added to this research by demonstrating that rats acquired
new responses through use of token reinforcement alone and
that token-specific response rates were similar to those seen
under primary reinforcement. Given the substantial body
of research demonstrating that findings from the basic ani-
mal research can be generalized to applied use of the same
behavioral mechanisms, it would seem that the principles
that govern the effectiveness of token economies in animal
studies have value when troubleshooting an ineffective token
economy. Thus, when token economies are not as effective
as projected, practitioners need first to investigate a number
of general factors relating to the effectiveness of the token.
Barrier: Insufficient Quality of Backup Reinforcers
One barrier to effective use of a token economy is when
the token has not been established as an effective condi-
tioned reinforcer. This problem may become evident when
the individual ceases to readily exchange tokens for previ-
ously accessed backup reinforcers. If this occurs, decreased
responding will likely ensue and the individual may discard
tokens instead of exchanging them. An initial issue that
practitioners need to investigate is the quality of the backup
stimuli.
Basic Experimental Research
Some manipulable dimensions of reinforcement found to
increase the likelihood of responding within token econo-
mies include reinforcer rate, magnitude, and quality of rein-
forcement (Mace and Roberts 1993). Quality of reinforce-
ment is often described as involving reinforcer potency or
efficacy, and can be quantified in terms of an individual’s
preferences. One way to measure preference is to consider
stimuli that are reliably selected as highly preferred in stimu-
lus preference assessments (Neef et al. 1994). An individu-
al’s preference for the to-be-paired stimuli will undoubtedly
influence a token’s effectiveness as a conditioned reinforcer.
In the classic Wolfe (1936) studies, researchers demonstrated
primates’ proclivity to select tokens that had been paired
with food (as opposed to nothing) and tokens that had been
paired with two pieces of food (rather than one). Additional
animal studies demonstrate that with all other dimensions of
reinforcement held constant (e.g., amount and rate) subjects
will bias responding toward reinforcers of higher quality.
Thus, if manipulating the quality of primary reinforcement
is an effective method of biasing responding, doing so will
likely impact the reinforcing effectiveness of tokens paired
with the primary (or backup) reinforcers.
In Applied Settings
Tokens will not become effective reinforcers if they are
paired with stimuli that are not of sufficient quality or that
have not been established as effective reinforcers themselves.
A large body of applied literature has identified numer-
ous strategies for selecting stimuli most likely to function
as effective backup reinforcers including single stimulus,
free operant, paired stimulus, and multiple stimulus with-
out replacement formats (DeLeon and Iwata 1996; Roane
et al. 1998). Given that preferences may fluctuate over
147Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
time, practitioners might consider having an assortment of
backup stimuli and institute periodic assessment of prefer-
ences. Systematically rotating preferred items can main-
tain the reinforcing properties of the backup stimuli. For
example, DeLeon et al. (2000) demonstrated that providing
access to only a single set of toys limited the effectiveness
of their intervention due to satiation effects. Instead, when
providing access to a rotating set of toys as reinforcement
for competing responses, automatically maintained self-
injurious behavior was reduced. Thus, tokens can maintain
their reinforcing properties despite fluctuating preferences if
they can be exchanged for a variety of high quality reinforc-
ers and can become much more flexible as a reinforcer in
treatment programs. Additionally, if tokens can effectively
be exchanged for many different backup reinforcers, the con-
venience and social validity of the program increases by not
requiring practitioners to keep a wide range of reinforcers
constantly and immediately available.
Barrier: Insufficient or Inconsistent Pairing
After ensuring quality backup reinforcers, another factor that
might impede consistent responding is the association of the
token with the backup reinforcer. These associations arise
through the original token-backup pairing and how often
this pairing occurs.
Basic Experimental Research
Foundational basic research by Wolfe (1936) and Cowles
(1937) demonstrated the importance of pairing tokens with
primary reinforcers by teaching primates to respond differ-
entially to tokens with exchange value as opposed to those
without. A token will likely not have a reinforcing influ-
ence over an organism’s behavior if the token is not paired
with the backup stimuli a sufficient number of times or close
enough in time. Williams and Dunn (1991) provided some
evidence for the necessity of token-backup pairing through a
series of experiments examining conditioned reinforcement
in pigeons. Overall, the effectiveness of conditioned rein-
forcers depended on the frequency with which the stimulus
was paired with the primary reinforcer as well as how often
the stimulus was followed by reinforcement. Kelleher and
Gollub (1962) also noted the significance of the number of
pairings between the eventual conditioned stimuli (tokens)
and primary reinforcers.
Research investigating respondent conditioning further
supports that, with repeated pairing, the token should retain
the reinforcing properties of the backup reinforcer even
without the individual engaging in any behaviors outside of
accepting and consuming the reinforcer (Williams 1994).
Shahan (2010) equates this circumstance to the principles
of respondent conditioning that result in stimuli acquiring
the capacity to act as conditioned stimuli when paired with
unconditioned stimuli. In this case, neutral stimuli (tokens)
acquire the capacity to function as reinforcers when paired
with primary reinforcers. Classic basic research demon-
strating application of conditioned reinforcers to shape new
responses (Malagodi 1967a, b, c; Kelleher and Gollub 1962)
provides further supporting evidence for the importance of
the foundational relationship between tokens and backups.
In Applied Settings
If the token does not seem to function as a conditioned rein-
forcer, practitioners will most likely need to pair the two
stimuli more frequently, more consistently, or temporally
closer. Before ever requiring the individual to engage in a
behavior to gain access to the token, practitioners would
benefit from repeatedly and contiguously pairing the token
with a backup reinforcer. For instance, a practitioner could
fill up 9 spaces on a 10-space token board and then noncon-
tingently deliver the 10th token while immediately allowing
access to a preferred item. This process would be repeated
until the individual accepts both the token and backup rein-
forcer a majority of the time. In an applied study, Moher,
Gould, Hegg, and Mahoney (2008) successfully established
tokens as conditioned reinforcers by pairing tokens with
backup reinforcers in two stages. The first stage involved
the experimenter delivering a backup reinforcer within 0.5 s
of delivering a token noncontingently. In the second stage,
the participant was encouraged to physically exchange the
token for the backup reinforcer. When evaluated in a pref-
erence assessment, tokens contingently paired with highly
preferred edible items became preferred stimuli themselves.
Barrier: Overcoming Problematic Effects
of Motivating Operations
Even after ensuring a strong pairing between tokens and
backup reinforcers, inconsistent responding may still occur.
One potential cause is the variable effectiveness of backup
stimuli from moment-to-moment. In an effort to prevent
inconsistent effectiveness of backup reinforcers and thus
responding, practitioners can investigate the effects of moti-
vating operations. By definition, motivating operations can
alter the reinforcing effectiveness of tokens either by increas-
ing (establishing) or decreasing (abolishing) the effective-
ness of a given consequence (Laraway et al. 2003; Vollmer
and Iwata 1991). It might be the case that motivating opera-
tions are affecting responding in unforeseen ways.
Basic Experimental Research
Wolfe (1936) first demonstrated this fact by exposing pri-
mates to various states of food deprivation. Specifically,
148 Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
subjects were given choices between tokens (some exchange-
able for food and some exchangeable for water) while under
alternating deprivation conditions. All subjects preferred the
tokens corresponding to the current deprivation conditions;
however, the preference was not exclusive due to relatively
modest deprivation states. Therefore, two additional sub-
jects were given the same choices between tokens under
longer deprivation conditions and the researchers allowed
access to the alternate reinforcer prior to each session. The
subjects under the more stringent conditions preferred the
deprivation-specific reinforcer to a higher degree. Thus, rate
of token-exchange was consistent with the state of depri-
vation specific to each backup reinforcer and lessening the
motivation for one reinforcer strengthened the motivation
for the other.
Another motivating operation factor studied within basic
research and applicable to practical application involves the
degree to which the reinforcer is available outside of the
experimental session.Hursh (1984) described closed econo-
mies as those in which reinforcers are only available through
an organism’s interaction with the experimental environ-
ment, and open economies as those in which consumption of
the reinforcer is not completely dependent on within-session
performance. For example, two classic studies (Felton and
Lyon 1966; Catania and Reynolds 1968) performed experi-
ments in which pigeons were given supplemental (noncon-
tingent) feedings outside of the experimental session (open
economy). Relative rates of responding were markedly less
under these conditions than in conditions where subjects
could only access reinforcers through responding accord-
ing to in-session schedules of reinforcement (Collier et al.
1972).
In Applied Settings
Motivating operations can be seen as an advantage to prac-
titioners who can regulate the amount of access to a single
backup reinforcer. This can be achieved by ensuring the indi-
vidual does not have access to the preferred backup item
outside of the token economy. For instance, Roane, Call,
and Falcomata (2005) demonstrated more responding during
closed economies in which participants were only able to
obtain reinforcement through interaction with progressive-
ratio schedules of reinforcement during session. This was
in contrast to open-economy sessions during which partici-
pants demonstrated decreased responding while obtaining
both within-session reinforcers and supplemental access to
reinforcers outside of session.
Depending on the nature of the primary backup rein-
forcer, applied research has shown that the effectiveness
of tokens decreases during periods in which participants
are satiated on backup reinforcers; however, rotation and
choice across multiple backup reinforcers may guard against
these effects (Moher et al. 2008; Sran and Borrero 2010).
Additionally, inconsistent responding due to the effects of
motivating operations can also be neutralized by creation of
generalized conditioned reinforcers. A token becomes a gen-
eralized conditioned reinforcer when it can be exchanged for
a variety of backup reinforcers and is less sensitive to moti-
vating operations (Ferster and Culbertson 1982). Increasing
the number of backup reinforcers with which the token is
paired should also result in the maintenance of responding
even when individuals are satiated on the most preferred
backup reinforcer (Moher et al. 2008). Thus, efforts can be
made to decrease the potential negative effects of abolish-
ing operations by having a menu of options from which an
individual can select when exchanging tokens for backup
reinforcers.
Barrier: Difficulty Shaping the Exchange Response
Given that most practitioners themselves have had a long
history of operating within a token economy (e.g., receiv-
ing and cashing paychecks), there may be some inclination
to assume an individual can exchange tokens for backup
reinforcers spontaneously. For ease and efficiency of token-
backup exchange, there may be some benefit in removing
the exchange response altogether. That is, by exchang-
ing the tokens for someone who is struggling to learn the
exchange response, the reinforcing properties of the token
might stay intact with practitioner-mediated token-backup
pairing. However, explicit teaching of token exchange may
be necessary and beneficial if the practitioner intends on the
individual eventually determining components such as the
magnitude and rate of reinforcement.
Basic Experimental Research
Laboratory studies often rely on multiple stages to shape the
exchange response. After magazine training, in which ani-
mals are taught to approach the food receptacle and consume
primary reinforcers, a shaping procedure is used to rein-
force approximations of lever pressing. Experimenters then
focus on teaching the animal the token deposit response.
For instance, Malagodi (1967a) distributed 80 marbles on
the floor of an operant chamber and reinforced successive
approximations of rats depositing the marbles into a recep-
tacle. Stimulus control over the response was established by
reinforcing deposit responses on a continuous reinforcement
schedule in the presence of a discriminative stimulus (recep-
tacle light and clicker). In the foundational Wolfe (1936) and
Cowles (1937) studies, exchange opportunities were freely
available for primates and depositing of the token was ini-
tially modeled by the experimenter. Each token deposited by
the subject was reinforced immediately with food.
149Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
In Applied Settings
Even when tokens have acquired the properties of effec-
tive conditioned reinforcers, not all individuals will imme-
diately have mastery over the response chain necessary to
physically exchange the token and consume the backup
reinforcer. Numerous empirically validated methods for
teaching response chains are possible; including graduated
guidance, errorless learning, constant-time delay, and video
modeling. Initially, the act of exchanging the token should
be the primary task in which the individual must engage in
order to gain access to the reinforcer. It could also be the
case that the response effort of the act of exchanging is too
great (e.g., walking to a different area, locating the practi-
tioner, making choices between backup reinforcers, engag-
ing in a communicative response, etc.); thus, individuals
sometimes save tokens to increase the amount of reinforc-
ment they receive per exchange (Yankelevitz et al. 2008).
For instance, an establishing operation for saving tokens
might be in effect during low-effort tasks if the exchange
response is too demanding; at least until a sufficient number
of tokens have been accumulated to overcome the effort of
the exchange response. Thus, practitioners must consider the
overall effort of the exchange itself, as it may influence the
effectiveness of token program.
Acknowledging and Investigating
First‑ and Second‑Order Schedules
of Reinforcement
Reinforcement schedules do not operate in isolation; instead,
one schedule (a first-order schedule) can be a unit of behav-
ior upon which another schedule operates (higher- or sec-
ond-order schedules). In other words, completion of the first-
order schedule (e.g., fixed-ratio [FR]-5) is a behavioral unit
that is reinforced according to a second schedule (e.g., varia-
ble-interval [VI]-25). An oversimplified view of responding
within token economies would include practitioners viewing
responding as vulnerable to only the “local” contingencies
available through first-order reinforcement schedules. If this
were the case, behavior patterns under token economies
would only mimic those seen under programs using pri-
mary reinforcement, which most often is not the case. Fixed-
ratio schedules, for instance, produce post-reinforcement
pauses—also referred to as “pre-run” pauses—in which
responding briefly ceases following reinforcement deliv-
ery. This momentary lag in responding is often followed
by an increase in response rate until the organism meets
the requirement for reinforcement. Conversely, a variable-
ratio (VR) schedule produces relatively higher and steadier
rates of responding (Ferster and Skinner 1957). Patterns
of behavior within an extended token economy, however,
should instead be considered as unitary responses influenced
and reinforced according to two other higher-order sched-
ules. Kelleher (1958, 1966) described token economies as
involving three interconnected schedules of reinforcement
and behavior that is responsive to a token economy will
be jointly determined by both the first- and second-order
reinforcement schedules. The three schedules include: (1)
the token-production schedule: the first-order schedule of
reinforcement under which the behavior targeted for change
will result in tokens (e.g., FR5: the individual must emit 5
responses to receive one token); (2) the exchange-production
schedule: the schedule that determines when the opportunity
to exchange tokens for backup reinforcers is available (e.g.,
fixed-time [FT]-5: the individual can exchange tokens for
backups every 5 min); and (3) the token-exchange schedule:
the rate of exchange or “cost” for backup reinforcers (e.g.,
FR5: the individual must exchange 5 tokens for a certain
backup reinforcer).
Practitioners might view ongoing patterns of behavior
as being reinforced by the local contingencies available
through the immediate token-production schedule; however,
responding is not under the sole control of any of these three
schedules at any given time. Given research supporting the
separate and combined effects of these schedules (Hacken-
berg 2009), practitioners are advised to take notice of the
three specific schedules of reinforcement and need to inves-
tigate both the “local” contingencies of the token-production
schedule as well as the often superseding contingencies of
the second-order schedules.
Barrier: Appropriately Adjusting Local
Contingencies of the Token‑Production Schedule
Beginning with the token-production schedule, practitioners
may struggle in deciding whether to provide tokens after a
fixed or variable number of responses (FR or VR); after a
fixed or variable duration of engaging in the targeted behav-
ior (FRD or VRD); or after the first response following a
fixed or variable amount of time (FI or VI). Most token
economies implemented in applied settings run on fixed
schedules. There are obvious logistical benefits of using
fixed token-production schedules (i.e., ease of implementa-
tion, predictability) and FR schedules can often result in
high, steady responding and are especially beneficial when
teaching new behaviors. One important property of fixed
schedules, however, is that they can introduce discrimina-
ble periods during which reinforcement does not occur, and
responding can appear erratic such as scalloped respond-
ing under an FI schedule in which responding is slow in
the beginning of an interval and increases just before token
delivery. Furthermore, FI schedules can become problematic
when practitioners attempt to reinforce longer durations of
150 Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
continuous appropriate behavior such as increasing time on
task.
Basic Experimental Research
As stated previously, the token-production schedule exerts
some control over response patterns in token economies
in a manner resembling those obtained under schedules of
primary reinforcement (Kelleher 1956; Malagodi 1967b).
For example, Kelleher (1958), taught primates to press
a lever that produced poker chips according to an FR30
schedule where every 30 responses produced one poker
chip. Exchange periods were scheduled following an FR50
exchange-production schedule. Corresponding to basic
experimental research with simple FR schedules using pri-
mary reinforcement, responding occurred at a high-steady
rate, with short pauses prior to each ratio run. The research-
ers then increased the token-production schedule to an
FR125 while keeping the exchange-production schedule
constant. Again, emulating effects seen with FR schedules
of primary reinforcement, overall response rates decreased
and post-reinforcement pausing increased. In addition to
this study, basic research also shows that by switching the
token-reinforcement schedule to variable schedules, one can
expect high and steady responding under VR token-produc-
tion schedules, and slow and steady responding under VI
schedules (Ferster and Skinner 1957).
In Applied Settings
During acquisition phases, practitioners have the option to
begin with an FR1 token production schedule so that every
occurrence of the new behavior produces reinforcement. As
the individual gains experience with the token economy,
practitioners can systematically adjust the production sched-
ule to include intermittent schedules using procedures such
as thinning the schedule to an FR5. Alternatively, moving to
a VR token-production schedule will result in maintenance
of the replacement behavior over longer periods of time and
may guard against post-reinforcement pauses and satiation
of backup reinforcers. Basic research suggests that under FI
schedules practitioners will observe long periods of inactiv-
ity with slight increases in responding towards the interval’s
end (scalloped responding; Kelleher 1956). To modify this
pattern, practitioners can change to either a VI token-produc-
tion schedule or a response duration schedule. Under a VI
schedule, responding should be steadier and more moderate
due to the unpredictability of the interval’s end (Malagodi
1967c). With an FRD or a VRD schedule, practitioners have
the option to only reinforce behaviors that are of a specific
duration. An example of effectively using an FRD token-
production schedule would involve an individual receiving
a token after 3 min of continuous appropriate conversation,
and not receiving a token if the duration of the conversation
was less than 3 min.
Barrier: Unforeseen Effects
of the Exchange‑Production Schedule
Whereas the token-production schedule refers to the num-
ber of target responses that must be emitted by individu-
als to receive a token, the exchange-production schedule
refers to how often an individual is given the opportunity
to exchange tokens for the backup reinforcers. Practitioners
may tend to focus too narrowly on the more local effects of
the token-production schedule and, as a result, encounter
decreased responding even with a dense token-production
schedule. One consideration that needs to be emphasized
in this instance is the influence of the exchange-production
schedule.
Basic Experimental Research
Basic research has suggested that the exchange-production
schedule may in fact have greater control over responding
patterns than the local contingencies operating within the
token-production schedule (Webbe and Malagodi 1978).
Foster et al. (2001) highlighted the relatively greater influ-
ence of the exchange-production schedule over token-pro-
duction schedules by comparing one condition with a VR-
token-production schedule and a FR-exchange-production
schedule with another condition involving an FR-token-
production schedule and a VR-exchange-production sched-
ule (FR-token/VR-exchange vs. VR-token/FR-exchange).
Because pause durations were longer in the VR token-pro-
duction schedule (a schedule known to produce relatively
pause-free, constant rates of responding) relative to the FR
token-production schedule (a schedule known to produce
break-run patterns), the researchers concluded that overall
rates of behavior were primarily organized by the exchange-
production schedule requirements. Bullock and Hackenberg
(2006) extended these results by showing more pronounced
effects of the exchange-production schedule when the token-
production ratios were higher and when more responses per
token were required. In trials that included lower token-pro-
duction schedules (e.g., FR2), response rates varied much
less with the exchange-production schedule. That is, the
schedules that allowed more frequent access to tokens mim-
icked those of primary reinforcement schedules; whereas,
when more responses were required per token the frequency
of exchange periods had more influence over responding.
In Applied Settings
A practical example of the token-production schedule acting
as a unitary response, and thereby producing reinforcement
151Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
according to the exchange-production schedule, includes
a student earning tokens for completing math problems.
For every two problems completed, he will receive a token
(FR2-token-production schedule). Given a relatively fre-
quent exchange period (FR8-exchange-production sched-
ule), one could expect responding to adhere to patterning
seen under simple-schedules with primary reinforcement:
the student would complete problems rapidly and accumu-
late more tokens within a given time period. The frequency
of exchange periods becomes more influential once teach-
ers thin the token-production schedule. For instance, if the
teacher now requires the student to complete 15 problems
for one token (FR15-token-production schedule), these clus-
ters of behaviors are now more vulnerable to the effects of
the higher-order exchange-production schedule. The teacher
could now allow the student to exchange all the tokens for a
backup reinforcer either (a) after completing a fixed number
of problems (e.g., FR45-exchange-production schedule), or
(b) after an average number of problems (e.g., VR45). Under
the FR45 exchange-production schedule, the student is likely
to engage in post-reinforcement pausing with a quick transi-
tion to rapid responding until reinforcement is received (i.e.,
break-run patterning). Under the VR45 exchange-production
schedule, however, the student could be expected to com-
plete problems more quickly while pausing for shorter peri-
ods. Thus, even though the token-production schedule is the
same in either scenario (FR15), it is the exchange-production
schedule that disproportionately controls the overall rate of
responding.
Barrier: Overcoming Ratio Strain
Within token economies, ratio strain occurs with abrupt
increases in ratio requirements, resulting in decreases in
behavior similar to those seen during extinction (Ferster
and Skinner 1957). Ratio strain can unexpectedly occur
through the interaction of the token- and exchange-produc-
tion schedule. Sifting through why ratio strain is occurring
and which schedule is influencing responding the most can
be a complex task. Long pauses in ratio performance could
occur when response requirements within the token-produc-
tion schedule are too high, or when too much time elapses
between exchange opportunities.
Basic Experimental Research
Specific to token-production, organisms whose schedules
of reinforcement are “thinned” are required to engage
in an increased amount of responding before reinforce-
ment. Decreased responding is often a function of increas-
ing ratio requirements too quickly; however, ratio strain
can also occur through the interaction of the token- and
exchange-production schedules whereby organisms
might earn tokens at an appropriate rate, yet would show
decreased responding if the requirement for exchange-
production was too stringent (i.e., exchange opportunities
are not often enough). For instance, Bullock and Hacken-
berg(2006) demonstrated ratio strain in pigeons by show-
ing an inverse relation between responding and the token-
production ratio. High response requirements decreased
responding due mainly to long pauses and low response
rates in early segments. Researchers also demonstrated,
however, an inverse relation between response rate and
exchange-production ratios when token-production ratios
were kept constant.
In Applied Settings
Ratio strain can occur within applied environments when
practitioners delay opportunities to exchange tokens until
the end of the day (FT- or VT-exchange-production sched-
ules) or do not restore the exchange-production schedules
to the ratio that previously maintained adequate rates of
responding. For example, if an individual receives tokens
on an FR5-token-production schedule, and can exchange
tokens after accumulating an average of 5 tokens (VR5-
exchange-production schedule), all other things accounted
for, the practitioner can expect relatively rapid responding
with short post-reinforcement pauses. If, however, the practi-
tioner abruptly requires the individual to either (a) engage in
50 responses for 1 token (FR-50-token-production schedule),
(b) exchange tokens only after accumulating an average of
100 tokens (VR100-exchange-production schedule), or (c)
both, one would expect decreased responding due to ratio
strain, and potentially complete extinction of responding
before reinforcement at the new schedule can occur. Ratio
strain can be avoided by increasing response requirements
gradually, temporarily reducing ratio requirements, or by
increasing backup magnitude or quality (Roane et al. 2007).
Barrier: Adjusting Prices of Backup Reinforcers
The token-exchange schedule, refers to how many tokens
are required for a specific backup reinforcer, or the token-
specific “price” of the backup reinforcer. It is not always the
case that once an exchange opportunity is earned, the indi-
vidual can exchange only one token for one unit of a backup
reinforcer. What is more likely to occur in applied settings
is the option to “purchase” a variety of backup reinforcers
that vary in price. Token-exchange schedules often may be
chosen arbitrarily or might even be based on the actual retail
price of the item. However, to ensure predictable respond-
ing, practitioners should consider a number of different fac-
tors when determining this schedule.
152 Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
Basic Experimental Research
Basic research does not provide extensive information about
the specific influence of token-exchange schedules other
than that token-exchange influence is similar to exchange-
production influence. Malagodi et al. (1975) demonstrated
decreased responding in rats’ lever pressing given increased
token-exchange schedules: the more demanding the token-
exchange schedule, the longer post-reinforcement pausing.
Basic research on “unit price,” or the ratio of responses
to every unit of reinforcer, includes a more sophisticated
description of price that details the characteristics of cost-
benefit tradeoffs (e.g., Delmendo et al. 2009; Foster and
Hackenberg 2004). That is, price does not necessarily refer
just to the actual token-exchange schedule of the backup
reinforcer; rather, the interaction of the token-exchange
schedule with the costs of producing tokens (the token-
production schedule) and the costs of producing exchange
opportunities (the exchange-production schedule; Bullock
and Hackenberg 2006). Basic research on unit price demon-
strates that decreases in response rates are associated with
increases in unit price (more stringent token-production,
exchange-production, and token-exchange requirements).
In Applied Settings
Practitioners can expect decreased overall responding if
the labeled price of the backup reinforcer is relatively too
high (i.e., too demanding of a token-exchange schedule).
In applied settings; however, the actual price of the backup
reinforcer is better determined by the combined effects of
how often responses result in tokens, how often the indi-
vidual can exchange tokens, and the number of tokens
required to exchange for specific reinforcers (Bullock and
Hackenberg 2006; Malagodi et al. 1975). In this sense,
Hackenberg (2009) likened the token-production schedule
to a worker’s wage, the exchange-production schedule to
the effort required to purchase the item (e.g., driving to the
store, getting cash out of the bank), and the token-exchange
schedule to the number listed on the price tag. The basic
premise of unit price can be applied to practical contexts
by using reinforcer assessments to identify a hierarchy of
backup reinforcers. Delmendo et al. (2009) suggested that
this information could be used to differentially program con-
tingencies for task completion based on the subjective effort
associated with the task. Within token economies, those
tasks that require more effort can be associated with more
preferred reinforcers. Conversely, reinforcers that are less
preferred may be more suitable for maintaining less effort-
ful responses. An example of this situation could involve an
individual being able to use tokens to purchase high quality
rewards that are not available at other times after a period
of effortful tasks (e.g., difficult homework). Less preferred
reinforcers, therefore, would be available for the periods that
involve less effortful tasks (e.g., sitting at the table while
eating).
Barrier: Response Cost and Reducing Inappropriate
Behavior
Common barriers not only to token economies, but also
to applied practice as a whole, may include the individual
engaging in problem behavior. Practitioners most likely are
well versed in the use of differential reinforcement proce-
dures to reinforce an appropriate behavior in place of an
inappropriate one, but they may struggle to enact appropri-
ate measures to respond to inappropriate behavior using the
token economy. Modifying the token-production schedule
to include response cost procedures could be an effective
addition to target problem behavior if reinforcement alone
is ineffective in reducing problem behavior.
Basic Experimental Research
Response cost is conceptualized as a punishment procedure
in which reinforcers are removed contingent upon some
response. For example, Pietras and Hackenberg (2005) used
LED lights as tokens to reinforce pecking in pigeons. Use
of these lights allowed researchers to easily remove tokens
by turning the light off. Key pecking was maintained on
two separate schedules and when FR schedules of response-
cost were introduced in one of the schedules, response rates
under only that schedule decreased. It is interesting to note
that in this experiment, response rates for the response-cost
schedule were not completely suppressed; rather, only dur-
ing extinction did response rates decrease to near-zero levels.
In a basic experiment with humans (Weiner 1962), responses
produced brief stimuli (lights) signaling availability of rein-
forcers/points according to either VI or FI schedules. By
subtracting one point from a counter during response cost
conditions, response rates were suppressed and did not
recover with continued exposure to the response-cost con-
tingency. Thus, basic experimental research has consistently
demonstrated decreased response rates of target behaviors
using contingent removal of conditioned reinforcers.
In Applied Settings
A response cost procedure can be effective within any type
of token-production schedule as long as the tokens are act-
ing as conditioned reinforcers. Much of the applied token
economy research implementing response cost involves the
individual being given a number of tokens at the begin-
ning of an interval and losing tokens for each inappro-
priate response (e.g., Conyers et al. 2004; McGoey and
DuPaul 2000). If the individual has enough tokens at the
153Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
end of the set interval, he or she can exchange them for a
backup reinforcer. Conyers et al. (2004) found that both
a response-cost and differential reinforcement of other
behavior (DRO) procedure were effective in reducing
problem behavior when implemented in isolation; how-
ever, they recommended implementing them together to
increase treatment acceptability. Thus, a response cost pro-
cedure will be more valuable if implemented in conjunc-
tion with a token-production differential reinforcement
schedule.
A practical limitation of using response cost can involve
an instance in which an individual loses all tokens before
an exchange period and has a long wait before an oppor-
tunity to earn them back. This instance might produce
a segment of time in which contingencies for appropri-
ate behavior are vague, perhaps creating an establishing
operation for problem behavior. Some other practical lim-
itations are encompassed by the potential negative side
effects of punishment procedures in general. Specifically,
punishment procedures such as response cost can produce
negative side effects that include collateral increases in
punishment-elicited aggression, escape behaviors, and
emotional reactions (Lerman and Vorndran 2002). Lastly,
using response cost in isolation might be disadvantageous
considering exchange opportunities depend on respond-
ing (FR or VR exchange-production schedules). That is,
if response-cost conditions result in low response rates,
infrequent pairings of tokens with backup reinforcers may
reduce the reinforcing value of the tokens.
Conclusion
As robust as the literature surrounding token economies is,
practitioners may not make regular contact with the basic
research that has defined many of the practices in use today.
However, when faced with practical problems or barriers to
success, it is likely beneficial for practitioners to revisit this
literature and examine these underlying principles. This is
especially true if the practitioner responsible for training
others in the implementation of the token economy has a
less sophisticated understanding of how and why certain
procedures function as they do. It is true that typical and
even substandard implementation of token economies can
have positive effects on behavior; however, many practition-
ers are called upon to consult for complex cases. Complex
cases likely require a deeper understanding of the underlying
mechanisms actuating the seemingly everyday practices that
we use. Thus, our objective was to make this research more
accessible and to translate this research for applied settings;
hopefully assisting practitioners in implementing a more
fundamentally sound and thus effective treatment.
Compliance with Ethical Standards
Conflict of interest All authors declare no conflicts of interest. This
article does not contain any studies with human participants or animals
performed by any of the authors.
References
Ayllon, T., & Azrin, N. H. (1968). The token economy: A moti-
vational system for therapy and rehabilitation. New York:
Appleton-Century-Crofts.
Bailey, J. R., Gross, A. M., & Cotton, C. R. (2011). Challenges asso-
ciated with establishing a token economy in a residential care
facility. Clinical Case Studies, 10(4), 278–290.
Bullock, C. E., & Hackenberg, T. D. (2006). Second-order schedules
of token reinforcement with pigeons: Implications for unit price.
Journal of the Experimental Analysis of Behavior, 85, 95–106.
Catania, A. C., & Reynolds, G. A. (1968). Quantitative analysis of the
responding maintained by interval schedules of reinforcement.
Journal of the Experimental Analysis of Behavior, 11, 327–383.
Collier, G. H., Hirsch, E., & Hamlin, P. H. (1972). The ecological
determinants of reinforcement in the rat. Physiology and Behav-
ior, 9, 705–716.
Conyers, C., Miltenberger, R., Maki, A., Barenz, R., Jurgens, M.,
Sailer, A., & Kopp, B. (2004). A comparison of response cost and
differential reinforcement of other behavior to reduce disruptive
behavior in a preschool classroom. Journal of Applied Behavior
Analysis, 37, 411–415.
Cowles, J. T. (1937). Food-tokens as incentives for learning by chim-
panzees. Comparative Psychological Monographs, 12, 1–96.
DeLeon, I. G., Anders, B. M., Rodriguez-Catter, V., & Neidert, P. L.
(2000). The effects of noncontingent access to single- versus mul-
tiple-stimulus sets on self-injurious behavior. Journal of Applied
Behavior Analysis, 33(4), 623–626.
DeLeon, I. G., & Iwata, B. A. (1996). Evaluation of multiple-stimulus
presentation format for assessing reinforcer preference. Journal
of Applied Behavior Analysis, 29, 519–532.
Delmendo, X., Borrero, J. C., Beauchamp, K. L., & Francisco, M. T.
(2009). Consumption and response output as a function of unit
price: Manipulation of cost and benefit components. Journal of
Applied Behavior Analysis, 42, 609–625.
Drabman, R. S., & Tucker, R. D. (1974). Why token economies fail.
Journal of School Psychology, 12(3), 178–188.
Fantino, E. (1977). Conditioned reinforcement: Choice and informa-
tion. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of
operant behavior (pp. 313–339). Englewood Cliffs: Prentice-Hall.
Felton, M., & Lyon, D. (1966). The post-reinforcement pause. Journal
of the Experimental Analysis of Behavior, 9, 131–134.
Ferster, C. B., & Culbertson, S. A. (1982). Behavior Principles
(3rd edn.). Englewood Cliffs: Prentice Hall.
Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement.
New York: Appleton Century-Crofts.
Foster, T. A., & Hackenberg, T. D. (2004). Unit price and choice in a
token-reinforcement context. Journal of the Experimental Analy-
sis of Behavior, 81(1), 5–25.
Foster, T. A., Hackenberg, T. D., & Vaidya, M. (2001). Second-order
schedules of token reinforcement with pigeons: Effects of fixed-
and variable-rate exchange schedules. Journal of Experimental
Analysis of Behavior, 76, 159–178.
Hackenberg, T. D. (2009). Token reinforcement: A review and analysis.
Journal of Experimental Analysis of Behavior, 91, 257–286.
Hursh, S. R. (1984). Behavioral economics. Journal of the Experimen-
tal Analysis of Behavior, 42, 435–452.
154 Journal of Contemporary Psychotherapy (2018) 48:145–154
1 3
Kazdin, A. E. (1977). The token economy: A review and evaluation.
New York: Plenum.
Kazdin, A. E. (1982). The token economy: A decade later. Journal of
Applied Behavior Analysis, 15, 431–445.
Kelleher, R. T. (1956). Intermittent conditioned reinforcement in chim-
panzees. Science, 124, 679–680.
Kelleher, R. T. (1958). Fixed-ratio schedules of conditioned reinforce-
ment with chimpanzees. Journal of the Experimental Analysis of
Behavior, 1, 281–289.
Kelleher, R. T. (1966). Conditioned reinforcement in second-order
schedules. Journal of the Experimental Analysis of Behavior, 9,
475–485.
Kelleher, R. T., & Gollub, L. R. (1962). A review of positive condi-
tioned reinforcement. Journal of the Experimental Analysis of
behavior, 5(4), 543–597.
Laraway, S., Snycerski, S., Michael, J., & Poling, A. (2003). Motivating
operations and terms to describe them: Some further refinements.
Journal of Applied Behavior Analysis, 36(3), 407–414.
Lerman, D. C., & Vorndran, C. M. (2002). On the status of knowledge
for using punishment: Implications for treating behavior disorders.
Journal of Applied Behavior Analysis, 35(4), 431–464.
Mace, F. C., & Critchfield, T. S. (2010). Translational research in
behavior analysis: Historical traditions and imperative for the
future. Journal of the Experimental Analysis of Behavior, 93(3),
293–312.
Mace, F. C., & Roberts, M. L. (1993). Factors affecting selection of
behavioral interventions. In J. Reichle & D. P. Wacker (Eds.),
Communicative alternatives to challenging behavior (pp. 113–
134). Baltimore: Brookes.
Malagodi, E. F. (1967a). Acquisition of the token-reward habit in the
rat. Psychological Reports, 20, 1335–1342.
Malagodi, E. F. (1967b). Fixed-ratio schedules of token reinforcement.
Psychonomic Science, 8, 469–470.
Malagodi, E. F. (1967c). Variable-interval schedules of token reinforce-
ment. Psychonomic Science, 8, 471–472.
Malagodi, E. F., Webbe, F. M., & Waddell, T. R. (1975). Second-order
schedules of token reinforcement: Effects of varying the sched-
ule of food presentation. Journal of the Experimental Analysis of
Behavior, 24, 173–181.
Matson, J. L., & Boisjoli, J. A. (2009). The token economy for children
with intellectual disability and/or autism: A review. Research in
Developmental Disabilities, 30(2), 240–248.
McGoey, K. E., & DuPaul, G. J. (2000). Token reinforcement and
response cost procedures: Reducing the disruptive behavior of
preschool children with attention-deficit/hyperactivity disorder.
School Psychology Quarterly, 15(3), 330–343.
Miltenberger, R. (2008). Behavior modification. Belmont: Wadsworth
Publishing.
Moher, C. A., Gould, D. D., Hegg, E., & Mahoney, A. M. (2008).
Non-generalized and generalized conditioned reinforcers: Estab-
lishment and validation. Behavioral Interventions, 23(1), 13–38.
Neef, N. A., Shade, D., & Miller, M. S. (1994). Assessing influen-
tial dimensions of reinforcers on choice in students with serious
emotional disturbance. Journal of Applied Behavior Analysis, 27,
575–583.
Pietras, C. J., & Hackenberg, T. D. (2005). Response-cost punish-
ment via token loss with pigeons. Behavioral Processes, 69(3),
343–356.
Roane, H. S., Call, N. A., & Falcomata, T. S. (2005). A preliminary
analysis of adaptive responding under open and closed economies.
Journal of Applied Behavior Analysis, 38(3), 335–348.
Roane, H. S., Falcomata, T. S., & Fisher, W. W. (2007). Applying the
behavioral economics principle of unit price to DRO schedule
thinning. Journal of Applied Behavior Analysis, 40(3), 529–534.
Roane, H. S., Vollmer, T. R., Ringdahl, J. E., & Marcus, B. A. (1998).
Evaluation of a brief stimulus preference assessment. Journal of
Applied Behavior Analysis, 31(4), 605–620.
Shahan, T. A. (2010). Conditioned reinforcement and response
strength. Journal of the Experimental Analysis of Behavior, 93,
269–289.
Skinner, B. F. (1974). About behaviorism. New York: Knopf.
Sran, S. K., & Borrero, J. C. (2010). Assessing the value of choice
in a token system. Journal of Applied Behavior Analysis, 43(3),
553–557.
Vollmer, T. R., & Iwata, B. A. (1991). Establishing operations and rein-
forcement effects. Journal of Applied Behavior Analysis, 24(2),
279–291.
Webbe, F. W., & Malagodi, E. F. (1978). Second-order schedules of
token reinforcement: Comparisons on performance under fixed-
ratio and variable-ratio exchange schedules. Journal of Experi-
mental Analysis of Behavior, 30, 219–224.
Weiner, H. (1962). Some effects of response cost upon human oper-
ant behavior. Journal of the Experimental Analysis of Behavior,
5(2), 201–208.
Williams, B. A. (1994). Conditioned reinforcement: Experimental and
theoretical issues. The Behavior Analyst, 17, 261–285.
Williams, B. A., & Dunn, R. (1991). Preference for conditioned rein-
forcement. Journal of the Experimental Analysis of Behavior, 55,
37–46.
Wolfe, J. B. (1936). Effectiveness of token-rewards for chimpanzees.
Comparative Psychology Monographs, 12, 1–72.
Yankelevitz, R. L., Bullock, C. E., & Hackenberg, T. D. (2008). Rein-
forcer accumulation in a token reinforcement context. Journal of
the Experimental Analysis of Behavior, 90, 283–299.
- Token Economies: Using Basic Experimental Research to Guide Practical Applications
Abstract
Introduction
Conditioning Tokens as Effective Reinforcers
Barrier: Insufficient Quality of Backup Reinforcers
Basic Experimental Research
In Applied Settings
Barrier: Insufficient or Inconsistent Pairing
Basic Experimental Research
In Applied Settings
Barrier: Overcoming Problematic Effects of Motivating Operations
Basic Experimental Research
In Applied Settings
Barrier: Difficulty Shaping the Exchange Response
Basic Experimental Research
In Applied Settings
Acknowledging and Investigating First- and Second-Order Schedules of Reinforcement
Barrier: Appropriately Adjusting Local Contingencies of the Token-Production Schedule
Basic Experimental Research
In Applied Settings
Barrier: Unforeseen Effects of the Exchange-Production Schedule
Basic Experimental Research
In Applied Settings
Barrier: Overcoming Ratio Strain
Basic Experimental Research
In Applied Settings
Barrier: Adjusting Prices of Backup Reinforcers
Basic Experimental Research
In Applied Settings
Barrier: Response Cost and Reducing Inappropriate Behavior
Basic Experimental Research
In Applied Settings
Conclusion
References
O R I G I N A L P A P E R
Effects of a Perseverative Interest-Based Token
Economy on Challenging and On-Task Behavior
in a Child with Autism
Amarie Carnett • Tracy Raulston • Russell Lang •
Amy Tostanoski • Allyson Lee • Jeff Sigafoos •
Wendy Machalicek
Published online: 19 March 2014
� Springer Science+Business Media New York 2014
Abstract We compared the effects of a token economy intervention that either did
or did not include the perseverative interests of a 7-year-old boy with autism. An
alternating treatment design revealed that the perseverative interest-based tokens
were more effective at decreasing challenging behavior and increasing on-task
behavior than tokens absent the perseverative interest during an early literacy
activity. The beneficial effects were then replicated in the child’s classroom. The
results suggest that perseverative interest-based tokens might enhance the effec-
tiveness of interventions based on token economies.
Keywords Autism � Perseverative interest � Token economy � Challenging
behavior � Alternating treatment design
A. Carnett (&) � J. Sigafoos
Victoria University of Wellington, Karori Campus, PO Box 17-310, Wellington, New Zealand
e-mail: Amarie.Carnett@vuw.ac.nz
T. Raulston � W. Machalicek
University of Oregon, Eugene, OR, USA
R. Lang � A. Lee
Clinic for Autism Research Evaluation and Support, Texas State University, San Marcos,
TX, USA
R. Lang
Meadows Center for the Prevention of Educational Risk, The University of Texas at Austin, Austin,
TX, USA
A. Tostanoski
Vanderbilt University, Nashville, TN, USA
123
J Behav Educ (2014) 23:368–377
DOI 10.1007/s10864-014-9195-7
Introduction
Token economy interventions involve delivering small tangibles (e.g., tokens)
contingent on the presence or absence of target behaviors and then providing an
opportunity to exchange a preset number of these tokens for backup reinforcers.
Previous research has demonstrated that behaviors can be established, decreased,
and/or maintained using token economy systems (Hackenberg 2009; Matson and
Boisjoli 2009). Research has also investigated several variations of this intervention
including the use of a response cost (i.e., losing tokens for inappropriate behavior),
pairing tokens with praise, and delivering tokens on a variety of intermittent
reinforcement schedules. These variables have been shown to influence the
effectiveness of token economy interventions in some cases (Maggin et al. 2011;
Matson and Boisjoli 2009; Mottram and Berger-Gross 2004).
One aspect of the token economy that has received relatively little attention is the
token itself. Traditionally, tokens are considered to be neutral stimuli (e.g., tickets)
that gain reinforcing power by being paired with the backup reinforcers. Charlop-
Christy and Haymes (1998) investigated the effectiveness of incorporating the
idiosyncratic perseverative interests of children with autism within tokens in an
effort to increase the reinforcing power of the token. Charlop-Christy and Haymes
(1998) defined such intense interests as preoccupations or obsessions that an
individual continually seeks. Results from that study indicated that making use of
tokens that reflected the child’s perseverative interests (e.g., using a small picture of
a train as a token for a child who had a perseverative interest in trains) improved
intervention outcomes. To date, this appears to be the only study to have
demonstrated the potential value of individualizing tokens based on a child’s
perseverative interest.
The purpose of this current study was to replicate and extend the work of
Charlop-Christy and Haymes (1998). Specifically, we compared the effects of a
token economy intervention that either did or did not make use of tokens that
reflected a child’s perseverative interest. We examined the effects of this
manipulation on the challenging and on-task behavior of a 7-year-old boy with
autism during an early literacy activity in a public school special education
classroom and an
inclusion classroom.
Method
Participant, Setting, and Materials
Troy was a 7-year-old boy who had been diagnosed with autism. He resided at home
with his father, mother, and three older siblings and attended a local public school.
He scored a 31 on the Childhood Autism Rating Scale (CARS; Schopler et al.
1980), which is indicative of mild-moderate autistic symptoms, and a 99 on the
Behavior Assessment System for Children-II, which indicates an overall clinically
significant range (BASC-II; Reynolds and Kamphaus 2004). Troy spent the majority
of his school day in a special education life skills classroom with four to eight other
J Behav Educ (2014) 23:368–377 369
123
children with developmental disabilities, a special education teacher, and a teaching
assistant. Troy’s individualized education plan (IEP) called for him to spend 1 h of
his school day included in activities with students without disabilities. However,
Troy’s challenging behavior (i.e., screaming, falling, and/or lying on the floor)
occurred too frequently to be acceptable in an inclusion classroom (i.e., a classroom
with a combination of students with and without disability).
The Questions About Behavior Function (QABF) Scale (Matson et al. 2012)
suggested that Troy’s challenging behavior was maintained by escape from
demands. As a result, the inclusion time specified in his IEP was met by
nonacademic activities with fewer demands (e.g., lunch, recess). Troy’s school
counselor referred him to this study in an effort to identify a strategy that could be
used to increase Troy’s inclusion during academic instruction in the general
education classroom. Additionally, Troy had previous experience in using a
traditional token economy within a discrete-trial format, and thus did not require
additional training to use the token economy system for this study.
The baseline and intervention sessions were conducted in Troy’s life skills
classroom and in his inclusion classroom. A video camera on a tripod was used to
record the participant and the researcher during all sessions. The inclusion
classroom included one teacher, 14 students without disabilities, two students with
learning disabilities who spent 100 % of their time in that classroom, and two
students with developmental disabilities who divided their time between the
inclusion and life skills classrooms. Both classrooms had a regularly scheduled early
literacy activity, which lasted 10–12 min and occurred three or four times per week.
During the activity, the teacher sat in a chair and read a story to the children as they
sat on the carpet with a teacher assistant and researcher observing. The children
were expected to sit quietly, look at the teacher or book, listen, and answer
occasional reading comprehension questions.
Response Measurement and Interobserver Agreement
Data were collected on Troy’s challenging behavior and on-task behavior.
Challenging behavior was defined as screaming (i.e., loud vocalizations lasting
3 s or more that were considered disruptive in the classroom), falling, and/or lying
on the ground (i.e., collapsing head and body to the ground). Screaming and falling
often occurred in tandem, and the QABF suggested both were maintained by escape
from demands, so these two topographies, whether they occurred alone or in
combination, were recorded as challenging behavior. On-task behavior was defined
as sitting with buttocks on the ground, head oriented toward the teacher, and having
an absence of challenging behavior. Challenging behavior was scored using 10-s
partial interval recording, and on-task behavior was scored using 10-s whole-
interval recording (Kennedy 2005). The on-task behaviors were selected due to their
incompatibility with Troy’s challenging behavior; thus, challenging behavior and
on-task behavior could not be scored in the same interval. Interval data were
converted to a percentage by dividing the number of intervals with each dependent
variable by the total number of intervals, then multiplying by 100 to convert into a
percentage.
370 J Behav Educ (2014) 23:368–377
123
Data on interobserver agreement (IOA) were collected from videos for both
dependent variables during 30 % of the baseline and intervention sessions by two
trained independent coders. IOA was calculated by dividing the number of intervals
with agreement (i.e., both data collectors scored the presence or absence of
challenging behavior/on-task behavior for the interval) by the total number of
intervals (i.e., agreements plus disagreements), then multiplying by 100 to convert
into a percentage. Mean agreement for both dependent variables was 98.5 % (range
95–100 %).
Treatment integrity was assessed for 30 % of the sessions. A procedural checklist
of intervention procedures (available upon request) was used to record the accuracy
of intervention implementation. The mean of treatment integrity was 96.9 % (range
84.6–100 %).
Procedure
Research Design
The two token economy interventions (i.e., with and without embedded persever-
ative interests) were compared using alternating treatments with an initial baseline
design (Gast 2010). The alternating treatments phase was conducted in the life skills
classroom, and the intervention was implemented by the researcher.
Generalization
from the life skills classroom to the inclusion classroom was assessed by conducting
a probe in the inclusion classroom during baseline and by adding a third phase, best-
treatment phase, in which the intervention associated with less challenging behavior
and more on-task behavior was implemented in the inclusion classroom (Gast
2010). Across all phases of the study, the following conditions were held constant:
(a) session duration (10 min), (b) time of day when sessions were conducted, (c) the
types of backup reinforcers that were available, (d) the number and timing of
opportunities to exchange tokens for the backup reinforcers, and (e) the reading
level of the stories. During the intervention and generalization phases, the reading
activity was led by the classroom teacher. A teaching assistant was also present, and
the researcher implemented the intervention.
Baseline
Four of the five baseline sessions were conducted in the life skills classroom. Due to
high rates of challenging behavior, only one baseline session was conducted in the
inclusion classroom. The duration of the reading activity was always between 1
0
and 12 min. To keep session duration constant, data were recorded during the first
10 min only. During baseline, all teachers and assistants were told to conduct the
reading activity as they would normally. During baseline, the teachers in both
classrooms verbally prompted on-task behavior (e.g., ‘‘Troy, please be quiet and sit
up.’’), provided praise contingent upon on-task behavior, and occasionally ignored
challenging behavior or delivered a mild reprimand (e.g., ‘‘Troy, stop that.’’).
However, none of these components were consistently implemented, and despite
this effort, the participant’s challenging behavior had persisted for over 6 months.
J Behav Educ (2014) 23:368–377 371
123
Preference and Backup Reinforcers
Backup reinforcers were selected by first asking Troy’s teachers to identify potential
reinforcers that would be appropriate in their classrooms. The teachers suggested
small edibles (e.g., bite-sized candy or cracker) because they were inexpensive and
could be consumed quickly without causing distraction. A pairwise preference
assessment was then conducted to identify preference of bite-sized edibles (Fisher
et al. 1992). Prior to each session, Troy selected a backup reinforcer from his top
three preferences (i.e., M&M, fruit snack, and chip). The researcher reviewed on-
task behaviors with Troy using a visual support that included pictures and words of
targeted on-task behaviors (i.e., sitting down, staying quiet, and looking at the
teacher) prior to the start of all sessions. The visual support remained present and
was used to redirect challenging behavior if it occurred, at the end of each 20-s
interval (i.e., the researcher pointed to the picture that represented the desired
behavior instead of delivering a token) throughout each session.
Token Economy without Perseverative Interest
The token economy system that did not include Troy’s perseverative interest used
pennies with a small patch of Velcro
�
on the back that could be fastened to a token
board. Penny tokens were delivered by the researcher sitting near Troy, contingent
on 20-s of consecutive on-task behavior. A maximum of 30 tokens per 10 min
session could be earned. Backup reinforcers (i.e., bite-sized candy) could be
obtained for every 10 tokens earned, and an opportunity to exchange was presented
within sessions at each moment in which Troy had earned 10 tokens. For data
collection purposes, the exchanges were coded as on-task behavior. The token board
included circles drawn in groups of 10 as a visual representation of the number of
tokens needed to earn a backup reinforcer. Upon earning a token for targeted on-
task behaviors, Troy was handed a token to place on the board (also coded as on-
task).
Token Economy with Perseverative Interests
The token economy system used in this condition differed from the previously
described condition in that the pennies and token board were replaced by tokens and
a board related to Troy’s perseverative interest in jigsaw puzzles. Specifically, the
tokens were small foam puzzle pieces, and the token board was a thin cardstock
frame into which the pieces fit. This token board mirrored the traditional token
economy, in that it included 10 outlined locations for each puzzle piece. The same
procedures, response requirements, exchange rate, and backup reinforcers were used
in both token economy conditions (i.e., with and without perseverative interests).
Troy’s perseverative interest in puzzles was determined by interviews with teachers
and a free operant preference assessment, in which a puzzle was made available
alongside other toys and activity options (Roane et al. 1998). All of Troy’s teachers
agreed that he perseverated on a specific puzzle, and he devoted 100 % of his time
in the free operant preference assessment touching, holding, and manipulating the
372 J Behav Educ (2014) 23:368–377
123
puzzle pieces. Further, Troy always selected this specific puzzle when other puzzles
were available.
Generalization
Troy’s behavior during the group reading activity in the inclusion classroom was
measured in one baseline session using the same procedures as the other four
baseline sessions conducted in the life skills classroom. In the final best-treatment
phase of the study, three sessions were conducted in the inclusion classroom using
the perseverative interest token economy system.
Results
The top panel of Fig. 1 displays the percentage of intervals during which Troy
engaged in on-task behavior during the entire 10-s interval. During baseline, Troy
was on-task in the life skills classroom for a mean of 11 % of the intervals (range
8–18 %). In the inclusion classroom, he was on-task during 13 % of the intervals.
During the alternating treatment phase, both token economy interventions resulted
in an increase in on-task behavior relative to baseline. However, Troy was on-task
more often during the perseverative interest token economy condition
(M = 59.7 %, range 48–70 %) than in the token economy condition that did not
involve tokens reflecting his perseverative interest (M = 45 %, range 32–55 %).
The increase in on-task behavior in the perseverative interest condition was then
replicated during the final best-treatment phase in the inclusion classroom
(M = 64 %, range 52–72 %).
The bottom panel of Fig. 1 displays the percentage of 10-s interval during which
at least one instance of challenging behavior occurred. During baseline conditions in
the life skills classroom, challenging behavior occurred during a mean of 89 % of
intervals (range 82–92 %) and in 87 % of intervals in the inclusion classroom.
Challenging behavior decreased from baseline levels in both token economy
conditions; however, a lower percentage of intervals had challenging behavior in the
perseverative interest condition (M = 40 %, range 30–52 %) compared with the
condition where the token did not coincide with Troy’s perseverative interests
(M = 55 %, range 45–68 %). The reduction in challenging behavior in the
perseverative interest condition in the life skills classroom was replicated during the
final best-treatment phase in the inclusion classroom (M = 36 %, range 28–48 %).
Discussion
The results of this study replicate previous research demonstrating the utility of
token economy interventions for children with autism (Matson and Boisjoli 2009)
because both token economy interventions (i.e., with and without the perseverative
interest) resulted in decreased challenging behavior and increased on-task behavior.
Further, the superiority of the condition involving tokens reflecting Troy’s
J Behav Educ (2014) 23:368–377 373
123
perseverative interest is consistent with the findings of Charlop-Christy and Haymes
(1998). Finally, these data extend previous research by demonstrating the benefit of
interest-based tokens in a special education classroom with generalization to an
inclusion classroom.
The perseverative interests inherent to an autism spectrum disorder (ASD)
diagnosis often impede appropriate classroom behavior and learning (e.g., Rispoli
et al. 2011; Lang et al. 2010) and can be associated with serious challenging
behavior (e.g., Hausman et al. 2009; Matson et al. 2009). Thus, interventions have
primarily sought to address challenging behavior associated with such restricted and
repetitive behaviors and interests (RRBI) with antecedent manipulations to enrich
0
10
20
30
40
50
60
70
80
90
100
P
er
ce
nt
ag
e
of
1
0-
s
W
ho
le
I
nt
er
va
ls
w
it
h
O
n-
T
as
k
B
eh
av
io
r
Token Economy Comparison in Baseline PI Token Economy
in Inclusion
Life Skills
Classroom
With PI
Without
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
P
er
ce
nt
ag
e
of
1
0-
s
P
ar
ti
al
I
nt
er
va
ls
w
it
h
C
ha
ll
en
gi
ng
B
eh
av
io
r Inclusion
Classroom
Probe
Sessions
Fig. 1 The top panel displays the percentage of 10-s whole interval during which Troy was on-task, and
the bottom panel displays the percentage of 10-s partial interval during which Troy engaged in
challenging behavior. The closed circles represent baseline in the life skills classroom, triangles represent
the inclusion classroom, open diamonds represent the token economy without the perseverative interest
(PI), and closed squares represent the token economy with the PI
374 J Behav Educ (2014) 23:368–377
123
the environment and prevent challenging behaviors, and consequence-based
interventions that involve interrupting the repetitive behavior (see Boyd et al.
2012 for a recent review). However, other researchers have demonstrated the utility
of capitalizing on perseverate interests by incorporating them into the intervention
procedures or making access to RRBI contingent on targeted appropriate behavior
or the absence of target challenging behavior (Baker et al. 1998; Charlop-Christy
and Haymes 1996, 1998; Vismara and Lyons 2011). This study, considered in
tandem with Charlop-Christy and Haymes (1998), suggests idiosyncratic persev-
erative interests can be utilized to improve intervention efficiency and effectiveness.
The putative mechanism of action responsible for the enhanced effectiveness is
likely the increased reinforcing value of the token itself. Compared with the use of
neutral stimuli as tokens, the reinforcement from perseverative interest-based tokens
may be more immediate, and thus more efficient, than relying only on the
reinforcing power of the backup edibles that were available only after a number of
tokens had been earned and exchanged. Although Troy was always willing to
exchange 10 tokens for the backup reinforcers, it is possible that some children may
value the perseverative interest-based tokens more than backup reinforcers. In such
cases, challenging behavior maintained by continued access to preferred tangibles
might be occasioned when the child is asked to exchange the high preferred token
for a less preferred item. Practitioners using this approach are therefore cautioned to
consider the reinforcing value of the perseverative interest token relative to the
backup reinforcers. If challenging behavior is observed during the exchange, it may
be preferable to use neutral stimuli as tokens or to merely use the preservative
interest tokens alone without additional backup reinforcers. As part of a larger effort
to better incorporate the characteristics of children with autism into intervention
approaches with the goal of improving educational outcomes, future research
designed to elucidate and then potentially address such a limitation remains
warranted.
These findings buttress the evidence supporting the use of token economy
systems with this population and align with the perspective that circumscribed
interests can be a unique strength of individuals with high-functioning ASD
(Mercier et al. 2000). Nevertheless, when children with ASD perseverate to the
exclusion of other activities, such RRBIs significantly restrict their social and
learning opportunities (Pierce and Courchesne 2001; Koegel et al. 1974; Lovaas
et al. 1971). Research could continue to investigate the effects of embedding
perseverative interests into other interventions, such as video modeling. However, it
is possible that the use of perseverative interests in this way may inadvertently lead
to a counterproductive increase in fascination with the perseverative interest.
Although we are not aware of this issue having been reported in previous research, it
would seem a plausible potential limitation that should be investigated as research
in this area continues.
The results of this current study should be considered in light of a few limitations.
First, we selected an alternating treatment design because teachers expressed
concern regarding a reversal to baseline conditions. Although this design facilitated
implementation in an applied setting, the lack of a reversal phase introduced the
potential of carryover effects. Second, to identify Troy’s perseverative interest, we
J Behav Educ (2014) 23:368–377 375
123
utilized teacher reports and a free operant preference assessment, which did not
capture a hierarchy of reinforcers (Roane et al. 1998). Further, there is not a well-
established procedure for distinguishing between high preferred stimuli and the
level of fascination indicative of a true perseverative interest, and our assertion that
puzzles were indeed a perseverative interest should be considered with caution. It is
possible that puzzles were merely highly preferred. Future research should further
investigate reinforcement hierarchies to determine more precise ways of identifying
perseverative interests. Third, the visual cues utilized to prompt on-task behavior
and redirect challenging behavior, although held constant in all intervention and
generalization sessions, were not evaluated as a separate intervention component.
Thus, we are uncertain to what degree they may have contributed to the effects on
the dependent variables. Finally, because on-task behavior increased and challeng-
ing behavior decreased with the use of both token systems, assessing the value of
the perseverative interest token proved difficult. It is possible that the effectiveness
of both systems might approach equivalence over time if challenging behavior
continued to decrease. Thus, future research should investigate the effects of
extended use of the two systems, as well as the effects of systematic fading
procedures of the embedded token system.
References
Baker, M. J., Koegel, R. L., & Koegel, L. K. (1998). Increasing the social behavior of young children with
autism using their obsessive behaviors. The Journal of the Association for Persons with Severe
Handicaps, 23, 300–308. doi:10.2511/rpsd.23.4.300.
Boyd, B. A., McDonough, S. G., & Bodfish, J. W. (2012). Evidence-based behavioral interventions for
repetitive behaviors in autism. Journal of Autism and Developmental Disorders, 42, 1236–1248.
doi:10.1007/s10803-011-1284-z.
Charlop-Christy, M. H., & Haymes, L. K. (1996). Using obsessions as reinforcers with and without mild
reductive procedures to decrease inappropriate behaviors of children with autism. Journal of Autism
and Developmental Disorders, 26, 527–545.
Charlop-Christy, M. H., & Haymes, L. K. (1998). Using objects of obsession as token reinforcers for
children with autism. Journal of Autism and Developmental Disorders, 28, 189–198. doi:10.1023/A:
1026061220171.
Fisher, W. W., Piazza, C. C., Bowman, L. G., Hagopian, L. P., Owens, J. C., & Slevin, I. (1992). A
comparison of two approaches for identifying reinforcers for persons with severe and profound
disabilities. Journal of Applied Behavior Analysis, 25, 491–498. doi:10.1901/jaba.1992.25-491.
Gast, D. L. (2010). Single subject research methodology in behavioral sciences. New York: Routledge.
Hackenberg, T. D. (2009). Token reinforcement: A review and analysis. Journal of the Experimental
Analysis of Behavior, 91, 257–286. doi:10.1901/jeab.2009.91-257.
Hausman, N., Kahng, S., Farrell, E., & Mongeon, C. (2009). Idiosyncratic functions: Severe problem
behavior maintained by access to ritualistic behaviors. Education and Treatment of Children, 32,
77–87.
Kennedy, C. (2005). Single-case designs for educational research. Boston: Pearson Education Inc.
Koegel, R. L., Firestone, P. B., Kramme, K. W., & Dunlap, G. (1974). Increasing spontaneous play by
suppressing self-stimulation in autistic children. Journal of Applied Behavior Analysis, 7, 521–528.
Lang, R., Regester, A., Rispoli, M., & Camargo, S. H. (2010). Rehabilitation issues for children with
autism spectrum disorders. Developmental Neurorehabilitation, 13, 153–155.
Lovaas, O. I., Schreibman, L., Koegel, R., & Rehm, R. (1971). Selective responding by autistic children
to multiple sensory input. Journal of Abnormal Psychology, 77(3), 211–222.
376 J Behav Educ (2014) 23:368–377
123
http://dx.doi.org/10.2511/rpsd.23.4.300
http://dx.doi.org/10.1007/s10803-011-1284-z
http://dx.doi.org/10.1023/A:1026061220171
http://dx.doi.org/10.1023/A:1026061220171
http://dx.doi.org/10.1901/jaba.1992.25-491
http://dx.doi.org/10.1901/jeab.2009.91-257
Maggin, D. M., Chafouleas, S. M., Goddard, K. M., & Johnson, A. H. (2011). A systematic evaluation of
token economies as a classroom management tool for students with challenging behavior. Journal of
School Psychology, 49, 529–554. doi:10.1016/j.jsp.2011.05.
001.
Matson, J. L., & Boisjoli, J. A. (2009). The token economy for children with intellectual disability and/or
autism: A review. Research in Developmental Disabilities, 30, 240–248. doi:10.1016/j.ridd.2008.04.
001.
Matson, J. L., Dempsey, T., & Fodstad, J. C. (2009). Stereotypies and repetitive/restrictive behaviours in
infants with autism and pervasive developmental disorder. Developmental Neurorehabilitation, 12,
122–127.
Matson, J. L., Tureck, K., & Rieske, R. (2012). The questions about behavioral function (QABF): Current
status as a method of functional assessment. Research in Developmental Disabilities, 33, 630–634.
doi:10.1016/j.ridd.2011.11.006.
Mercier, C., Mottron, L., & Belleville, S. (2000). A psychosocial study on restricted interests in high-
functioning persons with pervasive developmental disorders. Autism, 4(4), 29–46.
Mottram, L., & Berger-Gross, P. (2004). An intervention to reduce disruptive behaviours in children with
brain injury. Developmental Neurorehabilitation, 7, 133–143.
Pierce, K., & Courchesne, E. (2001). Evidence for a cerebellar role in reduced exploration and
stereotyped behavior in autism. Biological Psychiatry, 49, 655–664.
Reynolds, C. R., & Kamphaus, R. W. (2004). BASC-II: Behavior assessment system for children (2nd
ed.). Bloomington, MN: Pearson Assessments.
Rispoli, M. J., O’Reilly, M. F., Lang, R., Machalicek, W., Davis, T., Lancioni, G., et al. (2011). Effects of
motivating operations on aberrant behavior and academic engagement for two students with autism.
Journal of Applied Behavior Analysis, 44, 187–192.
Roane, H. S., Vollmer, T. R., Ringdahl, J. E., & Marcus, B. A. (1998). Evaluation of a brief stimulus
preference assessment. Journal of Applied Behavior Analysis, 31, 605–620. doi:10.1901/jaba.1998.
31-605.
Schopler, E., Reichler, R. J., Devellis, R. F., & Daly, K. (1980). Toward an objective classification of
childhood autism: Childhood autism rating scale (CARS). Journal of Autism and Developmental
Disabilities, 10, 91–103. doi:10.1007/BF02408436.
Vismara, L. A., & Lyons, G. L. (2011). Using perseverative interest to elicit joint attention behaviors in
young children with autism: Theoretical and clinical implications for understanding motivation.
Journal of Positive Behavioral Interventions, 9(4), 214–228. doi:10.1177/10983007070090040401.
J Behav Educ (2014) 23:368–377 377
123
http://dx.doi.org/10.1016/j.jsp.2011.05.001
http://dx.doi.org/10.1016/j.ridd.2008.04.001
http://dx.doi.org/10.1016/j.ridd.2008.04.001
http://dx.doi.org/10.1016/j.ridd.2011.11.006
http://dx.doi.org/10.1901/jaba.1998.31-605
http://dx.doi.org/10.1901/jaba.1998.31-605
http://dx.doi.org/10.1007/BF02408436
http://dx.doi.org/10.1177/10983007070090040401
Copyright of Journal of Behavioral Education is the property of Springer Science & Business
Media B.V. and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder’s express written permission. However, users may print,
download, or email articles for individual use.
- Effects of a Perseverative Interest-Based Token Economy on Challenging and On-Task Behavior in a Child with Autism
Abstract
Introduction
Method
Participant, Setting, and Materials
Response Measurement and Interobserver Agreement
Procedure
Research Design
Baseline
Preference and Backup Reinforcers
Token Economy without Perseverative Interest
Token Economy with Perseverative Interests
Generalization
Results
Discussion
References
131
The Token Economy: A Recent Review and Evaluation
Christopher Doll
1
; T. F. McLaughlin
2
; Anjali Barretto
3
1
Gonzaga University, East 502 Boone Avenue, Spokane, WA 99258-0025, USA
cdoll2@zagmail.gonzaga.edu
2
Gonzaga University, East 502 Boone Avenue, Spokane, WA 99258-0025, USA
mclaughlin@gonzaga.edu
3
Gonzaga University, East 502 Boone Avenue, Spokane, WA 99258-0025, USA
barretto@gonzage.edu
Abstract – This article presents a recent and inclusive review of the use of token
economies in various environments (schools, home, etc.). Digital and manual
searches were carried using the following databases: Google Scholar, Psych Info
(EBSCO), and The Web of Knowledge. The search terms included: token economy,
token systems, token reinforcement, behavior modification, classroom management,
operant conditioning, animal behavior, token literature reviews, and token
economy concerns. The criteria for inclusion were studies that implemented token
economies in settings where academics were assessed. Token economies have been
extensively implemented and evaluated in the past. Few articles in the peer-
reviewed literature were found being published recently. While token economy
reviews have occurred historically (Kazdin, 1972, 1977, 1982), there has been no
recent overview of the research. During the previous several years, token
economies in relation to certain disorders have been analyzed and reviewed;
however, a recent review of token economies as a field of study has not been
carried out. The purpose of this literature review was to produce a recent review
and evaluation on the research of token economies across settings.
Key Words – Digital Search; Future Research; Literature Review; Research;
Token Programs
1 Introduction
This article presents a recent and inclusive review of the use of token economies in various settings.
Digital and manual searches were carried using the following databases: Google Scholar, Psych Info
(EBSCO), and The Web of Knowledge. The search terms included: token economy, token systems,
token reinforcement, behavior modification, classroom management, operant conditioning, animal
behavior, token literature reviews, and token economy concerns. The criteria for inclusion were studies
that implemented token economies in settings where academics were assessed.
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
132 Insan Akademika Publications
2 History of Token Systems
Token systems, in one form or another, have been used for centuries and have evolved notably to
systems used today. Clay coins, which people could earn and exchange for goods and services, in the
early agricultural societies were part of the transition from simple barter systems to more complex
economies (Schmandt-Besserat, 1992). Before that, however, incentives- based structures were
created and sustained in a variety of cultures and as part of many institutions within those cultures.
Governments used the influencing abilities of rewards to shape behaviors in battle and throughout
society. Rewards have ranged from tangible prizes to socially significant titles (Doolittle, 1865;
Duran, 1964; Grant, 1967). During the first century, Grant (1967) explained that accomplishments of
gladiators were rewarded with property, prizes, and crowns. Carcopino (1940) described charioteers
in Rome during that same time being rewarded with their freedom after repeated victories. In ancient
China, soldiers received colored peacock feathers for bravery in battle (Doolittle, 1865). Several
military institutions in ancient civilizations utilized these systems of merit and rewards to incentivize
behavior. From the Aztecs in the 15
th
century (Duran, 1964), as well as the militaries of modern times,
the use of titles of distinction and medals to reward actions were common methods to promote certain
types of behavior, or responses. Modern research peaked in the 1970‟s where there was substantial
study surrounding psychiatry, clinical psychology, education, and mental health fields
(Kazdin, 1977).
Token economy systems have also been employed to modify animal behavior (Addessi, Mancini,
Crescimbene, & Visalberghi, 2011; Malagodi, 1967; Sousa, Matsuzawa, 2001). Malagodi‟s (1967)
study involving rats established a mechanism of exchange between marbles, which the rats earned
through a dispenser, and an edible primary reinforcer. In that study, token reinforcement under fixed
and variable interval schedules were shown to be as effective as the edible primary reinforcer to
increase lever pressing. In another study, Wolf (1936) compared the effectiveness of exchangeable
tokens, nonexchangeable tokens, and food to find that exchangeable tokens and food were comparable
in reinforcing ability. These studies clearly show that tokens, when paired with a primary reinforcer
are effective at modifying certain behaviors in animal subjects. Cowles (1937) found similar results
with exchangeable tokens when he taught chimpanzees new learning tasks. In Sousa and Matsuzawa‟s
(2001) study, not only did chimpanzees perform similarly with tokens as they did with direct food
rewards, but the researchers found that chimpanzees were able to collect and save several tokens
before exchanging them.
The military as well as mental health and educational facilities have increased their use of incentives
to shape behavior. Tangible items given as rewards evolved to tokens which could be exchanged for
certain privileges and rewards. This evolution of the token economy was a catalyst for increasingly
novel and diverse utilization of token-reinforcement systems. One example of how token systems
have been applied in an institutional setting was Alexander Maconochie‟s “Mark System”
implemented with a prison population during the 1840‟s (Kazdin, 1977). This token-based system
improved the conditions under which many prisoners lived; furthermore, it attempted to create an
incentive-driven system to reward positive behavior rather than give aversive consequences to
prisoners. Within this “Mark System,” sentences were converted to “marks” and the prisoners sought
to reduce these “marks,” or tokens, through good behavior within the prison system. Upon reaching a
certain level of tokens, the prisoner could then be released. The prisoners exchanged their tokens for
necessary items such as food, shelter, and clothes (Kazdin, 1977). A variation of the token economy
under Maconochie was the inclusion of a response cost component where negative or institutionally-
labeled aberrant behaviors resulted in the withdrawal of “marks.” Unique approaches such as the
Mark System have helped evolve the reward and cost structures resulting in “serious achievements in
reform, rehabilitation, and token economies” (Kazdin, 1977).
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 133
3 Early History of Token Systems in the Schools
3.1 Token, tracking, exchange
Educational systems have employed token economies as a means to manage students for several
decades (Kazdin, 1982). The need to educate large numbers of children and the demand for
meaningful education helped to evolve the application of these token-based systems. As noted
previously, titles of distinction as well as tangible property have all been used to incentivize
individuals and their behavior. In schools, a variety of incentives have acted and continue to serve as
the rewards earned for certain defined target behaviors (Boniecki & Moore, 2003; Lolich,
McLaughlin, & Weber, 2012; McLaughlin & Malaby, 1975). As early as the 7
th
century, a monk in
Southern Europe gave out biscuits of leftover dough, also known as “petriolas” or “little rewards,” to
give to children who learned their prayers (Kazdin, 1977). Later on in the 1100‟s, Birnbaum (1962)
noted that using rewards such as nuts, figs, and honey were commonly implemented by educators as
incentives for learning. In the 16
th
century, Skinner (1966) described instances where fruit and cake
was advocated by Erasmus in order to help children learn Greek and Latin.
Within the past several centuries, the modern forms of the token economy have been increasingly used
in the education of society. Two of those systems came to the United States during the 1800‟s. Joseph
Lancaster‟s “Monitorial System” originated in England in the early part of the century and came to
New York in 1805. This system, when implemented in New York schools, contained a more explicit
use of tokens and of response cost. More-able peers were “Monitors” for less-able peers and each
skill-group was awarded different sets of privileges and prizes, based on level. The Monitorial System
allowed for the creation of helper teachers which allowed for the teaching of large numbers of
students. The solution to this problem of larger classes helped to spread this program across the
nation. A second system, Excelsior, established itself during the latter part of the 1800‟s when the
United States was experiencing significant growth in the use of token economies (Kazdin, 1977). This
system consisted of giving out “Excellent(s)” and “Perfect(s)” designations to students for pro-social
and pro-academic behaviors. These “Excellents” and “Perfects” were exchanged for “Merits,” which
in turn were saved and exchanged for a special certificate from the teacher attesting to great
performance. In both of these systems, prizes and rewards acted to make the token more powerful in
affecting behavior. Furthermore, in both of these token-reinforcement systems, back-up reinforcers
and prizes were integral in their setups and sustainment.
3.2 Definition of a Token System
Token economies have been extensively researched throughout the last several decades and applied in
a variety of settings. Teachers and caretakers have used these systems in general education, special
education, and community-based settings. Because of the variety of token-based systems and the ease
at which teachers can implement them, token economies are widely used across the nation.
The behavioral principles employed in token systems are based primarily upon the concept of operant
conditioning (Kazdin, 1977; McLaughlin & Williams, 1988). Within a token economy, tokens are
most often a neutral stimulus in the form of “points” or tangible items that are awarded to economy
participants for target behaviors. In a token-reinforcement system, the neutral token is repeatedly
presented alongside or immediately before the reinforcing stimulus. That stimulus may be a variation
of edibles, privileges, or other incentives. By performing this process of repeating presentations of
neutral tokens before the reinforcing stimulus, the neutral token becomes the reinforcing entity. As the
participants in the token experience the pairing of token and a previously reinforcing items, the token
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
134 Insan Akademika Publications
itself may acquire reinforcing properties as a result. The token economy gains its utility and power to
modify behavior when the neutral tokens become secondary reinforcers. The effectiveness of this
process has been noted by Miller and Drennen (1970). They demonstrated that when praise is a
neutral stimulus, it could become a conditioned reinforcer through pairing it with another reinforcing
event.
3.2.1 Target behaviors of token economies
A token economy is often implemented because there are target behaviors that teachers would like to
increase or reduce. These behaviors must be identified by those who work in such classrooms.
Changes in these target behaviors often improve the classroom-learning environment or the needs for
that specific institution. Token economies can be used to minimize disruptions in a classroom as well
as increase student academic responding. This can depend on the classroom and the priorities of the
teacher. However, most teachers employ a token system to manage both academic and social
behaviors
(McLaughlin & Williams, 1988).
In a token economy it is important to clearly outline the target behaviors for the students as well as the
teacher (Kazdin, 1977). When a teacher is first implementing a token-reinforcement system it has
been recommended that desired behaviors are orally communicated, written down, or otherwise
clearly explained or modeled to the participants (Alberto & Troutman, 2012; McLaughlin & Williams,
1988). This communication with the participants is crucial and directly related to the effectiveness and
efficiency of the system (Alberto & Troutman, 2012; Cooper, Heron, & Heward, 2007).
3.2.2 Tokens
In order to establish and sustain a token economy system there needs to be tokens. These tokens then
serve as a way to provide consequences. Tokens can be tangible gaming-style chips, tickets, coins,
fake money, marbles, stickers, or stamps (McLaughlin & Williams, 1988). They can also come in the
form of more abstract items in the form of points or checkmarks given by the teacher or the economy‟s
“manager.” The choice of tokens can depend on the setting, population, manager‟s or teacher‟s
preference, cost, among other considerations. Population and setting considerations are related to
what type of tokens are going to be applicable for certain participants. A younger group, or students
with developmental or cognitive delays, may well benefit from more tangible items like coins or cards,
than more abstract items in the form of points or checkmarks (McLaughlin & Williams, 1988;
Stainback, Payne, Stainback, & Payne, 1973). Tangible tokens provide a concrete representation of
the number of tokens earned which can then be exchanged for rewards (B. Williams, R. Williams, &
McLaughlin, 1989). When choosing tokens, the teacher‟s preference, especially in relation to cost,
must be considered. Also, the choice of the token should include the difficulty or impossibility of the
token itself being duplicated and flooding the classroom with tokens not under the control of the
teacher. These factors must impact the types of tokens, which are used within the system, the
frequency at which they are delivered, and ultimately the back-up rewards that are available to give
value to the tokens.
3.2.3 Back-up rewards
Back-up rewards are the items that the students or persons have indicated they are willing to work.
Their desirability has been used to assign the number of tokens that are needed to purchase or take part
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 135
in this reward (Kazdin, 1977). Without these back-up rewards, the tokens have no exchangeable
value. Also, tokens without value can negatively alter an individual‟s motivation (Wolf, 1936). The
more back-up rewards in the token system, the more substantial the reinforcing strength becomes
through pairing of tokens and rewards (B. Williams, R. Williams, & McLaughlin, 1989). Back-up
rewards have also been used in the home settings where they have included: ski trips, video games,
movies, or lunch at a chosen restaurant (Rustab & McLaughlin, 1988). Even with this variety of back-
up rewards, the monetary reward has been used very effectively (Jordan, McLaughlin, & Hunsaker,
1980). This is likely due to money‟s exchangeable abilities and its ability to act as one of the ultimate
Generalized Conditioned Reinforcers.
3.2.4 The exchange
An important part of the token economy is the exchange of tokens for certain back-up rewards chosen
by the economy‟s manager or students and in part by the needs and preferences of the participants.
The value of the token is a function of the reinforcers which are able to back-up their value (Kazdin,
1977). At the end of the period where tokens have been given, the teacher will decide to begin the
exchange process.
When a conditioned reinforcer like a token is exchanged for a variety of privileges and rewards, the
token is referred to as a generalized conditioned reinforcer (Kazdin, 1977). Generalized tangible
conditioned reinforcers, which can be exchanged for a variety of items, are used very frequently in
behavior modification programs (Kazdin, 1977). Tokens or generalized conditioned reinforcers also
come in the form of money used in society. The more items or rewards you can exchange for the
token, the more powerful the token becomes. Money and other generalized conditioned reinforcers are
more valuable than any single reinforce because they can purchase a variety of back-up reinforcers
(Kazdin, 1977). The power of generalized conditioned reinforcers was assessed when Sran and
Borrero (2010) compared behaviors reinforced by tokens which could be exchanged for a single
highly preferred item with tokens which could be exchanged for a variety of preferred items. They
found, while degrees of preference varied, all participants were shown to deliver higher rates of
responding during sessions where tokens could be exchanged for a variety of preferred items.
During the early implementation of the token economy, especially for lower-functioning persons, it is
important to have frequent exchange periods where participants can be quickly reinforced and target
behaviors can increase (O‟Leary & Drabman, 1971). Infrequent exchange periods at the beginning of
a token economy‟s implementation may prevent this type of system from working effectively. It is
important to determine and adapt the exchange period based on classroom needs (Kazdin, 1977;
McLaughlin & Williams, 1988). For some participants, especially those with Attention Deficit
Hyperactivity Disorder (ADHD), the immediacy in which a back-up reinforcer is received will be the
most influential dimension a token economy, making the time between token and exchange crucially
important (Neef, Bicard, & Endo, 2001; Reed & Martens, 2011). One of the important considerations
when carrying-out a token economy is its impact on the classroom environment or setting. The
exchange period should be quick to complete and not significantly impact the ability of the teacher to
manage the classroom or particular setting. Based on these considerations, it is important to schedule
exchange periods at the end of the class period, during a naturally occurring transition, or possibly at
the end of the day or week.
There are many different ways in which a token exchange can take place. Many types of exchange
systems have been implemented (Kazdin, 1977; McLaughlin, 1975). Tokens may be exchanged as
soon as they are earned (Bushell, 1978), at the end of a certain time period (McLaughlin & Malaby,
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
136 Insan Akademika Publications
1972), or after a variable time period (McLaughlin & Williams, 1988). At the end of the token-reward
period, there may be a catalog of items and privileges, a “store” where the participant is able to
exchange tokens or a predetermined back-up reinforcer. Additionally, free-time itself may function as
its own generalized conditioned reinforce as it gives the participants access to a variety of back-up
rewards.
When the system is in place, teachers may choose an exchange time based on classroom schedule or
student needs. Token economy exchange periods could take place at the end of a 50-minute class
throughout the day, daily, weekly, or biweekly. The effectiveness of the token economy may decrease
as more if more time passes between presentation of the token and exchange for the backup reinforcers
(Kazdin, 1977; Neef et al., 2001; Reed & Martens, 2011). Variability of the exchange times as
opposed to fixed time periods where tokens are traded for back-up rewards have been shown to
increase response rates as well as maintenance of the behavior (McLaughlin & Malaby, 1976).
According to McLaughlin and Malaby (1976), executing variable exchange times within a token
economy is effective and an important consideration for any teacher or economy manager to consider.
3.3 Variations of Token Economies
3.3.1 Response cost
During a response cost system, tokens are taken away as students engage in certain pre-defined
behaviors. When tokens are taken from the student that is the cost of the behavior. In this variation of
the token economy, each unwanted behavior will have a cost which results in the confiscation of a
determined amount of tokens. Response cost is very commonly used to suppress behavior (Kazdin,
1977). The most commonly used form of response cost is the withdrawal of tokens or fines. Token
economies are unique because tokens can be presented or removed (Kazdin, 1977; McLaughlin &
Malaby, 1977a). Hall et al. (1972) employed response cost to reduce whining in a young child. The
researchers used slips of paper given to the boy with his name printed on them. The slips were taken
away for negative behaviors. Even when these slips had no apparent value, this response cost system
drastically reduced negative behaviors. Iwata and Bailey (1974) compared token reinforcement and
response cost in a special education classroom. Both were equally effective at improving behaviors.
However, the teacher was more negative with the students when response cost was used in the
classroom. In McLaughlin and Malaby (1977a), token reinforcement and response cost system was
found to be more effective at increasing target behavior than token reinforcement alone. Achievement
Place, (Kirigan, Braukman, Atwater, & Wolf, 1982), where at-risk youth are often sent to learn
important social and academic skills, so they can be placed back into mainstream society, effectively
implements a token reinforcement system with response cost to reduce severe behaviors while
increasing pro-social and academic behaviors (Ayllon & Azrin, 1968; Bailey, Wolf, & Phillips, 1970;
McLaughlin & Malaby, 1977a). In general, token economies with and without a response cost
component have been effective in different settings. It is important to note; however, that a program
solely reliant on response cost and punishment-oriented management are less likely to result in
creating pro-social behaviors in the participants (Iwata & Bailey, 1974; Kazdin, 1977). This is
interesting considering that, in some studies, there seems to be a preference by the teachers of response
cost when compared to a token reinforcement only system (McGoey & DuPaul, 2000). In McGoey
and DuPaul (2000), a preschool class compared stickers rewarded to students and stickers being
removed for off-task behavior. They found them to be equally effective. This finding replicates Iwata
and Bailey. However, it is important to consider that reinforcement for specific target behaviors is
more likely to develop pro-social responses as alternatives for the behaviors to being suppressed
(Kazdin, 1977).
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 137
3.3.2 Lottery systems
Instead of a token economy where behaviors earn tokens to be exchanged at later period, lottery-based
systems add an additional component to the exchange period. In this type of economy, target
behaviors are rewarded with a token, or ticket and at the end of the reward period there is a lottery to
determine which individuals earn a backup reward. This can minimize the amount of backup rewards
delivered in the token economy by choosing only a select number of tokens, or tickets, to exchange. A
weakness of this type of system would be some ages and populations may be difficult to affect without
a direct correspondence of tokens and backup rewards (McLaughlin & Williams, 1988).
3.3.3 Individual vs whole class
It will be up to the teacher or manager of the economy to determine whether tokens will be awarded to
entire groups or to individuals within the group. The advantage of developing a group-oriented token
economy is the ease of which teachers may implement and track tokens and rewards (Kazdin, 1977).
These class-wide systems have also been well documented and seem to be useful in reducing
unwanted behavior (Bushell, Wrobel, & Michaelis, 1968; Packard, 1970). Consequences in these
class-wide economies can be group or individually administered, depending on the system chosen.
Packard (1970) evaluated a token economy under a group contingency in four elementary school
classes where off-task behavior was a concern. In Packard‟s study, certain class periods were chosen
for each grade and a class goal was assigned to raise on-task behavior. When the class met the criteria
for on-task behavior, they were given points which could then be exchanged for group or individually
assigned rewards (Packard, 1970). The results in that study showed baseline levels of below 10% on-
task behavior rise to between 70-100% on-task behaviors during class periods once the group-
contingent token economy was implemented (Packard, 1970).
3.3.4 Level systems
Level systems are a variation of token economy. In these systems, different levels correspond to
different degrees of participant behavior. For example, increasing preferred target behaviors may
result in higher levels which then translate to higher rates of reinforcement and privilege while
unwanted behaviors may result in a decreased rate of reinforcement or loss of privileges. In one level
system, each participant was assigned a shape or character and every 2-4 hours, would be moved up or
down the six-level system (Filcheck, McNeil, & Greco, 2004). Each system can be monitored
differently; however, the movement from one level to another based on participant behavior which
results in varying levels of reinforcement. Filcheck et al. (2004) compared a system where efficiency
was a priority and all rewards were able to be dispensed within three minutes. The researchers found
this efficient exchange to be beneficial during class times. The ability to efficiently dispense rewards
and levels make these systems easily customized based on the needs of the setting.
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
138 Insan Akademika Publications
3.4 Efficacy of Token Systems
3.4.1 General Outcomes
Research with individuals in classroom settings using token economies has been firmly established the
efficacy of token reinforcement in altering a wide range of responses (Kazdin, 1977). There is a
significant need for effective behavior management systems. Lavigne (1998) notes that children
behavior problems are increasing, with estimates ranging from 2 to 17% of the population. This rate
of children with behavior problems is highlighting the demand for behavior management systems
which are data-based and effective. Token Economy systems are able to have a profound impact on
schools, classrooms, and community-based settings. One variation of the token economy, a response
cost system, is known to have produced higher levels of on-task behavior than when compared to
medication (Rapport, Murphy, & Bailey, 1982). The structure and implementation of the token
economy is important as noted by Kazdin (1977) where he describes the effectiveness of
reinforcement depends on: the delay between performance of response and delivery of reinforcement,
the magnitude and quality of the reinforcer, and the schedule of reinforcement. Many factors are
important in the consideration of a token economy. Whether or not reinforcement takes place on a
continuous or intermittent basis can impact the likelihood of maintenance (Kazdin, 1977).
3.4.2 Preschool
Token economies in the preschool setting have been utilized with a variety of modifications to this
behavior-management system (Filchek et al., 2004; McGoey & DuPaul, 2000). As the need for
behavioral interventions increase, it is important for preschool teachers to be aware of these token-
oriented procedures, and using these systems classroom-wide may be a great pro-active benefit
(Filcheck et. al., 2004).
Filcheck et al. (2004) compared the effectiveness of a class-wide token economy level system with
parent-training techniques in managing aberrant behaviors. These authors note that class-wide
application of the token economy has not been previously analyzed. However, group and individual
application of token systems have effectively reduced disruptive behavior in other settings (Bushell,
Wrobel, & Michaelis, 1968; Packard, 1970). The classroom in Filcheck et al. was described as “out of
control” and was chosen for behavioral intervention. The token economy used was a level system
where the top three levels included sunny faces which get increasingly happy, the center level is the
starting point and is blank and white, while the bottom three levels include cloudy faces that get
increasingly greyer and sad (Filcheck et al., 2004). In this system, promotion to different levels
within the preschool class allowed participants to complete certain activities while other children, who
were not promoted, were continuing with the pre-determined class schedule. Furthermore, at the end
of certain activities, all participants with “positive” behavior levels receive additional rewards like
stickers or activities with the teacher. In this system, the level system was found to decrease rates of
inappropriate behaviors; additionally, when the parent training was implemented further decreases
occurred (Filcheck et al., 2004). It is important to consider that in this study the training time
necessary for each of the two behavior management tools. In this study, the Level System took 4
hours and 30 minutes to train staff on including all consultation and feedback time; however, the
parent training took 11 hours and 30 minutes (Filcheck et. al. 2004). In term so effectiveness and time
efficiency, the level system seemed to have the greatest rate of positive return.
Additional studies have shown rapid behavioral improvement when a token economy is implemented.
A study involving a sticker chart in McGoey and DuPaul (2000) was managed by teachers placing
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 139
stickers on a classroom board when they “caught” students being on-task. When a student earned a
certain number of small stickers, they were rewarded with a big sticker (McGoey & DuPaul, 2000).
For the response cost portion of this study, stickers were removed contingent on being off-task and
when the session ended, the big sticker was kept or removed from the chart. These token economy
and response cost systems resulted in large decreases of aberrant behavior (McGoey & DuPaul, 2000).
Implementing token economies in a preschool setting, Sran and Borrero (2010) compared two
variations of this behavior management system. In this study, tokens that were exchanged for a variety
of preferred items were shown to be more effective than tokens that could only be exchanged for one
highly preferred item. These results are consistent with previous research which shows generalized
conditioned reinforcers are more reinforcing than a single reinforce (Kazdin, 1977).
3.4.3 Elementary school
Elementary school classrooms, based on research study volume, seem to be one of the most common
settings in which token economy systems are used (Coupland & McLaughlin, 1981; Ruesch &
McLaughlin, 1981; Thompson, McLaughlin, & Derby, 2011). Many studies exist which show the
effectiveness of this type of behavior management tool. One of these studies, employed a free time
reward when five tokens had been earned (Ruesch, McLaughlin, 1981). The rationale that free time
would consist of a variety of reinforcers made it unlikely that satiation would occur (Kazdin, 1977). In
Ruesch and McLaughlin, (1981) a clear increase in student assignment completion took place. When
token economies were used to decrease inappropriate behavior by rewarding being on task, there is
proven effectiveness with this behavior management system (Coupland & McLaughlin, 1981). Under
a token economy with sixth grade participants, points were given and subtracted for appropriate and
inappropriate behavior respectively (McLaughlin & Malaby, 1976).
McLaughlin and Malaby (1977a) compared token reinforcement with and without response cost in a
special education elementary classroom. In McLaughlin and Malaby‟s (1977a) study, ten participants
were asked to write letters for a several minute session where they earned no token reinforcement
during baseline, token reinforcement during the next phase, and token reinforcement plus response
cost during the final phase. The overall results were such that, in this elementary classroom, token
reinforcement plus response cost resulted in higher rates of target behavior (McLaughlin & Malaby,
1977a). In another study, McLaughlin and Malaby (1976) analyzed assignment completion under
different schedules of token exchange. During that study involving a fifth and sixth grade class, points
were earned or taken away depending on whether children displayed appropriate or inappropriate
behavior. The results showed that participants had higher rates of appropriate behavior, as measured
through assignment completion, when there were a variable number of days between token award and
exchange (McLaughlin & Malaby, 1976). According to the authors, McLaughlin and Malaby (1976)
note that such a system where variable exchange days were implemented should be considered for any
teacher or economy manager interested in impacting the rates of assignment completion.
3.4.4 Middle school
Middle school classrooms have seen many instances of positive behavioral outcomes as part of a token
economy (Flaman & McLaughlin, 1986; Maglio & McLaughlin, 1981; Swain & McLaughlin, 1998;
Truchlicka, McLaughlin, & Swain, 1998). Maglio and McLaughlin (1981) note the importance of a
teacher‟s ability to manage the token system in their study where a student‟s partial self-management,
with teacher supervision, of points along with back-up reinforcers resulted in a significant decrease of
inappropriate behaviors. Besides social behavior, academic improvement has also been seen during
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
140 Insan Akademika Publications
token reinforcement (Flaman & McLaughlin, 1986). Flaman and McLaughlin‟s study took place in a
junior high school drop-out prevention program where the subject rarely completed an assignment
unless given one-on-one assistance. In that study, correct answers on a worksheet resulted in 1-2
points per problem that could be exchanged for free-time on a classroom microcomputer. This study
increased the rate of correct answers from 34% to 69% correct during the first phase, and to 79%
during the second phase of token reinforcement (Flaman & McLaughlin, 1986). A second system
where assignment accuracy was a concern included bonus points (Swain & McLaughlin, 1998). In
that study, four middle school special education students which were previously being managed by a
token reinforcement system were offered fifty extra bonus tokens or points for assignment scores
greater than 80% (Swain & McLaughlin, 1986). This bonus contingency resulted in an increase of
math accuracy. When response cost is implemented in a high school setting, positive results are
possible (Truchlicka, McLaughlin, & Swain, 1998). Truchlicka et al. (1998) implemented a response
cost to an already functioning token reinforcement system. In this system, an accuracy goal of 85%
was required to earn token reinforcers; however, if that accuracy level was not reached, tokens were
removed or privileges were denied. This study concluded that the response cost phase resulted in a
higher rate of accuracy for each subject. The implementation of a point gain or point lose system had
a greater impact than a token reinforcing system.
3.4.5 High school
Implementation of token economies in the high school setting occurs at a much lower rate than when
compared to elementary school or middle school settings. This may be attributed to the fact that
teachers are more apprehensive towards this type of system; alternatively, the lower rate of occurrence
could be due to a perceived lack of effectiveness.
In a study by Crawford and McLaughlin (1982), token reinforcement was evaluated as a means to
increase on-task behavior. This study was conducted in a high school within a self-contained special
education classroom with a 15-year-old student. The student was given tokens and worked for a
chosen back-up reinforce which cost 30-40 cents worth of tokens. In this study there was a clear
increase in on-task behavior during the token-reinforcement phases. According to the study, on-task
behavior from the student more than doubled when tokens were first introduced (Crawford &
McLaughlin, 1982).
3.4.6 College or University
Token systems in college settings have also been assessed for effectiveness. Participation in class
within all settings is a priority and a goal for many teachers and professors, and two studies
specifically, aimed to analyze the impact of tokens on classroom participation in college settings.
Jalongo (1998) determined that only approximately 10% of students voluntarily participate in class
discussions. In one study, good questions that related to content, made sense, among other
requirements, were rewarded with token slips that were exchanged for bonus course points (Nelson,
2010). This study involved 318 undergraduate students and reported that classes asked higher rates of
questions when the token economy was implemented. An additional study involving token economies
at the college level analyzed the impact of class participation before, during, and after implementation
of the behavior management system (Boniecki & Moore, 2003). This study found that questions were
asked, and classroom participation was greater, when a token economy was introduced. The tokens in
this system were exchanged for .25% of additional credit towards the final course grade (Boniecki &
Moore, 2003). Students were more than twice as likely to participate than before the token economy
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 141
system. Both token economy studies found an increase in classroom participation when a token
reinforcer was introduced; notably, in both cases, the tokens were exchanged for extra credit towards
the final grades in the classes. Grades could potentially be considered highly preferred items for
college students seeking certain GPAs, job prospects, etc.
3.4.7 Community and home
Applicability of the token economy can also be found in home-based and community settings (Bippes,
McLaughlin, & Williams, 1986; Jordan, McLaughlin, & Hunsaker, 1980; Rustab & McLaughlin,
1988). Token systems implemented at home can be effective at reducing or increasing similar
behaviors that are found in the school setting, as well as social behaviors and task-related behaviors
(Alvord, 1971; Arnett & Ulrich, 1975). Implementation in the community detention centers have also
delivered increased rates of accuracy and target behaviors (Bippes, et. al., 1986). In Rustab and
McLaughlin‟s (1988) study, inappropriate behavior and spelling accuracy were measured during
baseline and post-token economy implementation. In this particular case, tokens were rewarded for
every 5 minutes of appropriate behavior and tokens were exchanged weekly for privileges within and
outside the home. Inappropriate behavior immediately decreased once token reinforcement began.
When target academic and social behaviors were only reinforced through tokens at home, the higher
rates of on-task behavior and spelling accuracy at home were generalized to higher rates of the
behaviors in school (Rustab & McLaughlin, 1986). Home-exclusive behaviors in the category of
chores and social demands were also dramatically increased during another study (Christophersen,
Arnold, Hill, & Quilitch, 1972). Home-based token economies using 1 cent per minute token rewards
have been shown to increase on-task behavior (Jordan et al., 1980).
Token economies in the schools where consequences were dispensed at the participant‟s home have
also resulted in improved classroom performance and study behavior (Bailey, Wolf, & Phillips, 1970).
In this study, on task “yes‟” were rewarded with privileges at home (Bailey et al., 1970). Partnerships
between the classroom teacher and the home guardian of the participant can play an effective role in
behavior modification. In many cases of children with severe behavior, classroom teachers may not be
in possession of reinforcing contingencies, and, may require a parent or guardian to devise effective
consequences (Bailey et al., 1970). Moreover, concerns of a lack of maintenance and participants
being unable to generalize behavioral gains made in the school setting make home-involvement more
attractive (Brown, Montgomery, & Barclay, 1969; Walker & Buckley, 1972). Involving the parents or
guardians in such a way that they are dispensing the consequences for behavior occurring in other
settings is an effective method to sustain a token economy (Bailey et al., 1970; Cantrell, Cantrell,
Huddleston, & Woolridge, 1969; McKenzie, Clark, Wolf, Kothera, & Benson, 1968; Thorne, Tharp, &
Wetzel, 1967).
3.5 Limitations and Ethical Concerns with a Token Economy
As with any system which has been widely implemented, token economies have been the target of
ethical concerns as well as criticisms stemming from published and perceived weaknesses (Kohn,
1999). Doubts and concerns about token economies have existed since the behavior modification
method has taken on a more mainstream role in society. Early criticism of Alexander Maconochie‟s
“Mark System” described his program as indulging the prisoners rather than providing the punishment
and social revenge usually accorded them (Kazdin, 1977). The tickets given out in New York City
schools originating from Lancaster‟s “Monitorial System” of reward and punishment was withdrawn
in the 1830‟s because the trustees believed that cunning behavior rather than meritorious behavior was
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
142 Insan Akademika Publications
being rewarded (Kazdin, 1977). However, token-based reinforcement systems tend to be extremely
effective as a method to modify behavior (Chance, 2006; Kazdin, 1977). Notably within a token
economy, a large number of target behaviors, clients, and back-up reinforcers can be incorporated into
a single, highly efficient method (Kazdin, 1977). A general concern inherent to any behavior
management system is its ability to be fair, reliable, and functional. Stealing of tokens, lack of
participation, token-economy sabotage by participants are some of the ways that this behavior
management system may fail from within. It is vital that token economy managers are aware of these
possibilities and take steps to pre-empt any of these negative consequences of poor planning
(McLaughlin & Williams, 1988).
Modern critiques of the token economy have come from education professionals, administrators, and
community members. This criticism has stemmed from philosophical opposition to token
reinforcement. These critics have suggested that token reinforcement constitutes bribery or blackmail
(Kazdin, 1977; Kohn, 1999). However, when one defines bribery in the correct manner, token
reinforcement is not used to reward unethical or illegal behaviors. Therefore, labeling token
reinforcement as bribery is totally inappropriate (Chance, 2006). Although social and philosophical
opposition are fruitful topics for the media, the inappropriate use of such terms as bribery, rewards as
suggested by Kohn (1999) is totally inaccurate. There have been concerns that students may become
dependent on these systems and they will only constantly working for tangible tokens or backup
rewards. Furthermore, there is criticism that these systems may undermine intrinsic motivation for
students (Kohn, 2006). While intrinsic motivation may produce qualitatively different results, not all
individuals possess such willingness and appropriate behavior must be more directly reinforced.
As part of the token economy, teachers and others use back-up reinforcers to give value or potency to
the token (Kazdin, 1977). Some systems employ back-up reinforcers that are new to the environment,
while others use back-up reinforcers that more naturally “fit,” such as recess or a free break during
class in a school setting (McLaughlin, 1981; McLaughlin & Malaby, 1972, 1975, 1976). An important
component to remedy a loss of target behavior over time is to create token economies where the back-
up reward is a natural reinforcer, where, instead of an external prize that costs money and is
administered by the economy manager, the tokens could be exchanged for a rest period or a water
break. Even when these two different forms of back-up reinforcers are dispensed, it is setting the
occasion for the participant to be rewarded for certain behaviors, just as an employee would be
rewarded with a paycheck, a participant would be able to earn tokens. Token-reinforcement systems
can easily be compared to the adult world of work and society as a whole where certain work or
behaviors are rewarded with tokens, or cash. Token-based programs can leave the participants
dependent on earning rewards for target behaviors. Once tokens are withdrawn, desirable behavior
may decrease or inappropriate behaviors increase (Kazdin, 1977). As a token-economy manager
attempts to phase out the program, it is important that specific procedures are implemented in order to
withdraw the economy without a loss of behavior gains. Kazdin (1977) and others note that creating a
procedure where exchange periods become less frequent and increasingly variable may improve the
likelihood of maintenance (McLaughlin & Malaby, 1972, 1975, 1976). Additionally, self-monitoring
by the participant may also help the behavior to generalize across settings and even after tangible
rewards are being exchanged explicitly by the manager (Turkewitz, O‟Leary, & Ironsmith, 1975).
These modifications have been shown to remedy these issues related to maintenance and
generalization.
Another concern is that token economies are sometimes substantial work for the staff that administers
them. Teachers are encountering larger classes with increasing numbers of behavioral issues;
however, easily implemented systems can address their needs as well as the varied classroom
management concerns (Barth, 1979). The degree in which a teacher can easily implement this token
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 143
economy strategy is an issue for teachers who are busy teaching. Often, it is difficult to engage in
elaborate systems that mandate data collection, token management, and intricate exchange processes.
While there are systems which are administratively more involved than others, it is possible to
implement systems which are easy to implement and evaluate. A system of easy administration was
studied in Rustab and McLaughlin‟s (1988) home-based system where a parent was able to administer
the system without any outside help once the parent was trained on token reinforcement. Additionally,
when a token economy was implemented to increase piano practice time, the parent was able to
implement the procedures with little training and administrative struggle (Jordan, McLaughlin, &
Hunsaker, 1980). Concerns over the administrative aspects can be mitigated with deliberate planning
of the token economy. For example, response cost was preferred by teachers and sustained after
research ended in a preschool classroom due to easier management (McGoey & DuPaul, 2000). In
McGoey and DuPaul (2000), the researchers noted that catching individuals within a large class or
group made a response cost system easier to implement. Making preferences for one system
modification over another, especially when implementing a token economy with an entire classroom,
will help teachers decrease administrative tasks inherent in some token economies while allowing this
system to function as a behavior management tool.
Next, there are limitations of token economies, notably concerning participants who exhibit severe
behavior in a class or group-home setting. These participants with severe behaviors may not be
affected by a token economy system that would work for most other individuals (Kazdin, 1977).
Some participants simply do not respond to the token economy for one or multiple reasons.
Potentially, with severe behavior, other therapies may be implemented to decrease inappropriate
responses. If the problem is behavioral it will be up to the manager of the system to determine
whether certain modifications can be made to enhance the viability of the token economy. If a student
is not responding to the token economy, then it would be necessary to evaluate the procedures used to
give tokens, exchange tokens, as well as the actual rewards being given out in exchange for the tokens.
For example, altering the back-up rewards where they are more reinforcing for an individual would be
way to make the token economy more effective. As previously noted, if the classroom teacher is
unable to dispense appropriate consequences which do have significant reinforcing qualities, involving
those who can by communicating with the parent or guardian at the participant‟s home may result in a
more effective token economy (Bailey et al., 1970).
Cost is a significant consideration when implementing a token economy, and can be a limitation when
a teacher or other manager is beginning to plan the back-up reinforcers being used. This is especially
true when trying to configure a genuinely reinforcing reward with the ability to drive behavior
modification, a potentially challenging mission with increasingly older participants. There are several
studies which aim to develop token economies which are effective and cost-conscious. The purpose of
McLaughlin and Malaby (1977b) study was to evaluate the effects of a cost free token reinforcement
program on special education students. Rewards included: recess, extra gym time, films, free time,
special jobs, messenger, art projects, and buying the teacher lunch. It was shown that this system
delivered an increase in the frequency of letters traced. The number of target responses varied from
15-84 during baseline, to 30-108 during the token phase (McLaughlin & Malaby, 1977b). It is clear
that token economies can be effective at a low cost when certain rewards are used in the program.
Free and low cost reinforcers are also a realistic option for token economy administrators of older and
more sophisticated students (Crawford & McLaughlin, 1982). In Crawford and McLaughlin, (1982) a
single cassette tape was purchased and listening time acted as a back-up reward; a cost effective
reinforcer within that token economy increased levels of on-task behaviors. Ultimately, it is the
responsibility of teachers and economy administrators to utilize the low cost and free options available
to them and within their classroom and community.
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
144 Insan Akademika Publications
These concerns and limitations of token economies are genuine and should be addressed in one way or
another; however, they are no reason to cease implementation of a token economy. All concerns and
limitations listed above and throughout this literature review can be mitigated through careful review
and modification of the token economy. Concerns may be best addressed through meaningful
communication between the token economy manager and the concerned individual. Communication
an education of the teachers, parents, and community members may help reduce the concerns and
likelihood that public distress may preemptively end the token economy in the classroom.
4 Suggestions for Future Research
It is important to elaborate on and conduct further research on token economies with a variety of
settings, participants, and modifications. As this behavior management system has seen wide-range
success in increasing target behaviors, while decreasing others, it is important to expand the scope of
utilization of the token economy. More studies with older participants should be conducted. Notably,
research should be completed with students in middle and high schools; in particular, research
implemented with older students diagnosed with emotional, behavioral, and social disabilities would
benefit students and teachers significantly.
Additionally, it is important to evolve teacher education programs to where new teachers have strong
classroom management foundations. Successful classroom management techniques are crucial to
successful teaching and student learning: token economies are an important aspect of classroom
management which teachers could implement. Beyond learning the techniques available to teachers in
their programs, instilling a meaningful knowledge of behavioral principles are important for successful
classroom management and token economy implementation in particular.
Another suggestion for future research relates to maintenance of certain target behaviors which were
reinforced in a token economy. Maintenance of skills is crucial for real world application and long-
term success. Sustainment of behavioral gains is important to the teacher‟s target behavior goals,
long-term success for the student, and various social rewards. Research which elaborates on
maintenance realities of behavior post-token reinforcement would be helpful for practitioners on how
best to continue the gains made during a token economy. Within the area of back-up reinforcers, the
type of item used may help to strengthen the long-term sustainability and maintenance of the token
system. Research which discusses whether more natural reinforcers, which are part of the setting in
which the participants live, work, or are taught, are more effective and sustainable than more abstract
or artificial rewards or reinforcers.
5 Analysis and Conclusions
Ultimately, token economies have been found to be an effective method of behavior management
across various settings. This analysis has compiled evidence of effectiveness across school and
community settings; however, token- reinforcement systems have seen remarkably diverse
applications in prisons, military organizations, and psychiatric hospitals. Based on this collection of
studies, it is important to note the trends which exist in the modern implementation of the token
economy; particularly, the populations most often studied and the types of modifications implemented
across varied settings. In order to effectively implement a token economy, it is important to fully
understand the principles of behavior, the variety of token systems, and how to manipulate the
conditions of the token economy in order to best serve the needs of a particular group or setting.
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 145
Based on the review of literature, it seems there has been a decline in the quantity of research articles
of token economies throughout the past several decades. The works referenced in this review illustrate
the great majority of articles are dated before 1990. Moreover, each decade from 1960, 1970, and
1980 resulted in an average of approximately three times the number of articles when compared to
each decade after. Clearly, based on the references reviewed for this article and searches completed on
various databases, token-economy research since the 1960‟s through 1980‟s has experienced a sharp
decline. There may not be a single explanation why this reduction in research has occurred in this
area; however, there are several possible reasons. One, the steep reduction of research could be a
result of overwhelming data and research on the topic‟s effectiveness. Another possibility could be a
decline in use as increasing numbers of school districts and communities have avoided using extrinsic
rewards, and token economy systems, to manage classrooms. Third, the reduction in research of token
economies could be attributed to researchers‟ concentration on novel management techniques or more
unique learning strategies. While these given reasons may or may not be the actual reason for the
decline in token research, they each have an important role in the discussion.
The reduction in research articles vetting the token economy since the around the 1970‟s leaves much
work to be done. The effectiveness of these systems in middle and high school has been addressed
only minimally. The same is true for higher education settings where token economies have shown to
be, so far, highly effective. Specifically, research deficits can be cited with the lack of completed
studies involving participants with emotional and behavioral disorders in the high school classroom.
These deficits should be remedied, especially if one of the reasons for the decline in research was a
result of the overwhelming attention the topic received in decades past. There are still areas within the
token economy that have not been adequately addressed. While the token economy is widely known it
is important to inform the education community of the potential for even greater utilization across an
even larger number of settings and populations.
In the research on token systems, there are certain settings where a reader is more likely to find a study
relating to the implementation of the behavior modification system. Elementary settings are much
more likely to implement a token-reinforcement system, based on the articles reviewed, than middle or
high school settings. The older and more senior a participant, the less likely there is to be a study on
effective behavior modification using a token-reinforcement. Of particular note, classrooms composed
of students with emotional, social, and behavioral disabilities have not widely implemented token
systems. Research with these high-needs populations would add knowledge to the field and enhance
behavior management in those classrooms. This could really be beneficial for those teachers working
in such classrooms.
An additional area of noticeable weakness within token economy literature is related to maintenance
and generalization of treatment effects both during and after program implementation (Kohn, 1999;
Turkewitz et al., 1975). Varying schedules of exchange from fixed (once per period or week) to a
more variable one (exchange from once a week to once every 3 weeks for example) may help to
mitigate maintenance concerns. Variable exchanges have been shown to increase maintenance of the
skill and to be effective (McLaughlin & Malaby, 1976). Also, additional research where the long-term
assessment of such outcomes is employed is clearly needed.
Acknowledgement
This research was completed in partial fulfillment for the requirements of the first author‟s Master‟s
Thesis in the Master of Initial Teaching (MIT) by the first author from the Department of Special
Education at Gonzaga University. The author would like to give particular thanks to various faculty
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
146 Insan Akademika Publications
members. Now teaching students with EBD in the Lake Washington School District. Requests for
reprints should be addressed to Christopher Doll, MIT, Lake Washington School District #414, Juanita
High School, 10601 NE 132
nd
St., Kirkland, WA 98034.
References
Addessi, E., Mancini, A., Crescimbene, L., & Visalberghi, E. (2011). How social context, token value,
and time course affect token exchange in Capuchin monkeys. International Journal of
Primatology, 32, 83-98.
Alberto, P., & Troutman, A. (2012). Applied behavior analysis for teachers (2
nd
ed.) Upper Saddle
River NJ: Pearson Education.
Alvord, J. (1975). Home token economy. Champaign, IL: Research Press.
Arnett, M. S. & Ulrich, R. C. (1975). Behavioral control in the home setting. Psychological Record,
25, 395-413.
Ayllon, T., & Azrin, N. (1968). The token economy. New York, NY: Appleton-Century-Crofts.
Barth, R. (1979). Home-based reinforcement of school behavior: A review and analysis. Review of
Educational Research, 3, 436-458.
Bailey, J. S., Wolf, M. M., & Phillips, E. L. (1970). Home-based reinforcement and the modification of
pre-delinquents‟ classroom behavior. Journal of Applied Behavior Analysis, 3, 223-233.
Bippes, R., McLaughlin, T. F., & Williams, R. L. (1986). A classroom token system in a detention
center: Effects for academic and social behavior. Techniques: A Journal for Remedial
Education and Counseling, 2, 126-132.
Birnbaum, P. (Ed.). (1962). A treasury of Judaism. New York, NY: Hebrew Publishing Company.
Boniecki, K. A., & Moore, S. (2003). Breaking the silence: Using a token economy to reinforce
classroom participation. Teaching of Psychology, 30, 224-227.
Brown, J., Montgomery, R., & Barclay, L. (1969). An example of psychologist management of
teaching reinforcement procedures in the elementary classroom. Psychology in the Schools, 6,
336-340.
Bushell, D. (1978). An engineering approach to the elementary classroom: The behavior analysis
follow-through project. In A. C Catania & T. A. Brigham (Eds.), Handbook of applied
behavior analysis: Social and instructional processes (pp. 525-563). New York, NY:
Irvington.
Bushell Jr., D., Wrobel, P. A., & Michaelis, M. L. (1968). Applying „group‟ contingencies to the
classroom study behavior of preschool children. Journal of Applied Behavior Analysis, 1, 55-
61.
Cantrell, R., Cantrell, M., Huddleston, C., & Woolridge, R. (1969). Contingency contracting with
school problems. Journal of Applied Behavior Analysis, 2, 215-220.
Carcopino, J. (1940). Daily life in ancient Rome. New Haven, CT: Yale University Press.
Chance, P. (2006). First course in applied behavior analysis. Long Grove, IL: Waveland Publishing
Christophersen, E. R., Arnold, C. M., Hill, D. W., & Quilitch, H. R. (1972). The home point system:
Token reinforcement procedures for application by parents of children with behavior
problems. Journal of Applied Behavior Analysis, 5, 485-497.
Cooper, J. O., Heron, T., & Heward, W. L. (2007). Applied behavior analysis (2
nd
ed.). Upper Saddle
River, NJ: Prentice-Hall Pearson Education.
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 147
Coupland, L., McGregor, S., & McLaughlin, T. F. (1981). Reduction of inappropriate noise through
the use of a token economy. B. C. Journal of Special Education, 5, 65-75.
Cowles, J.T. (1937). Food-tokens as incentives for learning by chimpanzees. Comparative Psychology
Monographs, 23, 1-96.
Crawford, D. J., & McLaughlin, T. F. (1982). Token reinforcement of on-task behavior in a secondary
special education setting. Behavioral Engineering, 7, 109-117.
Dickerson, F. B., & Tenhula, W. N. (2005). The token economy for schizophrenia: Review of the
literature and recommendations for future research. Schizophrenia Research, 75, 405-416.
Doolittle, J. (1865). Social life of the Chinese: With some account of their religious, governmental,
educational, and business customs and opinions, Volume 1. New York, NY: Harper &
Brothers.
Duran, F. D. (1964). The aztecs. New York, NY: Orion Press.
Filcheck, H. A., McNeil, C. B., Greco, L. A., & Bernard, R. S. (2004). Using a whole-class token
economy and coaching of teacher skills in a preschool classroom to manage disruptive
behavior. Psychology in the Schools, 41, 351-361.
Flaman, F., & McLaughlin, T. F. (1986). Token reinforcement: Effects for accuracy of math
performance and generalization to social behavior with an adolescent student. A Journal for
Remedial Education and Counseling, 2, 39-47.
Grant, M. (1967). Gladiators. London: Trinity Press.
Hall, R. V., Axelrod, S., Foundopoulos, M., Shellman, J., Campbell, R. A., & Cranston, S. S. (1972).
The effective use of punishment to modify behavior in the classroom. In K. D. O‟Leary & S.
G. O‟Leary (Eds.), Classroom Management: The successful use of behavior modification (pp.
173-182). New York, NY: Pergamon Press.
Iwata, B. A., & Bailey, J. S. (1974). Reward versus cost token systems: An analysis of the effects on
students and teacher. Journal of Applied Behavior Analysis, 7, 567-576.
Jalongo, M., Tweist, M., Gerlack, G., & Skoner, D. (1998). The college learner. Upper Saddle River,
NJ: Merrill.
Jordan, D., McLaughlin, T. F., & Hunsaker, D. (1980). The effects of monetary reinforcement on piano
practice in the home. Education and Treatment of Children, 3, 161-163.
Kazdin, A. E. (1977). The token economy: A review and evaluation. New York, NY: Plenum Press.
Kazdin, A. E. (1982). The token economy: A decade later. Journal of Applied Behavior Analysis, 5,
431-445.
Kazdin, A. E., & Bootzin, R. R. (1972). The token economy: An evaluative review. Journal of Applied
Behavior Analysis, 5, 343-372.
Kirigan, K. A., Braukman, C. J., Atwater, J. D., & Wolf, M. M. (1982). An evaluation of Teaching-
Family (Achievement Place) group homes for juvenile offenders. Journal of Applied Behavior
Analysis, 15, 1-16.
Kohn, A. (1999). Punished by rewards: The trouble with gold stars, incentive plants, A’s, praise and
other bribes. Boston, MA: Houghton Mifflin.
Lavigne, J. V., Gibbons, R. D., Christoffel, K. K., Arend, R., Rosenbaum, D., Binns, H., Dawson,
N., … Isaacs, C. (1998). Prevalence rates and correlates of psychiatric disorders among
preschool children. Journal of the American Academy of Child and Adolescent Psychiatry, 35,
204-214.
Lolich, E., McLaughlin, T. F., & Weber, K. P. (2012). The effects of using reading racetracks
combined with direct instruction precision teaching and a token economy to improve the
reading performance for a 12-year-old student with learning disabilities. Academic Research
International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
Doll, et. al.
148 Insan Akademika Publications
International, 2(3), 245-252. Retrieved from:
http://174.36.46.112/~savaporg/journals/issue.html
Maggin, D. M., Chafouleas, S. M., Goddard, K. M., & Johnson, A. H. (2011). A systematic evaluation
of token economies as a classroom management tool for students with challenging behavior.
Journal of School Psychology, 49, 529-554.
Maglio, C. L., & McLaughlin, T. F. (1981). Effects of a token reinforcement system and teacher
attention in reducing inappropriate verbalizations with a junior high school student. Corrective
and Social Psychiatry and Journal of Behavior Technology Methods and Therapy, 27, 140-
145.
Malagodi, E. F. (1967). Acquisition of the token-reward habit in the rat. Psychological Reports, 20,
1335-1342.
Matson, J. L., & Boisjoli, J. A. (2009). The token economy for children with intellectual disability
and/or autism: A review. Research in Developmental Disabilities, 30, 240-248.
McGoey, K. E., & DuPaul, G. J. (2000). Token reinforcement and response cost procedures: Reducing
the disruptive behavior of preschool children with Attention-Deficit/Hyperactivity Disorder.
School Psychology Quarterly, 15, 330-343.
McKenzie, H., Clark, M., Wolf, M., Kothera, R., & Benson, C. Behavior modification of children with
learning disabilities using grades as token reinforcers. Exceptional Children, 38, 745-752.
McLaughlin, T. F. (1981). The effects of a classroom token economy on math performance in an
intermediate grade school class. Education and Treatment of Children, 4, 139-147.
McLaughlin, T. F., & Malaby, J. E. (1972). Intrinsic reinforcers in a classroom token economy.
Journal of Applied Behavior Analysis, 5, 263-270.
McLaughlin, T. F., & Malaby, J. E. (1975). The effects of various token reinforcement contingencies
on assignment completion and accuracy during variable and fixed token exchange schedules.
Canadian Journal of Behavioral Sciences, 7, 412-419.
McLaughlin, T. F., & Malaby, J. E. (1976). An analysis of assignment completion and accuracy across
time under fixed, variable, and extended token exchange periods in a classroom token
economy. Contemporary Educational Psychology, 1, 346-355.
McLaughlin, T. F., & Malaby, J. E. (1977a). The comparative effects of token-reinforcement with and
without a response cost contingency with special education children. Educational Research
Quarterly, 2, 34-41.
McLaughlin, T. F., & Malaby, J. E. (1977b). A cost free token reinforcement program for special
education students. Corrective and Social Psychiatry and Journal of Behavior Technology
Methods and Therapy, 23, 111-116.
McLaughlin, T. F., & Williams, R. L. (1988). The token economy in the classroom. In J. C. Witt, S. N.
Elliot, & F. M. Gresham (Eds.). Handbook of behavior therapy in education (pp. 469-487).
New York, NY: Plenum.
Miller, P. M., & Drennen, W. T. (1970). Establishment of social reinforcement as an effectivce
modifier of verbal behavior in chronic psychiatric patients. Journal of Abnormal Psychology,
76, 392-395.
Neef, N. A., Bicard, D. F., & Endo, S. (2011). Assessment of impulsivity and the development of self-
control in students with attention deficit hyperactivity disorder. Journal of Applied Behavior
Analysis, 34, 397-408.
Nelson, K. G. (2010). Exploration of classroom participation in the presence of a token economy.
Journal of Instructional Psychology, 37, 49-56.
http://174.36.46.112/~savaporg/journals/issue.html
Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149
www.insikapub.com 149
O‟Leary, K. D., & Drabman, R. (1971). Token reinforcement programs in the classroom: A review.
Psychological Bulletin, 75, 379-398.
Packard, R. G. (1970). The control of „classroom attention‟: A group contingency for complex
behavior. Journal of Applied Behavior Analysis, 3, 13-28.
Rapport, M. D., Murphy, H. A., & Bailey, J. S. (1982). Ritalin vs. response cost in the control of
hyperactive children: A within-subject comparison. Journal of Applied Behavior Analysis, 15,
205-216.
Reed, D. D., & Martens, B. K. (2011). Temporal discounting predicts student responsiveness to
exchange delays in a classroom token system. Journal of Applied Behavior Analysis, 44, 1-18.
Ruesch, U., & McLaughlin, T. F. (1981) Effects of a token system using a free-time contingency to
increase assignment completion with individuals in the regular classroom. B. C. Journal of
Special Education, 5, 347-355.
Rustab, K. E., & McLaughlin, T. F. (1988). Reducing inappropriate behavior in the home with a token
economy. Behaviour Change, 5, 160-164.
Sran, S. K., & Borrero, J. C. (2010). Assessing the value of choice in a token system. Journal of
Applied Behavior Analysis, 43, 553-557.
Schmandt-Besserant, D. (1992). Before writing: Volume 1: From counting to cuneiform. Austin, TX:
University of Texas Press.
Skinner, B. F. (1966). What is the experimental analysis of behavior?. Journal of the Experimental
Analysis of Behavior, 9, 213-218.
Sousa, C., & Matsuzawa, T. (2001). The use of tokens as rewards and tools by chimpanzees (pan
troglodytes). Animal Cognition, 4, 213-221.
Stainback, W., Payne, J. S., Stainback, S., & Payne, R. A. (1973). Establishing a token economy in the
classroom. Columbus, OH: Merrill.
Swain, J. C., & McLaughlin, T. F. (1998). The effects of bonus contingencies in a classwide token
program on math accuracy with middle-school students with behavioral disorders. Behavioral
Interventions, 13, 11-19.
Thompson, M. J., McLaughlin, T. F., & Derby, K. M. (2011). The use of differential reinforcement to
decrease the inappropriate verbalizations of a nine-year-old girl with autism. Electronic
Journal of Research in Educational Psychology, 9, 183-196.
Thorne, G., Tharp, R., & Wetzel, R. (1967). Behavior modification techniques: New tools for
probation officers. Federal Probation.
Truchlicka, M., McLaughlin, T. F., Swain, J. C. (1998). Effects of token reinforcement and response
cost on the accuracy of spelling performance with middle-school special education students
with behavior disorders. Behavioral Interventions, 13, 1-10.
Turkewitz, H., O‟Leary, K. D., & Ironsmith, M. (1975). Generalization and maintenance of
appropriate behavior through self-control. Journal of Consulting and Clinical Psychology, 43,
577-583.
Walker, H., & Buckley, N. (1972). Programming generalization and maintenance of treatment effects
across time and across settings. Journal of Applied Behavior Analysis, 5, 209-224.
Williams, B. F., Williams, R. L, & McLaughlin, T. F. (1989). The use of token economies with
individuals who have developmental disabilities. In E. Cipani (Ed.), The treatment of severe
behavior disorders (pp. 3-18). Washington, DC: AAMR Publications.
Wolf, J. B. (1936). Effectiveness of token-rewards for chimpanzees. Comparative Psychology
Monographs, 12, 1-72.
EFFICACY OF AND PREFERENCE FOR REINFORCEMENT
AND
RESPONSE COST IN TOKEN ECONOMIES
ERICA S. JOWETT HIRST
SOUTHERN ILLINOIS UNIVERSITY
CLAUDIA L. DOZIER
UNIVERSITY OF KANSAS
AND
STEVEN W. PAYNE
STATE UNIVERSITY OF NEW YORK
Researchers have shown that both differential reinforcement and response cost within token
economies are similarly effective for changing the behavior of individuals in a group context
(e.g., Donaldson, DeLeon, Fisher, & Kahng, 2014; Iwata & Bailey, 1974). In addition, thes
e
researchers have empirically evaluated preference for these procedures. However, few previou
s
studies have evaluated the individual effects of these procedures both in group contexts and in
the absence of peers. Therefore, we replicated and extended previous research by determining
the individual effects and preferences of differential reinforcement and response cost under both
group and individualized conditions. Results demonstrated that the procedures were equal
ly
effective for increasing on-task behavior during group and individual instruction for most chil-
dren, and preference varied across participants. In addition, results were consistent across partici-
pants who experienced the procedures in group and individualized settings.
Key words: differential reinforcement, independent group contingency, preference, response
cost, token economy
The token economy is a common behaviora
l
intervention that has been demonstrated to be
effective for increasing appropriate behavior
and decreasing inappropriate behavior for many
populations across different settings (Doll,
McLaughlin, & Barretto, 2013; Hackenberg,
2009; Kazdin, 1977). Token economies involve
delivery, removal, or both delivery and removal
of conditioned reinforcers (e.g., tokens and
points) that can be exchanged for back-up rein-
forcers (e.g., prizes, treats, and leisure activ-
ities). When tokens are delivered contingent on
appropriate behavior or for the absence of inap-
propriate behavior, these procedures are termed
differential reinforcement of alternative behavior
(DRA) or differential reinforcement of other
behavior (DRO), respectively. When tokens are
removed contingent on inappropriate behavior
or for the absence of appropriate behavior, this
procedure is termed response cost (RC).
An advantage of token economies is that
they can be implemented with a group of indi-
viduals as a general behavior-management strat-
egy during small-group instruction or as a
classwide intervention. Classwide behavior-
management strategies such as token economies
should be considered to address minor disrup-
tive behavior, to increase motivation for learn-
ing, or as a complement to an individualized
intervention. However, general behavior-
management strategies may not be effective in
isolation for some individuals who engage in
severe problem behavior or have more intense
Correspondence concerning this article should be
addressed to Claudia L. Dozier, Department of Applied
Behavioral Science, University of Kansas, Lawrence, Kan-
sas 66045 (e-mail: cdozier@ku.edu).
doi: 10.1002/jaba.294
JOURNAL OF APPLIED BEHAVIOR ANALYSIS 2016, 49,
329
–345 NUMBER 2 (SUMMER
)
329
deficits in learning. These individuals may
require more individualized, function-based
assessment, intervention, and additional sup-
port. Regardless, token economies are common
in classrooms and numerous other environ-
ments because they are likely to create motiva-
tion for changes in behavior for most
individuals in the group, creating a more man-
ageable and effective learning environment.
After numerous studies were conducted to
demonstrate the effectiveness of reinforcement
and RC procedures in token economies,
researchers began to compare the effectiveness
of these two procedures (e.g., Brent & Routh,
1978; Broughton & Lahey, 1978; Iwata & Bai-
ley, 1974; Panek, 1970). Overall, most studies
that have compared differential reinforcement
(DR) to RC have demonstrated equal effective-
ness of the two procedures (e.g., Capriotti,
Brandt, Ricketts, Espil, & Woods, 2012;
Donaldson, DeLeon, Fisher, & Kahng, 2014;
Iwata & Bailey, 1974; McGoey & DuPaul,
2000). However, these results are limited in
two important ways. First, most studies
involved the use of group contingencies (i.e.,
the implementation of the procedures in the
context of a group in which others are present),
which may have influenced responding. For
example, comments made or behaviors mod-
eled by others in the group may have influ-
enced target responding. Second, most studies
reported only group averages with respect to
target behavior, which does not allow analysis
of individual differences. For example, Iwata
and Bailey (1974) compared DRO and RC for
decreasing rule violations and increasing on-
task behavior of 15 children in a classroom.
During DRO, tokens were delivered at the end
of a 3- to 5-min interval if no rule violations
occurred during that interval. During RC,
tokens were removed at the end of an interval
if any rule violations occurred during that inter-
val. The children could earn or lose up to
10 tokens throughout a 30-min math period,
and the tokens could be exchanged for snacks
and free time. Results showed that the proce-
dures were similarly effective for reducing rule
violations and off-task behavior. However, the
authors reported group averages, which may
not be representative of individual responding.
Furthermore, because the study was conducted
as a group intervention, the influence of peer
behavior on target responding is unknown.
More recently, Donaldson et al. (2014) com-
pared DRO and RC for decreasing the disrup-
tive behavior of 12 first-grade students.
Although the procedures were implemented in
a group context, the authors reported both
group-average outcomes and individual out-
comes. Group-average data showed low to zero
levels of problem behavior; however, an analysis
of individual data showed that responding dur-
ing DRO was somewhat variable for four of
the 12 participants. Although this study, along
with Iwata and Bailey (1974) and most others,
provides preliminary evidence regarding the
effectiveness of reinforcement and RC when
used in a token economy, because the proce-
dures were implemented in a group context,
the influence of peers on target responding is
unknown. For example, individuals may show
an increase or decrease in target behavior
because their peers are (a) engaging in a target
behavior, (b) prompting them to engage in a
target behavior, (c) providing reinforcers (e.g.,
attention) for them to engage in appropriate
target behavior, (d) implementing punishers
(e.g., reprimands) for not engaging in a target
behavior (Salend & Kovalich, 1981), or
(e) extinguishing previously reinforced target
behavior (e.g., no longer delivering attention).
Therefore, to further isolate the effects of rein-
forcement and RC contingencies in token
economies, conducting the comparison while
students work independently or are otherwise
not in the presence of others might be
important (Capriotti et al., 2012; Sindelar,
Honsaker, & Jenkins, 1982). Furthermore,
comparing responding of a single individual
when in the presence and absence of peers to
ERICA S. JOWETT HIRST et al.330
determine whether changes in responding are
associated with the presence or absence of peers
would be useful.
In addition to comparing the effectiveness of
DR and RC procedures in individual and
group contexts, considering preference is also
important; however, only two studies that have
compared DR and RC in token economies
have empirically evaluated preferen
ce
(Donaldson et al., 2014; Iwata & Bailey,
1974). Iwata and Bailey (1974) compared the
effects of DRO and RC for reducing disruptive
classroom behavior displayed by 15 elementary
school special-education students. To deter-
mine preference across the procedures, the
experimenters conducted a choice assessment
during which each child was given the opportu-
nity to select which token procedure would be
implemented for a particular session. After all
children made a selection, the chosen token
procedure was implemented for each child. The
results showed that four students chose DRO
most often, five students chose RC more often,
and six students switched their selection across
opportunities. Donaldson et al. (2014) used a
similar procedure and found that six of the
12 children preferred RC, four children pre-
ferred DRO, and two children had approxi-
mately equal preference.
These studies provide evidence that prefer-
ence varies among individuals; however, the
results are limited, at least in Donaldson
et al. (2014), because children made selections
vocally and in the presence of their peers
(Iwata & Bailey, 1974, did not provide infor-
mation regarding how or where children made
a selection). Therefore, some children’s selec-
tions may have been influenced by the presence
or behavior (e.g., choices or comments) of their
peers (Donaldson et al., 2014). To isolate indi-
vidual preference, it is important to conduct a
preference assessment when the child is not in
the presence of his or her peers (e.g., Layer,
Hanley, Heal, & Tiger, 2008). For example,
Layer et al. (2008) presented choices on an
upright board in front of each child with the
choices facing the child (not visible to other
children) and then had the child use a motor
response (i.e., pointing), rather than a vocal
response (i.e., stating which procedure he or
she liked best), to make his or her selection.
This procedure controlled for both visual and
auditory observation of other children’s choice.
Overall, given the demonstrated effectiveness
of DR and RC but unknown influence of peers
and lack of empirical data for preference in the
absence of peers, further research is warranted.
The current study involved several evaluations
that replicate and extend previous research. The
purpose of the first evaluation was to replicate
research directly comparing the effectiveness of
DR and RC procedures in a group setting. The
second purpose was to provide a direct compar-
ison of the effectiveness of DR and RC proce-
dures for the on-task behavior of individual
children engaged in a solitary work task. The
third purpose was to evaluate individual prefer-
ence of all children in the absence of peers.
Finally, responding of individuals who partici-
pated in both the small-group activity and the
solitary work task was compared to determine
if the presence of peers influenced responding.
STUDY 1: DR VERSUS RC (GROUP)
Method
Participants and setting. Three groups of
three typically developing preschool-aged (3 to
5 years old) children who attended a university-
based preschool program participated. All chil-
dren could follow multistep instructions (e.g.,
walk to your cubby, hang up your jacket, and
come sit on the floor) and communicated using
vocal speech. We conducted sessions 3 to 5 days
per week, once or twice per day, in a quiet area
of the classroom separate from all other chil-
dren. During each session, only one group of
participants was present. Participants sat next
to one another on the floor on designated mats
across from the experimenter, and one to two
331REINFORCEMENT AND RESPONSE COST
data collectors and relevant session materials
were present.
Materials. During all sessions, small-group
activity materials were present. Materials
included plastic letters and numbers for expres-
sive labeling and individual bingo boards with
various items (i.e., plastic buttons and jewels)
for matching. During some sessions, tokens
(i.e., pennies) were present that could be earned
or lost. Tokens were attached to and removed
from laminated strips of paper (approximately
10.2 cm by 30.5 cm) with 10 square pieces of
Velcro. Participants earned access to a toy roo
m
with tangible items (e.g., stickers, plastic rings,
spin tops, sticky hands), edible items (e.g.,
gummies, Smarties, Skittles, and M&Ms), and
leisure activities (e.g., video games and DVDs)
via token exchange following some sessions
(DR and RC). Different-colored materials (pos-
ters and token boards) were present during
each of the different conditions to aid in dis-
crimination between conditions.
Response measurement and interobserver agree-
ment. Trained graduate and undergraduate stu-
dents collected data using paper and a pencil.
The dependent variable was percentage of
intervals with on-task behavior. We defined on-
task behavior as sitting on a mat (i.e., bottom
on the mat), keeping hands to oneself (i.e.,
keeping hands in lap unless instructed to
manipulate activity materials), and sitting
quietly (i.e., talking only when the experi-
menter asked or called on the participant to
respond). We partitioned sessions into 5-s
intervals and scored on-task behavior for each
child using a momentary-time-sample proce-
dure. That is, at the end of every 5-s interval
(signaled by an auditory cue), the data collector
scored whether each child was on task at that
moment. After each session, we collected data
for on-task behavior of an individual child by
dividing the number of intervals on task by the
total number of intervals in the session and
converting the result to a percentage. In addi-
tion, for two groups, experimenters collected
data on the number of tokens that remained on
each participant’s board at the end of a DR ses-
sion or the number of empty spaces on each
participant’s board at the end of each RC ses-
sion. We later subtracted the number of empty
spaces counted after RC sessions from 10 to
compare number of net tokens in each session.
Two independent observers collected data
for at least 30% of sessions and then calculated
interobserver agreement for on-task behavior by
dividing the number of 5-s intervals during
which both observers agreed by the total num-
ber of intervals and converting the result to a
percentage. We defined an agreement for on-
task behavior as both observers scoring or not
scoring the occurrence of the behavior in a
given interval. We calculated interobserver
agreement for token count using the total
method. That is, we divided the smaller num-
ber of tokens that remained on a board (at the
end of each DR session) or were missing from
the board (at the end of each RC session) by
the larger number and converted the result to a
percentage. Interobserver agreement averaged
93% (range, 73% to 100%) for on-task behav-
ior and 99% (range, 88% to 100%) for token
count.
Procedure. All sessions lasted 5 min. During
all sessions, the participants sat next to one
another and in front of the experimenter in a
small area away from the other children in the
classroom. In addition, the experimenter placed
bingo boards with pieces and token boards
(in some sessions) in front of each participant
and a colored poster board on the wall in front
of the children. Before the start of the first ses-
sion of each condition, the experimenter
described the rules and the session contingen-
cies and required each participant to practice
engaging in related behaviors (e.g., sitting
quietly, talking out of turn, keeping hands in
lap, and touching materials) to experience the
consequences associated with each behavior.
During the 5-min sessions, the experimenter
provided continuous individual and group
ERICA S. JOWETT HIRST et al.332
instructions to name letters and numbers (e.g.,
the experimenter held up a plastic letter and
said, “Caroline, what letter is this?” and “Can
everybody tell me what letter this is?”) and
place a marker on a specific bingo board letter
or number (e.g., “Ok everyone, put a gem on
the letter d”). The experimenter delivered sev-
eral instructions during a session in a way that
was similar to instructions delivered during a
classroom activity; however, the rate at which
instructions were provided varied depending on
responding. During all sessions, if a child
(or children) responded correctly, the experi-
menter delivered praise, and if any child did
not respond correctly, the experimenter
prompted the correct response and then moved
on to another instruction.
First, the experimenter conducted baseline
sessions to determine the level of on-task
behavior in the absence of programmed conse-
quences. Next, the experimenter practiced
token trading with the participants. That is, the
experimenter gave each child tokens and the
opportunity to trade the tokens for various
items (e.g., prizes and snacks). Next, we com-
pared DR and RC to determine their effects on
on-task behavior. During DR and RC sessions,
the experimenter observed each participant in
the group at the same moment every 30 s on
average (ranging from 15 to 45 s) according to
a schedule based on a pseudorandom number
generator in Excel. We created three versions of
the schedule and rotated across sessions to
reduce the likelihood that the participants
would learn a schedule. During each scheduled
observation and depending on the condition,
the experimenter quietly delivered a token to
every child who was on task at that moment
(DR) or removed a token from any child who
was off task at that moment (RC). The experi-
menter did not say anything when delivering or
removing a token. We used the same schedules
across both conditions; therefore, the possible
number of net tokens across conditions was
equal (i.e., 10 tokens). In addition, the last
opportunity to earn or lose a token was at the
last second of each session; therefore, no partic-
ipant could earn or lose all tokens before the
end of the session.
After each DR and RC session, an experi-
menter took the participant to a room that
contained many different toys, leisure activities,
edible items, and trinkets that were not found
in the preschool classroom and gave the partici-
pant the opportunity to trade tokens for edible
items or trinkets or engagement with a toy or
leisure activity. A participant could trade one
token for 1 min to play with a toy or leisure
activity, one token for one edible item to con-
sume, or three tokens for one trinket to take
home. Each participant could spend the num-
ber of tokens he or she had for any combina-
tion of the above. All participants traded all
tokens at the end of a session. We used a mul-
tielement design in which we rapidly alternated
baseline, RC, and DR conditions to compare
the effects of the different procedures on on-
task behavior.
Baseline. Before the start of all baseline ses-
sions, the experimenter described the rules and
contingencies for the session and posted a white
board on the wall in front of the participants.
The experimenter stated the rules as follows:
“Today it’s white, and there are no tokens.
When we start, you need to sit on your mat,
keep your hands to yourself, and raise your
hand to talk.” During the session, the experi-
menter did not provide any programmed con-
sequences for any behavior, with the exception
of responses to correct and incorrect responding
(as mentioned above).
Differential reinforcement. Before the start of
all DR sessions, the experimenter described the
rules and contingencies for the session, posted a
green poster board on the wall in front of the
participants, and placed a green board with no
tokens on the floor in front of each participant.
The experimenter stated the rules as follows:
“Today you get the green board, and it doesn’t
have any tokens. If you stay on your mat, keep
333REINFORCEMENT AND RESPONSE COST
your hands to yourself, and raise your hand to
talk, you will get a token. If you get off your
mat, touch your friends, or talk during some-
one else’s turn, you will not get any tokens.
When small group is done, you can trade
your tokens for prizes and candy. If you don’t
have any tokens, you don’t get anything.”
Each participant had his or her own token
board. Throughout the session, the experi-
menter watched a timer, and during a sched-
uled observation, placed a token on the token
board of any participant who was on task.
The experimenter did not deliver any pro-
grammed consequences for participants who
were not on task.
Response cost. Before the start of all RC ses-
sions, the experimenter described the rules and
contingencies for the session, posted a red
poster board on the wall in front of the partici-
pants, and placed a red board with 10 tokens
in front of each participant. The experimenter
stated the rules as follows: “Today you get the
red board, and it has 10 tokens. If you stay on
your mat, keep your hands to yourself, and
raise your hand to talk, you will keep your
tokens. If you get off your mat, touch your
friends, or talk during someone else’s turn, you
will lose tokens. When small group is done,
you can trade your tokens for prizes and candy.
If you don’t have any tokens, you don’t get
anything.” During the session, the experi-
menter followed the variable momentary obser-
vation schedule as in the DR condition;
however, when a scheduled observation
occurred, the experimenter did not deliver con-
sequences for any participant who was on task
and removed a token from any participant’s
token board who was not on task.
Choice. When we observed stable levels of
responding in the DR and RC phases for
each participant, we conducted a preference
assessment to determine the procedure that
each participant preferred. We conducted this
evaluation with Groups 2 and 3 only because
one participant in Group 1 left the preschool
before evaluation of preference. We used a pro-
cedure similar to that used by Layer
et al. (2008) to evaluate preference. Before each
session, the experimenter placed the stimuli
(i.e., different-colored token boards and materi-
als) associated with each type of condition (i.e.,
baseline, RC, and DR) on the floor where the
experimenter conducted sessions. We presented
the DR token board without tokens present
and the RC token board with all tokens on the
board. Near each of the token boards was a
small strip of paper that matched the color of
the stimuli (e.g., a green strip of paper was
placed in front of the the DR token board).
The experimenter called each participant to the
small-group area one at a time and reminded
him or her of the contingencies associated with
each set of materials. Next, the experimenter
asked the participant to pick which session he
or she liked best by placing the colored strip of
paper associated with the selected condition
into a canvas bag. When the participant made
a selection, he or she was asked to go play in
another area of the classroom until this proce-
dure was repeated with each participant. This
method reduced the likelihood that a partici-
pant’s choice would be influenced by other
children’s prompts or comments or by obser-
ving the choices of other members in the
group. Although it is possible that children
could have discussed their choices with a peer
before his or her selection, informal observa-
tions suggest that this did not occur. However,
we did observe participants occasionally discuss
their choices after all participants had made a
selection. After all participants independently
made a selection, the experimenter called them
to the small-group area, drew a color from the
bag, then explained the contingencies in place
for the chosen session. After the experimenter
had explained the contingencies for the chosen
procedure, the experimenter implemented the
type of session chosen as described above. We
determined individual preference by counting
the number of selections of each procedure; the
ERICA S. JOWETT HIRST et al.334
procedure that an individual selected most
often was identified as the preferred procedure.
During the choice phase, we calculated inter-
observer agreement for selection of a procedure
using a total agreement method. That is, we
scored an agreement if both observers agreed
which procedure the participant selected and a
disagreement if the two observers disagreed.
Thus, interobserver agreement for selection
of a procedure for a particular session was
either 100% (the two observers agreed) or 0%
(the two observers disagreed). Interobserver
agreement for selection was 100% for all
participants.
Results
Figure 1 displays graphs of the percentage of
intervals of on-task behavior for all participants
in Groups 1, 2, and 3 and individual cumula-
tive selections and experimenter-selected proce-
dures during the choice phase for Groups
2 and 3. During the initial baseline, most parti-
cipants engaged in moderate to low levels of
on-task behavior, although participants in
Group 1 engaged in somewhat higher levels of
on-task behavior. When we compared DR and
RC, we observed similarly high levels of on-task
behavior for six of the nine participants (93%
during DR and 95% during RC) and higher
levels of on-task behavior during RC for three
participants (Adam, Molly, and Carl). When
we evaluated preference, one participant
switched his selections but selected DR more
than RC (Paul), two participants switched their
selections but selected RC more than DR (Judy
and Molly), and three participants selected
RC
exclusively (Carl, Jack, and Lance).
Table 1 provides a summary of results with
respect to percentage of selections during the
choice phase and average net tokens yielded
during the DR and RC comparison phase. We
did not evaluate preference or calculate net
tokens for Group 1; therefore, Table 1 includes
data only for participants in Groups 2 and
3. Preference results show that one participant
chose DR more than RC (Paul), and the other
five participants chose RC more than
DR. Also, three of six participants had an aver-
age difference of at least 0.5 tokens between
the two procedures, and all three participants
(Molly, Carl, and Lance) preferred response
cost, which was the procedure for which more
net tokens were yielded.
STUDY 2: DRA VERSUS RC
(INDIVIDUAL)
Method
The purposes of Study 2 were twofold. The
first purpose was to replicate Study 1 by com-
paring the effectiveness of and preference for
DR and RC in the context of an independent
work task. The second purpose was to compare
responding of participants in Studies 1 and
2 to evaluate the influence of the presence of
peers.
Participants and setting. Thirteen typically
developing preschool-aged (3 to 5 years old)
children (three of whom participated in Study
1) and one child with cerebral palsy (Brianna),
who were enrolled in a university-based pre-
school program, participated. All children could
follow multistep instructions and communi-
cated using vocal speech. We conducted ses-
sions 3 to 5 days per week, once or twice per
day, in session rooms that contained tables,
chairs, and relevant session materials. The
experimenter, one participant, and one or two
data collectors were present for each session.
Materials. During all sessions, we placed
worksheets with printed letters and shapes and
markers on a child-sized table, and two chairs
were available for the child and experimenter.
In addition, we placed toys from the preschool
classroom (e.g., puzzles, dolls, toy cars, coloring
book, and crayons) on the floor on the opposite
side of the session room. Tokens were identical
to those used in Study 1. We also used
different-colored token boards and poster
335REINFORCEMENT AND RESPONSE COST
02550751
0
0
B
L
D
R
v
s
R
C
A
da
m
B
L
D
R
v
s
R
C
C
ho
ic
e Pa
ul
0246810
B
L
D
R
v
s
R
C
C
ar
l
C
ho
ic
e
025507510
0
L
uk
e
Ju
dy
0246810
Ja
ck
0
5
10
15
20
025507510
0
A
nn
a
0
5
10
15
20
25
30M
ol
ly
5
10
15
20
25
30
0246810
L
an
ce
G
ro
u
p
1
G
ro
u
p
2
G
ro
u
p
3
% Intervals (On task)
S
es
si
on
s
Cumulative Selections
F
ig
u
re
1.
P
er
ce
n
ta
ge
of
on
-t
as
k
be
h
av
io
r
fo
r
A
da
m
,
L
u
ke
,
an
d
A
n
n
a
(G
ro
u
p
1)
;
P
au
l,
Ju
dy
,
an
d
M
ol
ly
(G
ro
u
p
2)
;
an
d
C
ar
l,
Ja
ck
,
an
d
L
an
ce
(G
ro
u
p
3)
du
ri
n
g
R
C
(fi
lle
d
ci
rc
le
s)
an
d
D
R
(fi
lle
d
tr
ia
n
gl
es
),
ba
se
lin
e
(o
pe
n
sq
u
ar
es
),
an
d
cu
m
u
la
ti
ve
se
le
ct
io
n
s
(o
pe
n
ci
rc
le
s
fo
r
R
C
se
le
ct
io
n
s
an
d
op
en
tr
ia
n
gl
es
fo
r
D
R
se
le
ct
io
n
s)
.
T
h
e
fi
lle
d
da
ta
po
in
ts
gr
ap
h
ed
al
on
g
th
e
le
ft
y
ax
is
du
ri
n
g
th
e
ch
oi
ce
ph
as
e
(G
ro
u
ps
2
an
d
3)
re
pr
es
en
t
pe
rc
en
ta
ge
of
i
n
te
rv
al
s
on
ta
sk
du
ri
n
g
th
e
co
n
di
ti
on
th
e
ex
pe
ri
–
m
en
te
r
se
le
ct
ed
.
ERICA S. JOWETT HIRST et al.336
boards to aid in the discrimination between the
conditions as in Study 1. Furthermore, partici-
pants earned access to the same toy room used
in Study 1 after some sessions; however, some
of the toys changed over time.
Response measurement and interobserver agree-
ment. Trained graduate and undergraduate stu-
dents collected data using handheld computers.
The dependent variable during all sessions was
percentage of intervals of on-task behavior. We
defined on-task behavior as the first instance of
walking to the work table, the first instance of
removing the lid of the marker, moving the
marker approximately within the boundaries of
the printed lines of a worksheet, and turning
over pages to access a new worksheet. We did
not score on-task behavior if the participant
was scribbling or drawing pictures on the work-
sheet or making patterns (e.g., dashed lines or
dots) within the printed boundaries of the let-
ters or shapes. We partitioned sessions into 5-s
intervals and scored on-task behavior using
partial-interval recording. That is, we scored
on-task behavior if it occurred during any por-
tion of the 5-s interval. Next, we converted
data to a percentage by dividing the number of
intervals during which the child was on task by
the total number of intervals in the session. We
also collected data on the frequency of token
delivery (i.e., when the experimenter placed a
token on the token board) and token removal
(i.e., when the experimenter removed a token
from the token board).
We calculated interobserver agreement for
on-task behavior as in Study 1 and calculated
interobserver agreement coefficients for token
delivery or removal by dividing the session time
into 5-s intervals and comparing observer data
on an interval-by-interval basis. If exact agree-
ment occurred (i.e., both observers scored or
did not score a token delivery or removal
within a 5-s interval), we gave a score of 1 for
that interval. For any disagreements, we divided
the smaller score in each interval by the larger.
We then summed interval scores, divided them
by the total number of observation intervals,
and converted the result to a percentage. Inter-
observer agreement for on-task behavior was
93% (range, 73% to 100%) and for token
delivery or removal it was 96% (range, 78%
to 100%).
Design. We used a multielement design for
10 participants to compare the effects of the
different procedures on on-task behavior, and
we conducted sessions in a quasirandom order.
In addition, for two of these participants, we
used a reversal design following the multiele-
ment design to rule out discrimination failure
or carryover effects during the multielement
comparison. However, because we conducted
the reversal designs after the participants had a
history of both procedures, we used a reversal
design with four participants to determine
levels of responding during DRA before and
after a history of RC.
Procedure. All sessions lasted 5 min. Before
the first session of each condition, the experi-
menter described the session contingencies and
required the participant to practice engaging in
related behaviors (i.e., tracing or playing with
toys) to experience the consequences associated
with each behavior, as in Study 1. For example,
the experimenter required the participant to
practice tracing by providing a vocal and model
prompt (i.e., “Try tracing like this,” while
demonstrating tracing), and used physical guid-
ance as necessary. After the participant prac-
ticed tracing, the experimenter provided the
Table 1
Percentage of Selections and Average Net Tokens Yielded
for Participants in Study 1 (Group Analysis)
% selections Average net tokens
Participant Group DR RC DR RC
Paul 2 67 33 9.8 9.9
Molly 2 22 78 8.5 9.4
Judy 2 11 89 9.4 9.0
Carl 3 0 100 7.3 9.1
Jack 3 0 100 9.1 9.1
Lance 3 0 100 8.9 9.6
337REINFORCEMENT AND RESPONSE COST
relevant consequences and repeated the contin-
gency for that particular phase (e.g., “Look,
you got a token because you were tracing.”).
Before the start of each subsequent session dur-
ing a particular phase, the experimenter
described the session contingencies (see condi-
tion descriptions below).
First, we conducted baseline sessions to
determine the level of on-task behavior in the
absence of programmed consequences. Next,
the experimenter practiced token trading
with the participant, as in Study 1. During
DRA and RC sessions, the experimenter deliv-
ered or removed tokens according to the
same variable momentary schedule used in
Study 1; however, the experimenter conducted
observations on a fixed 30-s schedule for
four participants (Brianna, Mark, Zoey, and
Sam), who participated later in the study, to
simplify data collection. In addition, after each
DRA and RC session, participants traded
tokens for prizes, candy, and access to leisure
items.
Baseline. Before the start of all baseline ses-
sions, the experimenter described the rules and
contingencies for the session and placed a white
board with no tokens near the participant. The
experimenter stated the rules as follows: “Today
you get the white board, and there are no
tokens. When we start, you can either work
on tracing or play with toys. If you are
working (i.e., tracing), nothing will happen, if
you are not working, nothing will happen.”
During the session, the experimenter did not
provide programmed consequences for any
behavior.
Differential reinforcement of alternative behav-
ior. Before the start of all DRA sessions, the
experimenter described the rules and contin-
gencies for the session and placed a green board
with no tokens near the participant. The exper-
imenter stated the rules as follows: “Today you
get the green board, and it doesn’t have any
tokens on it. When we start, you can either
work on tracing or play with toys. If you are
working, you will get a token; if you are not
working, you will not get a token. At the end,
you can trade your tokens for prizes and
snacks. If you don’t have any tokens, you don’t
get anything.” Throughout the session, the
experimenter watched a timer. If the partici-
pant was on task at the time of a scheduled
observation, the experimenter placed a token
on the token board. If the participant was not
on task at the time of the scheduled observa-
tion, the experimenter did not provide any pro-
grammed consequences.
Response cost. Before the start of all RC ses-
sions, the experimenter described the rules and
contingencies for the session and placed a red
board with 10 tokens near the participant. The
experimenter stated the rules as follows: “Today
you get the red board, and it has 10 tokens on
it. When we start, you can either work on trac-
ing or play with toys. If you are working, you
will keep your tokens; if you are not working,
you will lose tokens. At the end, you can trade
your tokens for prizes and snacks. If you don’t
have any tokens, you don’t get anything.”
Throughout the session, the experimenter
watched a timer. If the participant was on task
at the time of a scheduled observation, the
experimenter did not provide any programmed
consequences. If the participant was not on
task at the time of a scheduled observation, the
experimenter removed a token from the token
board.
Choice. When we observed stable levels of
responding in the DRA and RC evaluations,
we conducted a preference assessment to deter-
mine the procedure that each participant pre-
ferred. Before each session, the experimenter
placed the stimuli (i.e., poster and token
boards) associated with each type of condition
(i.e., baseline, RC, and DRA) near the partici-
pant and reminded him or her of the contin-
gencies associated with each set of materials.
For example, the experimenter reminded the
participant that the white board means that
there are no tokens; the green board means that
ERICA S. JOWETT HIRST et al.338
he or she can earn tokens if he or she is tracing;
and the red board means that he or she could
keep his or her tokens if he or she is tracing.
The experimenter switched the placement of
the different sets of stimuli and materials each
session. After the experimenter reminded the
participant of the contingencies associated with
each set of materials, the experimenter asked
the participant to pick (by pointing to or
touching a set of materials) which session he or
she wanted to do. When the participant made
the selection, the experimenter explained the
contingencies in place for the session (e.g.,
“You picked green, you will get a token when I
see that you are working on tracing.”). After
the participant chose a procedure, the experi-
menter implemented the chosen type of session
as described above. The experimenter con-
ducted sessions until we observed a stable pat-
tern of selections. During the choice phase, we
calculated interobserver agreement as in Study
1; it was 100% for all participants.
Results
Figure 2 shows the results for 10 of the
14 participants. During the initial baseline, all
participants engaged in moderate to low levels
of on-task behavior, and these levels remained
low throughout the evaluation (with the excep-
tion of Adam, Frank, and Martin, who engaged
in variable levels of on-task behavior during
baseline). When we compared DRA and RC
using a multielement design, we observed
(a) similar levels of on-task behavior for eight
of the 10 participants (average of 88% during
DRA and 85% during RC), (b) higher levels of
on-task behavior during DRA for one partici-
pant (Emily; 94% during DRA and 82% dur-
ing RC), and (c) higher levels of on-task
behavior during RC for one participant (Adam;
47% during DRA and 65% during RC). When
we compared DRA and RC using a reversal
design for two participants (Anna and Caro-
line), we observed similar and high levels of on-
task behavior as during the multielement evalu-
ation. When we evaluated preference, two par-
ticipants selected DRA exclusively (Paul and
Frank), three participants switched their selec-
tions but selected DRA more than RC (Martin,
Emily, and Adrianna), three participants
switched their selections but selected RC more
than DRA (Elisa, Adam, and Anna), and two
participants selected RC exclusively (Collin and
Caroline).
Figure 3 shows the results for Brianna,
Mark, Zoey, and Sam. During baseline ses-
sions, all participants engaged in low to zero
levels of on-task behavior. When we compared
DRA and RC using a reversal design only, we
observed similar and high levels of on-task
behavior for three of the four participants
(Brianna, Mark, and Zoey); however, we
observed higher levels of on-task behavior dur-
ing RC for one participant (Sam; 62% during
DRA and 90% during RC). These data suggest
that a history of response cost is not likely to
influence responding during DRA.
Table 2 provides a summary of results from
Study 2 with respect to the percentage of selec-
tions in the choice phase and the net tokens
yielded for participants during the DRA and
RC comparison phases. We evaluated prefer-
ence for 10 of the 14 participants and calcu-
lated net tokens for all participants. Preference
results show that five participants chose DR
more than RC and five chose RC more than
DR. Although these results are similar to those
of previous studies (e.g., Donaldson et al.,
2014; Iwata & Bailey, 1974), these results were
somewhat different than those of Study 1. That
is, the majority of participants preferred RC in
Study 1, but only half of the participants pre-
ferred RC in Study 2. Also, five of the 10 parti-
cipants in Study 2 for which we also assessed
preference had an average difference of at least
0.5 tokens between the two procedures, and
four of these five participants (Frank, Paul,
Adam, and Anna) preferred the procedure that
yielded more net tokens.
339REINFORCEMENT AND RESPONSE COST
GENERAL DISCUSSION
Overall, DR and RC were effective proce-
dures for increasing the on-task behavior of the
majority of children who participated in a
group activity (Study 1), and these findings
replicated those of previous research (e.g.,
Donaldson et al., 2014; Iwata & Bailey, 1974).
However, similar to Donaldson et al. (2014)
and Tanol, Johnson, McComas, and Cote
(2010), the procedures were differentially effec-
tive for some individuals in the group, which
suggests that analyzing individual data is
5 10 15 20 25 30 35 40 45 50
0
20
40
60
80
100
BL
Paul
DRA vs RC Choice
5 10 15 20 25 30 35 40
0
20
40
60
80
100
BL
Elisa
DRA vs RC Choice
5 10 15 20 25 30
0
20
40
60
80
100
BL
Frank
ChoiceDRA vs RC
10 20 30 40 50 60 70
0
20
40
60
80
100
BL
Adam
DRA vs RC Choice
5 10 15 20 25
0
20
40
60
80
100
BL
Martin
DRA vs RC Choice
5 10 15 20 25 30 35 40
0
20
40
60
80
100
BL
Collin
DRA vs RC Choice
5 10 15 20 25 30 35 40 45
0
20
40
60
80
100
BL
Emily
DRA vs RC Choice
10 20 30 40 50 60 70 80
0
20
40
60
80
100
BL
Anna
DRA vs RC Choice RC RC D D
5 10 15 20 25 30
0
20
40
60
80
100
BL
Adrianna
DRA vs RC Choice
10 20 30 40 50 60 70 80 90 100
0
20
40
60
80
100
Caroline
BL
DRA vs RC Choice D D DRA RC RC
%
I
n
te
rv
al
s
(O
n
t
as
k
)
Sessions
Figure 2. Percentage of on-task behavior for Paul, Frank, Martin, Emily, Adrianna, Elisa, Adam, Collin, Anna, and
Caroline during RC, DRA (also denoted as D during the short reversal phases for Anna and Caroline), and baseline in
the comparative analysis and choice phases. The symbol used for each data point during the choice phase represents the
condition selected by the participant for that session.
ERICA S. JOWETT HIRST et al.340
important because these differences may not
have been observed if we reported only group
averages. The importance of analyzing individ-
ual data is further supported by the results of
Study 2, which showed differential effects for
three participants (Adam, Emily, and Sam),
whereas the overall results suggest that the two
procedures are equally effective.
Several variables might have influenced
results of the current study, including the type
of contingency used (individual vs. group
oriented) and the experimental design. Results
showed that the comparative effectiveness of
the procedures was the same for all three
participants who participated in Studies 1 and
2 (Adam, Anna, and Paul). That is, RC was
more effective than DR for Adam during the
group activity and solitary work task, and the
procedures were equally effective for Anna and
Paul under both conditions. These results sug-
gest that the presence of peers did not influ-
ence the comparative effectiveness of DR and
RC. However, an analysis of the results for
Adam and Anna shows that these participants
engaged in 10% to 20% higher levels of on-
task behavior during the group evaluation than
in the individual evaluation. These results ten-
tatively suggest that the presence of peers may
enhance the effectiveness of the procedures for
some children. Because both procedures
resulted in equally higher levels of responding
in the presence of peers, it could be that obser-
ving a peer receiving a token increases the
value of the token or functions as a discrimina-
tive stimulus for on-task behavior (during DR
conditions). In addition, the aversiveness of
token loss might also be enhanced when
tokens are removed in the presence of peers
(during RC).
Although the relative efficacy of DR and RC
was not influenced by the use of group-
oriented contingencies, the overall effectiveness
of the procedures was greater during the group
activity. These higher levels of on-task behavior
during the group activity may have been due
to the differential effort or task difficulty across
tasks in the group activity and individual activ-
ity (i.e., it may have been more effortful to
trace letters than to keep one’s hands in one’s
lap and sit on the mat). In addition, higher
levels of on-task behavior in the group activity
may have been due to the absence of a salient
alternative task, as was provided in the individ-
ual activity (i.e., toys were available). However,
there were many alternative tasks available dur-
ing the group activity, such as playing with or
manipulating the bingo boards and pieces and
leaving the mat to join other activities in the
classroom.
2 4 6 8 10 12
0
20
40
60
80
100
%
I
nt
er
va
ls
(
O
n
T
as
k)
Brianna
DRA RC
BL
DRA
2 4 6 8 10 12
0
20
40
60
80
100
%
I
nt
er
va
ls
(
O
n
T
as
k)
Mark
DRA RC
BL
DRA
2 4 6 8 10 12
0
20
40
60
80
100
%
I
nt
er
va
ls
(
O
n
T
as
k)
Zoey
DRA
BL
RC DRA
5 10 15 20 25 30
0
20
40
60
80
100
%
I
nt
er
va
ls
(
O
n
T
as
k)
BL
Sam
DRA RC DRA
Sessions
RC
Figure 3. Percentage of on-task behavior for Brianna,
Mark, Zoey, and Sam during RC, DRA, and baseline.
341REINFORCEMENT AND RESPONSE COST
We used a multielement design in Study
1 and for 10 participants in Study 2. Thus,
similar effects observed across DR and RC may
have been due to multiple-treatment interfer-
ence because of the rapid alternation of condi-
tions that were similar in numerous respects.
Although we attempted to control for multiple-
treatment interference by including session
rules and discriminative stimuli, we also
attempted to address this concern by evaluating
the effects when a different design was used.
For two participants in Study 2 (Anna and Car-
oline), in which we used both a multielement
design and a reversal design to compare the
effects of DR and RC, we found similar results
regardless of which design was used. In addi-
tion, for four participants in Study 2 (Brianna,
Mark, Zoey, and Sam), in which we used only
a reversal design to compare DR and RC, we
showed similar levels of on-task behavior across
the two procedures as well as similar levels of
on-task behavior regardless of whether DR was
conducted before or after RC. These data sug-
gest that the use of a multielement design was
unlikely to influence the results.
With respect to preference, five of the 15 par-
ticipants in the choice evaluation preferred DR,
and the other 10 participants preferred RC. As
suggested in previous research (e.g., Donaldson
et al., 2014), several variables may have influ-
enced preference for the different procedures.
Participants may select the reinforcement pro-
cedure to avoid the loss condition, as observed
by Pietras, Brandt, and Searcy (2010), who
found that when they equated net tokens, par-
ticipants avoided the procedure that involved
token loss. In addition, participants may prefer
reinforcement, specifically when reinforcer
delivery is spaced evenly throughout the ses-
sion, because token delivery signals time pro-
gression through the session. That is, token
delivery provides feedback regarding the dura-
tion of the session, which may be valuable,
especially with young children.
With respect to preference for RC, the
potential aversion associated with RC may have
been eliminated because participants did not
contact loss often; as Donaldson et al. (2014)
noted, one participant mentioned preference
for RC due to losing few tokens. However,
additional variables also warrant consideration.
First, some participants may have preferred RC
because selection of the RC procedure results
in the delivery of all tokens; therefore, access to
all tokens may function as a reinforcer for selec-
tion of that procedure. In addition, selection of
RC over DR may be because, from the child’s
perspective, starting with tokens is viewed as
not having to work for the tokens. That is, the
procedure appears to be less effortful. To rule
out influence of the presence of tokens, future
researchers might evaluate preference under
conditions in which the tokens are present for
DR and RC (i.e., a cup of tokens next to the
DRA token board and tokens attached to the
RC board) or the tokens are not present (i.e.,
placing colored strips of paper representing
each procedure or asking the participant which
procedure he or she would like to do).
Other variables that might influence prefer-
ence in the current study are the consequences
that followed selection of a particular condition
Table 2
Percentage of Selections and Average Net Tokens Yielded
for Participants in Study 2 (Individual Analysis)
% selections Average net tokens
Participant DR RC DR RC
Frank 100 0 8.9 8.3
Paul 100 0 9.6 9.1
Martin 82 18 9.3 9.3
Adrianna 75 25 9.6 9.7
Emily 67 33 8.5 8.2
Adam 28 72 4.1 5.3
Elisa 21 79 8.4 8.2
Anna 18 82 7.6 8.8
Collin 0 100 9.8 9.1
Caroline 0 100 5.7 5.4
Brianna 8.7 8.7
Mark 9.4 9.6
Zoey 9.4 8.7
Sam 6.1 8.8
ERICA S. JOWETT HIRST et al.342
(DRA vs. RC) and the net tokens earned
within a particular condition. Participants in
the group evaluation may have chosen a differ-
ent procedure the next time they were offered a
choice if the experimenter did not implement
the procedure they had chosen in a given ses-
sion. However, an evaluation of data for parti-
cipants in Study 2 showed that participants
switched their selection during subsequent
choice opportunities when the session that the
experimenter implemented after a selection did
not match the initial selection on 38% (Paul),
38% (Molly), and 50% (Judy) of selections.
These results suggest that switches in selections
were not influenced by whether the session that
was implemented matched the procedure they
had selected, and these findings are consistent
with those of Layer et al. (2008).
Previous researchers have evaluated the
potential influence of net tokens across DR and
RC conditions. Iwata and Bailey (1974) calcu-
lated the average number of net tokens for the
class, and Donaldson et al. (2014) calculated
individual net token averages; both studies
found that net tokens were similar across proce-
dures. Although the number of net tokens was
similar, because some participants preferred one
procedure over another, it could be that even
slight differences may influence preference. In
the current study, we were able to evaluate
preference for 15 participants (twice with Paul)
and found that seven of the 14 children who
participated once (and Paul on one occasion in
Study 2) yielded an average difference of at
least 0.5 tokens between the two procedures.
Of these eight participants, seven preferred the
procedure for which more net tokens were
yielded during the comparison phase. However,
in previous research and in the current study,
experimenters did not manipulate the number
of net tokens. Therefore, the influence of net
tokens on preference is unknown, and research
on this variable is warranted.
Another point of discussion relates to
best practice guidelines. The general
recommendation is to use reinforcement-based
procedures when possible (Bailey & Burch,
2005). Therefore, because RC is a negative
punishment procedure (Kazdin, 1977), RC
often is not recommended before implementa-
tion of positive reinforcement procedures.
However, given that (a) RC is just as effective
as reinforcement, (b) RC has limited side
effects (Kazdin, 1972), (c) more participants
preferred RC in the current study, and
(d) previous researchers have also found prefer-
ence for punishment procedures (e.g., Hanley,
Piazza, Fisher, & Maglieri, 2005), reconsidera-
tion of best practice appears to be warranted.
Perhaps the use of effective and preferred pro-
cedures should be considered best practice
(e.g., Hanley, 2010).
There are several areas for future research.
First, we were able to compare responding of
only three individuals who participated in both
the group activity and solitary work task; there-
fore, our conclusions about the effects of peer
presence are limited, and future researchers
should consider conducting this evaluation with
a larger number of participants. Second,
because we conducted both preference evalua-
tions in Studies 1 and 2 in the absence of peers,
we were unable to compare choice in the pres-
ence versus absence of peers.
Third, we did not collect data on side effects
of the procedures, which may be important,
specifically with the possibility of negative side
effects (e.g., emotional responding or increases
in problem behavior) when RC procedures are
used. However, little to no negative side effects
have been reported during the use of RC proce-
dures (Conyers et al., 2004; Kazdin, 1972) nor
were negative side effects observed in the cur-
rent study.
Fourth, future researchers should include a
measure of accuracy. In the current study, we
selected on-task behavior because it was age
appropriate, but we did not measure the accu-
racy of responding. Iwata and Bailey (1974)
showed decreases in rule violations without
343REINFORCEMENT AND RESPONSE COST
increasing correct responding. Because on-task
behavior is a prerequisite for accurate respond-
ing in many situations, correct responding
should increase as children are attending; there-
fore, future researchers should measure changes
in accuracy when reinforcement and punish-
ment contingencies are in effect for on-task
behavior.
Fifth, we arranged individual contingencies,
rather than interdependent group-oriented con-
tingencies or dependent group-oriented contin-
gencies. Individual and interdependent group-
oriented contingencies require that the teacher
monitor the behavior of each child and then
deliver consequences based on the behavior of
each child individually or for the behavior of
the group, respectively; on the other hand, a
dependent group-oriented contingency requires
that a teacher monitor the behavior of only one
child in the group. Herman and Tramontana
(1971) found no difference in the effectiveness
of individual and group contingencies and sug-
gested that group contingencies may be easier
for teachers. Therefore, future researchers
should compare DR and RC using dependent
and interdependent group-oriented contingen-
cies (see Litow & Pumroy, 1975, for a brief
review of group contingencies).
Finally, because we associated specific colors
with the different procedures, children’s choices
for procedures may have been based on prefer-
ence for color rather than procedure. However,
anecdotal reports do not suggest that partici-
pants had strong preferences for colors (i.e., it
was not common for participants to report
color preference during the choice evaluation).
Future researchers might control for the influ-
ence of color preferences by using low or mod-
erately preferred colors for the stimuli used for
the DR and RC procedures (e.g., Luczynski &
Hanley, 2009) or changing the colors associ-
ated with the procedures throughout the study.
In summary, there are several important
implications of the current study. First, the
results suggest that both DR and RC are
similarly effective; therefore, teachers might use
the procedure that more children prefer or that
is easier to implement in a classroom setting.
Second, the presence of peers does not appear
to influence the relative efficacy of the proce-
dures; therefore, future researchers might con-
tinue to conduct comparisons of DR and RC
in group settings for more efficient data collec-
tion. Finally, considerations for best practice
should take into account preference, given the
large number of participants who preferred RC.
REFERENCES
Bailey, J. S., & Burch, M. R. (2005). Ethics for behavior
analysts. Mahwah, NJ: Erlbaum. doi: 10.4324/
9781410613738
Brent, D. E., & Routh, D. K. (1978). Response cost and
impulsive word recognition errors in reading-disabled
children. Journal of Abnormal Child Psychology, 6,
211–219. doi: 10.1007/bf00919126
Broughton, S. F., & Lahey, B. B. (1978). Direct and col-
lateral effects of positive reinforcement, response cost,
and mixed contingencies for academic performance.
Journal of School Psychology, 16, 126–136. doi:
10.1016/0022-4405(78)90051-1
Capriotti, M. R., Brandt, B. C., Ricketts, E. J.,
Espil, F. M., & Woods, D. W. (2012). Comparing
the effects of differential reinforcement of other
behavior and response-cost contingencies on tics in
youth with Tourette syndrome. Journal of Applied
Behavior Analysis, 45, 251–263. doi: 10.1901/
jaba.2012.45-251
Conyers, C., Miltenberger, R., Maki, A., Barenz, R.,
Jurgens, M., Sailer, A., … Kopp, B. (2004). A com-
parison of response cost and differential reinforce-
ment of other behavior to reduce disruptive behavior
in a preschool classroom. Journal of Applied Behavior
Analysis, 37, 411–415. doi: 10.1901/
jaba.2004.37-411
Doll, C., McLaughlin, T. F., & Barretto, A. (2013). The
token economy: A recent review and evaluation.
International Journal of Basic and Applied Science, 2,
131–149.
Donaldson, J. M., DeLeon, I. G., Fisher, A. B., &
Kahng, S. (2014). Effects of and preference for condi-
tions of token earn versus token loss. Journal of
Applied Behavior Analysis, 47, 537–548. doi:
10.1002/jaba.135
Hackenberg, T. D. (2009). Token reinforcement: A
review and analysis. Journal of the Experimental Analy-
sis of Behavior, 91, 257–286. doi: 10.1901/
jeab.2009.91-257
ERICA S. JOWETT HIRST et al.344
Hanley, G. P. (2010). Toward effective and preferred pro-
gramming: A case for the objective measurement of
social validity with recipients of behavior-change pro-
grams. Behavior Analysis in Practice, 3, 13–21.
Hanley, G. P., Piazza, C. C., Fisher, W. W., &
Maglieri, K. A. (2005). On the effectiveness of and
preference for punishment and extinction compo-
nents of function-based interventions. Journal of
Applied Behavior Analysis, 38, 51–65. doi: 10.1901/
jaba.2005.6-04
Herman, S. H., & Tramontana, J. (1971). Instructions
and group versus individual reinforcement in modify-
ing disruptive group behavior. Journal of Applied
Behavior Analysis, 4, 113–119. doi: 10.1901/
jaba.1971.4-113
Iwata, B. A., & Bailey, J. S. (1974). Reward versus cost
token systems: An analysis of the effects on students
and teacher. Journal of Applied Behavior Analysis, 7,
567–576. doi: 10.1901/jaba.1974.7-567
Kazdin, A. E. (1972). Response cost: The removal of con-
ditioned reinforcers for therapeutic change. Behavior
Therapy, 3, 533–546. doi: 10.1016/S0005-7894(72)
80001-7
Kazdin, A. E. (1977). The token economy: A review and
evaluation. New York, NY: Plenum Press. doi:
10.1007/978-1-4613-4121-5
Layer, S. A., Hanley, G. P., Heal, N. A., & Tiger, J. H.
(2008). Determining individual preschoolers’ prefer-
ences in a group arrangement. Journal of Applied
Behavior Analysis, 41, 25–37. doi: 10.1901/
jaba.2008.41-25
Litow, L., & Pumroy, D. K. (1975). A brief review of
classroom group-oriented contingencies. Journal of
Applied Behavior Analysis, 8, 341–347. doi: 10.1901/
jaba.1975.8-341
Luczynski, K. C., & Hanley, G. P. (2009). Do children
prefer contingencies? An evaluation of the efficacy of
and preference for contingent versus noncontingent
social reinforcement during play. Journal of Applied
Behavior Analysis, 42, 511–525. doi: 10.1901/
jaba.2009.42-511
McGoey, K. E., & DuPaul, G. J. (2000). Token rein-
forcement and response cost procedures: Reducing
the disruptive behavior of preschool children with
attention-deficit/hyperactivity disorder. School Psychol-
ogy Quarterly, 15, 330–343. doi: 10.1037/h0088790
Panek, D. M. (1970). Word association learning
by chronic schizophrenics on a token economy
ward under conditions of reward and punishment.
Journal of Clinical Psychology, 26, 163–167. doi:
10.1002/1097-4679(197004)26:2<163::aid-jclp2270
260208>3.0.co;2–5
Pietras, C. J., Brandt, A. E., & Searcy, G. D. (2010).
Human responding on random-interval schedules of
response-cost punishment: The role of reduced rein-
forcement density. Journal of the Experimental Analysis
of Behavior, 93, 5–26. doi: 10.1901/jeab.2010.93-5
Salend, S. J., & Kovalich, B. (1981). A group response-
cost system mediated by free tokens: An alternative
to token reinforcement. American Journal of Mental
Deficiency, 86, 184–187.
Sindelar, P. T., Honsaker, M. S., & Jenkins, J. R. (1982).
Response cost and reinforcement contingencies of
managing the behavior of distractible children in
tutorial settings. Learning Disability Quarterly, 5,
3–13. doi: 10.2307/1510610
Tanol, G., Johnson, L., McComas, J., & Cote, E. (2010).
Responding to rule violations or rule following: A
comparison of two versions of the Good Behavior
Game with kindergarten students. Journal of School
Psychology, 48, 337–355. doi: 10.1016/j.
jsp.2010.06.001
Received December 2, 2014
Final acceptance October 8, 2015
Action Editor, Jeanne Donaldson
345REINFORCEMENT AND RESPONSE COST
- EFFICACY OF AND PREFERENCE FOR REINFORCEMENT AND RESPONSE COST IN TOKEN ECONOMIES
STUDY 1: DR VERSUS RC (GROUP)
Method
Participants and setting
Materials
Response measurement and interobserver agreement
Procedure
Baseline
Differential reinforcement
Response cost
Choice
Results
STUDY 2: DRA VERSUS RC (INDIVIDUAL)
Method
Participants and setting
Materials
Response measurement and interobserver agreement
Design
Procedure
Baseline
Differential reinforcement of alternative behavior
Response cost
Choice
Results
GENERAL DISCUSSION
REFERENCES