Literature review on behavior analysis

I need a literature review or a capstone project. i will provide articles and outline with purpose and hypothesis to work with. Feel free to add anymore that you think will fit with this literature review 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Valeria Castano

Literature Review Outline

Introduction

· Extension to Carnett et al. (2014) study

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

· Study the effects of preservative interest based token economies on task behavior

· Research is warranted to find the most effective method for on task behavior.

·

Carnett Amarie, Raulston Tracy, Lang Russell, Tostanoski Amy, Lee Allyson, Sigafoos Jeff, & Machalicek Wendy. (2014). Effects of a Perseverative Interest-Based Token Economy on Challenging and On-Task Behavior in a Child with Autism. Journal of Behavioral Education, 23(3), 368–377.

Body

Token Economies

·

Doll, C., McLaughlin, T. F., & Barretto, A. (2013). The token economy: A recent review and evaluation. International Journal of basic and applied science, 2(1), 131-149.

·

Hine, J. F., Ardoin, S. P., & Call, N. A. (2018). Token economies: Using basic experimental research to guide practical applications. Journal of Contemporary Psychotherapy, 48(3), 145-154.

·

Williamson, R. L., & McFadzen, C. (2020). Evaluating the Impact of Token Economy Methods on Student On-task Behaviour within an Inclusive Canadian Classroom. International Journal of Technology and Inclusive Education (IJTIE), 9(1), 1531-1541.

·

Boniecki, K. A., & Moore, S. (2003). Breaking the Silence: Using a Token Economy to Reinforce Classroom Participation. Teaching of Psychology, 30(3), 224.

Interests

· Carnett Amarie, Raulston Tracy, Lang Russell, Tostanoski Amy, Lee Allyson, Sigafoos Jeff, & Machalicek Wendy. (2014). Effects of a Perseverative Interest-Based Token Economy on Challenging and On-Task Behavior in a Child with Autism. Journal of Behavioral Education, 23(3), 368–377.

·

Charlop-Christy, M. H., & Haymes, L. K. (1998). Using objects of obsession as token reinforcers for children with autism. Journal of Autism and Developmental Disorders, 28(3), 189-198.

·

Hirst, E. S. J., Dozier, C. L., & Payne, S. W. (2016). Efficacy of and preference for reinforcement and response cost in token economies. Journal of Applied Behavior Analysis, 49(2), 329.

·

Soares, D. A., Harrison, J. R., Vannest, K. J., & McClelland, S. S. (2016). Effect Size for Token Economy Use in Contemporary Classroom Settings: A Meta-Analysis of Single-Case Research. School Psychology Review, 45(4), 379–399.

Conclusion

· Main findings: Studies have shown that including objects of interest in a clients token board will increase engagement.

· This study replicates past research conducted by Carnett et al. (2014) but also extends it by using multiple participants.

· The purpose of this study he purpose of this capstone is to extend the work of Carnett et al. (2014) and Charlop-Christy and Haymes (1998) and compare the effects of a token economy that does not include a child’s perseverative interest versus a token economy that includes the child’s perseverative interest on on-task behavior

· I hypothesize that the token economy with the clients perseverative interest incorporated in the token economy will increase on task behavior.

References:

Boniecki, K. A., & Moore, S. (2003). Breaking the Silence: Using a Token Economy to Reinforce Classroom Participation. Teaching of Psychology, 30(3), 224.
Carnett Amarie, Raulston Tracy, Lang Russell, Tostanoski Amy, Lee Allyson, Sigafoos Jeff, & Machalicek Wendy. (2014). Effects of a Perseverative Interest-Based Token Economy on Challenging and On-Task Behavior in a Child with Autism. Journal of Behavioral Education, 23(3), 368–377.
Charlop-Christy, M. H., & Haymes, L. K. (1998). Using objects of obsession as token reinforcers for children with autism. Journal of Autism and Developmental Disorders, 28(3), 189-198.
Doll, C., McLaughlin, T. F., & Barretto, A. (2013). The token economy: A recent review and evaluation. International Journal of basic and applied science, 2(1), 131-149.
Hine, J. F., Ardoin, S. P., & Call, N. A. (2018). Token economies: Using basic experimental research to guide practical applications. Journal of Contemporary Psychotherapy, 48(3), 145-154.
Hirst, E. S. J., Dozier, C. L., & Payne, S. W. (2016). Efficacy of and preference for reinforcement and response cost in token economies. Journal of Applied Behavior Analysis, 49(2), 329.
Soares, D. A., Harrison, J. R., Vannest, K. J., & McClelland, S. S. (2016). Effect Size for Token Economy Use in Contemporary Classroom Settings: A Meta-Analysis of Single-Case Research. School Psychology Review, 45(4), 379–399.
Williamson, R. L., & McFadzen, C. (2020). Evaluating the Impact of Token Economy Methods on Student On-task Behaviour within an Inclusive Canadian Classroom. International Journal of Technology and Inclusive Education (IJTIE), 9(1), 1531-1541.

Breaking the Silence: Using a Token Economy
to Reinforce Classroom Participation

Kurt A. Boniecki
Stacy Moore
University of Central Arkansas

We propose a procedure for increasing student participation, par-
ticularly in large classes. The procedure establishes a token econ-
omy in which students earn tokens for participation and then
exchange those tokens for extra credit. We evaluated the effective-
ness of the procedure by recording the degree of participation in an
introductory psychology class before, during, and after implemen-
tation of the token economy. Results revealed that the amount of di-
rected and nondirected participation increased during the token
economy and returned to baseline after removal of the token econ-
omy. Furthermore, students responded faster to questions from the
instructor during the token economy than during baseline, and this
decrease in response latency continued even after removal of the to-
ken economy.

A considerable literature attests to the importance of ac-
tive learning in which students engage and process course
material rather than passively receive it (e.g., Benjamin,
1991; Bligh, 2000; Bonwell & Eison, 1991). One way instruc-
tors can facilitate active learning is to challenge the class pe-
riodically with relevant questions and encourage students to
offer questions and comments. However, instructors may
avoid this form of classroom interaction because of a phe-
nomenon we call “the silence,” the uncomfortable time fol-
lowing the instructor’s question when no one responds. The
silence is a particular problem in large classes in which stu-
dents feel relatively anonymous and are reluctant to partici-
pate (McKeachie, 2002). Instructors can use a variety of
techniques to combat the silence, such as waiting out the si-
lence (Kendall, 1994), calling on students by name (Gurung,
2002), or initiating small group discussions (McKeachie,
2002). In this article, we present another method for break-
ing the silence that is effective and easy to use, particularly in
large classes.

Our method relies on extra credit to reinforce participation.
Other faculty have used extra credit as an incentive to improve
exam performance (Junn, 1995; Nation & Bourgeois, 1978),
read journal articles (Carkenord, 1994), seek writing assis-
tance (Oley, 1992), demonstrate critical thinking (Junn,
1994), improve behavior modification projects (Barton,
1982), and avoid procrastination (Lloyd & Zylla, 1981;
Powers, Edwards, & Hoehle, 1973). Our method creates a to-
ken economy in which students earn tokens for participation.
Immediately following participation, the instructor presents a
token to the student. At the end of class, students exchange
their tokens for extra credit toward their course grades.

Hodge and Nelson (1991) also used reinforcement to
shape classroom participation. In their study, the instructor
wrote students’ initials on the board and placed plus marks
next to the initials of students who exhibited the desired
amount of participation. Although similar to our method,
Hodge and Nelson’s procedure differs from ours in several
ways. For instance, their procedure is feasible only in small
classes, whereas our method is relatively easy to use in classes
of almost any size. Indeed, the first author has successfully
used our method in classrooms that seat as many as 200 stu-
dents. Also, Hodge and Nelson evaluated the effectiveness of
their technique based on students’ self-reported participa-
tion. In contrast, we evaluated the effectiveness of our
method more objectively by having a research assistant ob-
serve the degree of student participation prior, during, and af-
ter the token economy.

Method

Participants

Sixty-three undergraduate students enrolled in an intro-
ductory psychology course at the University of Central Ar-
kansas participated in the study.

Procedure

The class met 75 min twice weekly for 16 weeks. We con-
ducted the study over the final 11 class meetings of the term.
During each of these 11 class meetings, the instructor period-
ically directed relevant questions to the class, and students
who wanted to answer the questions raised their hands. The
instructor then called on students in the order in which they
raised their hands until a student answered the question cor-
rectly. If no one raised a hand within 60 sec following a ques-
tion, the instructor announced the answer and continued
with the lecture.

The first 4 of the 11 class meetings served as the baseline
period. During this time, students did not receive any explicit
reward for answering a question correctly. Over the next 4
class meetings, the instructor implemented the token econ-
omy. The instructor announced that the first person to an-
swer a question correctly would receive a token. The tokens
were wooden checker pieces purchased from a local hobby
store. The pieces were heavy enough to throw, but light

224 Teaching of Psychology

enough not to cause injury if they missed their target. At the
end of each class meeting, students could exchange each to-
ken for one point added to their next exam grade. Each exam
point was worth 0.25% of the course grade. If students did
not turn in their tokens at the end of the class meeting, those
tokens were void, and students could not exchange them for
extra credit in the future. This rule ensured that the instruc-
tor had to keep a supply of tokens for only one class meeting
and avoided claims of lost tokens. During the final 3 class
meetings, the instructor discontinued the token economy
and informed students that they could no longer earn tokens
for correct answers. As required by our university’s institu-
tional review board, the instructor also provided students
who had not earned extra credit during the token economy
with alternative extra credit opportunities during the re-
moval period. After the removal period, the instructor fully
debriefed students about the study.

During each of the final 11 class meetings, a research assis-
tant sat in the last row of the classroom where she had an un-
obstructed view of all students and posed as a student in the
class (e.g., by pretending to take notes). The research assis-
tant recorded the amount of directed participation (number
of students who raised their hands in response to a question
from the instructor), latency to participation (amount of time
following each question until the first hand was raised), and
amount of nondirected participation (number of times any
student spontaneously asked the instructor a question or en-
gaged the instructor in discussion). The research assistant
measured latency using a hand-operated digital stopwatch,
which she kept hidden at all times.

Results

The instructor asked 16 questions during baseline, 14 dur-
ing the token economy, and 16 during removal. Overall, the
instructor asked a mean of 4.18 questions per class meeting.
Only once did no student raise a hand following a question
from the instructor. We recorded and analyzed this question,
which occurred during baseline, as zero directed participa-
tion, but removed it from the analysis of latency to participa-
tion. Table 1 presents a summary of all three dependent
measures across the three phases.

Directed Participation

We analyzed amount of directed participation using fo-
cused chi-square tests. We adjusted the expected frequencies
to control for the different number of questions across the
three phases. Compared to baseline, significantly more stu-
dents raised their hands in response to the instructor’s ques-
tions during the token economy, χ2(1, N = 77) = 11.85, p < .001. Furthermore, students raised significantly fewer hands during removal than during the token economy, χ2(1, N = 77) = 11.85, p < .001, but the number of hands raised during removal was not significantly different from baseline, χ2(1, N = 52) = 0.00.

Latency to Participation

We conducted a one-way ANOVA of the latency data.
Each question from the instructor, rather than each student
in the class, constituted the unit of analysis. The ANOVA re-
vealed a significant difference between the mean latencies of
the three phases, F(2, 42) = 8.23, p = .001, η = .53. Tukey’s
honestly significant difference (HSD) test indicated that stu-
dents raised their hands significantly faster during the token
economy than during baseline (p = .001). However, Tukey’s
HSD tests showed that latency to participation during re-
moval was not significantly slower than during the token
economy (p > .20), but was significantly faster than during
baseline (p = .05).

Nondirected Participation

We analyzed amount of nondirected participation using
focused chi-square tests. We adjusted the expected frequen-
cies to control for the different number of class meetings
across the three phases. Compared to baseline, students
spontaneously participated significantly more during the to-
ken economy, χ2(1, N = 125) = 19.21, p < .001. However, during removal students spontaneously participated signifi- cantly less than during the token economy, χ2(1, N = 120) = 11.56, p < .001. Furthermore, nondirected participation did not significantly differ between baseline and removal, χ2(1, N = 71) = 0.38, p > .44.

Discussion

As we hoped, the amount of directed and nondirected par-
ticipation dramatically increased following the implementa-
tion of the token economy. Students were more than twice as
likely to raise their hands following a question during the to-
ken economy than during baseline. Likewise, students were
more than twice as likely to ask questions and to make com-
ments spontaneously during the token economy than during
baseline, even though the instructor did not directly rein-
force this form of participation with tokens. Thus, in general,
students appeared more willing to contribute to the class dur-
ing the token economy. Once the instructor removed the to-

Vol. 30 No. 3, 2003 225

Table 1. Means for the Dependent
Measures Across the Three Phases

Dependent Measure Baseline
Token

Economy Removal

Directed
participation/question 1.63a 3.64b 1.63a

Latency to
participation/questiona 6.16a 0.56b 2.93b

Nondirected
participation/class
period 9.50a 21.75b 11.00a

Note. Values within a row not sharing a subscript are significantly
different (p ≤ .05).
aTime latencies are reported in seconds.

ken economy, both directed and nondirected participation
fell back to baseline levels, but not below them. This result
suggests that the token economy did not reduce students’ in-
trinsic motivation to participate.

We also were impressed by the shorter amount of time it
took students to respond to a question during the token econ-
omy compared to baseline. During baseline, an average of 6 sec
passed before a student raised a hand, but during the token
economy, this latency dropped to less than 1 sec. A person may
question whether a student can formulate a thoughtful answer
in less than 1 sec. Although we collected no data to address
this concern directly, the instructor and research assistant no-
ticed little change in the quality of students’ responses across
the phases of the study. Furthermore, we believe that, during
the token economy, students often raised their hands not be-
cause they had an answer, but because they wanted to be the
first to answer. Reder (1987) showed that students can quickly
assess whether they know an answer before actually recalling
the answer from memory. Indeed, during the token economy,
many students took a few seconds to formulate their response
after being called on by the instructor. In contrast, during re-
moval, when there was no competition for tokens, students ap-
peared to wait until they formulated an answer before raising
their hands—nearly 3 sec, on average, after the instructor
asked the question. However, this latency during removal was
still half the latency of the baseline phase, which suggests that
the token economy may have a lasting effect on the speed of
participation.

The contingency between the presence of the token
economy and the amount and speed of participation
strongly suggests that the tokens were responsible for in-
creasing participation. However, we are aware that the de-
sign of this study does not allow a definitive causal
conclusion. A comparable control group and random as-
signment would have provided a stricter test of the token
economy’s effectiveness, but these methodological luxuries
were not possible. Thus, alternative explanations abound.
For example, the instructor covered different topics across
the three periods—developmental psychology during base-
line, personality and psychological disorders during the to-
ken economy, and therapies and social psychology during
removal. Perhaps the topics covered during the token econ-
omy facilitated more participation than the topics covered
during baseline and removal. Nonetheless, we have confi-
dence in the token economy for two reasons beyond these
results. First, a large body of research attests to the effec-
tiveness of token economies and other operant techniques
to modify human behavior (Glynn, 1990; Kazdin, 1982;
Miltenberger, 1997). Second, the instructor in this study
(the first author) has used the token economy effectively
across the entire terms of several courses.

In all the classes in which the instructor has used the token
economy, only one student has complained of being unable
to earn tokens. One way of avoiding this complaint is to pro-
vide alternative extra credit opportunities, although too
many opportunities may reduce the token economy’s effec-
tiveness. Another way is to set a maximum limit on the num-
ber of tokens that can be earned. The “faster” students reach
the limit early, thereby increasing the chance of other stu-
dents earning tokens.

We believe the token economy procedure is a simple and
effective means of breaking the silence, especially in large
classes. In addition, the procedure serves as an excellent
demonstration of operant conditioning and the utility of to-
ken economies. Indeed, during the removal period, while
the instructor described token economies, one student
spontaneously noted that the instructor had used a token
economy to increase students’ participation. We believe
this sudden connection promotes an “a-ha” experience for
the class and a deeper understanding of the material. Fur-
thermore, the first author has noticed an increase in stu-
dent attendance, enthusiasm, and preparation when he has
used the token economy. Students have commented that
they enjoy the procedure because it makes class more excit-
ing and interactive.

Finally, the token economy system described in this study
is flexible and easily adapted to an instructor’s teaching style.
We understand that some instructors do not like to use extra
credit in their courses. However, instead of extra credit to-
ward the students’ course grades, tokens could be worth
credit toward “purchasing” desirable options, such as drop-
ping a quiz or being excused from the final exam (see Komaki,
1975). Alternatively, instructors could replace tokens with
other easily delivered rewards, such as candy. As long as stu-
dents perceive a contingency between some positive rein-
forcer and their participation, instructors may develop
variations to suit their teaching style.

References

Barton, E. J. (1982). Facilitating student veracity: Instructor applica-
tion of behavioral technology to self modification projects.
Teaching of Psychology, 9, 99–101.

Benjamin, L. T., Jr. (1991). Personalization and active learning in
the large introductory psychology class. Teaching of Psychology, 18,
68–74.

Bligh, D. A. (2000). What’s the use of lectures? San Francisco:
Jossey-Bass.

Bonwell, C. C., & Eison, J. A. (1991). Active learning: Creating excite-
ment in the classroom (Rep. No. ISBN–1–878380–08–7). Washing-
ton, DC: School of Education and Human Development, George
Washington University. (ERIC Document Reproduction Service
No. ED 336049)

Carkenord, D. M. (1994). Motivating students to read journal arti-
cles. Teaching of Psychology, 21, 162–164.

Glynn, S. M. (1990). Token economy approaches for psychiatric pa-
tients: Progress and pitfalls over 25 years. Behavior Modification,
14, 383–407.

Gurung, R. (2002, June). Sleeping students don’t talk (or learn): En-
hancing active learning via class participation. In P. Price (Chair),
Active learning in the classroom: Overview and methods. Symposium
conducted at the 14th annual meeting of the American Psycho-
logical Society, New Orleans, LA.

Hodge, G. K., & Nelson, N. H. (1991). Demonstrating differential
reinforcement by shaping classroom participation. Teaching of Psy-
chology, 18, 239–241.

Junn, E. (1994). “Pearls of wisdom”: Enhancing student class partici-
pation with an innovative exercise. Journal of Instructional Psychol-
ogy, 21, 385–387.

Junn, E. N. (1995). Empowering the marginal student: A skills-based
extra-credit assignment. Teaching of Psychology, 22, 189–192.

226 Teaching of Psychology

Kazdin, A. E. (1982). The token economy: A decade later. Journal of
Applied Behavior Analysis, 15, 431–445.

Kendall, B. (1994). Moment of silence. In E. Bender, M. Dunn, B.
Kendall, C. Larson, & P. Wilkes (Eds.), Quick hits: Successful strat-
egies by award winning teachers (p. 18). Bloomington: Indiana Uni-
versity Press.

Komaki, J. (1975). Neglected reinforcers in the college classroom.
Journal of Higher Education, 46, 63–74.

Lloyd, M. E., & Zylla, T. M. (1981). Self-pacing: Helping students
establish and fulfill individualized plans for pacing unit tests.
Teaching of Psychology, 8, 100–103.

McKeachie, W. J. (2002). McKeachie’s teaching tips: Strategies, re-
search, and theory for college and university teachers (11th ed.).
Boston: Houghton Mifflin.

Miltenberger, R. G. (1997). Behavior modification: Principles and pro-
cedures. Pacific Grove, CA: Brooks/Cole.

Nation, J. R., & Bourgeois, A. E. (1978). PASS, an alternative
method of teaching introductory psychology. Research in Higher
Education, 8, 273–282.

Oley, N. (1992). Extra credit and peer tutoring: Impact on the qual-
ity of writing in introductory psychology in an open admissions col-
lege. Teaching of Psychology, 19, 78–81.

Powers, R. B., Edwards, K. A., & Hoehle, W. F. (1973). Bonus
points in a self-paced course facilitates exam-taking. Psychologi-
cal Record, 23, 533–538.

Reder, L. M. (1987). Strategy selection in question answering. Cog-
nitive Psychology, 19, 90–138.

Notes

1. We thank Bill Lammers and Timothy Johnston for their helpful
comments on an earlier draft of this article.

2. Send correspondence to Kurt A. Boniecki, University of Central
Arkansas, Department of Psychology and Counseling, 201
Donaghey Avenue, UCA Box 4915, Conway, AR 72035; e-mail:
kurtb@mail.uca.edu.

Vol. 30 No. 3, 2003 227

Effects on Content Acquisition of Signaling Key Concepts
in Text Material

Jeffrey S. Nevid
Jodi L. Lampmann
St. John’s University

Eighty college students read textbook passages that either included
marginal inserts to signal key concepts or did not include these in-
serts. Signaling key concepts enhanced performance on content quiz-
zes overall and on subsets of items assessing signaled material.
Performance was not affected on subsets of items for nonsignaled
content. Students reported preferring the signaled format and found
it both clearer and easier to understand than the nonsignaled format.
Signaling key concepts by extracting and highlighting them in mar-
ginal inserts may facilitate encoding and retention of these concepts.

Even in this day of multimedia enhancements in the class-
room, textbooks remain very much at the core of the learning
process. In recent years, increasing concerns about declining
student competencies in mastering basic subject matter have
led to the incorporation of numerous pedagogical aids
(Weiten & Wight, 1992), including the SQ3R study method,
marginal running glossaries, pronunciation guides, built-in or
accompanying study guides, self-scoring quizzes, chap-
ter-by-chapter learning objectives, and interactive laboratory
demonstrations on CD–ROMs and companion Web sites.
Publishers are spending increasing amounts of money pro-
ducing textbooks, and this increase is passed along to con-
sumers via higher prices (Weiten & Wight, 1992). Despite
these changes, it remains unclear whether the benefits of
learning enhancements are worth the additional costs. Sur-
prisingly, there is little research on the use of pedagogical fea-
tures as learning devices.

Most reported studies on textbook pedagogy are limited to
student surveys. In one survey, Weiten, Guadagno, and Beck
(1996) assessed student familiarity with pedagogical devices,
their likelihood of using them, and their perceptions of the de-
vices’ value. Students were generally familiar with most peda-
gogical aids, but reported they rarely used some of the aids,
such as outlines and discussion questions. Among the most
highly valued and widely used pedagogical aids were boldfaced
technical terms, chapter summaries, and running or chapter
glossaries.

Other investigators reported similar findings, with students
generally endorsing the value of boldfaced technical terms,
running or chapter glossaries, chapter summaries, and
self-tests (Marek, Griggs, & Christopher, 1999; Weiten,
Deguara, Rehmke, & Sewell, 1999). Students also tend both
to value and make greater use of pedagogical devices that take
little time to read and those that they perceive as relevant in
helping them prepare for course examinations (Marek et al.,
1999; Weiten et al., 1996). Students appear to be more con-
cerned with meeting course demands and less concerned with
developing more elaborate study patterns (Marek et al., 1999).

Weiten and his colleagues (1999) reported small, but sig-
nificant positive correlations between grade point averages
and students’ ratings of how likely they were to use pedagogi-
cal devices. Although correlational links between academic
success and use of pedagogical aids may be encouraging, they
cannot be used as a basis for drawing cause–effect relations.

Effect Size for Token Economy Use in Contemporary
Classroom Settings: A Meta-Analysis of

Single-Case Research

Denise A. Soares
University of Mississippi

Judith R. Harrison
Rutgers University

Kimberly J. Vannest
Texas A&M University

Susan S. McClelland
University of Mississippi

Abstract. Recent meta-analyses of the effectiveness of token economies (TE

s)

report insufficient quality in the research or mixed effects in the results. This study
examines the contemporary (post-Public Law 94-142) peer-reviewed published
single-case research evaluating the effectiveness of TEs. The results are stratified
across quality of demonstrated functional relationship using a nonparametric
effect size (ES) that controls for undesirable baseline trends in the analysis. In
addition, moderators (i.e., classroom setting, age of participant, outcomes, use of
response cost, and use of verbal cueing) were analyzed. Eighty-eight AB phas

e

contrasts were calculated from 28 studies (1980 –2014) representing 90 partici-
pants and produced a weighted mean ES of 0.82 (SE � 0.03, 95% CI [0.77, 0.88]).
Strong quality produced a combined weighted mean ES of 0.85 (SE � 0.642, 95%
CI [0.74, 0.97]). Moderator analyses revealed that a TE was slightly more
effective for youth between the ages of 6 and 15 years than for children between
the ages of 3 and 5 years or when used with behavioral goals in comparison to
academic goals. However, no difference was found when implemented in general
or special education settings or with the inclusion of response cost or verbal
cueing.

A token economy (TE) is one of a hand-
ful of interventions found in classroom set-
tings. Based on the well-established principles

of reinforcement described by Skinner (1931),
a TE is a secondary reinforcement system
(Alberto & Troutman, 2003), whereby inher-

Correspondence concerning this paper should be addressed to Denise A. Soares, University of Mississippi,
P.O. Box 1848, 49 Guyton Drive University, MS

386

77; e-mail: dasoares@olemiss.edu

Copyright 2016 by the National Association of School Psychologists, ISSN 0279-6015, eISSN 2372-966x

School Psychology Review,
2016, Volume 45, No. 4, pp.

379

399

379

ently neutral items (i.e., tokens) are awarded
for the demonstration of targeted behaviors.
Tokens are accumulated and exchanged for
backup reinforcers valued by the student (Ka-
zdin, 1971; Simonsen, Fairbanks, Briesch,
Myers, & Sugai, 2008). A TE has historically
been considered a best-practices behavior
management strategy for use in schools (Fil-
check, McNeil, Greco, & Bernard, 2004; Mat-
son & Boisjoli, 2009) and is one intervention
frequently implemented within the positive
behavioral interventions and supports frame-
work. However, the emphasis on meta-ana-
lytic thinking (see Maggin, Chafouleas, God-
dard, & Johnson, 2011) that evolved after the
passage of the Individuals with Disabilities
Education Improvement Act (2004) and No
Child Left Behind Act (2001) has provoked
questions regarding its effectiveness (Maggin
et al., 2011). In the following sections, we
describe the historical and current research on
the use of TEs and gaps in the literature that
are addressed by this meta-analysis.

TOKEN ECONOMY RESEARCH

Numerous individual studies have
demonstrated successful application of TEs
across populations and settings. Specifi-
cally, TEs have produced positive effects
for students with emotional and behavioral
problems (Cavalier, Ferretti, & Hodges,
1997), intellectual disabilities (Millersmith,
Weber, & McLaughlin, 2013), attention def-
icit hyperactivity disorder (DuPaul & Wey-
andt, 2006), learning disabilities (Higgins,
Williams, & McLaughlin, 2001), and schizo-
phrenia (Ulmer, 1976). The use of TEs has
been effective not only in schools (Filcheck et
al., 2004) but also in residential treatment cen-
ters (Murray & Sefchik, 1992), mental health
hospitals (Hopko, Lejuez, Lepage, Hopko, &
McNeil, 2003), prisons or detention centers
(Bippes, McLaughlin, & Williams, 1986), and
colleges (Stilitz, 2009).

A TE has been deemed an effective in-
tervention by two seminal reviews (Kazdin &
Bootzin, 1972; Kazdin, 1982), two systematic
reviews (Dickerson, Tenhual, & Green-Paden,
2005; Matson & Boisjoli, 2009), and one

meta-analysis (Maggin et al., 2011). Kazdin
and Bootzin (1972) and Kazdin (1982) evalu-
ated benefits of using a TE, such as immediate
reinforcement of behavior to maintain perfor-
mance across time. They also identified obsta-
cles to its effective implementation, such as
inadequate staff training, client resistance, cir-
cumvention of contingencies, and lack of re-
sponse. In 1982, Kazdin updated the original
review, evaluating the progress in the field
since 1972. The authors found research had
uncovered solutions to previously identified
obstacles such as individualizing tokens
and backup reinforcers (i.e., frequency, value)
to enhance responsiveness, revising methods
of staff training, decreasing resistance, and
emphasizing the need to maintain effects
across time. However, the authors did not
complete systematic literature reviews, as
their goal was to identify obstacles and meth-
ods of overcoming them and not to synthesize
the literature. Thus, these two articles do not
identify or appraise the state of the literatur

e.

Two systematic literature reviews
(Dickerson et al., 2005; Matson & Boisjoli,
2009) synthesized and reported information
from source articles. Dickerson et al. (2005)
evaluated the use of a TE to improve socially
appropriate behaviors of individuals with
mental health disorders in hospital settings.
They reviewed 13 studies (group and single-
case experimental design [SCED]) with 1,074
participants ranging from 18 to 55 years old;
29% were diagnosed with schizophrenia, 13%
with psychotic disorder, and 57% with other
mental illnesses. Results indicated that a TE
was effective for increasing adaptive behav-
iors such as work performance, social interac-
tion, and the daily care skills of these patients.
Similarly, Matson and Boisjoli (2009) evalu-
ated the effects of a TE on the behaviors of
individuals with autism and/or development
disabilities. They reviewed 16 group and
SCED studies conducted in multiple settings,
such as schools, homes, summer camps, group
homes, state hospitals, and a developmental
center. The 164 participants ranged in age
from 4 to 18 years; approximately 91% were
children with intellectual disabilities and 8%
were children with autism. Results indicated

School Psychology Review, 2016, Volume 45, No. 4

380

that a TE was associated with positive out-
comes in social, behavioral, and academic ar-
eas. Although these studies systematically re-
viewed the literature, neither quantified the
effect associated with the use of a TE in
schools through meta-analytic procedures.

Maggin et al. (2011) raised questions
regarding a TE as an evidence-based interven-
tion in the only meta-analysis to date. They
conducted a meta-analysis of SCED studies to
extend findings from earlier reviews. Maggin
et al. targeted behavioral outcomes and coded
the studies utilizing the Protocol for Assessing
Single-Subject Research Quality (PASS-RQ;
Maggin & Chafouleas, 2010) developed by
the authors, with indicators from the guide-
lines of Horner et al. (2005) for quality re-
search and the What Works Clearinghouse
(WWC) standards (Kratochwill et al., 2010).
In addition, Maggin et al. calculated four dif-
ferent meta-analytic effect sizes (ESs): percent
of nonoverlapping data (PND; Scruggs &
Mastropieri, 2001) � 78.49%; improvement
rate difference (IRD; Parker, Vannest, &
Brown, 2009) � 51.47%; standardized mean
difference (SMD; Busk & Serlin, 1992) � 8.02;
and raw-data multilevel ES (RMD; Van den
Noortgate & Onghena, 2003, 2008) � 8.74.
Maggin et al. stated, “Three of the four effect
size (ES) measures found a significant improve-
ment” (p. 550). The authors reported that the
three significant ESs were PND, SMD, and
RMD (D. Maggin, personal communication,
March 2, 2016). However, the authors con-
cluded design quality did not meet WWC stan-
dards for 70% of the included studies because of
fewer than three opportunities to demonstrate an
effect (n � 7) or fewer than three data points per
phase (n � 10). Weaknesses were found in the
description of measurement procedures used; the
number of data points per phase; and the report-
ing of treatment fidelity, interobserver agree-
ment (IOA), and social validity. The message
from these findings is that rigorous research with
sufficient methodological quality is needed to
support a TE as an evidence-based intervention.
Additional information can be contributed to the
findings of this meta-analysis through the inclu-
sion of studies that evaluate both academic and

behavioral outcomes and stratification of results
across design quality.

Building on earlier studies, these re-
searchers provide valuable information, which
represents the historical research in TE litera-
ture. The studies (i.e., Kazdin, 1982; Kazdin &
Bootzin, 1972) were characterized as literature
reviews emphasizing many strengths of TEs
and identifying obstacles; however, the au-
thors provided summary findings and not spe-
cific data from the source articles. The more
recent reviews (Dickerson et al., 2005; Matson
& Boisjoli, 2009) indicated a TE was associated
with an increased effect on social, behavioral,
and academic outcomes; however, neither re-
view reported an ES, confidence intervals (CIs),
or design quality. In addition, Dickerson et al.
(2005) searched only one database, the National
Library of Medicine’s PubMed, and Matson and
Boisjoli (2009) reviewed “representative litera-
ture.” Finally, Maggin et al. (2011) found large
effects of a TE on behavioral outcomes; how-
ever, the researchers contended that a TE could
not be considered an evidence-based interven-
tion because of the quality of the research. Ad-
ditional research is needed so that a TE can
potentially be considered an evidence-based
strategy.

SCED AND QUALITY

SCED has a long history of use in the
applied fields of education and human behav-
ior and is particularly suited to school-based
practices allowing the single subject to serve
as his or her own control (Horner et al., 2005).
As such, SCED is especially relevant to re-
views of school-based use of TEs. In recent
years, experts have developed quality indica-
tors and standards for individual SCED studies
that can be utilized to synthesize the method-
ological rigor of a group of studies (Horner et
al., 2005; Kratochwill et al., 2010). Horner et
al. (2005) identified the criteria necessary to
evaluate the quality of (a) research reporting
(e.g., description of participants and settings,
social validity, research questions) and (b) de-
sign (e.g., dependent variable, independent
variable, baseline, experimental control or in-
ternal validity, external validity). In addition,

Main and Moderator Effects for Token Economies

381

Kratochwill et al. (2010) outlined specific cri-
teria for a study to meet standards or meet
standards with reservations based on (a) the
number of phases per design, (b) the number
of data points per phase, and (c) the percent of
IOA that must be measured. On the basis of
recommendations from experts in the field
(i.e., Maggin, Briesch, & Chafouleas, 2013),
combining conceptually relevant constructs,
such as operational definitions of participants
and settings (see Horner et al., 2005) and
sufficient evidence to support a functional re-
lationship between the dependent and inde-
pendent variable (Kratochwill et al., 2010),
results in a rigorous evaluation of SCED qual-
ity. Thus, design quality can be evaluated
based on these criteria, and outcomes can be
stratified by quality.

EFFECT SIZES

With the emphasis on evidence-based
interventions, the need for quantifying inter-
vention effects achieved through SCED stud-
ies has come to the forefront. In 2007, Parker
and Hagan-Burke found over 40 ESs; how-
ever, the field has not come to consensus for
which is best (Manolov, Solanas, Sierra, &
Evans, 2011). Tau-U (an ES from the fre-
quently used Kendall’s � and Mann-Whitney
U) is summarized by Parker, Vannest, Davis,
and Sauber (2011) as “having statistical power
that is flexible and can calculate trend only,
nonoverlap between phases only, or a combi-
nation of the two” (p. 291). Tau-U is a con-
servative measure that offers important bene-
fits of a “bottom-up” approach (Parker &
Vannest, 2012), designed to explain the im-
pact of changes at the individual phase con-
trast on the overall effect. Benefits of Tau-U’s
nonparametric bottom-up approach include (a)
consistency with visual analysis; (b) applica-
bility to short data series and simple designs;
(c) appropriateness with any design; (d) char-
acterization by strong statistical power (i.e.,
one of the strongest parametric tests; Parker et
al., 2011, p. 288); (e) control in Phase A trend;
and (f) usefulness at three levels—nonaggre-
gated data from a single client, aggregated
data from a complex design, and meta-analy-

ses (Parker et al., 2011). Tau-U allows for the
calculation of CIs and p values. All data are
used, reflecting the interventionist experimen-
tal perspective that each data point reflects
performance.

MODERATORS

Although a TE has been found to be
effective, a comparison has never been made
to determine in which environment, with
which participants, and with which outcome
measures (i.e., academic and/or behavioral) a
TE is most successful. Moderator variables
can account for variations across studies (e.g.,
characteristics of setting, participant, outcome,
or implementation). Identifying the impact of
these variables has the potential to increase the
effectiveness and efficiency of a TE for edu-
cators. Although numerous moderators could
be hypothesized, to avoid Type I error (Fairch-
ild & MacKinnon, 2009) in analyses (i.e.,
finding a moderator effect when none exists)
and minimize the chances of inaccurate re-
sults, we identified five potential moderators
for which we have strong hypotheses: (a)
classroom setting; (b) age of participant; (c)
type of outcome, academic or behavioral; (d)
use of response cost (RC); and (e) use of
verbal cueing. Next, we describe our hypoth-
eses and rationale for those hypotheses.

Some authors contend that a TE is most
effective in settings with small teacher-to-stu-
dent ratios, such as self-contained special ed-
ucation classrooms (Center & Wascom, 1984;
Kazdin & Geesey, 1980), with younger stu-
dents (Filcheck et al., 2004), and with only
behavioral outcomes (Himle, Woods, & Bu-
naciu, 2008; Jones, Weber, & McLaughlin,
2013). These findings have the potential to
limit its use in different types of classroom
settings, with certain students, and to address
specific behaviors. However, we hypothesize
that setting does not change the effectiveness
of a TE, as individual studies seem to indicate
otherwise. A TE has been found to be effec-
tive (a) in multiple instructional settings such
as inclusive, general education, special educa-
tion, and alternative settings (De Martini-
Scully, Bray, & Kehle, 2000; Rhode, Jenson,

School Psychology Review, 2016, Volume 45, No. 4

382

& Reavis, 1993); (b) with differing age groups
such as elementary-age students (Akin-Little
& Little, 2004; Christensen, Young, & March-
ant, 2004; Filcheck et al., 2004), junior high
students (Carlson, Pelham, Milich, & Dixon,
1992, Cavalier et al., 1997; Feindler, Marriott,
& Iwata, 1984; Heaton & Safer, 1982), and
high school students (Schellenberg, Skok, &
McLaughlin, 1991); and (c) to increase aca-
demic (Klimas & McLaughlin, 2007; Salend,
Tintle, & Balber, 1988; Sran & Borrero, 2010)
as well as behavioral outcomes (Center &
Wascom, 1984; De Martini-Scully et al.,
2000). Thus, on the basis of these studies, we
hypothesize that these potential variables do
not moderate the effects of a TE.

Similarly, some authors contend that a
TE can be a very complex intervention, which
leads to lack of adoption in schools (Milten-
berger, 2001; Rosen, Taylor, O’Leary, &
Sanderson, 1990; Skinner, Cashwell, & Bunn,
1996). Adding procedures, such as RC or ver-
bal cueing, increases the complexity of the
intervention. RC is a procedure designed to
decrease behavior by contingently withdraw-
ing a specific amount of reinforcement follow-
ing an inappropriate behavior or response (Ka-
zdin, 1972). The impact of RC on a TE’s
effectiveness is not clear. Some previous re-
search has supported RC procedures in TE
systems (Gresham, 1979; Rapport, Murphy, &
Bailey, 1980; Witt & Elliot, 1982). Other stud-
ies (e.g., Phillips, Phillips, Fixsen, & Wolf,
1971) found RC has harmful side effects, such
as the opportunity for the implementer to over-
penalize and the possibility of decreasing the
incentive of demonstrating the target behavior.
Verbal cueing (i.e., prompting) is another pro-
cedure frequently added that increases the
complexity of a TE. Similarly, some research
findings regarding the effect of adding verbal
cueing to a TE have suggested the procedure
might increase effectiveness (Latham &
Locke, 1991) and others suggested it might
not (Balcazar, Hopkins, & Suarez, 1985;
Kluger & DeNisi, 1996). We hypothesized
that RC and verbal cueing are not necessary
for a TE to be effective. This hypothesis is
founded on some research suggesting that nei-
ther RC nor verbal cueing is necessary in

hopes of reducing complexity of implementa-
tion that might decrease adoption and use.

Although we have strong hypotheses re-
garding the moderating nature of the previ-
ously discussed variables, no meta-analysis of
TEs has reported moderator analysis results.
Maggin et al. (2011) conducted moderator
analyses with participant characteristics and
intervention features. However, they did not
report results because of “the likely presence
of family-wise error in these findings” (p.
547). Thus, we conducted a moderator anal-
ysis for which we had strong a priori hy-
potheses to decrease the risk of Type I error
(McKillup, 2011).

PURPOSE

The current study addresses limitations
to the prior literature and provides additional
information to the field by (a) evaluating re-
search design quality, (b) calculating ESs and
CIs, (c) stratifying results by design quality,
and (d) evaluating moderator analyses of peer-
reviewed literature through 2014, with a focus
on both academic and behavioral outcomes.
Thus, this meta-analysis addressed the follow-
ing research questions:

1. What is the research design quality of
studies of the effectiveness of TEs across
SCED studies?

2. What is the overall effect of TEs in
public school classrooms?

3. Do the effects of TEs differ by design
quality?

4. What are the effects of potential
moderators?

METHOD

We conducted this study in four phases
and organized the methods accordingly. Meth-
ods are adapted from Bowman-Perrott, Burke,
Zhang, and Zaini (2014). Details are included
below for literature review and study selection,
data extraction, ES and CI calculations, stratifi-
cation across methodological quality, and mod-
erator analyses. Procedures for each phase are
described in detail.

Main and Moderator Effects for Token Economies

383

Phase 1: Literature Review and Study
Selection

We applied standard methods identified
by Cooper and Hedges (1994) to search the
EBSCO Research Complete, Education Full
Text, PsycINFO, and Education Resource In-
formation Center (ERIC) electronic databases.
Key words, Boolean strings, and truncated
words used to conduct the search included
“token economy,” “intervention,” “reinforce-
ment,” “contingency management,” “system-
atic positive reinforcement,” “tokens,” “oper-
ant conditioning,” “applied behavior analysis,”
“backup reinforcers,” “behavior therapy,”
“points,” and/or “response cost.” In addition to
the search of databases, we conducted a hand
search for titles related to TEs or secondary
reinforcement by reviewing the tables of con-
tents for the years 1980 to 2014 in 21 journals on
special education, school psychology, and be-
havioral psychology (e.g., Journal of Positive
Behavior Interventions, Behavior Therapy, Be-
havioral Interventions, and The Journal of Spe-
cial Education). We selected the year 1980 as a
delimiter based on a desire to use classroom
settings that were more likely to be inclusive of
students with and without disabilities than class-
rooms before or immediately after the passage of
Public Law 94-142 in 1975. We conducted his-
torical searches with all of the resulting screened
articles, and the process was repeated to include
studies with potential to meet the inclusion cri-
teria. The initial search conducted by the first
author yielded 1,833 results. Article titles and
abstracts were screened based on inclusion cri-
teria (described in the following subsection) re-
sulting in the elimination of 1,436 studies. We
eliminated 278 of the remaining

397

articles
based on information in the article indicating the
study was not conducted in a school setting, was
exclusively a descriptive study, or was without
peer review. Fifty-five articles remained. During
the gathering of articles, reliability checks were
conducted and assessed using simple percent of
agreement (Sum of agreement/Total number of
agreements � disagreements � 100; House,
House, & Campbell, 1981). Initial agreement for
article inclusion was 100%.

Inclusion Criteria
We examined the full text of 55 studies

for potential inclusion in this meta-analysis. A
TE was operationally defined as a program in
which students earned tokens for identified
academic skills (e.g., task engagement, accu-
racy, and completion) or behaviors (e.g., dis-
ruptive, out of seat) and then exchanged the
earned tokens for backup reinforcers (Alberto
& Troutman, 2003; Martin & Pear, 2003). The
first author and a doctoral student indepen-
dently coded the 55 articles for inclusion cri-
teria in separate spreadsheets. We included
studies if they (a) were published between the
years 1980 and 2014, (b) occurred in U.S.
public school classroom settings, (c) included
school-age children (i.e., 3 to 21 years old), (d)
were published in peer-reviewed journals, and
(e) included SCED with published data in
readable graphs. We elected to only include
studies published in peer-reviewed journals as
studies were reviewed and filtered for quality
to maintain standards, scientific merit, and va-
lidity (Voight & Hoogenboom, 2012). We as-
sessed for publication bias statistically (see
Publication Bias subsection).

Exclusion Criteria
We excluded 27 studies: 6 addressed

multicomponent interventions for which the
intervention data could not be disaggregated, 5
did not include a visual graph of data from
which raw data could be digitized, 4 were not
intervention studies, 4 were graduate theses, 3
included participants other than school-age
children, 2 were set in international class-
rooms, 1 was set in a classroom within a
residential treatment center, 1 focused on the
implementation of the TE by paraprofession-
als and did not include intervention data for
the children in the study, and 1 included the
intervention in the baseline data. After exclu-
sion of these studies, the literature search re-
sulted in 28 SCED studies in which a TE was
the intervention in a classroom setting with
school-age children.

Publication Bias
We examined the final set of selected

studies for publication bias (i.e., tendency of

School Psychology Review, 2016, Volume 45, No. 4

384

studies with null effects to not be published;
Rosenthal & DiMatteo, 2001) using the Egg-
er’s test (Egger, Smith, Schneider, & Minder,
1997). The Egger’s test evaluates Y inter-
cept � 0 using linear regression of the effect
against precision. The intercept for the Egg-
er’s test (1.04; 90% CI [– 0.03, 2.11]; p � .52)
indicated that statistically significant publica-
tion bias was not found in this sample. Heter-
ogeneity was measured using H and I2 (Hig-
gins & Thompson, 2002), where H � 1.5
(95% CI [1.2, 1.8]) and I2 � 54.0% (95% CI
[29.5, 70.0]), indicating less than notable het-
erogeneity in the sample with H values
above 1.5 considered not notable (Abramson,
2011).

Coding
We coded for four purposes: (a) inclu-

sion and exclusion in the review, (b) dem-
onstration of a functional relationship
between the independent and dependent
variables according to Horner et al. (2005)
and the WWC standards for SCED (design
quality; Kratochwill et al., 2010; see Table 1
for specific criteria), (c) moderator vari-
ables, and (d) descriptive reporting of pro-
cedural integrity or fidelity. Specific codes
are listed in Tables 1 and 2. Each of the
coded variables provided the basic data for
the reliability analyses.

We assessed reliability of data coding
by IOA checks between the two independent
raters (the first author and a doctoral student).
Coding-sheet training and trial coding were
performed for agreement between the two rat-
ers before reliability was calculated. Within a
discussion format, the two raters identified one
example and one nonexample of each coding
variable. If difficulties in this task arose be-
cause of lack of code clarity, the pair deliber-
ated until clarity was achieved with 100%
agreement. Official coding began when a min-
imum acceptable value of IOA (�80%) was
met (Hartmann, Barrios, & Wood, 2004).
Each rater coded every variable in all articles
independently as is commonly done in com-
puting intercoder reliability in meta-analyses
(Yeaton & Wortman, 1993).

We calculated Cohen’s � reliability
agreement for coding using NCSS (Hintze,
2004) by entering the agreement– disagree-
ment matrix for analysis. Kappa is a conser-
vative measure of reliability and perhaps even
underestimates agreement (Ary & Suen, 1989;
Strijbos, Martens, Prins, & Jochems, 2006).
Kappa was 0.96.

Phase 2: Design Quality Evaluation

We evaluated design quality based on
published guidelines (Horner et al., 2005;
Kratochwill et al., 2010) with a goal of strat-
ifying results across study quality. Two raters
(first and second authors) independently as-
signed ratings of weak, medium, and strong to
each included study. Weak ratings are equiv-
alent to the “does not meet standards” cate-
gory and include fewer than three opportuni-
ties for demonstration of an effect and fewer
than three data points per phase. Medium rat-
ings are equivalent to the WWC “meets stan-
dards with reservations” and include at least
three opportunities for demonstration of an
effect with at least three data points per phase
for reversal or multiple-baseline designs and
four for multielement designs and reporting
of IOA. Strong ratings are equivalent to the
WWC design standards that “meet criteria”
and include at least three opportunities for
demonstration of an effect with five or more
data points per phase and reporting of IOA.
Studies without three demonstrations of
effect do not meet criteria (e.g., ABCD,
ABBCC) as they do not provide three oppor-
tunities for demonstration of an effect. Simul-
taneous, multiple-probe, alternating-treatment
(e.g., ABCBD), and multielement designs
were coded as their underlying design (e.g.,
ABA, ABC). Horner et al. (2005) described
reporting the level of treatment integrity as
“highly desirable” (p. 174), but it was not a
tenet described by Kratochwill et al. (2010) to
meet the standards for design quality for
SCED. Thus, we did not include it in our
ratings of design quality, but we did report the
number of studies that stated the percent of
treatment integrity.

Main and Moderator Effects for Token Economies

385

T
ab

le
1

.

T
au

-U
E

ff
ec

t
S
iz

es
fo

r
E

ac
h

S
tu

d
y

b
y

Q
u

al
it

y
an

d
A

u
th

o
r

D
es

ig
n

Q
ua

li
ty

a
A

ut
ho

rs
an

d
Y

ea
r

O
ut

co
m

e
D
es
ig
n

P
ha

s

e
A

B
C

on
tr

as
ts

,
n

P
ar

ti
ci

pa
nt

s,
n

T
au

-U
95

%
C

I
A

ca
d

or
B

eh
av

T
yp

e

W
ea

k
F

il
ch

ec
k

et
al

.

,
20

04
B

eh
av

D
is

ru
pt

iv
e

A
B

A
C

C
M

1
17

0.
67

�0
.6

0,
1.

00

W
ea

k
H

im
le

et
al

.,
20

08
B

eh
av
D
is
ru
pt
iv
e

M
ul

ti
el

e

m
en

t
N

4
4

0.
65

�0
.3

4,
1.

00

W
ea

k
K

az
di

n
&

G
ee

se
y,

19
80

A
ca

d
T

as
k

en
ga

ge
m

en
t

S
im

ul
ta

ne
ou

s
N

2
2

1.
00

�0
.7

8,
1.

00

W
ea
k
K
az
di
n
&

M
as

ci
te

ll
i,

19
80
A
ca
d
T
as
k
en
ga
ge
m
en
t
S
im
ul
ta
ne
ou
s
N
2
2

0.
99

�0
.7

3,
1.

00

W
ea
k
K

li
m

as
&

M
cL

au
gh

li
n,

20
07

A
ca
d
A

ss
ig

nm
en

t
co

m
pl

et
io

n
A

B
C

M
3

1
1.

0

0
�0

.7
6,

1.
00

W

ea
k

M
il

le
rs

m
it

h
et

al
.,

20
13

A
ca
d
A
ss
ig
nm
en

t
ac

cu
ra

cy
R

ev
er

sa
l

N
2

1
.9

7
�0

.3
6,

1.
00

W
ea
k

R
os

en
be

rg
,

19
86

B
eh

av
D

is
ru

pt
iv

e
A
B
C

N
5

5
0.

99
�0

.6
9,

1.
00

W
ea
k

S
al

en
d

&
A

ll
en

,
19

85
B

eh
av
D
is
ru
pt
iv
e
A
B

C
B

C
N

2
2
1.
00
�0
.6

2,
1.

00

W
ea

k
S

ra
n

&
B

or
re

ro
,

20
10

A
ca
d
T
as
k

ac
cu

ra
cy

A
B

C
D

N
12

4
0.

40
�0

.2
2,

0.
55

W
ea
k

S
te

ve
ns

,
S

id
en

er
,

R
ee

ve
,

&
S

id
en
er
,

20
11

A
ca
d
T
as
k
ac
cu
ra
cy
M
ul

ti
pr

ob
e

de
si

gn
M

4
2

1.
00
�0
.6
0,
1.
00

W
ea
k
S

ul
li

va
n

&
O

’L
ea

ry
,

19
90

B
eh
av
D
is
ru
pt
iv
e
A

B
B

C
C

N
2
1
1.

00
�0

.5
7,

1.
00

M

ed
iu

m
C

ar
ne

tt
et

al
.,

20
14

B
eh
av
D
is
ru
pt
iv
e
A

lt
er

na
ti

ng
-t

re
at

m
en

t
de

si
gn

G
2

1
1.
0
�0

.4
5,

1.
00

M
ed

iu
m

D
e

M
ar

ti
ni

-S
cu

ll
y

et
al
.,
20

00
B

eh
av
D
is
ru
pt
iv
e

M
B

D
N

2
2

0.
95

�0
.6
3,
1.
00

M
ed
iu
m

H
ig

gi
ns

et
al
.,
20

01
B

eh
av
D
is
ru
pt
iv
e
M
B

D
M

3
1

0.
98

�0
.6
3,
1.
00

M
ed
iu
m

Jo
ne

s
et

al
.,
20
13
B
eh
av
D
is
ru
pt
iv
e
A

B
A

B
N

2
2

.7
2

�0
.2

2,
1.
00

M
ed
iu
m

M
cG

oe
y

&
D

uP
au

l,
20

00
B
eh
av
D
is
ru
pt
iv
e

R
ev

er
sa

l
M

4
4

0.
75

�0
.4

2,
1.
00

M
ed
iu
m
S
im

on
,

A
yl

lo
n,

&
M

il
an

,
19

82
B

eh
av
D
is
ru
pt
iv
e
A
B
C
B

N
3

1
0.

78
�0

.3
8,

1.
00

M
ed
iu

m
S

m
it

h
&

F
ow

le
r,

19
84

B
eh
av
D
is
ru
pt
iv

e
M

B
D

N
6

6
0.

92
�0

.6
5,

1.
00

M
ed
iu

m
T

ho
m

ps
on

,
M

cL
au

gh
li

n

,
&

D
er

by
,

20
11
B
eh
av
D
is
ru
pt
iv
e
M
B
D
N
3
1
.9

5
�0

.5
8,

1.
00

M
ed
iu
m

T
ru

ch
li

ck
a,

M
cL
au
gh
li
n,
&
S

w
ai

n,
19

98
A

ca
d

T
as

k
ac

cu
ra

cy
M

B
D
M
3

3
0.

35
�0

.1
2,

0.
58

S
tr

on
g

C
en

te
r

&
W

as
co

m
,

19
84
A
ca
d
T
as
k
ac
cu
ra
cy
R
ev
er
sa

l
N

5
5

0.
71

�0
.4

8,
0.

94

S
tr
on
g

C
on

ye
rs

et
al
.,
20
04
B
eh
av
D
is
ru
pt
iv
e

A
lt

er
na

ti
ng

-t
re

at
m

en
t
de
si

gn
N

2
2

0.
83

�0
.5

2,
1.
00

S
tr
on
g

M
ag

li
o

&
M
cL
au
gh
li
n,
19

81
B

eh
av
D
is
ru
pt
iv
e
R
ev
er
sa
l
M

1
1

1.
00
�0
.7

9,
1.

00

(T
ab

le
1

co
nt

in
ue

s)
School Psychology Review, 2016, Volume 45, No. 4
386

T
ab

le
1
.

C
o
n

ti
n

u
ed

D
es
ig
n
Q
ua
li
ty
a
A
ut
ho
rs
an
d
Y
ea
r
O
ut
co
m
e
D
es
ig
n
P
ha

se
A

B
C
on
tr
as
ts
,
n
P
ar
ti
ci
pa
nt
s,
n
T
au
-U
95
%
C
I
A
ca
d
or
B
eh
av
T
yp
e
S
tr
on
g

M
ot

tr
am

,
B

ra
y,

K
eh

le
,

B
ro

ud
y,

&
Je

ns
on

,
20

02
B

eh
av
D
is
ru
pt
iv
e
M
B
D
M

3
3

0.
99

�0
.8

3,
1.
00

S
tr
on
g

M
us

se
r,

B
ra

y,
K

eh
le

,
&

Je
ns

on
,

20
01

B
eh
av
D
is
ru
pt
iv
e
M
B
D
M
3
3
0.

88
�0

.5
3,

1.
00

S
tr
on
g

R
ei

tm
an

et
al
.,
20
04
B
eh
av
D
is
ru
pt
iv
e
A
lt
er
na
ti
ng
-t
re
at
m
en
t
de
si
gn
N
3
3

0.
69

�0
.4

0,
0.

95

S
tr
on
g
S
al
en
d

&
L

am
b,

19
86
B
eh
av
D
is
ru
pt
iv

e
R

ev
er
sa
l

M
2

9
1.

00
�0
.5
3,
1.
00

S

tr
on

g
S

al
en

d,
T

in
tl

e,
&

B
al

be
r,

19
88

A
ca
d
T
as
k
en
ga
ge
m
en
t
R
ev
er
sa
l
N
2
2
1.
00
�0
.5
3,
1.
00

O
ve

ra
ll

88
90

0.
82

�0
.7

7,
0.

88

N
o
te

.
A

B
co

nt
ra

st
s

w
er

e
us

ed
in

ef
fe

ct
si

ze
ca

lc
ul

at
io

ns
.

T
he

po
ss

ib
le

ra
ng

e
fo

r
C

I
is

0
to

1.
A


ba

se
li

ne
;

A
ca

d

ac
ad

em
ic

;
B


fi

rs
t

in
te

rv
en

ti
on

;
C


se

co
nd

in
te
rv
en
ti
on

;
D


th

ir
d

in
te
rv
en
ti
on
;
B
eh
av


be

ha
vi

or
al

;
C

I

co
nfi

de
nc

e
in

te
rv

al
;

G

au
th

or
s

in
cl

ud
ed

ge
ne

ra
li

za
ti

on
ph

as
e,

M

au
th
or
s
in
cl
ud
ed

m
ai

nt
en

an
ce

ph
as

e;
M

B
D


m

ul
ti

pl
e-

ba
se

li
ne

de
si

gn
;

N

au
th
or
s
in
cl
ud
ed

ne
it

he
r

ge
ne
ra
li
za
ti
on
ph

as
e

no
r

m
ai
nt
en
an
ce
ph
as

e.
a
S
tr

o
n
g

in
di

ca
te

s
at

le
as

t
th

re
e

de
m

on
st

ra
ti

on
s

of
an

ef
fe

ct
w

it
h

fi
ve

or
m

or
e

da
ta

po
in

ts
pe

r
ph

as
e

an
d

re
po

rt
in

g
of

in
te

ro
bs

er
ve

r
ag

re
em

en
t;

m
ed

iu
m

,
at

le
as
t
th
re
e
de
m
on
st
ra
ti
on
s

of
ef

fe
ct

w
it

h
th

re
e

to
fo

ur
da

ta
po

in
ts

pe
r

ph
as

e
an

d
re

po
rt

in
g

of
in

te
ro

bs
er

ve
r

ag
re

em
en

t;
an

d
w

ea
k,

fe
w

er
th

an
th

re
e
de
m
on
st
ra
ti
on
s
of
an
ef
fe

ct
an

d
th

re
e
to
fo
ur
da
ta
po
in
ts
pe
r
ph
as
e.
Main and Moderator Effects for Token Economies

387

Phase 3: ES, CIs, and Stratification

Calculation

To calculate the Tau-U, we extracted
raw data from the graphs and figures of the
included studies. We used a computer scanner
and software program to view and assign
data values electronically. Digitizing the
data resulted in an exact reconstruction of
the original graphic data providing numeric
raw data to enable proper comparisons
(Glass, 1976). All SCED graphs of included
articles were digitized with GetData Graph
Digitizer (Version 2.21).

ES and Visual Analysis
We calculated Tau-U for each individ-

ual contrast between the baseline (e.g., A1)
and the adjacent intervention contrast (e.g.,
B1) for each unit of analysis (i.e., student,
class, behavior). Tau-U is derived from Ken-
dall’s � and Mann-Whitney U (see Parker et
al., 2011) and is calculated by merging trend
and nonoverlap data. We completed calcula-
tions using the online Tau-U calculator (Van-
nest, Parker, & Gonen, 2011), selecting the
analysis to “control for baseline trend,” for
individual ES (i.e., an ES for each participant,
behavior, or setting). The individual ESs were
entered into the statistical program WinPEPI

for analysis to produce the “combined” omni-
bus ES and 95% CI (Abramson, 2011). The
algorithm for WinPEPI to calculate the overall
ES is the weighted average of all individual
ESs, with weights equaling the inverse of the
variance (i.e., not standard error). We coded
the Tau-U effects as small (0 – 0.65), medium
(0.66 – 0.92), and large (0.93–1.00), which are
equivalent to ranges recommended for non-
overlap of all pairs (Parker et al., 2009) to
compare Tau-U to effects garnered through
visual analysis, our next step.

We visually analyzed data from each
study to determine whether a functional rela-
tionship existed between the independent and
dependent variables based on recommenda-
tions from Kratochwill et al. (2010). Compar-
ing the ES to visual analysis in SCED studies
increases the credibility of the ES (Parker &
Vannest, 2012). This is in part because (a)
visual analysis is the traditionally accepted
approach to SCED analysis (Kratochwill et al.,
2010) and (b) the documentation of concurrent
validity between a visual analysis and an ES
is a component of a bottom-up approach
(Ninci et al., 2015; Parker & Vannest, 2012).
The first and second authors visually analyzed
the data following procedures recommended
by Kratochwill et al. (2010) with additional

Table 2. Summary of Moderators

Study
Characteristic Category

Studies,
n

AB Contrasts,
na

Participants,
n Tau-U SE 95% CI z scoreb p valueb

Setting Special 16 47 47 0.89 0.05 �0.80, 0.98
General 12 41 43 0.86 0.07 �0.74, 0.98 0.04 .70

Age 3–5 years 6 29 22 0.64 0.06 �0.53, 0.75
6–15 years 22 64 68 0.91 0.04 �0.84, 0.98 2.33 .02*

Outcome Academic 8 29 21 0.89 0.08 �0.73, 1.00
Behavioral 20 59 73 0.93 0.07 �0.75, 1.00 0.04 .69

Response
cost Yes 16 47 47 0.84 0.06 �0.75, 0.95

No 12 41 43 0.91 0.05 �0.81, 1.00 0.99 .32
Verbal cue Yes 9 25 20 0.83 0.06 �0.74, 0.92

No 19 63 70 0.88 0.28 �0.33, 1.00 0.20 .84

Note. A � baseline; B � intervention; CI � confidence interval.
aNumber of AB contrasts used in effect size calculation.
bReliable difference z-test scores and corresponding p values are reported for the moderators.
* p
.05.

School Psychology Review, 2016, Volume 45, No. 4

388

guidance from Lane and Gast (2014). First, we
analyzed within-phase data by visually in-
specting the (a) level (i.e., mean, median); (b)
trend (i.e., baseline trend, slope of the best
fitting line); and (c) variability (i.e., bounce) of
the data around the line within each phase.
Second, we analyzed between-phase data by
visually inspecting the (a) immediacy of the
effect of a TE by observing the level change
between the data at the end of a phase (last
three data points) and the beginning of the
next phase (first three data points), (b) fre-
quency of overlap between two phases by
determining the number of data points in one
phase that overlapped with the adjacent phase,
and (c) consistency of the data across similar
phases.

From these analyses, cumulatively, we
conceptualized the effect of a TE derived from
each study as follows: (a) no effect, with no
evidence of a functional relation between the
independent and dependent variables; (b)
weak effect, with some evidence of a func-
tional relation with latency between the intro-
duction of the independent variable, variabil-
ity of data in the baseline and/or intervention
phases, overlap between adjacent phases, and
variability of data in similar phases; (c) me-
dium effect, with mixed evidence of a func-
tional relation with either latency between the
introduction of the independent variable, vari-
ability of data in the baseline and/or interven-
tion phases, overlap between adjacent phases,
or variability of data in similar phases; or (d)
strong effect, with clear evidence of a func-
tional relationship demonstrated by a mean
consistent level in the baseline indicating the
need for intervention, evidence of an immedi-
ate effect between phases with a positive
trend, minimal variability in all phases, no
overlap between phases, and clear consistency
between similar data phases.

We individually completed these analy-
ses, and initial agreement between observers
was 95%. When there was disagreement be-
tween the two, discussion occurred until a
consensus was reached with 100% agreement.
We then compared the magnitude of effect
from visual analysis (i.e., no, weak, medium,

or strong) with the magnitude of effect from
Tau-U (i.e., small, medium, or large).

Phase 4: Moderator Analyses

We coded moderator data for setting
(special education, general education), age of
participant (preschool children ages 3 to 5
years and school-age children ages 6 to 15
years), outcome (academic, such as task accu-
racy or task engagement, or behavioral, such
as out of seat or disruptive), and two proce-
dural differences (i.e., RC and verbal cueing).
Following the procedures of Bowman-Perrott
et al. (2014), we analyzed moderator effects
by dichotomously coding the moderator vari-
ables within the studies and examining statis-
tically significant differences between the ES
(Tau-U) of studies within each category.

We calculated a reliable difference (i.e.,
difference that cannot be accounted for solely
by chance) for each moderator pair to deter-
mine if the differences were statistically
significant using the following formula:
(L1 – L2)/�[(SETau1

2) � (SETau2
2)], where

L1 is the first level of the moderator (e.g.,
academic outcome) and L2 is the second level
of the moderator (e.g., behavioral outcome).
Specifically, we compared effects for a TE in
general and special education settings, for
children ages 3 to 5 years (preschool age)
and 6 to 15 years (school age), for academic
and behavioral outcomes, for use or nonuse of
RC, and for use or nonuse of verbal cueing.
Reliable-difference z-test scores and p values
are reported in the Results section.

RESULTS

We conducted this study in four phases
and organized the results accordingly. Results
are included below for the literature review and
study selection, design quality, effect size and
stratification, and the moderator analysis.

Phase 1: Literature Review and Study
Selection

Twenty-eight studies met the inclusion
criteria and included 90 students and 88
opportunities for demonstrations of effect

Main and Moderator Effects for Token Economies

389

(A1B1). Seventy-nine percent of the studies
were implemented prior to 2005 when the
quality indicators were published, and of those
studies, 50% were published between 1980
and 1989. Forty-three percent of the studies
were implemented in general education class-
rooms, and 57% were implemented in special
education classrooms. Seventy-nine percent of
the studies included children ages 6 to 15
years, and 21% included children ranging
from 3 to 5 years old. Seventy-one percent of
the studies used behavioral outcome measures
(e.g., disruptive, talking out, out of seat), and
29% used outcome measures of academic be-
haviors (e.g., task accuracy, completion, en-
gagement). RC was used in 16 studies (57%).
Verbal cueing was used in nine studies (31%).
Six designs (series phase; multiple baseline;
alternating intervention design; reversal de-
sign; simultaneous; multiple probe) were uti-
lized in the studies (see Table 1). Ten studies
(36%) reported using a follow-up or mainte-
nance phase, and one (3%) reported on gener-
alizing a TE to another setting. Seventeen
studies (61%) did not include maintenance to
monitor the use of a TE on the outcome mea-
sure (see Table 1).

Phase 2: Design Quality

Each of the 28 studies was visually an-
alyzed for quality using the rubric of internal
validity described in the Method section.
Eleven studies were rated as weak quality,
nine as medium quality, and eight as strong
quality. Of the 20 weak- and medium-quality
studies, 16 had insufficient demonstrations of
effect, 8 had at least three to four data points
per phase, and 2 did not report IOA. Integrity
data were collected in 10 studies. Integrity
ranged from 30% to 100% (M � 86.14%,
SD � 23.36%).

Phase 3: ES, CIs, and Stratification

Tau-U was calculated for 88 baseline
versus intervention contrasts (A1 versus B1)
controlling for baseline trend for the 28 stud-
ies. The weighted mean Tau-U of the TE
was 0.82 (SE � 0.03; 95% CI [0.77, 0.88])
and ranged from 0.35 to 1.00. Figure 1 illus-

trates the range of ESs and 95% CIs for each
study. Categorically, 19 of the studies had a
Tau-U of 0.80 or above, 7 studies were be-
tween 0.50 and 0.79, and 2 studies fell be-
low 0.50 (i.e., 0.20 and 0.49).

ES Stratified by Design Quality
Stratifying the ESs by quality level pro-

duced the following scores: Eleven studies in
the weak-quality range had a combined Tau-U
of 0.77 (SE � 0.05; 95% CI [0.67, 0.87]). The
nine studies with a medium-quality rating had
a combined Tau-U of 0.84 (SE � 0.04; 95%
CI [0.76, 0.93]). The eight studies categorized
as strong quality had a combined Tau-U
of 0.85 (SE � 0.06; 95% CI [0.74, 0.97]).
Medium- and strong-quality studies produced
a large Tau-U of 0.84 (SE � 0.03; 95% CI
[0.78, 0.91]).

Visual Analyses
We visually analyzed 88 baseline and

intervention phases and contrasts. Results in-
dicated baseline trend in 82% of baseline
phases (n � 72), variability in 73% of baseline
and intervention phases (n � 65), a strong
immediate effect in 75% of phase changes
(n � 66), a weak to medium immediate effect
in 15% of phase changes (n � 13), and no
immediate effect in 10% of phase changes
(n � 9). In addition, 40% (n � 35) included
overlapping data in adjacent phases.

From these visual analyses, we deter-
mined that 58% of phase contrasts (n � 51)
represented strong effects, 13% (n � 11) rep-
resented medium effects, 22% (n � 19) rep-
resented weak effects, and 8% (n � 7) repre-
sented no effect. These results showed a 75%
agreement (n � 66) with Tau-U. In compari-
son to visual analysis, Tau-U slightly over-
rated the effect on 14 occasions, in which the
author team judged the data to represent (a) no
effect compared to small effect for Tau-U on
six occasions, (b) a weak effect compared to a
medium Tau-U on three occasions, and (c) a
medium effect compared to a large Tau-U on
five occasions. In comparison to visual analy-
sis, Tau-U slightly underrated the effect on
five occasions, in which the author team
judged the data to represent (a) a medium

School Psychology Review, 2016, Volume 45, No. 4

390

effect to a small Tau-U on two occasions and
(b) a strong effect to a medium Tau-U on three
occasions.

Phase 4: Moderator Analyses

The results of five potential moderator
analyses are presented as follows. (Also see
Table 2.)

Setting
Twelve studies with 43 participants

and 41 phase contrasts were coded for general
education and 16 studies with 47 participants
and phase contrasts were coded for special

education. Results indicated that studies in
general education settings had a lower ES
(0.86; SE � 0.07; 95% CI [0.74, 0.98]) than
special education settings (ES � 0.89; SE �
0.05; 95% CI [0.80, 0.98]). When the two cate-
gories were compared for parameter estimates,
overlapping CIs indicated there may not be a
statistically significant difference, which was
confirmed by the values from the reliable-differ-
ence formula: z � 0.04, p � .70.

Age
The preschool category of children

ages 3 to 5 years contained six studies, 22

Figure 1. Forest Plot of ESs for 28 Included Studies and Overall ES

Note. ES � effect size; Min CI � minimum confidence interval; Max CI � maximum confidence interval.

Main and Moderator Effects for Token Economies

391

participants, and 29 phase contrasts. The
school-age category for ages 6 to 15 years
contained 22 studies, 68 participants, and 64
phase contrasts. Results indicated that studies
for ages 3 to 5 years had a lower ES (0.64;
SE � 0.06; 95% CI [0.53, 0.75]) than for
ages 6 to 15 years (ES � 0.91; SE � 0.04;
95% CI [0.84, 0.98]). When the two categories
were compared for parameter estimates, non-
overlapping CIs were observed indicating that
the moderator might be statistically signifi-
cantly different. The values from the reliable-
difference formula were z � 2.33 and p � .02,
confirming statistically significant results.

Outcome Measures (Academic, Behavioral)
Eight studies with 21 participants and 29

phase contrasts were coded as academic out-
comes (e.g., task accuracy, task engagement;
see Table 1). Twenty studies with 73 partici-
pants and 59 phase contrasts were coded as
behavioral outcomes (e.g., disruptive behav-
ior, noncompliance). Results indicated that
studies that included academic outcomes had a
lower ES (0.89; SE � 0.08; 95% CI
[0.73, 1.00]) than studies that included behav-
ioral outcomes (ES � 0.93; SE � 0.07; 95%
CI [0.75, 1.00]). When the two categories
were compared for parameter estimates, over-
lapping CIs indicated there may not be a sta-
tistically significant difference, which was
confirmed by the values from the reliable-
difference formula: z � 0.04, p � .69.

Response Cost
Sixteen studies that used RC with 47

participants and 47 phase contrasts were ag-
gregated and compared with the 12 studies
with 43 participants and 41 phase contrasts
without RC. Results indicated that studies that
included RC had a lower ES (0.84; SE � 0.06;
95% CI [0.75, 0.95]) than studies that did not
include RC (ES � 0.91; SE � 0.05; 95% CI
[0.81, 1.00]). When the two categories were
compared for parameter estimates, overlap-
ping CIs indicated there may not be a sta-
tistically significant difference, which was
confirmed by the values from the reliable-
difference formula: z � 0.99, p � .32.

Verbal Cueing
Nine studies that included verbal cueing

with 20 participants and 25 phase contrasts
were aggregated and compared with 19 studies
including 70 participants and 63 phase
changes without verbal cueing. Results indi-
cated that studies using verbal cueing pro-
duced a lower ES (0.83; SE � 0.06; 95% CI
[0.74, 0.92]) than studies without verbal cue-
ing (ES � 0.88; SE � 0.28; 95% CI
[0.33, 1.00]). When the two categories were
compared for parameter estimates, overlap-
ping CIs indicated there may not be a statisti-
cally significant difference, which was con-
firmed by the values from the reliable-differ-
ence formula: z � 0.20, p � .84.

DISCUSSION

This meta-analysis synthesized and an-
alyzed the findings from SCED studies to
evaluate the effectiveness of TEs in public
schools from 1980 to 2014. We specifically set
out to (a) evaluate the quality of the design, (b)
calculate ESs and CIs, (c) stratify the results
across quality, and (d) evaluate moderator
variables of peer-reviewed literature, with a
focus on both academic outcomes (e.g., task
accuracy, task engagement) and behavioral
outcomes (e.g., disruptive behavior, noncom-
pliance). Supporting and contributing infor-
mation to previous findings (Dickerson et al.,
2005; Matson & Boisjoli, 2009), these results
suggest that a TE is an effective intervention,
specifically for use in classroom settings.

We first sought to evaluate the quality of
the design in SCED on TEs. We evaluated the
design quality, and contrary to Maggin et al.
(2011), who found 30% of studies with de-
signs of medium to strong quality, we found
64% rated as medium to strong. The remain-
der of studies were rated as low primarily
because of insufficient demonstration of ef-
fects, lack of reporting of sufficient informa-
tion (e.g., IOA), and the number of data points
per phase. We found a large majority of in-
cluded studies reported IOA. However, only a
third of the studies reported treatment fidelity.
This difference can, at least partially, be ex-
plained by the different methodologies uti-

School Psychology Review, 2016, Volume 45, No. 4

392

lized, which resulted in inclusion of unique
studies. Maggin et al. included studies that had
not been through a peer-review process in an
attempt to eliminate publication bias. We
elected to statistically address publication
bias. Thus, Maggin et al. included 10 studies
published prior to 1980, two dissertations, and
one study presented at a conference, and our
review did not include studies published prior
to 1980 and only included studies published
after the peer-review process. Thus, there were
only four overlapping studies between this
meta-analysis and the Maggin et al. meta-
analysis. Therefore, it appears that the quality
of the research is increasing with time and
through the peer-review process.

The second question addressed by this
study was the overall effectiveness of TEs.
The overall Tau-U was large from 88 individ-
ual ESs. Similar to Maggin et al. (2011), re-
sults provide preliminary evidence that a TE is
an effective intervention. However, it is im-
portant to note that we chose to evaluate the
effectiveness of a TE with Tau-U to increase
the trustworthiness of our results as 82% of the
included studies showed a positive baseline
trend, which is a threat to internal validity
addressed by Tau-U. These results suggest
further analytic evidence that a TE is effective
at reducing challenging behaviors. In addition,
results imply that a TE is an intervention that
can be utilized to increase academic readiness
skills in school settings.

The third question addressed by this
meta-analysis was whether effects differed
across design quality. When stratified, the ES
for studies considered medium or large quality
was strong. Given the results, methodological
quality did not appear to explain differences in
the effectiveness of the intervention; rather,
methodological quality seemed to only explain
the extent to which the study could be repli-
cated. From these results, it is apparent that
although some design quality issues are evi-
dent, a majority of the data and designs are
sufficient for a TE to be preliminarily consid-
ered an evidence-based intervention for imple-
mentation in classrooms.

The fourth question addressed was po-
tential moderators. Results supported our a

priori hypotheses that neither setting nor out-
come nor addition of RC or verbal cueing
moderated the effects of a TE. A TE appeared
to be equally as effective in general and spe-
cial education classes, to target academic and
behavioral outcomes, and with and without
RC or verbal cueing. However, it does seem
that a TE is slightly more effective with older
children than with younger children.

The finding that setting did not moderate
the effects of a TE is important. If the TE is an
intervention needed by a student who receives
special education services, it appears that re-
sults provide initial support for use of a TE in
general or special education settings. How-
ever, this finding is not consistent with the
report of DuPaul, Eckert, and McGoey (1997),
who found that interventions had a greater
impact on behavior when they were imple-
mented in special education classrooms as op-
posed to implementation in general education
classrooms. Thus, more research is needed to
determine if a TE can be implemented effec-
tively in both general and special education
classrooms.

Data suggest the only statistically signif-
icant moderator was the age of the participant.
Results indicated a TE was more effective for
children age 6 years and above. Nonoverlap-
ping CIs were observed in this moderator,
indicating the difference was statistically sig-
nificant. A TE was effective with both groups
of children; however, it appears to be most
effective with older children. This finding
might be contributed to the probability of
older students understanding the procedures
and being able to identify items or activities
that would be motivating as backup reinforc-
ers. This finding is consistent with previous
research (Brumfield & Roberts, 1998; Shriver
& Allen, 1997) and suggests that the age of the
child may be predictive of compliance with
this intervention. The implication for this
seems to be that implementers need to spend
additional time on training, modeling, and
practice prior to use with younger children to
increase the likelihood of the child under-
standing the concept of backup reinforcers.

Furthermore, the finding that outcomes
did not moderate the effect of a TE is mean-

Main and Moderator Effects for Token Economies

393

ingful. The equality of a TE for both academic
(e.g., task accuracy, task engagement) and be-
havioral outcomes implies a single interven-
tion may be incorporated for dual targets,
which is likely to simplify the procedures for
increased efficiency. Thus, practitioners can
use TEs to target various outcomes.

Past studies have evaluated the effects
across procedural differences, and our findings
contribute to mixed results in this area. Spe-
cifically, our findings suggest that a TE is
effective with or without the use of RC. This
finding differs from those who have found RC
increases effectiveness (Center & Wascom,
1984; De Martini-Scully et al., 2000). Further-
more, while our findings indicate a TE with
and without verbal cues is equally effective,
Kluger and DeNisi (1996) found that to min-
imize the negative effects of the use of verbal
cues (e.g., decreases internal motivation,
draws attention to negative), cueing should
only be used during goal setting. In addition,
the researchers found if more verbal cueing
was needed, there was a lack of understanding
of or confusion about the expected behaviors.
These findings may be affected by the strength
of the token for expected behaviors and the
exchange for the secondary reinforcer. As
such, practitioners are advised to consider the
importance of the secondary reinforcer to the
student and change the secondary reinforcer
when it is no longer motivating. Further study
is needed to clarify these findings; however,
we are encouraged that practitioners might be
able to simplify a TE by eliminating RC and
verbal cueing.

Several findings of the visual analysis
are noteworthy and slightly temper the confi-
dence with which we can say definitively that
the studies strongly support TEs. However,
Tau-U addresses some of the concerns. First, a
majority of the studies had a positive baseline
trend. This finding could be problematic be-
cause of potential issues with internal validity.
However, while this trend influenced out-
comes of visual analyses, it did not influence
Tau-U, as we controlled for baseline trend
within the analyses. Second, 67% of the
phases had variable data, and 40% of adjacent
phases had overlapping data points. These

findings indicated fluctuation in participant per-
formance and affected our decisions regarding
the functional relationship between a TE and
the outcomes. However, 78.70% of calculated
Tau-U and decisions made through visual
analysis were in agreement, providing stron-
ger support for our findings.

Although results should be interpreted
with caution because of low design quality,
the current meta-analysis extends the empiri-
cal literature suggesting the potential for a TE
to be effective in classroom settings. Findings
indicate medium to large effects for a TE
overall when used to achieve academic and
behavioral outcomes and when used with or
without RC or verbal cues. Aggregated results
across design quality reveal preliminary sup-
port for use of TEs in special and general
education classrooms. Professionals serving
children in classroom settings may be encour-
aged by the finding that a TE does not have to
include additional components (e.g., verbal
cueing and RC), thus minimizing the com-
plexity of the intervention.

Limitations

Limitations of the current meta-analysis
should be mentioned. First, studies were only
included that were published in peer-reviewed
journals as we were interested in the design
quality of published studies. We conducted the
Egger’s test to evaluate the sample for publi-
cation bias; however, caution should be given
to interpretations because of the small number
of included studies. Second, although Wolery
(2013) and others suggested treatment integ-
rity be included as a component of design
quality, we elected not to include it in our
measure to align coding with the current stan-
dards for demonstrating a functional relation-
ship between the dependent and independent
variables (i.e., Horner et al., 2005; Kratochwill
et al., 2010). Third, as with any nonparametric
measure of effect, Tau-U has some limitations
including ceiling effects, as demonstrated here
with 9 studies hitting the upper ceiling and 23
studies with CIs that hit the upper ceiling.
Using nonparametric effects, in this study,
outweighed the limitations. Fourth, age group-

School Psychology Review, 2016, Volume 45, No. 4

394

ings may potentially have limited this study.
Ages 6 to 15 years is a broad range, and results
may vary within the category. Finally, al-
though design quality was rated as medium to
strong for a majority of our studies, results
should be interpreted in light of the design
quality weaknesses of the remaining studies.

Implications for Practice and Future
Research

Educators and professionals who work
with children in schools struggle for econom-
ical (use of time, training, money, personnel,
expertise) interventions. Within school-wide
behavior programs, students frequently re-
ceive tokens, often in the form of “bucks” that
can be traded in at a school-level store. In
addition, a similar system could be put in
place in a classroom setting. A TE can be built
within an individual student behavior contract
(see Soares, Cegelka, & Payne, 2016). Stu-
dents can earn tokens and work toward having
a sufficient number of tokens to purchase a
desired item. A TE is effective with and with-
out RC. Therefore, the procedure can be 100%
positive with no removal of reinforcers.

More and varied research on the effects
of a TE is needed. Future studies should cal-
culate more than one ES (Kratochwill et al.,
2010) and examine the level of treatment in-
tegrity correlated with intervention effective-
ness. We found that only 10 studies out of 28
reported levels of treatment integrity, and
Maggin et al. (2011) found that 2 of 24 studies
reported treatment fidelity. When authors do
not report treatment integrity, implementation
is assumed. Low levels of treatment integrity
in schools are concerning. However, reports of
low treatment fidelity in educational settings
are frequent (Becker & Domitrovich, 2011;
Riley-Tillman & Eckert, 2001) and potentially
decrease the likelihood of effectiveness. Our
finding suggests that a TE is effective and
based on implementation of the intervention as
designed. Researchers are only beginning to
evaluate the level of integrity that is needed to
achieve effects with evidence-based interven-
tions in educational settings outside of the
tight controls of research (see Owens et al.,

2014). Thus, this is an area of caution and one
in which much research is needed. In addition,
research is needed to identify the levels of
professional development and ongoing coach-
ing required to maintain reliable implementa-
tion and generalization across behaviors and
settings. Furthermore, researchers need to re-
fer to the current quality standards when de-
signing and implementing SCED.

REFERENCES

References marked with an asterisk indicate
studies included in the meta-analysis.
Abramson, J. H. (2011). WINPEPI updated: Computer

programs for epidemiologists, and their teaching
potential. Epidemiological Perspective & Innova-
tions, 8, 1–9. doi:10.1186/1742-5573-8-1. Retrieved
from http://archive.biomedcentral.com/1742-5573/
content/8/1/1

Akin-Little, K. A., & Little, S. G. (2004). Re-examining
the over justification effect: A case study. Journal of
Behavioral Education, 13, 179 –192. doi:10.1023/
B:JOBE.0000037628.81867.69

Alberto, A. A., & Troutman, A. C. (2003). Applied be-
havior analysis for teacher (6th ed.). Upper Saddle
River, NJ: Merrill Prentice Hall.

Ary, D., & Suen, H. K. (1989). Analyzing qualitative
behavioral observational data. Mahwah, NJ: Law-
rence Erlbaum Associates.

Balcazar, F., Hopkins, B. L., & Suarez, Y. (1985). A
critical, objective review of performance feedback.
Journal of Organizational Behavior Management,
7(3– 4), 65– 89. doi:10.1300/J075v07n03_05

Becker, K. D., & Domitrovich, C. E. (2011). The concep-
tualization, integration, and support of evidence-based
interventions in the schools. School Psychology Re-
view, 40(4), 582–589. Retrieved from http://eric.ed.
gov/?id�EJ962068

Bippes, R., McLaughlin, T. F., & Williams, R. L. (1986).
A classroom token system in a detention center: Ef-
fects for academic and social behavior. Techniques: A
Journal for Remedial Education and Counseling, 2,
126 –132. Retrieved from http://psycnet.apa.org/
psycinfo/1987-14003-001

Bowman-Perrott, L., Burke, M., Zhang, N., & Zaini, S.
(2014). Direct and collateral benefits of peer tutoring
on social and behavioral outcomes: A meta-analysis of
single-case studies. School Psychology Review, 43,
260 –285. Retrieved from http://bmo.sagepub.com/
content/early/2014/09/25/0145445514551383.abstract

Brumfield, B. D., & Roberts, M. W. (1998). A comparison of
two measurements of child compliance with normal pre-
school children. Journal of Clinical Child Psychology,
27(1), 109 –116. doi:10.1207/s15374424jccp2701_12

Busk, P. L., & Serlin, R. C. (1992). Meta-analysis for
single-case research. In T. R. Kratochwill & J. R.
Levin (Eds.), Single-case research design and analy-
sis: New directions for psychology and education (pp.
187–212). Hillsdale, NJ: Erlbaum.

Carlson, C. L., Pelham, W. E., Milich, R., & Dixon, J.
(1992). Single and combined effects of methylpheni-
date and behavior therapy on the classroom perfor-

Main and Moderator Effects for Token Economies

395

mance of children with attention-deficit hyperactivity
disorder. Journal of Abnormal Child Psychology, 20,
213–232. doi:10.1007/BF00916549

*Carnett, A., Raulston, T., Lang, R., Tostanoski, A., Lee,
A., Sigafoos, J., & Machalicek, W. (2014). Effects of a
perseverative interest-based token economy on chal-
lenging and on-task behavior in a child with autism.
Journal of Behavioral Education, 23(3), 368 –377. doi:
10.1007/s10864-014-9195-7

Cavalier, A. R., Ferretti, R. P., & Hodges, A. E. (1997).
Self-management within a classroom token economy
for students with learning disabilities. Research in De-
velopmental Disabilities, 18, 167–178. doi:10.1016/
S0891-4222(96)00045-5

*Center, D. B., & Wascom, A. (1984). Transfer of
reinforcers: A procedure for enhancing response
cost. Educational and Psychological Research, 4,
19 –27. Retrieved from http://davidcenter.com/
documents/Publications/39

Christensen, L., Young, K. R., & Marchant, M. (2004).
The effects of a peer-mediated positive behavior sup-
port program on socially appropriate classroom behav-
ior. Education & Treatment of Children, 27, 199 –234.
Retrieved from http://www.jstor.org/stable/42900544

*Conyers, C., Miltenberger, R. G., Gubin, A., Barenz,
R., Jurgens, M., Sailer, A., . . . Kopp, B. (2004). A
comparison of response cost and differential rein-
forcement of other behavior to reduce disruptive
behavior in a preschool classroom. Journal of Ap-
plied Behavior Analysis, 37, 411– 415. doi:10.1901/
jaba.2004.37-411

Cooper, H. M., & Hedges, L. V. (Eds.). (1994). The
handbook of research synthesis. New York, NY: Rus-
sell Sage Foundation.

*De Martini-Scully, D., Bray, M. A., & Kehle, T. J.
(2000). A packaged intervention to reduce disruptive
behaviors in general education students. Psychology in
the Schools, 37, 149 –156. doi:10.1002/(SICI)1520-
6807(200003)37:2
149::AID-PITS6�3.0.CO;2-K

Dickerson, F. B., Tenhual, W. N., & Green-Paden, L. D.
(2005). The token economy for schizophrenia: Review
of the literature and recommendations for future re-
search. Schizophrenia Research, 75, 405– 416. doi:
10.1016/j.schres.2004.08.026

DuPaul, G. J., Eckert, T. L., & McGoey, K. E. (1997).
Interventions for students with attention-deficit/hyper-
activity disorder: One size does not fit all. School
Psychology Review, 26, 369 –381. Retrieved from
http://www.nasponline.org/publications/periodicals/
spr/volume-26/volume-26-issue-3/interventions-for-
students-with-attention-deficit/hyperactivity-disorder-
one-size-does-not-fit-all

DuPaul, G. J., & Weyandt, L. L. (2006). School-based
intervention for children with Attention Deficit Hyper-
activity Disorder: Effects on academic, social, and
behavioural functioning. International Journal of Dis-
ability, Development and Education, 53(2), 161–176.
doi:10.1080/10349120600716141

Egger, M., Smith, G. D., Schneider, M., & Minder, C.
(1997). Bias in meta-analysis detected by a simple,
graphical test. BMJ, 315(7109), 629 – 634. doi:
10.1136/bmj.315.7109.629

Fairchild, A. J., & MacKinnon, D. P. (2009). A general
model for testing mediation and moderation effects.
Prevention Science: The Official Journal of the Society

for Prevention Research, 10(2), 87–99. doi:10.1007/
s11121-008-0109-6

Feindler, E. L., Marriott, S. A., & Iwata, M. (1984). Group
anger control training for junior high school delin-
quents. Cognitive Therapy and Research, 8(3), 299 –
311. doi:10.1007/BF01173000

*Filcheck, H. A., McNeil, C. B., Greco, L. A., & Bernard,
R. S. (2004). Using a whole-class token economy and
coaching of teacher skills in a preschool classroom to
manage disruptive behavior. Psychology in the
Schools, 41, 351–361. doi:10.1002/pits.10168

GetData Graph Digitizer (Version 2.21) [Software]. Re-
trieved from http://www.getdata-graph-digitizer.com/
download.php

Glass, G. V. (1976). Primary, secondary, and meta-anal-
ysis of research. Educational Researcher, 5, 3– 8. doi:
10.3102/0013189×005010003

Gresham, F. M. (1979). Comparison of response cost
and timeout in a special education setting. Journal of
Special Education, 13, 199 –208. doi:10.1177/
002246697901300211

Hartmann, D. P., Barrios, B. A., & Wood, D. D. (2004).
Principles of behavioral observation. In S. N. Haynes
& E. M. Heiby (Eds.), Comprehensive handbook of
psychological assessment: Behavioral assessment
(Vol. 3, pp. 108 –127). Hoboken, NJ: John Wiley and
Sons.

Heaton, R. C., & Safer, D. J. (1982). Secondary school
outcome following a junior high school behavioral
program. Behavior Therapy, 13, 226 –231. doi:
10.1016/S0005-7894(82)80066-X

Higgins, J. P., & Thompson, S. G. (2002). Quantifying
heterogeneity in a meta-analysis. Statistics in Medi-
cine, 21(11), 1539 –1558. doi:10.1002/sim.1186

*Higgins, J. W., Williams, R. L., & McLaughlin, T. F.
(2001). The effects of a token economy employing
instructional consequences for a third-grade student
with learning disabilities: A data-based case study.
Education & Treatment of Children, 24, 99 –106. Re-
trieved from http://www.jstor.org/stable/42899646

*Himle, M. B., Woods, D. W., & Bunaciu, L. (2008).
Evaluating the role of contingency in differentially
reinforced tic suppression. Journal of Applied Behav-
ior Analysis, 41, 285–289. doi:10.1901/jaba.2008.41-
285

Hintze, J. (2004). NCSS 2004 [Computer software]. Re-
trieved from http://www.ncss.com

Hopko, D. R., Lejuez, C. W., Lepage, J. P., Hopko, S. D., &
McNeil, D. W. (2003). A brief behavioral activation
treatment for depression: A randomized pilot trial within
an inpatient psychiatric hospital. Behavior Modification,
27(4), 458 – 469. doi:10.1177/0145445503255489

Horner, R. H., Carr, E. G., Halle, J., McGee, J., Odom, S.,
& Wolery, M. (2005). The use of single-subject re-
search to identify evidence-based practice in special
education. Exceptional Children, 71, 165–179. doi:
10.1177/001440290507100203

House, A. E., House, B. J., & Campbell, M. B. (1981).
Measures of interobserver agreement: Calculation for-
mulas and distribution effects. Journal of Behavioral
Assessment, 3, 37–57. doi:10.1007/BF01321350

Individuals with Disabilities Education Improvement Act,
H.R. 1350, 108th Cong. (2004).

*Jones, M. N., Weber, K. P., & McLaughlin, T. F.
(2013). No teacher left behind: Educating students
with ASD and ADHD in the inclusion classroom.

School Psychology Review, 2016, Volume 45, No. 4

396

The Journal of Special Education Apprenticeship,
2(2), 1–22. Retrieved from http://josea.info/
archives/vol2no2/vol2no2-5-FT

Kazdin, A. E. (1971). The effect of response cost in sup-
pressing behavior in a pre-psychotic retardate. Journal of
Behavior Therapy and Experimental Psychiatry, 2, 137–
140. doi:10.1016/0005-7916(71)90029-2

Kazdin, A. E. (1972). Response cost: The removal of
conditioned reinforcers for therapeutic change. Be-
havior Therapy, 3, 533–546. doi:10.1016/S0005-
7894(72)80001-7

Kazdin, A. E. (1982). Single-case research designs: Meth-
ods for clinical and applied settings. New York, NY:
Oxford University Press.

Kazdin, A. E., & Bootzin, R. R. (1972). The token econ-
omy: An evaluative review. Journal of Applied Behav-
ior Analysis, 5, 343–372. doi:10.1901/jaba.1972.5-343

*Kazdin, A. E., & Geesey, S. (1980). Enhancing class-
room attentiveness by preselection of back rein forcers
in a token economy. Behavior Modification, 4, 98 –
114. doi:10.1177/014544558041006

*Kazdin, A. E., & Mascitelli, S. (1980). The opportunity
to earn oneself off a token system as a reinforcer for
attentive behavior. Behavior Therapy, 11, 68 –78. doi:
10.1016/s0005-7894(80)80037-2

*Klimas, A., & McLaughlin, T. F. (2007). The effects of
a token economy system to improve social and aca-
demic behavior with a rural primary aged child with
disabilities. International Journal of Special Educa-
tion, 22, 72–77. Retrieved from http://eric.ed.gov/
?id�EJ814513

Kluger, A. N., & DeNisi, A. (1996). The effects of feed-
back interventions on performance: A historical re-
view, a meta-analysis, and a preliminary feedback
intervention theory. Psychological Bulletin, 119, 254 –
284. doi:10.1037/0033-2909.119.2.254

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin,
J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R.
(2010). Single-case designs technical documentation.
Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_
scd

Lane, J. D., & Gast, D. L. (2014). Visual analysis in single-
case experimental design studies: Brief review and guide-
lines. Neuropsychological Rehabilitation, 24(3– 4), 445–
463. doi:10.1080/09602011.2013.815636

Latham, G. P., & Locke, E. A. (1991). Self-regulation
through goal setting. Organizational Behavior and Hu-
man Decision Processes, 50, 212–247. doi:10.1016/
0749-5978(91)90021-K

Maggin, D. M., Briesch, A. M., & Chafouleas, S. M. (2013).
An application of the What Works Clearinghouse stan-
dards for evaluating single-subject research: Self-man-
agement interventions. Remedial and Special Education,
34, 44 –58. doi:10.1177/0741932511435176

Maggin, D. M., & Chafouleas, S. M. (2010). PASS-RQ:
Protocol for assessing single-subject research quality.
Unpublished research instrument.

Maggin, D. M., Chafouleas, S. M., Goddard, K. M., &
Johnson, A. H. (2011). A systematic evaluation of token
economies as a classroom management tool for students
with challenging behavior. Journal of School Psychology,
49, 529 –554. doi:10.1016/j.jsp.2011.05.001

*Maglio, C., & McLaughlin, T. F. (1981). Effects of a
token reinforcement system and teacher attention in
reducing inappropriate verbalization with a junior high
school student. Corrective and Social Psychiatry, 27,

140 –145. Retrieved from http://psycnet.apa.org/
psycinfo/1982-24369-001

Manolov, R., Solanas, A., Sierra, V., & Evans, J. J.
(2011). Choosing among techniques for quantifying
single-case intervention effectiveness. Behavior Ther-
apy, 42(3), 533–545.

Martin, G., & Pear, J. (2003). Behavior modification:
What it is and how to do it? (7th ed.). Upper Saddle
River, NJ: Simon & Schuster.

Matson, J. L., & Boisjoli, J. A. (2009). The token econ-
omy for children with intellectual disability and/or
autism: A review. Research on Developmental Dis-
abilities, 30, 240 –248. doi:10.1016/j.ridd.2008.04.001

*McGoey, K. E., & DuPaul, G. J. (2000). Token rein-
forcement and response cost procedures: Reducing
the disruptive behavior of preschool children. School
Psychology Quarterly, 15, 330 –343. doi:10.1037/
h0088790

McKillup, S. (2011). Statistics explained: An introductory
guide for life scientists. Cambridge, UK: Cambridge
University Press.

*Millersmith, T., Weber, K. P., & McLaughlin, T. F.
(2013). The use of token economy and a math manip-
ulative for a child with moderate intellectual disabili-
ties. International Journal of Basics and Applied Sci-
ences, 1(3), 634 – 640. Retrieved from http://www.
insikapub.com/

Miltenberger, R. G. (2001). Behavior modification: Prin-
ciples and procedures (2nd ed.). Pacific Grove, CA:
Brooks/Cole.

*Mottram, L. M., Bray, M. A., Kehle, T. J., Broudy,
M., & Jenson, W. R. (2002). A classroom-based
intervention to reduce disruptive behaviors. Journal
of Applied School Psychology, 19, 65–74. doi:
10.1300/j370v19n01_05

Murray, L., & Sefchik, G. (1992). Regulating behavior
management practices in residential treatment facili-
ties. Children and Youth Services Review, 14(6), 519 –
539. doi:10.1016/0190-7409(92)90004-F

*Musser, E. H., Bray, M. A., Kehle, T. J., & Jenson, W. R.
(2001). Reducing disruptive behaviors in students with
serious emotional disturbance. School Psychology Re-
view, 30, 294 –304. Retrieved from http://www.
nasponline.org/publications/spr/abstract.aspx?ID�1590

Owens, J. S., Lyon, A. R., Brant, N. E., Masia-Warner, C.,
Nadeem, E., Spiel, C., & Wagner, M. (2014). Imple-
mentation science in school mental health: Key con-
structs in a developing research agenda. School Mental
Health, 6(2), 99 –111. doi:10.1007/s12310-013-9115-3

Ninci, J., Neely, L. C., Hong, E. R., Boles, M. B., Gilli-
land, W. D., Ganz, J. B., . . . Vannest, K. J. (2015).
Meta-analysis of interventions to improve functional
living skills for people with autism spectrum disorder.
Review of Journal of Autism and Developmental Dis-
orders. 2, 184 –198. doi:10.1007/s40489-014-0046-1

No Child Left Behind Act of 2001, 20 U.S.C. 70 § 6301
et seq (2001).

Parker, R., & Hagan-Burke, S. (2007). Useful effect size
interpretations for single-case research. Behavior Ther-
apy, 38, 95–105. doi:10.1016/j.beth.2006.05.002

Parker, R. I., & Vannest, K. J. (2012). Bottom up analysis
of single-case research designs. Journal of Behavioral
Education, 17(1), 254 –265. doi:10.1007/s10864-012-
9153-1

Parker, R. I., Vannest, K. J., & Brown, L. (2009). The
“improvement rate difference” for single-case re-

Main and Moderator Effects for Token Economies
397

search. Exceptional Children, 75, 135–150. Retrieved
from http://eric.ed.gov/?id�EJ842529

Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B.
(2011). Combining non-overlap and trend for single-
case research: Tau-U. Behavior Therapy, 42, 284 –299.
doi:10.1016/j.beth.2010.08.006

Phillips, E. L., Phillips, E. A., Fixsen, D. L., & Wolf,
M. M. (1971). Achievement place: Modification of the
behaviors of pre-delinquent boys within a token econ-
omy. Journal of Applied Behavior Analysis, 4, 45–50.
doi:10.1901/jaba.1971.4-45

Rapport, M. D., Murphy, A., & Bailey, J. S. (1980). The
effects of a response cost treatment tactic on hyperac-
tive children. Journal of School Psychology, 18, 98 –
111. doi:10.1016/0022-4405(80)90025-4

*Reitman, D., Murphy, M. A., Hupp, S. D. A., &
O’Callaghan, P. M. (2004). Behavior change and per-
ceptions of change: Evaluating the effectiveness of a
token economy. Child & Family Behavior Therapy,
26(2), 17–36. doi:10.1300/J019v26n02_02

Rhode, G., Jenson, W. R., & Reavis, H. K. (1993). The
tough kid book: Practical classroom management
strategies. Longmont, CO: Sopris West.

Riley-Tillman, T. C., & Eckert, T. L. (2001). Generaliza-
tion programming and school based consultation: An
examination of consultees’ generalization of consulta-
tion-related skills. Journal of Educational and Psycho-
logical Consultation, 12, 217–241. doi:10.1207/
s1532768xjepc1203_03

Rosen, L. A., Taylor, S. A., O’Leary, S. G., & Sanderson,
W. (1990). A survey of classroom management prac-
tices. Journal of School Psychology, 28(3), 257–269.
doi:10.1016/0022-4405(90)90016-Z

*Rosenberg, M. S. (1986). Maximizing the effectiveness
of structured classroom management programs: Imple-
menting rule-review procedures with disruptive and
distractible students. Behavioral Disorders, 11, 239 –
248. Retrieved from http://www.jstor.org/stable/
23882205

Rosenthal, R., & DiMatteo, M. R. (2001). Meta-analysis:
Recent developments in quantitative methods for liter-
ature reviews. Annual Review of Psychology, 52, 59 –
82. doi:10.1146/annurev.psych.52.1.59

*Salend, S. J., & Allen, E. M. (1985). Comparative effects of
externally-managed response cost systems on inappropri-
ate classroom behavior. Journal of School Psychology,
23, 59 – 67. doi:10.1016/0022-4405(85)90035-4

*Salend, S. J., & Lamb, E. A. (1986). Effectiveness of a
group-managed interdependent contingency system.
Learning Disability Quarterly, 9, 268 –273. doi:
10.2307/1510380

*Salend, S. J., Tintle, L., & Balber, H. (1988). Effects of
a student-managed response cost system on the behav-
ior of two mainstreamed students. The Elementary
School Journal, 89, 89 –97. doi:10.1086/461564

Schellenberg, T., Skok, R., & McLaughlin, T. F. (1991).
The effects of contingent free time on homework com-
pletion in English with high school English students.
Child & Family Behavior Therapy, 13(3), 1–12. doi:
10.1300/J019v13n03_01

Scruggs, T. E., & Mastropieri, M. A. (2001). How to
summarize single-participant research: Ideas and ap-
plication. Exceptionality, 9, 227–244. doi:10.1207/
S15327035EX0904_5

Shriver, M. D., & Allen, K. D. (1997). Defining child
noncompliance: An examination of temporal parame-

ters. Journal of Applied Behavior Analysis, 30(1), 173–
176. doi:10.1901/jaba.1997.30-173

*Simon, S. J., Ayllon, T., & Milan, M. A. (1982). Behav-
ioral compensation: Contrast like effects in the class-
room. Behavior Modification, 6, 407– 420. doi:
10.1177/014544558263006

Simonsen, B., Fairbanks, S., Briesch, A., Myers, D., &
Sugai, G. (2008). Evidence-based practices in class-
room management: Considerations for research to
practice. Education and Treatment of Children, 31,
351–380. doi:10.1353/etc.0.0007

Skinner, B. F. (1931). The concept of the reflex in the
description of behavior. Journal of General Psychology,
5, 427– 458. doi:10.1080/00221309.1931.9918416

Skinner, C. H., Cashwell, C. S., & Bunn, M. S. (1996).
Independent and interdependent group contingencies:
Smoothing the rough waters. Special Services in the
Schools, 12, 61–78. doi:10.1300/J008v12n01_04

*Smith, L. K., & Fowler, S. A. (1984). Positive peer
pressure: The effects of peer monitoring on children’s
disruptive behavior. Journal of Applied Behavior Anal-
ysis, 17, 213–227. doi:10.1901/jaba.1984.17-213

Soares, D. A., Cegelka, W. J., & Payne, J. S. (2016). The
token economy playbook: The ultimate guide to pro-
moting superior performance and personal growth.
San Diego, CA: University Readers.

*Sran, S. K., & Borrero, J. C. (2010). Assessing the
value of choice in a token system. Journal of Ap-
plied Behavior Analysis, 43, 553–557. doi:10.1901/
jaba.2010.43-553

*Stevens, C., Sidener, T. M., Reeve, S. A., & Sidener, D. W.
(2011). Effects of behavior-specific and general praise on
acquisition of tacts in children with pervasive develop-
mental disorders. Research in Autism Spectrum Disor-
ders, 5, 666 – 669. doi:10.1016/j.rasd.2010.08.003

Stilitz, I. (2009). A token economy of the early 19th
century. Journal of Applied Behavior Analysis, 42(4),
925–926. doi:10.1901/jaba.2009.42-925

Strijbos, J., Martens, R., Prins, F., & Jochems, W.
(2006). Content analysis: What are they talking
about? Computers & Education, 46, 29 – 48. doi:
10.1016/j.compedu.2005.04.002

*Sullivan, M. A., & O’Leary, S. G. (1990). Maintenance
following reward and cost token programs. Behavior
Therapy, 21, 139 –149. doi:10.1016/s0005-7894(05)
80195-9

*Thompson, M. J., McLaughlin, T. F., & Derby, K. M.
(2011). The use of differential reinforcement to de-
crease the inappropriate verbalizations of a nine-year
old girl with autism. Electronic Journal of Research in
Educational Psychology, 9(1), 183–196. Retrieved
from http://eric.ed.gov/?id�EJ926483

*Truchlicka, M., McLaughlin, T. F., & Swain, J. C.
(1998). Effects of token reinforcement and response
cost on the accuracy of spelling performance with
middle-school special education students with behav-
ior disorders. Behavioral Interventions, 13, 1–10.
doi:10.1002/(SICI)1099-078X(199802)13:1
1::AID-
BIN1�3.0.CO;2-Z

Ulmer, R. A. (1976). On the development of a token
economy mental hospital treatment program. Wash-
ington, DC: Hemisphere.

Witt, J. C., & Elliot, S. N. (1982). The response cost
lottery: A time efficient and effective classroom inter-
vention. Journal of School Psychology, 20(2), 155–
161. doi:10.1016/0022-4405(82)90009-7

School Psychology Review, 2016, Volume 45, No. 4

398

Wolery, M. (2013). A commentary single-case design
technical document of the What Works Clearinghouse.
Remedial and Special Education, 34(1), 39 – 43. doi:
10.1177/0741932512468038

Van den Noortgate, W., & Onghena, P. (2003). Hierar-
chical linear models for the quantitative integration of
effect sizes in single-case research. Behavior Research
Methods, Instruments, & Computers, 35, 1–10. doi:
10.3758/bf03195492

Van den Noortgate, W., & Onghena, P. (2008). A multi-
level meta-analysis of single-subject experimental
design studies. Evidence-Based Communication As-
sessment & Intervention, 2, 142–151. doi:10.1080/
17489530802505362

Vannest, K.J., Parker, R.I., & Gonen, O. (2011). Single
case research: Web based calculators for SCR analysis

(Version 1.0) [Web-based application]. College Sta-
tion, TX: Texas A&M University. Retrieved from
singlecaseresearch.org

Voight, M. L., & Hoogenboom, B. J. (2012). Publishing
your work in a journal: Understanding the peer review
process. International Journal of Sports Physical Ther-
apy, 7(5), 452– 460. Retrieved from http://www.ncbi.
nlm.nih.gov/pmc/articles/PMC3474310/

Yeaton, W. H., & Wortman, P. M. (1993). On the reli-
ability of meta-analytic reviews: The role of intercoder
agreement. Evaluation Review, 17(3), 292–309. doi:
10.1177/0193841X9301700303

Date Received: April 4, 2015
Date Accepted: August 26, 2015

Associate Editor: Lisa Bowman-Perrott

Denise A. Soares, PhD, is the Assistant Department Chair of Teacher Education, an
assistant professor of special education, and Special Education Program Coordinator at
the University of Mississippi. Her research interests include applied and practical expe-
riences in academic and behavior interventions for at-risk students, as well as examining
the efficacy of those interventions in classroom settings where teachers have competing
time demands.

Judith R. Harrison, PhD, is an assistant professor in the Department of Educational
Psychology–Special Education at Rutgers University in New Brunswick, New Jersey. Her
research interests include the effectiveness, acceptability, and feasibility of assessment,
interventions, and other services for youth with emotional and behavioral disorders and
attention deficit hyperactivity disorder.

Kimberly J. Vannest, PhD, is a professor in the Department of Educational Psychology–
Special Education at Texas A&M University. Her research interests are in determining
effective interventions for children and youth with or at risk for emotional and behavioral
disorders, including teacher behaviors and measurement.

Susan S. McClelland, PhD, is an associate professor of educational leadership and Chair
of the Department of Teacher Education at the University of Mississippi. Her research
interests include leadership for students with disabilities, literacy, school and organiza-
tional culture, and issues relating to rural education.

Main and Moderator Effects for Token Economies
399

Copyright of School Psychology Review is the property of National Association of School
Psychologists and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder’s express written permission. However, users may print,
download, or email articles for individual use.

Evaluating the Impact of Token Economy Methods on Student

On-task Behaviour within an Inclusive Canadian Classroom

Robert L. Williamson, Chelsea McFadzen

Simon Fraser University, Canada

Abstract

A token economy is a common classroom positive

behaviour support method whereby ‘tokens’ are

delivered to students contingent on exhibiting specific

behaviours. Students later exchange earned tokens for

items of interest. This project developed a prototype,

iPad-based tool that enabled teachers to deliver and

track tokens virtually. The virtual token economy

system was then compared to implementation using a

typical, physically delivered token economy method.

Both methods were evaluated concerning their impact

with regard to grade four-five student’s on-task

behavior within one inclusive Canadian classroom

using a multielement design. Individual impacts and

group effects were analyzed using an analysis of

variance with planned contrasts as well as visually

utilizing single case methods to assess efficacy

regarding each implementation method. Results

indicated that only one significant difference for one

individual subject was found between baseline (no

token economy) and both token economy systems. No

other significant differences were found between

individual or group on-task behaviours nor between

the baseline, physical and virtual methodologies

overall. Implications regarding evidence that TEs

represent evidence-based practice and suggestions for

future research are discussed.

1. Introduction

A token economy (TE) is a secondary

reinforcement system of positive behaviour support

whereby tokens (i.e., conditioned reinforcers) are

delivered to students for exhibiting specific

behaviours [1, 2, 3, 4, 5]. These tokens represent a

medium of exchange to be used by recipients to

purchase desired goods or privileges from a menu of

items [6, 2]. TE systems have been used in a variety of

settings and over many decades within an academic

environment [1, 2, 7]. Over a decade ago, TEs were

identified by Simonsen and colleagues (2008) as

meeting criteria for evidence-based practice and by the

American Psychological Association’s Task Force

on Promotion and Dissemination of Psychological

Procedures (1993) as a well-established psychological

procedure.

With a long history of use within academic and

other settings, the TE has enjoyed a reputation as an

evidence-based classroom behaviour management

tool and has widely been considered effective in

decreasing non-desired behaviours and increasing pro-

academic behaviours in students [8, 9, 4, 7]. Studies

have also shown that token economy systems have

been used to increase on task behaviours and decrease

non-desired behaviours [10, 11].

Some however, have questioned the assertion that

the TE should be considered an evidence-based

practice. Maggin, Chafouleas, Goddard and Johnson

[12] conducted a systematic evaluation of research

involving TEs as classroom management tools for

students with challenging behaviours and found that

the “…extant research on token economies (did) not

provide sufficient evidence to be deemed best practice

based on the WWC (What Works Clearinghouse)

criteria” [12]. Authors suggested that this finding was

largely due to inadequate research designs in their

uncovered literature. Specifically, the authors cited a

lack of information within studies regarding treatment

fidelity and social validity as among the

methodological problems found in the literature

available at the time of their systematic evaluation

[12]. This finding presents a profound concern, as

without sufficient description of the exact procedures

used within literature that finds positive or negative

results, the aggregate effectiveness of any given

implementation method of a TE cannot be

appropriately assessed.

In a more recent meta-analysis by Soares, Harison,

Vannest and McClelland [7] conducted five years after

Maggin et al., the use of a token economy in a

classroom setting was found to “…suggest that a TE is

an effective intervention, specifically for use in the

classroom setting” [7]. Unlike Maggin et al.’s finding

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 202

0

Copyright © 2020, Infonomics Society 1531

that only 30% of studies were rated as achieving a

medium to strong quality design based on WWC

standards, Soares et al. found that 64% met that same

WWC criteria as either medium or strong and

suggested that the quality of research regarding the

effectiveness of TEs is improving. Still, Soares et al.

commented that “…only a third of the studies reported

treatment fidelity.” [7]. Clearly, differences remain in

evaluating this intervention.

Ivy et al. [2] conducted a systematic literature

review regarding the quality of procedural

descriptions within TE research and found that “…of

the 96 articles reviewed, only 18 (19%) included

procedural descriptions of each component to a degree

sufficient to guide replication” [2]. This finding would

seem to support the previous assertions of Maggin et

al. [12] in noting a lack of adequate implementation

information regarding the specific TE methodologies

applied within published studies. Differing findings

leads to questions as to exactly how teachers are

implementing TE’s evaluated in previous studies and

if any specific attributes of implementation are more

or less impactful upon specific student behaviours.

There are six essential components of a TE system

[13, 2, 14]. These six components include (1) the

target behavior that is the focus of the intervention, (2)

the tokens themselves, which must have been

conditioned to function as reinforcers, (3) the backup

reinforcers that may be purchased with a token, (4) the

method by which tokens are earned, (5) the method by

which tokens may be exchanged for backup

reinforcers, and (6) the cost of the backup reinforcers

[2].

Traditionally, TE’s have been implemented within

classrooms using physical tokens that are delivered to

students. Teachers carry tokens (often fake money,

poker chips or similar type of token) on their person as

they teach. When a student displays the desired

behaviour, the teacher delivers a token to the student.

This physical method requires physical proximity to

the student receiving the token and delivering the

reward in person from teacher to student. This

requirement may interrupt teacher instructional

leadership.

The goal of this research was to examine any

relative impacts concerning student on task behaviour

between three separate conditions. Condition one

consisted of baseline (no TE implementation) while

the remaining two consisted of 1) a prototype virtual

iPad-based TE methodology known as ‘CARS’

(Class-wide Augmented Reward System) and 2) a

traditional, physically implemented TE system.

Within both method (iPad and physical), specific

implementation methods were defined and followed.

Both utilized a variable ratio reward system of token

delivery and the observation of interest was student

on-task behaviour.

2. Method

2.1. Setting

Participants were recruited from one typical

inclusive elementary school classroom located in a

non-urban area within the lower mainland of British

Columbia (BC), Canada. A grade 4-5 combined

inclusive classroom was chosen by the district based

on teacher and school interest in the study. The

classroom was located on the second floor of a single

school building, at the end of a long hallway. Inside

the classroom, tables were generally oriented in the

first two-thirds of room space while the front third

contained a crescent table (for group work) on the left,

a carpet just under a smart board in the center, and a

teacher desk in the front right of the room. The back

of the room contained cabinets over the length of that

wall. The wall adjacent to the back wall and opposite

the door was fully windowed. Supplies were placed on

lower cabinets along the windowed wall and iPads

were stored and charged in the corner on the lower

cabinets between the back and windowed walls. No

TE system had been in use within the selected

classroom prior to initiation of this

study.

2.2. Participants

This mixed grade 4-5 class consisted of 23 total

students within grades four (n=13) and five (n=10).

The class was taught by one Caucasian female, BC

licensed professional teacher with four years total

teaching experience. The classroom employed one full

time, one-on-one education assistant. Two students

were designated by the district as having chronic

health conditions and seven students were designated

as having behavioural difficulties using BC Ministry

of Education disability determination guidelines. All

students gave assent to participate in the study and

parent/guardian permissions per ethics review board

protocols were obtained. The teacher likewise gave

consent to participate in the research.

Three students were selected by the teacher for

individual observation. Bob (pseudonym) was in grade

four. Bob was a Caucasian male and was designated

by school personnel as having a behaviour disability

and a chronic health condition as defined by the BC

Ministry of Education. Hellen (pseudonym) was a

grade five Caucasian female and was designated by

school personnel as having a behaviour disability as

defined by the BC Ministry of Education. Mark

(pseudonym) was a grade four Caucasian male and

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 20

20

Copyright © 2020, Infonomics Society 1532

was also designated by school personnel as having a

behaviour disability as defined by the BC Ministry of

Education. All students were native English speakers

and participated in the typical BC standard curriculum

for 100% of the school day. For each of the three

students of specific interest, the behaviour of concern

was identified by the teacher as time on task and thus

this was the behaviour of observation in the present

study.

2.3. Intervention Agent and Training

The teacher represented the intervention agent for

this work. The role of the teacher as intervention agent

consisted of 1) learning how to implement a TE, 2)

teaching students how to engage in a TE, 3) initiating

both the physical and virtual TE methods on preset

days and phases of data collection and 4) adhering to

the academic schedule during implementation and

providing for reward redemption on days of

implementation. The teacher was trained by the first

and second authors regarding the six principles of a TE

via individual one-on-one training regarding specific

implementation factors in the classroom setting.

Efficacy of the teacher training was assessed by

observing the teacher’s instruction regarding the TE to

her students (as implemented in the classroom) by the

first and second authors. The teacher covered all six

aspects in a functional way with the students during

the introduction of the TE with the students. The

teacher was then observed during trial runs of both the

physical and virtual implementation methods in her

classroom. Implementation fidelity was observed via

an implementation fidelity requirements list (see

appendix A). Trial runs showed that the teacher

understood and implemented both TE methods as

required by the six components of a TE noted and

adhering 100% to the implementation fidelity

checklist as noted independently by the authors. No

additional qualification nor training was deemed

necessary nor provided prior to research data

collection implementation. Follow up training after

initiation of the TE research protocols was likewise

not required as implementation protocols during

implementation phases did not deviate from the

implementation requirements.

2.4. Materials

All students were provided with one 9.5-inch iPad

containing the student version of the CARS app each.

The teacher was provided with a 12.9-inch iPad pro

containing the teacher version of the CARS app.

The CARS system consisted of two interconnected

iPad apps. The teacher ‘signed up’ each student in an

online class portal. Students were then able to securely

log in to their individual student app on their

individual student iPad. The teacher likewise securely

signed in on the teacher app from the teacher iPad. The

student app allowed students to view tokens already

obtained (a bank), prizes available and token price of

each, as well as showed when a token was awarded via

a ‘pop up’, push-type individual text notification

message. The pop-up notification worked similar to all

text messages on the iPad and thus the student app did

not need to be activated in order for the pop-up

notification to appear. The student could also be

working on a different app on their iPad and the pop-

up notification of an awarded token would still appear.

The CARS prototype virtual application was

designed to mitigate possible struggles related to

physically delivering tokens and gathering data by

utilizing a specially designed, prototype iPad software

tool. Specifically, the prototype software tool was

designed to mitigate two difficulties teachers may face

when implementing a token economy: 1) The iPad-

based tool eliminated the need to physically deliver a

token to a student. Instead, the teacher delivered

tokens virtually by tapping on the picture of a student

on the teacher iPad. Alternately, the teacher could

award the whole class tokens via a ‘whole class’

button on the teacher app. It was hypothesized that this

would improve temporal contingency relating

behaviour to receipt of a token, save instructional time

and minimize modest disruption when a teacher using

a traditional physical TE might have been required to

disengage from an instructional activity to deliver a

token physically. Virtual token delivery also became a

private rather than a public event when the student’s

iPad software recorded delivery and delivered the

individual pop-up text message to the student’s iPad

confirming token receipt. 2) The iPad-based tool

automatically recorded token delivery time and

amount as data that was then available to the teacher

on the system’s web-based portal. The iPad also

recorded tokens exchanged by the student, what they

were exchanged for, and when the exchanges occurred

within the same portal. Although outside the focus of

this present study, such data could then be analyzed at

a later time by the teacher to validly adjust token

exchange intervals or reward choices for individual

students at the time and discretion of the teacher.

3. Methodology

This research utilized a multi-element design. A

single case alternating treatment (ABCBC) design was

used to visually investigate the efficacy of two

different versions of the token economy classroom

management strategy upon baseline student on task

behaviours. Baseline data (A) was taken in absence of

any TE system of behaviour support in place. Then the

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1533

token economy method was implemented under two

conditions: B) traditional (physical) token delivery

and C) the prototype iPad-based virtual token delivery.

On task behaviour was defined based on a related

definition from Lee, Sugai and Horner [15] as a

student that exhibits engagement of his/her senses and

focus on the activity of instruction indicated by the

teacher at the momentary time sampled. Student

actions such as pausing, sleeping, prolonged gaze in a

non-relevant direction, engaging or remaining

disengaged from communication depending on the

instructional activity and/or engaging in any non-

relevant activity was an indication that the student was

not reasonably attending to the instructional task.

3.1. Procedures

Data collection and research protocol

implementation was scheduled and took place during

morning academic activities. Each morning, students

first engaged in whole group (class-wide) instruction

led by the teacher. During whole group instruction, the

teacher frequently sat on a stool located on a small

carpet and within easy access to a classroom smart

board. During this time, students were able to choose

to sit in chairs or on a carpet during the whole group

instructional activities. Whole group attention to task

data collection was conducted during whole group

instruction activities and took place at this same pre-

determined and routine timeframe of the classroom

schedule. No individual student data was collected

during the whole group activities.

Following whole group instruction, students

attended recess for approximately 15 minutes. Upon

returning to the classroom, students engaged in

stations-based instruction. The stations were located

within the classroom (and one station sometimes

located just outside the open classroom door at a

hallway table). During stations work, the teacher led

instruction in reading development activities from one

of the stations (typically 3 or 4 total stations in

operation during the stations activities) by sitting

behind the crescent shaped table with her orientation

out toward the class and students from the forward

left-hand corner of the room. It was possible for the

teacher to see all students during stations instruction

(except any student engaged in activities using the

hallway table just outside the classroom door). Other

stations not led by the teacher were structured as

independent learning activities for students at those

stations. The classroom educational assistant generally

monitored student activities at the stations as well as

individual students during station activities in the

classroom. Students rotated as cued by the teacher

from station to station throughout the hour. During the

stations-based instruction, data was collected

regarding the three individual students of interest and

not regarding the whole group. Both hours of

instruction focused on language arts and reading

related activities. This identical schedule of activities

was followed each day that data was collected.

3.2. Data Collection

A momentary time sampling methodology [16]

was implemented by designating multiple 15-minute

periods over the course of each two hours +/- of data

collection per day. During the 15-minute intervals

within the first hour of whole class instruction,

observations concerning the on or off task behaviour

of the entire group of students was obtained using a

timed camera snapshot of the students at the end of

each minute of the 15-minute interval. Two cameras

(for accuracy of angle and inter-rater review purposes)

were placed high up in the front right and left corner

of the rectangular room in a way as to capture the

activities of all students within the room when the

picture was snapped. Pictures were snapped

automatically and without human interaction with the

cameras via a commercially purchased app designed

for that purpose. The snapping of pictures did not

make a sound, nor did it make any visually observable

action so as not to divert any student attention from the

lesson/task being taught. Additionally, an independent

data recorder (one or both authors) observed the group

directly and noted the activities to which the students

were to be engaged during that time. Following the 15-

minute period, pictures were analyzed to count how

many students were focused on the instruction or

engaged in a directed activity for that sample captured

in the photo based on the previously described

definition of on-task behaviour. Data points for group

on-task behaviour were then calculated by dividing the

number of students on task for each sample picture (15

pictures in 15 minutes) by the total number of students

within each picture frame. One data point was then

calculated as percent on task over the entire 15-minute

period by averaging the individual picture data points

taken in the 15-minute period for the group of

students. The unit of analysis was the average on-task

percent over a one 15-minute period.

During physical TE implementation and during

whole group instruction, the teacher utilized variable

ratio (slot machine) reward schedules to deliver tokens

by physically handing ‘toy dollar bills’ as tokens to

students paying attention. A variable ratio schedule of

token delivery is generally accepted as effective

regarding the reinforcement of on task behaviours [17,

18]. At times, the teacher would hand a bill to all

students by walking around the room as students

engaged in a directed activity related to the whole

group instruction. The use of variable ratio reward

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1534

(token) distribution was requested by the teacher in

order that any interruptions to the flow and pace of the

intended instruction would be minimized. Students

would be asked to keep the bills in an envelope until

access to their personal items (such as backpacks or

notebooks) were accessible.

During virtual implementation, the teacher would

award tokens through tapping a picture of an

individual student or tapping the group button on the

teacher iPad during whole group instruction while

maintaining a variable ratio reward schedule. All

students were located within visual proximity to their

individually assigned iPads during virtual

implementation, as the teacher announced the start of

the virtual TE implementation prior to whole group

teaching by asking all students to retrieve their

individually assigned iPads and sign in prior to lesson

initiation.

During the second hour of instruction

(stations/small group and independent activities), each

of three pre-selected students were observed during a

15-minute period using a momentary time sampling

methodology using the same definition of on-task

behaviour. Data was collected by the first and/or

second authors by observing each of the three students

at the end of each minute of each 15-minute period and

noting if the student was on or off task relative to the

educational activity assigned. This was then converted

to a percentage on task by dividing the number of

points of on task observations by the total number of

observations in the period (15) and multiplying by

100. A single percent on task data point was recorded

that represented one student over the entire 15-minute

period of observation per student. The unit of measure

for individual student on-task behaviour was one data

point representing the average on-task percentage of

the student over one 15-minute period.

During stations work, physical implementation of

the TE method was conducted by the teacher through

assigning individualized tasks to students at the station

in which the teacher was leading instruction and then

physically ‘roaming’ the room handing out bills to

those students on task. The teacher also awarded bills

occasionally to the students assigned to her own

station. During virtual implementation, the teacher

remained at the station in which she was directing

instruction and gave tokens electronically to students

on task by visually (and not physically) observing

students in the room. A variable ratio token delivery

schedule was used for both the physical and the iPad-

based methodologies. Additionally, at least one token

was delivered to one student (as a minimum

requirement) over each group and individual 15-

minute observation time period.

3.3. Inter-rater reliability

Inter-rater reliability (IRR) was conducted on 13 of

31 (32.2%) individual data collection sessions (each

containing 15 separate data points to compare) by

collecting data on the three individual students of

interest by both the first and second authors

simultaneously. After independent collection, data

points were compared and percent agreement over

each data point within each 15-minute period for each

individual student (of the three targeted students) and

a percent agreement was calculated. IRR achieved an

average of 89.96% agreement (range: 82.2%-97.7%)

for observations of the three individual students in

total. To conduct IRR on the whole group attention to

task data, the pictures were analyzed independently by

the first and second authors. Group IRR was

conducted on 5 of 14 sets of 15 pictures each or 35.7%

of total observed data points and achieved 93.72%

agreement.

3.4. Implementation Fidelity

Prior to implementation of the physical and virtual

token economy systems, the participating teacher was

trained in how to implement both forms of the TE

systems. Practice with each form (physical and virtual)

was conducted with feedback given to the teacher by

the authors. Following teacher training and practice,

the teacher relayed the method to the students in the

class and was observed as accurate in describing the

process to the students by study authors. The teacher

explained that 1) ‘paying attention’ to lessons and

activities was the desired behaviour. Teacher role

played attention vs. inattention with specific reference

to where one’s eyes were looking vis-à-vis lesson

involvement. Regarding tokens, the teacher explained

that when she noticed students paying attention, she

would award a token (individually) and if she noticed

the group paying attention, she would award all of

them a token. This was exemplified by asking students

to perform an activity as the teacher went to each

student noting if and how the student was paying

attention providing specific feedback to each as she

handed the student a token. The students were

surveyed by the teacher to obtain reasonable ‘prizes’

that students could redeem tokens to obtain. The

students brainstormed prizes and together with the

teacher, listed those that would be available and at

what price. This menu was posted in back of the room

on a corner cabinet door that was used for prize

redemption and within individual CARS student iPad

apps. Tokens were to be redeemed at recess periods,

lunch or after school. The recess period occurred

directly in between the two hours of data

collection/method implementation. Students

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1535

understood that these were the only times in which

prizes could be redeemed.

These initial preparations adhered to the six vital

components of TE methodologies noted by Ivy et al.

[2] in the following ways. 1) Students were trained by

the teacher and under the observation of study authors

as to exactly what behaviour constituted reward of a

token. 2) Students knew the value of each token by

participating in the development of the different

rewards and the costs related to each. According to the

teacher, all students cognitively understood relative

value and participated in choosing items/activities

they valued to be placed on the menu of rewards. 3)

Students understood that both ‘real’ and ‘virtual’

tokens could be combined to purchase items from the

menu. All students were capable of independent

mathematics required to add the physical and virtual

tokens together. 4) Students understood that tokens

were being given only during the two hours of

observation in the mornings in which the data

recorders (first and/or second author) were in the room

observing and taking data. The tokens were not given

on a specific schedule but instead were given

according to a variable ratio method by the teacher as

time and teaching methodology permitted her to note

the attentive behaviour of individual or groups of

students. Tokens were given no less than once during

each 15-minute period to at least one student. 5)

Students understood that recess, lunch and after school

were designated as times that tokens could be

redeemed based on teacher availability. Rewards were

available during at least one of the times each day of

study observation/implementation. 6) The menu of

rewards contained the costs for each. Last, the

implementation fidelity checklist was used to ensure

the teacher adhered to these mandates of

implementation each day of data collection achieving

100% adherence.

3.5. Data analysis

Regarding whole group attention to task, all data

for all students’ percent on task was calculated

between the three conditions. A one-way, independent

samples analysis of variance (ANOVA) was used to

analyze any difference between or within percent time

on-task among phases. Similarly, a one-way,

independent samples ANOVA was conducted for each

of the three students that were the focus of individual

behaviour support to examine any difference between

each student’s on-task performance among the three

conditions. Last, each condition and set of student data

was examined using single case visual analysis

techniques.

4. Results

Data was collected across eight total days between

April 30, 2018 and June 4, 2018. Some differences

exist concerning total number of 15-minute sessions

(data points) between whole group data and each of

the three individual student observations. This is due

to absences from the class for any given student thus

impacting total available time to observe and collect

data for that student. The following results step

through each planned comparison of means and visual

inspection process.

Regarding any differences in whole group student

on-task behaviours between the three conditions (no

intervention, physical method, virtual method), data

included all picture-based analysis of whole group

activities. No significant difference was detected

[F(2,15)=2.211, p< .05] between any of the phases.

Planned contrasts showed that the implementation of

either of the two methods (physical and virtual) did not

significantly differ from baseline (no TE) [t(15)= –

1.413, p<.05 (one tailed)] and that virtual

implementation did not significantly differ from the

physical implementation [t(15)=-1.557, p<.05 (two

tailed)] regarding whole group on-task behaviour.

Visual single case analysis of whole group data

similarly did not reveal notable trends between phases

(see Figure 1).

Figure 1. All Students

Regarding Bob (pseudonym), a one-way analysis

of variance was calculated to test if the mean instances

of on-task observations differed significantly between

any of the three phases at the p<.05 level. Results

indicated that there was a significant and large effect

of the TE (not any specific version) on the target on-

task behaviour of Bob [F(2,20)=4.375, p<.05, ω=.48].

Using a Tukey HSD post hoc analysis, the differences

in means between baseline and the virtual

implementation was significant (p<.05). Further

analysis using planned contrasts revealed that the

significant effect was shown between baseline and the

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

P
e

rc
e

n
t

O
n

-t
a

sk

Trial #

Phase A Phase B Phase C Phase B Phase C

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1536

TE implementation (both physical and virtual

combined) [t(20)=2.954, p<.01 (two tailed) but did not

indicate a significant difference between virtual and

physical implementation methods [t(20)=.029, p<.05].

Visual single case plot analysis confirmed a positive

difference between baseline and TE implementation

phases but did not exhibit trends between the two

implementation phases themselves (see Figure 2).

Helen (pseudonym), using a one-way analysis of

variance to test if the mean instances of on-task

observations differed significantly at the a<.05 level

between phases, resulted in a finding that no

significant differences existed in the means of on-task

data between any phase condition [F(2,21)=1.81,

p=.188]. Within the planned contrast examinations, no

significant affect was shown between baseline and the

TE implementation (both physical and virtual

combined) [t(19)=1.718, p<.05] (one tailed) nor was

any difference between physical and virtual

implementation methods found [t(19)=.102, p<.05]

(two tailed). Note that the examination of contrast

between the means of virtual and physical

implementation phases combined with regard to

baseline indicated a one-tailed significance of p=.051.

While this was not strictly significant statistically, it is

worth noting that this test barely missed the levels

required. Overall single case plot visual analysis did

not indicate any discernable patterns across

implementation phases (see Figure 3). The single case

visual analysis provided further cause to support a

finding of non-significant in regard to the virtual vs

physical implementation planned contrast

examination that was so close to a rounded p=.05

cutoff point.

Mark (see Figure 4) showed an overall decrease in

time on task over baseline achievements. A one-way

analysis of variance was conducted to test if the mean

instances of on-task observations differed

significantly at the p<.05 level. Results indicated that

no significant differences existed in the on-task

instances data between any condition [F(2,19)=1.122,

p<.05]. Further analysis using planned contrasts

indicated that no significant affect was shown between

baseline and the TE implementation (both physical

and virtual combined) [t(19)= -1.415, p<.05] (one

tailed) nor was any difference between physical and

virtual implementation methods found [t(19)=-.545,

p<.05] (two tailed). Note that because Mark’s visual

mean plot data indicated a negative slope, the planned

contrast concerning differences in the combined TE

methods and baseline at the two tailed level were also

not significant at p<.05. Overall visual analysis of

single case data did not indicate any observable trends

in data between phases of TE implementation (see

Figure 4).

4.1. Social Validity

Social validity was obtained via a student and

teacher questionnaire given to the students following

the data collection periods. Student questionnaires

contained three questions. First, students were asked if

they preferred the paper or iPad token delivery system.

Twenty students responded to this question and 16

indicated a preference for the iPad delivery. One

student stated that “…it was cool to see the points pop

up”, while another noted that the iPad was preferred

because “…you don’t have to count the points”. Two

0
10
20
30
40
50
60
70
80
90
100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

P
e
rc
e
n
t
O
n
-t
a
sk

Trial #
Phase A Phase B Phase C Phase B Phase C
0
10
20
30
40
50
60
70
80
90
100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

P
e
rc
e
n
t
O
n
-t
a
sk

Trial #

Phase C Phase CPhase BPhase A Phase B

Figure 4. Mark

Figure 3. Helen

0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
P
e
rc
e
n
t
O
n
-t
a
sk

Trial #
Phase A Phase B Phase C Phase B Phase C

Figure 2. Bob

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1537

students preferred the paper token delivery stating that

such system allowed them to “…share it (paper

tokens) with Friends [SIC] if you want to save up for

something like Raptor room.” Two students stated that

they had equal preference for paper or iPad-based

tokens.

Question two asked students to rate, on a scale of

one to ten, how much they focused on obtaining tokens

through being ‘on task’ during periods of using either

of the two systems. Results indicated that 7 of 16

(43.8%) rated their attention to obtaining tokens as a 1

(did not focus on obtaining tokens at all). One student

stated, “After a while, the thought of getting a reward

wore down”. Another three students (18.7%) gave the

score of 3 and three more gave a score of 5. One of the

students that indicated a 3 stated that they only focused

on being on task to obtain tokens about a quarter of the

time “…because your [SIC] so busy working.” Two

students indicated a 10 in response to question 2 and

stated that they focused on receiving tokens “…all the

time.”

Asked in question three, which method (iPad,

Physical, Both, None) they would recommend

teachers use to help students focus on their work, one

indicated paper, five indicated both and ten indicated

iPad. One student that had indicated that they would

recommend both systems to teachers stated that they

did so “…cuz [SIC] then there would be two ways of

getting rich!” One student that indicated they would

recommend the iPad method stated they did so “…

because it (tokens) can’t be stolen.” and “Because its

[SIC] fun”. It should be noted that early in the

implementation, one instance of theft of physical

tokens (bills) occurred (and was rectified by the

teacher). This likely directly related to this student’s

reference to such possible issues on the anonymous

survey.

The teacher participant also provided social

validity feedback data through a separate

questionnaire. Overall, the teacher participant

indicated that the paper methodology was more

effective in helping keep students on task. The teacher

indicated that the paper method provided “…instant

gratification… students knew why they earned the

token… It caused a ripple effect around the student

who earned the token, that others (would) see what

happened and learn that if they did the same thing, they

too could earn a token.” As a corollary, the teacher

stated that “…(using) the iPad system, students did not

see when someone (else) earned a token because it

only showed up on the individual who earned the

(token on their) iPad.” Further, the teacher noted the

iPad app was difficult and time consuming to use.

5. Discussion

It is interesting to note that prior to the results of

the present research being presented to the subject

participants, the teacher indicated an overall

satisfaction with the TE as a classroom management

method. The teacher indicated that she felt the overall

attention to task for students increased during times in

which she implemented the TE methods. The results

seemed to be surprising to the teacher when reveled at

a classroom pizza party following the study.

5.1. Token delivery

A variable ratio schedule of reinforcement (token

delivery) is generally accepted as effective regarding

the tracking and reinforcement of on task behaviours

[17, 18]. The teacher in the present study also

requested this reinforcement schedule so that time to

deliver tokens, both virtual and physical, could occur

when breaks in her teaching flow allowed and so that

instruction would not be interrupted based on a fixed

interval reinforcement methodology. One possible

explanation for the overall ineffectiveness of both

token economy systems in the present study may be

related to the variable ratio schedule of reinforcement.

The reinforcement schedule that resulted from relying

on breaks in lesson flow may have been sub-optimal

for some students.

It is therefore possible that prior to the

implementation of a variable ratio reinforcement

schedule, students may require a more defined

schedule of interval reinforcement prior to the

application of a variable ratio methodology. Future

researchers should consider this possibility as well as

the equally possible reality that such alterations in

delivery schedule may be impractical for a teacher to

administer alone. Further study is required to address

such hypothesis.

Another area of interest was the non-public nature

of token delivery during the iPad based TE phases. It

may be that when students noticed delivery of tokens,

they made an effort to display the desired on-task

behaviour but the behaviour might have dissipated

when students noticed the teacher otherwise engaged.

If this had been the case, we likely would have

expected to see a difference in impact between the

private iPad deliver and the public physical deliver of

tokens. This was not the case in the present study.

Despite the teacher’s best intentions, within the

current study framework, she was unable to attend to

the on-task behaviour of the group 100% of the time

while teaching either group or station-based lessons.

This would seem to indicate that the need to physically

deliver tokens versus being able to do so from a

distance did not impact the teacher’s ability to attend

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1538

to the behaviours for which tokens were to be

delivered. The teacher seemed to confirm this

suspicion by stating in the follow-up questionnaire that

“…allow(ing) the EA (educational assistant) to hand

out the tokens instead of the teacher” for paper

delivery would be helpful and simplifying the finding

of specific students within the app’s interphase would

reduce the difficulty in delivering tokens to individuals

and/or small groups of individuals. These assertions

by the teacher seem to indicate difficulty with being

able to teach while simultaneously attending to the

observation of student on-task behaviours.

As with the previous assertion concerning public

vs private token delivery, physically walking over to

students (physical) vs taping an iPad (iPad) to deliver

tokens did not seem to impact the effectiveness of

either methodology as to impact on student on task

behaviour. We would have expected to see a

difference between the impact between physical and

iPad-based methods if delivery method had been an

important aspect of the method however this was not

observed.

It is likely that the need to simultaneously focus on

the fluid needs of instruction while teaching allows for

limited attention to matters of observation regarding

individual or group behaviours. Indeed, the teacher’s

token delivery occurred during times within lessons

that did not require her direct involvement with a

student. This hypothesis would seem to support recent

research regarding a teacher’s ability to mulit-task. As

cognitive tasks are divided between two or more

pressing needs, the quality and efficiency of results is

generally reduced [19, 20, 21]. Such a finding

regarding teacher abilities to multi-task would seem to

point to one possible reason for the overall failure of

the TE system in the present study.

5.3. Token redemption

Token redemption took place at least one time per

day at one or more pre-determined redemption

periods, however the students were required to ask the

teacher for redemption during the noted times.

Sometimes the teacher was otherwise engaged during

these times, speaking with other faculty members

while children were at play or preparing stations for

when children would return. Occasionally, the teacher

was required to serve as a recess monitor and was

unavailable to deliver tokens during recess. Overall,

this resulted in a less predictable token redemption

time period during both phases of TE implementation.

Students may have been discouraged if they had

intended on receiving a prize at a specific time period

in which the teacher was unable to comply with a

purchase request. While students had been told that not

all the redemption periods would be available due to

the teacher’s multiple commitments, and that one

would be available at minimum per day, the lack of a

solid, repetitive daily redemption schedule may have

negatively impacted the students’ motivation to

remain on task.

Again, future researchers should address this

redemption hypothesis in more detail to examine any

impact a more predictable redemption schedule may

have upon the overall time on task behaviours of

students. Like the delivery hypothesis, researchers

must also seek to understand if a predictable

redemption schedule is reasonable to maintain when

the teacher alone, implements the TE system. It may

be the case that additional help may be required if

predictable delivery of tokens and predictable

redemption periods other than the one time per day in

the present study are to be achieved.

5.4. Analysis of efficacy

Results indicated that the virtual delivery TE

system and the combined data from virtual and

physical methods were significantly effective over

baseline (no TE) for Bob only. No other individual or

whole group analysis showed a significant difference

between base line and the two TE approaches nor

between the two TE approaches themselves. This may

indicate that in spite of statistical indications, the

delivery of tokens to Bob was optimal or effective by

sheer chance alone (within the 5% error range). Also,

Bob’s data included an outlier in data point five (score

of 0). No obvious reason for Bob’s inattention during

that data observation period was noted and thus for

official analysis, the point remained within the data

set. It is important to note, however, that this possible

outlier influenced the magnitude of significant results.

Adding visual assessment of raw data, it seems that at

best, we can describe the results for Bob as

inconclusive.

While the current findings indicated support for the

findings of Maggan et al., and Ivy et al., [12, 2], the

current work would seem to contradict some other

available research regarding the effectiveness of TE

systems within an inclusive classroom setting. Given

the negative results of the present work in relation to

previous studies suggest that clarity in the

implementation of studied TEs is critical to

understanding conclusions drawn from any findings.

In the present work, implementation fidelity was

strictly noted and adhered to a pre-defined set of

standards. Given those standards, results showed the

method as implemented not to be an effective support

regarding on task behaviours within the student

population studied. When one considers the incredible

differences with which the idea of a TE can be

implemented (ie: multiple human intervention agents,

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1539

the behaviour/s of focus, the diversity of students

individual characteristics, the token delivery and

redemption schedules), it is likely not possible to

assert that any ‘generic’ TE method should be the

focus for analysis leading to the categorization of

evidence based practice. Instead, specific versions of

the TE, strictly defined, may be a more proper unit of

analysis.

6. Limitations

This work is limited to that observed within the

contexts of the participants within the location chosen

for the study. Results should not be used to justify

broader meaning outside of this context, as individual

circumstances exist in any defined population and

context of study.

Additionally, time on task represents a difficult

variable of measure. Specifically, data collectors were

required to identify the direction of each subject’s

attention or activity toward a direction, activity or

object that was relevant to the instruction being

provided at that time while simultaneously excluding

indicators of non-attention to task as defined by Lee,

Sugai and Horner [15]. The relevance of the direction

of attention or activity based on instruction can be

somewhat subjective to the person judging the data

point. For example, if a student is looking at his/her

shoes while the teacher is working mathematics on a

white board, the data recorder would likely mark the

data point as ‘not on task’ however if the teacher were

using eyelets of shoes as an example to count pairs of

objects, the same gaze would be recorded as ‘on task’.

IRR was used to indicate the breadth of subjectivity

with reasonable findings however it is important to

acknowledge such as a limitation to the results of the

present work.

7. Acknowledgements

This research was supported by the Social Sciences

and Humanities Research Council of Canada.

8. References

[1] Hackenberg, T.D. (2018). Token reinforcement:

Translational Research and Application. Journal of Applied

Behavior Analysis, 51, 393-435. doi: 10.1002/jaba.439

[2] Ivy, J.W., Meindl, J.N., Overley, E. & Robson, K.M.

(2017). Token Economy: A Systematic Review of

Procedural Descriptions, Behavior Modification, 00(0), 1-

30. doi: 10.1177/0145445517699559

[3] Kazdin, A.E. (1977). The token economy: A review and

evaluation, New York: Plenum Press.

[4] Robacker, C.M., Rivera, C.J. & Warren, S.H. (2016). A

Token Economy Made Easy Through ClassDojo,

Intervention in School and Clinic, 52(1), 39-43. doi:

10.1177/1053451216630279

[5] Simonsen, B., Fairbanks, S., Briesch, A., Myers, D., &

Sugai, G. (2008). Evidenced-based practices in classroom

management: Considerations for research to practice.

Education and Treatment of Children, 31. 351–380.

[6] Carnett, A., Rulston, R., Lang, R., Tostanoski, A., Lee,

A., Sigafoos, J. & Machalicek, W. (2014). Effects of a

Perseverative Interest-Based Token Economy on

Challenging and On-Task Behavior in a Child with Autism,

Journal of Behavior Education, 23, 368-377. doi:

10.1007/s10864-014-9195-7

[7] Soares, D.A., Harrison, J.R., Vannest, K.J. &

McClelland, S.S. (2016). Effect Size for Token Economy

Use in Contemporary Classroom Settings: A Meta-Analysis

of Single-Case Research, School Psychology Review, 45(4),

379-399

[8] Kazdin, A.E. (1982). The token economy: A decade

later., Journal of Applied Behavior Analysis, 15, 431-445.

[9] Matson, J.L. & Boisjoli, J.A. (2009). The token

economy for children with intellectual disability and/or

autism: A review, Research in Developmental Disabilities,

30, 240-248. doi: 10.1016/j.ridd.2008.04.001

[10] Alter, P. (2012). Helping Students with Emotional and

Behavioral Disorders Solve Mathematics Word Problems,

Preventing School Failure, 56(1), 55-64, doi:

10.1080/1045988X.2011.565283

[11] Ogasahara, K., Hirono, M. & Kato, S. (2013). Support

for on-task behavior through a token economy system:

Autistic youth who shows challenging behavior, Jamanese

Journal of Special Education, 51(1), 41-49

[12] Maggin, D. M., Chafouleas, S. M., Goddard, K. M., &

Johnson, A. H. (2011). A systematic evaluation of token

economies as a classroom management tool for students with

challenging behavior. Journal of School Psychology, 49,

529-554.

[13] Hackenberg, T.D. (2009). Token reinforcement: A

review and analysis. Journal of the Experimental Analysis of

Behavior, 91, 257-286. doi: 10.1901/jeab.2009.91-257

[14] Kazdin, A.E., & Bootzin, R.R. (1972). The token

economy: An evaluative review. Journal of Applied

Behavior Analysis, 5, 343-372.

[15] Lee, Y.Y., Sugai, G., & Horner, R.H. (1999). Using an

instructional intervention to reduce problem and off-task

behaviors. Journal of Positive Behavior Interventions, 1(4),

195-204

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1540

[16] Alberto, P. & Troutman, A. (2003). Applied Behavior

Analysis for Teachers: 6th Ed., Merrill Prentice Hall, Upper

Saddle River, New Jersey, Columbus Ohio

[17] Martens, L.K., Lochner, D.G., & Kelley, S.Q. (1992).

The effects of variable-interval reinforcement on academic

engagement: A demonstration of matching theory. Journal

of Applied Behavior Analysis, 25, 143-151

[18] Hulac, D., Benson, N., Nesmith, M.C. & Shervey, S.W.

(2016). Using Variable Interval Reinforcement Schedules to

Support Students in the Classroom: An Introduction With

Illustrative Examples, Journal of Educational Research and

Practice, 6(1), 90-96, doi:10.5590/JERAP.2016.06.1.06

[19] Bowman, L.L., Levine, L.E., Waite, B.M., & Dendron,

L. (2010). Can students really multitask? An experimental

study of instant messaging while reading. Computers &

Education, 54(4), 927-931, doi:

10.1016/j.compedu.2009.09.024

[20] Logie, R.H., Gilbooly, K.J., & Wynn, V. (1994).

Counting on working memory in arithmetic problem

solving. Memory & Cognition, 22(4), 395-410, doi:

10.3758/BF03200866

[21] Wieth, M.B. & Burns, B.D. (2014). Rewarding

Multitasking Negative Effects of an Incentive on Problem

Solving Under Divided Attention, Journal of Problem

Solving, 7, doi: 10.7771/1932-6246.1163

International Journal of Technology and Inclusive Education (IJTIE), Volume 9, Issue 1, 2020

Copyright © 2020, Infonomics Society 1541

Vol.:(0123456789)

1 3

Journal of Contemporary Psychotherapy (2018) 48:145–154
https://doi.org/10.1007/s10879-017-9376-5

O R I G I N A L PA P E R

Token Economies: Using Basic Experimental Research to Guide
Practical Applications

Jeffrey F. Hine1 · Scott P. Ardoin2 · Nathan A. Call3

Published online: 12 December 2017
© Springer Science+Business Media, LLC, part of Springer Nature 2017

Abstract
This paper highlights the applicability of patterns seen within basic experimental research in relation to contemporary appli-
cation of token economies. Token economies are one of the most widely used interventions to promote behavior change,
and this procedure has evolved to be effective across many settings, behaviors, and individuals. Due to this widespread use,
casual implementation of the token economy might result in inconsistencies in responding and therefore an overall skepti-
cism in the procedure itself. We present multiple barriers that encumber practical application of token economies, including
insufficient conditioning and pairing of tokens, determining quality of backup reinforcers, unforeseen effects of motivating
operations, teaching the token exchange, effects of higher-order reinforcement schedules, ratio strain, and use of response
cost procedures. To assist practitioners in implementing more effective treatments, for each barrier we revisit the often
overlooked basic research involving features of conditioned reinforcement and reinforcement schedules. It is important to
translate the often complex implications of basic research so that practitioners can use this information to improve their own
practice as well as their confidence in disseminating use of this evidence-based treatment. To further guide practitioners
in using this knowledge in everyday settings, we also provide recommendations specific to each barrier as well as relevant
applied research and practical examples.

Keywords Token economy · Conditioned reinforcement · Applied behavior analysis

Introduction

Since first proposed by Ayllon and Azrin (1968) and subse-
quently refined by Kazdin (1977), the use of token econo-
mies has become one of the most venerable and widespread
applied interventions for producing behavior change (Kazdin
1982; Matson and Boisjoli 2009). Given this widespread
use and effectiveness across settings, many practitioners

(i.e., psychologists, teachers, and applied behavior analysts)
may have a general understanding of the procedures behind
establishing a token economy. Although there may be some
differences in the specifics of establishing a token economy
(e.g., Drabman and Tucker 1974; Miltenberger 2008), there
seems to be general consensus that establishing an effec-
tive token economy should at least include: (1) identifying
and operationally defining appropriate target behaviors; (2)
selecting appropriate tokens (e.g., durable, engaging, indi-
vidualized); (3) identifying backup reinforcers (e.g., primary
reinforcers, other conditioned reinforcers); (4) determining
values of tokens and exchange rates for backup reinforc-
ers; (5) determining methods of exchange; (6) determin-
ing how individuals can earn or lose tokens; (7) accurately
monitoring the program’s effects on the target behaviors; and
(8) adjusting the program to meet the long-term goals and
addressing barriers to success. If practitioners implement
these steps in a consistent and systematic manner, positive
behavior changes are likely to occur. In general, practitioners
can use the above framework as a “base” behavior manage-
ment system with which they can enact multiple options.

* Jeffrey F. Hine
jeffrey.hine@vanderbilt.edu

1 Vanderbilt University Medical Center Department
of Pediatrics, Vanderbilt Kennedy Center/Treatment
and Research Institute for Autism Spectrum Disorders
(TRIAD), 1211 21st Ave S, #110, Nashville, TN 37212,
USA

2 University of Georgia Department of Educational
Psychology, Center for Autism and Behavioral Education
Research, Athens, GA, USA

3 Emory University School of Medicine, Marcus Autism
Center, Atlanta, GA, USA

http://crossmark.crossref.org/dialog/?doi=10.1007/s10879-017-9376-5&domain=pdf

146 Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3

However, during implementation of a token economy,
practitioners may encounter complex barriers to behavior
change and may struggle to enact solutions targeting those
barriers (Bailey et al. 2011; Kazdin 1982). Revisiting foun-
dational basic research on the underlying mechanisms of
token economies can assist practitioners in overcoming such
difficulties when they are encountered in practice. Wide-
spread failure to implement token economies without an
understanding of these mechanisms likely has an impact on
the progress of the individuals with whom such procedures
are adopted and the reputation of this strategy.

Much basic research exists demonstrating the effects
of the underlying mechanisms of token economies (Foster
et al. 2001; Hackenberg 2009). Ideally, practitioners would
consult this literature when faced with a practical dilemma.
A variable that often interferes with this venture; however,
includes the considerable effort required for practitioners
to assimilate and apply information gained from reading
basic experimental research. Practitioners might view this
research as inapplicable to everyday practice; yet, general
patterns of performance found with animals and humans in
the laboratory consistently emerge in applied research (Mace
and Critchfield 2010). We will highlight the applicability
of these patterns that may assist practitioners in identify-
ing potential barriers to individual success and implement-
ing a more fundamentally sound and thus effective token
economy.

Conditioning Tokens as Effective Reinforcers

A conditioned reinforcer is defined as an initially neutral
event or stimulus acquiring value through its relation to pri-
mary reinforcers and subsequently can serve as an effec-
tive independent reinforcer (Skinner 1974; Williams 1994).
Comprehensive research programs such as Fantino (1977),
Kelleher (1966), and Williams (1994) collectively demon-
strate that several species’ response rates increase if respond-
ing produces conditioned reinforcers. Perhaps the most
widely cited laboratory investigations of the effects of tokens
as conditioned reinforcers are the classic primate studies of
Wolfe (1936) and Cowles (1937). Contingent presentation of
tokens maintained responding across multiple experiments
even when subjects were not allowed to exchange the tokens
until the end of an experimental session. Malagodi (1967a, b,
c) added to this research by demonstrating that rats acquired
new responses through use of token reinforcement alone and
that token-specific response rates were similar to those seen
under primary reinforcement. Given the substantial body
of research demonstrating that findings from the basic ani-
mal research can be generalized to applied use of the same
behavioral mechanisms, it would seem that the principles
that govern the effectiveness of token economies in animal

studies have value when troubleshooting an ineffective token
economy. Thus, when token economies are not as effective
as projected, practitioners need first to investigate a number
of general factors relating to the effectiveness of the token.

Barrier: Insufficient Quality of Backup Reinforcers

One barrier to effective use of a token economy is when
the token has not been established as an effective condi-
tioned reinforcer. This problem may become evident when
the individual ceases to readily exchange tokens for previ-
ously accessed backup reinforcers. If this occurs, decreased
responding will likely ensue and the individual may discard
tokens instead of exchanging them. An initial issue that
practitioners need to investigate is the quality of the backup
stimuli.

Basic Experimental Research

Some manipulable dimensions of reinforcement found to
increase the likelihood of responding within token econo-
mies include reinforcer rate, magnitude, and quality of rein-
forcement (Mace and Roberts 1993). Quality of reinforce-
ment is often described as involving reinforcer potency or
efficacy, and can be quantified in terms of an individual’s
preferences. One way to measure preference is to consider
stimuli that are reliably selected as highly preferred in stimu-
lus preference assessments (Neef et al. 1994). An individu-
al’s preference for the to-be-paired stimuli will undoubtedly
influence a token’s effectiveness as a conditioned reinforcer.
In the classic Wolfe (1936) studies, researchers demonstrated
primates’ proclivity to select tokens that had been paired
with food (as opposed to nothing) and tokens that had been
paired with two pieces of food (rather than one). Additional
animal studies demonstrate that with all other dimensions of
reinforcement held constant (e.g., amount and rate) subjects
will bias responding toward reinforcers of higher quality.
Thus, if manipulating the quality of primary reinforcement
is an effective method of biasing responding, doing so will
likely impact the reinforcing effectiveness of tokens paired
with the primary (or backup) reinforcers.

In Applied Settings

Tokens will not become effective reinforcers if they are
paired with stimuli that are not of sufficient quality or that
have not been established as effective reinforcers themselves.
A large body of applied literature has identified numer-
ous strategies for selecting stimuli most likely to function
as effective backup reinforcers including single stimulus,
free operant, paired stimulus, and multiple stimulus with-
out replacement formats (DeLeon and Iwata 1996; Roane
et  al. 1998). Given that preferences may fluctuate over

147Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3

time, practitioners might consider having an assortment of
backup stimuli and institute periodic assessment of prefer-
ences. Systematically rotating preferred items can main-
tain the reinforcing properties of the backup stimuli. For
example, DeLeon et al. (2000) demonstrated that providing
access to only a single set of toys limited the effectiveness
of their intervention due to satiation effects. Instead, when
providing access to a rotating set of toys as reinforcement
for competing responses, automatically maintained self-
injurious behavior was reduced. Thus, tokens can maintain
their reinforcing properties despite fluctuating preferences if
they can be exchanged for a variety of high quality reinforc-
ers and can become much more flexible as a reinforcer in
treatment programs. Additionally, if tokens can effectively
be exchanged for many different backup reinforcers, the con-
venience and social validity of the program increases by not
requiring practitioners to keep a wide range of reinforcers
constantly and immediately available.

Barrier: Insufficient or Inconsistent Pairing

After ensuring quality backup reinforcers, another factor that
might impede consistent responding is the association of the
token with the backup reinforcer. These associations arise
through the original token-backup pairing and how often
this pairing occurs.

Basic Experimental Research

Foundational basic research by Wolfe (1936) and Cowles
(1937) demonstrated the importance of pairing tokens with
primary reinforcers by teaching primates to respond differ-
entially to tokens with exchange value as opposed to those
without. A token will likely not have a reinforcing influ-
ence over an organism’s behavior if the token is not paired
with the backup stimuli a sufficient number of times or close
enough in time. Williams and Dunn (1991) provided some
evidence for the necessity of token-backup pairing through a
series of experiments examining conditioned reinforcement
in pigeons. Overall, the effectiveness of conditioned rein-
forcers depended on the frequency with which the stimulus
was paired with the primary reinforcer as well as how often
the stimulus was followed by reinforcement. Kelleher and
Gollub (1962) also noted the significance of the number of
pairings between the eventual conditioned stimuli (tokens)
and primary reinforcers.

Research investigating respondent conditioning further
supports that, with repeated pairing, the token should retain
the reinforcing properties of the backup reinforcer even
without the individual engaging in any behaviors outside of
accepting and consuming the reinforcer (Williams 1994).
Shahan (2010) equates this circumstance to the principles
of respondent conditioning that result in stimuli acquiring

the capacity to act as conditioned stimuli when paired with
unconditioned stimuli. In this case, neutral stimuli (tokens)
acquire the capacity to function as reinforcers when paired
with primary reinforcers. Classic basic research demon-
strating application of conditioned reinforcers to shape new
responses (Malagodi 1967a, b, c; Kelleher and Gollub 1962)
provides further supporting evidence for the importance of
the foundational relationship between tokens and backups.

In Applied Settings

If the token does not seem to function as a conditioned rein-
forcer, practitioners will most likely need to pair the two
stimuli more frequently, more consistently, or temporally
closer. Before ever requiring the individual to engage in a
behavior to gain access to the token, practitioners would
benefit from repeatedly and contiguously pairing the token
with a backup reinforcer. For instance, a practitioner could
fill up 9 spaces on a 10-space token board and then noncon-
tingently deliver the 10th token while immediately allowing
access to a preferred item. This process would be repeated
until the individual accepts both the token and backup rein-
forcer a majority of the time. In an applied study, Moher,
Gould, Hegg, and Mahoney (2008) successfully established
tokens as conditioned reinforcers by pairing tokens with
backup reinforcers in two stages. The first stage involved
the experimenter delivering a backup reinforcer within 0.5 s
of delivering a token noncontingently. In the second stage,
the participant was encouraged to physically exchange the
token for the backup reinforcer. When evaluated in a pref-
erence assessment, tokens contingently paired with highly
preferred edible items became preferred stimuli themselves.

Barrier: Overcoming Problematic Effects
of Motivating Operations

Even after ensuring a strong pairing between tokens and
backup reinforcers, inconsistent responding may still occur.
One potential cause is the variable effectiveness of backup
stimuli from moment-to-moment. In an effort to prevent
inconsistent effectiveness of backup reinforcers and thus
responding, practitioners can investigate the effects of moti-
vating operations. By definition, motivating operations can
alter the reinforcing effectiveness of tokens either by increas-
ing (establishing) or decreasing (abolishing) the effective-
ness of a given consequence (Laraway et al. 2003; Vollmer
and Iwata 1991). It might be the case that motivating opera-
tions are affecting responding in unforeseen ways.

Basic Experimental Research

Wolfe (1936) first demonstrated this fact by exposing pri-
mates to various states of food deprivation. Specifically,

148 Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3

subjects were given choices between tokens (some exchange-
able for food and some exchangeable for water) while under
alternating deprivation conditions. All subjects preferred the
tokens corresponding to the current deprivation conditions;
however, the preference was not exclusive due to relatively
modest deprivation states. Therefore, two additional sub-
jects were given the same choices between tokens under
longer deprivation conditions and the researchers allowed
access to the alternate reinforcer prior to each session. The
subjects under the more stringent conditions preferred the
deprivation-specific reinforcer to a higher degree. Thus, rate
of token-exchange was consistent with the state of depri-
vation specific to each backup reinforcer and lessening the
motivation for one reinforcer strengthened the motivation
for the other.

Another motivating operation factor studied within basic
research and applicable to practical application involves the
degree to which the reinforcer is available outside of the
experimental session.Hursh (1984) described closed econo-
mies as those in which reinforcers are only available through
an organism’s interaction with the experimental environ-
ment, and open economies as those in which consumption of
the reinforcer is not completely dependent on within-session
performance. For example, two classic studies (Felton and
Lyon 1966; Catania and Reynolds 1968) performed experi-
ments in which pigeons were given supplemental (noncon-
tingent) feedings outside of the experimental session (open
economy). Relative rates of responding were markedly less
under these conditions than in conditions where subjects
could only access reinforcers through responding accord-
ing to in-session schedules of reinforcement (Collier et al.
1972).

In Applied Settings

Motivating operations can be seen as an advantage to prac-
titioners who can regulate the amount of access to a single
backup reinforcer. This can be achieved by ensuring the indi-
vidual does not have access to the preferred backup item
outside of the token economy. For instance, Roane, Call,
and Falcomata (2005) demonstrated more responding during
closed economies in which participants were only able to
obtain reinforcement through interaction with progressive-
ratio schedules of reinforcement during session. This was
in contrast to open-economy sessions during which partici-
pants demonstrated decreased responding while obtaining
both within-session reinforcers and supplemental access to
reinforcers outside of session.

Depending on the nature of the primary backup rein-
forcer, applied research has shown that the effectiveness
of tokens decreases during periods in which participants
are satiated on backup reinforcers; however, rotation and
choice across multiple backup reinforcers may guard against

these effects (Moher et al. 2008; Sran and Borrero 2010).
Additionally, inconsistent responding due to the effects of
motivating operations can also be neutralized by creation of
generalized conditioned reinforcers. A token becomes a gen-
eralized conditioned reinforcer when it can be exchanged for
a variety of backup reinforcers and is less sensitive to moti-
vating operations (Ferster and Culbertson 1982). Increasing
the number of backup reinforcers with which the token is
paired should also result in the maintenance of responding
even when individuals are satiated on the most preferred
backup reinforcer (Moher et al. 2008). Thus, efforts can be
made to decrease the potential negative effects of abolish-
ing operations by having a menu of options from which an
individual can select when exchanging tokens for backup
reinforcers.

Barrier: Difficulty Shaping the Exchange Response

Given that most practitioners themselves have had a long
history of operating within a token economy (e.g., receiv-
ing and cashing paychecks), there may be some inclination
to assume an individual can exchange tokens for backup
reinforcers spontaneously. For ease and efficiency of token-
backup exchange, there may be some benefit in removing
the exchange response altogether. That is, by exchang-
ing the tokens for someone who is struggling to learn the
exchange response, the reinforcing properties of the token
might stay intact with practitioner-mediated token-backup
pairing. However, explicit teaching of token exchange may
be necessary and beneficial if the practitioner intends on the
individual eventually determining components such as the
magnitude and rate of reinforcement.

Basic Experimental Research

Laboratory studies often rely on multiple stages to shape the
exchange response. After magazine training, in which ani-
mals are taught to approach the food receptacle and consume
primary reinforcers, a shaping procedure is used to rein-
force approximations of lever pressing. Experimenters then
focus on teaching the animal the token deposit response.
For instance, Malagodi (1967a) distributed 80 marbles on
the floor of an operant chamber and reinforced successive
approximations of rats depositing the marbles into a recep-
tacle. Stimulus control over the response was established by
reinforcing deposit responses on a continuous reinforcement
schedule in the presence of a discriminative stimulus (recep-
tacle light and clicker). In the foundational Wolfe (1936) and
Cowles (1937) studies, exchange opportunities were freely
available for primates and depositing of the token was ini-
tially modeled by the experimenter. Each token deposited by
the subject was reinforced immediately with food.

149Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3
In Applied Settings

Even when tokens have acquired the properties of effec-
tive conditioned reinforcers, not all individuals will imme-
diately have mastery over the response chain necessary to
physically exchange the token and consume the backup
reinforcer. Numerous empirically validated methods for
teaching response chains are possible; including graduated
guidance, errorless learning, constant-time delay, and video
modeling. Initially, the act of exchanging the token should
be the primary task in which the individual must engage in
order to gain access to the reinforcer. It could also be the
case that the response effort of the act of exchanging is too
great (e.g., walking to a different area, locating the practi-
tioner, making choices between backup reinforcers, engag-
ing in a communicative response, etc.); thus, individuals
sometimes save tokens to increase the amount of reinforc-
ment they receive per exchange (Yankelevitz et al. 2008).
For instance, an establishing operation for saving tokens
might be in effect during low-effort tasks if the exchange
response is too demanding; at least until a sufficient number
of tokens have been accumulated to overcome the effort of
the exchange response. Thus, practitioners must consider the
overall effort of the exchange itself, as it may influence the
effectiveness of token program.

Acknowledging and Investigating
First‑ and Second‑Order Schedules
of Reinforcement

Reinforcement schedules do not operate in isolation; instead,
one schedule (a first-order schedule) can be a unit of behav-
ior upon which another schedule operates (higher- or sec-
ond-order schedules). In other words, completion of the first-
order schedule (e.g., fixed-ratio [FR]-5) is a behavioral unit
that is reinforced according to a second schedule (e.g., varia-
ble-interval [VI]-25). An oversimplified view of responding
within token economies would include practitioners viewing
responding as vulnerable to only the “local” contingencies
available through first-order reinforcement schedules. If this
were the case, behavior patterns under token economies
would only mimic those seen under programs using pri-
mary reinforcement, which most often is not the case. Fixed-
ratio schedules, for instance, produce post-reinforcement
pauses—also referred to as “pre-run” pauses—in which
responding briefly ceases following reinforcement deliv-
ery. This momentary lag in responding is often followed
by an increase in response rate until the organism meets
the requirement for reinforcement. Conversely, a variable-
ratio (VR) schedule produces relatively higher and steadier
rates of responding (Ferster and Skinner 1957). Patterns
of behavior within an extended token economy, however,

should instead be considered as unitary responses influenced
and reinforced according to two other higher-order sched-
ules. Kelleher (1958, 1966) described token economies as
involving three interconnected schedules of reinforcement
and behavior that is responsive to a token economy will
be jointly determined by both the first- and second-order
reinforcement schedules. The three schedules include: (1)
the token-production schedule: the first-order schedule of
reinforcement under which the behavior targeted for change
will result in tokens (e.g., FR5: the individual must emit 5
responses to receive one token); (2) the exchange-production
schedule: the schedule that determines when the opportunity
to exchange tokens for backup reinforcers is available (e.g.,
fixed-time [FT]-5: the individual can exchange tokens for
backups every 5 min); and (3) the token-exchange schedule:
the rate of exchange or “cost” for backup reinforcers (e.g.,
FR5: the individual must exchange 5 tokens for a certain
backup reinforcer).

Practitioners might view ongoing patterns of behavior
as being reinforced by the local contingencies available
through the immediate token-production schedule; however,
responding is not under the sole control of any of these three
schedules at any given time. Given research supporting the
separate and combined effects of these schedules (Hacken-
berg 2009), practitioners are advised to take notice of the
three specific schedules of reinforcement and need to inves-
tigate both the “local” contingencies of the token-production
schedule as well as the often superseding contingencies of
the second-order schedules.

Barrier: Appropriately Adjusting Local
Contingencies of the Token‑Production Schedule

Beginning with the token-production schedule, practitioners
may struggle in deciding whether to provide tokens after a
fixed or variable number of responses (FR or VR); after a
fixed or variable duration of engaging in the targeted behav-
ior (FRD or VRD); or after the first response following a
fixed or variable amount of time (FI or VI). Most token
economies implemented in applied settings run on fixed
schedules. There are obvious logistical benefits of using
fixed token-production schedules (i.e., ease of implementa-
tion, predictability) and FR schedules can often result in
high, steady responding and are especially beneficial when
teaching new behaviors. One important property of fixed
schedules, however, is that they can introduce discrimina-
ble periods during which reinforcement does not occur, and
responding can appear erratic such as scalloped respond-
ing under an FI schedule in which responding is slow in
the beginning of an interval and increases just before token
delivery. Furthermore, FI schedules can become problematic
when practitioners attempt to reinforce longer durations of

150 Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3

continuous appropriate behavior such as increasing time on
task.

Basic Experimental Research

As stated previously, the token-production schedule exerts
some control over response patterns in token economies
in a manner resembling those obtained under schedules of
primary reinforcement (Kelleher 1956; Malagodi 1967b).
For example, Kelleher (1958), taught primates to press
a lever that produced poker chips according to an FR30
schedule where every 30 responses produced one poker
chip. Exchange periods were scheduled following an FR50
exchange-production schedule. Corresponding to basic
experimental research with simple FR schedules using pri-
mary reinforcement, responding occurred at a high-steady
rate, with short pauses prior to each ratio run. The research-
ers then increased the token-production schedule to an
FR125 while keeping the exchange-production schedule
constant. Again, emulating effects seen with FR schedules
of primary reinforcement, overall response rates decreased
and post-reinforcement pausing increased. In addition to
this study, basic research also shows that by switching the
token-reinforcement schedule to variable schedules, one can
expect high and steady responding under VR token-produc-
tion schedules, and slow and steady responding under VI
schedules (Ferster and Skinner 1957).

In Applied Settings

During acquisition phases, practitioners have the option to
begin with an FR1 token production schedule so that every
occurrence of the new behavior produces reinforcement. As
the individual gains experience with the token economy,
practitioners can systematically adjust the production sched-
ule to include intermittent schedules using procedures such
as thinning the schedule to an FR5. Alternatively, moving to
a VR token-production schedule will result in maintenance
of the replacement behavior over longer periods of time and
may guard against post-reinforcement pauses and satiation
of backup reinforcers. Basic research suggests that under FI
schedules practitioners will observe long periods of inactiv-
ity with slight increases in responding towards the interval’s
end (scalloped responding; Kelleher 1956). To modify this
pattern, practitioners can change to either a VI token-produc-
tion schedule or a response duration schedule. Under a VI
schedule, responding should be steadier and more moderate
due to the unpredictability of the interval’s end (Malagodi
1967c). With an FRD or a VRD schedule, practitioners have
the option to only reinforce behaviors that are of a specific
duration. An example of effectively using an FRD token-
production schedule would involve an individual receiving
a token after 3 min of continuous appropriate conversation,

and not receiving a token if the duration of the conversation
was less than 3 min.

Barrier: Unforeseen Effects
of the Exchange‑Production Schedule

Whereas the token-production schedule refers to the num-
ber of target responses that must be emitted by individu-
als to receive a token, the exchange-production schedule
refers to how often an individual is given the opportunity
to exchange tokens for the backup reinforcers. Practitioners
may tend to focus too narrowly on the more local effects of
the token-production schedule and, as a result, encounter
decreased responding even with a dense token-production
schedule. One consideration that needs to be emphasized
in this instance is the influence of the exchange-production
schedule.

Basic Experimental Research

Basic research has suggested that the exchange-production
schedule may in fact have greater control over responding
patterns than the local contingencies operating within the
token-production schedule (Webbe and Malagodi 1978).
Foster et al. (2001) highlighted the relatively greater influ-
ence of the exchange-production schedule over token-pro-
duction schedules by comparing one condition with a VR-
token-production schedule and a FR-exchange-production
schedule with another condition involving an FR-token-
production schedule and a VR-exchange-production sched-
ule (FR-token/VR-exchange vs. VR-token/FR-exchange).
Because pause durations were longer in the VR token-pro-
duction schedule (a schedule known to produce relatively
pause-free, constant rates of responding) relative to the FR
token-production schedule (a schedule known to produce
break-run patterns), the researchers concluded that overall
rates of behavior were primarily organized by the exchange-
production schedule requirements. Bullock and Hackenberg
(2006) extended these results by showing more pronounced
effects of the exchange-production schedule when the token-
production ratios were higher and when more responses per
token were required. In trials that included lower token-pro-
duction schedules (e.g., FR2), response rates varied much
less with the exchange-production schedule. That is, the
schedules that allowed more frequent access to tokens mim-
icked those of primary reinforcement schedules; whereas,
when more responses were required per token the frequency
of exchange periods had more influence over responding.

In Applied Settings

A practical example of the token-production schedule acting
as a unitary response, and thereby producing reinforcement

151Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3

according to the exchange-production schedule, includes
a student earning tokens for completing math problems.
For every two problems completed, he will receive a token
(FR2-token-production schedule). Given a relatively fre-
quent exchange period (FR8-exchange-production sched-
ule), one could expect responding to adhere to patterning
seen under simple-schedules with primary reinforcement:
the student would complete problems rapidly and accumu-
late more tokens within a given time period. The frequency
of exchange periods becomes more influential once teach-
ers thin the token-production schedule. For instance, if the
teacher now requires the student to complete 15 problems
for one token (FR15-token-production schedule), these clus-
ters of behaviors are now more vulnerable to the effects of
the higher-order exchange-production schedule. The teacher
could now allow the student to exchange all the tokens for a
backup reinforcer either (a) after completing a fixed number
of problems (e.g., FR45-exchange-production schedule), or
(b) after an average number of problems (e.g., VR45). Under
the FR45 exchange-production schedule, the student is likely
to engage in post-reinforcement pausing with a quick transi-
tion to rapid responding until reinforcement is received (i.e.,
break-run patterning). Under the VR45 exchange-production
schedule, however, the student could be expected to com-
plete problems more quickly while pausing for shorter peri-
ods. Thus, even though the token-production schedule is the
same in either scenario (FR15), it is the exchange-production
schedule that disproportionately controls the overall rate of
responding.

Barrier: Overcoming Ratio Strain

Within token economies, ratio strain occurs with abrupt
increases in ratio requirements, resulting in decreases in
behavior similar to those seen during extinction (Ferster
and Skinner 1957). Ratio strain can unexpectedly occur
through the interaction of the token- and exchange-produc-
tion schedule. Sifting through why ratio strain is occurring
and which schedule is influencing responding the most can
be a complex task. Long pauses in ratio performance could
occur when response requirements within the token-produc-
tion schedule are too high, or when too much time elapses
between exchange opportunities.

Basic Experimental Research

Specific to token-production, organisms whose schedules
of reinforcement are “thinned” are required to engage
in an increased amount of responding before reinforce-
ment. Decreased responding is often a function of increas-
ing ratio requirements too quickly; however, ratio strain
can also occur through the interaction of the token- and
exchange-production schedules whereby organisms

might earn tokens at an appropriate rate, yet would show
decreased responding if the requirement for exchange-
production was too stringent (i.e., exchange opportunities
are not often enough). For instance, Bullock and Hacken-
berg(2006) demonstrated ratio strain in pigeons by show-
ing an inverse relation between responding and the token-
production ratio. High response requirements decreased
responding due mainly to long pauses and low response
rates in early segments. Researchers also demonstrated,
however, an inverse relation between response rate and
exchange-production ratios when token-production ratios
were kept constant.

In Applied Settings

Ratio strain can occur within applied environments when
practitioners delay opportunities to exchange tokens until
the end of the day (FT- or VT-exchange-production sched-
ules) or do not restore the exchange-production schedules
to the ratio that previously maintained adequate rates of
responding. For example, if an individual receives tokens
on an FR5-token-production schedule, and can exchange
tokens after accumulating an average of 5 tokens (VR5-
exchange-production schedule), all other things accounted
for, the practitioner can expect relatively rapid responding
with short post-reinforcement pauses. If, however, the practi-
tioner abruptly requires the individual to either (a) engage in
50 responses for 1 token (FR-50-token-production schedule),
(b) exchange tokens only after accumulating an average of
100 tokens (VR100-exchange-production schedule), or (c)
both, one would expect decreased responding due to ratio
strain, and potentially complete extinction of responding
before reinforcement at the new schedule can occur. Ratio
strain can be avoided by increasing response requirements
gradually, temporarily reducing ratio requirements, or by
increasing backup magnitude or quality (Roane et al. 2007).

Barrier: Adjusting Prices of Backup Reinforcers

The token-exchange schedule, refers to how many tokens
are required for a specific backup reinforcer, or the token-
specific “price” of the backup reinforcer. It is not always the
case that once an exchange opportunity is earned, the indi-
vidual can exchange only one token for one unit of a backup
reinforcer. What is more likely to occur in applied settings
is the option to “purchase” a variety of backup reinforcers
that vary in price. Token-exchange schedules often may be
chosen arbitrarily or might even be based on the actual retail
price of the item. However, to ensure predictable respond-
ing, practitioners should consider a number of different fac-
tors when determining this schedule.

152 Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3
Basic Experimental Research

Basic research does not provide extensive information about
the specific influence of token-exchange schedules other
than that token-exchange influence is similar to exchange-
production influence. Malagodi et al. (1975) demonstrated
decreased responding in rats’ lever pressing given increased
token-exchange schedules: the more demanding the token-
exchange schedule, the longer post-reinforcement pausing.
Basic research on “unit price,” or the ratio of responses
to every unit of reinforcer, includes a more sophisticated
description of price that details the characteristics of cost-
benefit tradeoffs (e.g., Delmendo et  al. 2009; Foster and
Hackenberg 2004). That is, price does not necessarily refer
just to the actual token-exchange schedule of the backup
reinforcer; rather, the interaction of the token-exchange
schedule with the costs of producing tokens (the token-
production schedule) and the costs of producing exchange
opportunities (the exchange-production schedule; Bullock
and Hackenberg 2006). Basic research on unit price demon-
strates that decreases in response rates are associated with
increases in unit price (more stringent token-production,
exchange-production, and token-exchange requirements).

In Applied Settings

Practitioners can expect decreased overall responding if
the labeled price of the backup reinforcer is relatively too
high (i.e., too demanding of a token-exchange schedule).
In applied settings; however, the actual price of the backup
reinforcer is better determined by the combined effects of
how often responses result in tokens, how often the indi-
vidual can exchange tokens, and the number of tokens
required to exchange for specific reinforcers (Bullock and
Hackenberg 2006; Malagodi et  al. 1975). In this sense,
Hackenberg (2009) likened the token-production schedule
to a worker’s wage, the exchange-production schedule to
the effort required to purchase the item (e.g., driving to the
store, getting cash out of the bank), and the token-exchange
schedule to the number listed on the price tag. The basic
premise of unit price can be applied to practical contexts
by using reinforcer assessments to identify a hierarchy of
backup reinforcers. Delmendo et al. (2009) suggested that
this information could be used to differentially program con-
tingencies for task completion based on the subjective effort
associated with the task. Within token economies, those
tasks that require more effort can be associated with more
preferred reinforcers. Conversely, reinforcers that are less
preferred may be more suitable for maintaining less effort-
ful responses. An example of this situation could involve an
individual being able to use tokens to purchase high quality
rewards that are not available at other times after a period
of effortful tasks (e.g., difficult homework). Less preferred

reinforcers, therefore, would be available for the periods that
involve less effortful tasks (e.g., sitting at the table while
eating).

Barrier: Response Cost and Reducing Inappropriate
Behavior

Common barriers not only to token economies, but also
to applied practice as a whole, may include the individual
engaging in problem behavior. Practitioners most likely are
well versed in the use of differential reinforcement proce-
dures to reinforce an appropriate behavior in place of an
inappropriate one, but they may struggle to enact appropri-
ate measures to respond to inappropriate behavior using the
token economy. Modifying the token-production schedule
to include response cost procedures could be an effective
addition to target problem behavior if reinforcement alone
is ineffective in reducing problem behavior.

Basic Experimental Research

Response cost is conceptualized as a punishment procedure
in which reinforcers are removed contingent upon some
response. For example, Pietras and Hackenberg (2005) used
LED lights as tokens to reinforce pecking in pigeons. Use
of these lights allowed researchers to easily remove tokens
by turning the light off. Key pecking was maintained on
two separate schedules and when FR schedules of response-
cost were introduced in one of the schedules, response rates
under only that schedule decreased. It is interesting to note
that in this experiment, response rates for the response-cost
schedule were not completely suppressed; rather, only dur-
ing extinction did response rates decrease to near-zero levels.
In a basic experiment with humans (Weiner 1962), responses
produced brief stimuli (lights) signaling availability of rein-
forcers/points according to either VI or FI schedules. By
subtracting one point from a counter during response cost
conditions, response rates were suppressed and did not
recover with continued exposure to the response-cost con-
tingency. Thus, basic experimental research has consistently
demonstrated decreased response rates of target behaviors
using contingent removal of conditioned reinforcers.

In Applied Settings

A response cost procedure can be effective within any type
of token-production schedule as long as the tokens are act-
ing as conditioned reinforcers. Much of the applied token
economy research implementing response cost involves the
individual being given a number of tokens at the begin-
ning of an interval and losing tokens for each inappro-
priate response (e.g., Conyers et al. 2004; McGoey and
DuPaul 2000). If the individual has enough tokens at the

153Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3

end of the set interval, he or she can exchange them for a
backup reinforcer. Conyers et al. (2004) found that both
a response-cost and differential reinforcement of other
behavior (DRO) procedure were effective in reducing
problem behavior when implemented in isolation; how-
ever, they recommended implementing them together to
increase treatment acceptability. Thus, a response cost pro-
cedure will be more valuable if implemented in conjunc-
tion with a token-production differential reinforcement
schedule.

A practical limitation of using response cost can involve
an instance in which an individual loses all tokens before
an exchange period and has a long wait before an oppor-
tunity to earn them back. This instance might produce
a segment of time in which contingencies for appropri-
ate behavior are vague, perhaps creating an establishing
operation for problem behavior. Some other practical lim-
itations are encompassed by the potential negative side
effects of punishment procedures in general. Specifically,
punishment procedures such as response cost can produce
negative side effects that include collateral increases in
punishment-elicited aggression, escape behaviors, and
emotional reactions (Lerman and Vorndran 2002). Lastly,
using response cost in isolation might be disadvantageous
considering exchange opportunities depend on respond-
ing (FR or VR exchange-production schedules). That is,
if response-cost conditions result in low response rates,
infrequent pairings of tokens with backup reinforcers may
reduce the reinforcing value of the tokens.

Conclusion

As robust as the literature surrounding token economies is,
practitioners may not make regular contact with the basic
research that has defined many of the practices in use today.
However, when faced with practical problems or barriers to
success, it is likely beneficial for practitioners to revisit this
literature and examine these underlying principles. This is
especially true if the practitioner responsible for training
others in the implementation of the token economy has a
less sophisticated understanding of how and why certain
procedures function as they do. It is true that typical and
even substandard implementation of token economies can
have positive effects on behavior; however, many practition-
ers are called upon to consult for complex cases. Complex
cases likely require a deeper understanding of the underlying
mechanisms actuating the seemingly everyday practices that
we use. Thus, our objective was to make this research more
accessible and to translate this research for applied settings;
hopefully assisting practitioners in implementing a more
fundamentally sound and thus effective treatment.

Compliance with Ethical Standards

Conflict of interest All authors declare no conflicts of interest. This
article does not contain any studies with human participants or animals
performed by any of the authors.

References

Ayllon, T., & Azrin, N. H. (1968). The token economy: A moti-
vational system for therapy and rehabilitation. New York:
Appleton-Century-Crofts.

Bailey, J. R., Gross, A. M., & Cotton, C. R. (2011). Challenges asso-
ciated with establishing a token economy in a residential care
facility. Clinical Case Studies, 10(4), 278–290.

Bullock, C. E., & Hackenberg, T. D. (2006). Second-order schedules
of token reinforcement with pigeons: Implications for unit price.
Journal of the Experimental Analysis of Behavior, 85, 95–106.

Catania, A. C., & Reynolds, G. A. (1968). Quantitative analysis of the
responding maintained by interval schedules of reinforcement.
Journal of the Experimental Analysis of Behavior, 11, 327–383.

Collier, G. H., Hirsch, E., & Hamlin, P. H. (1972). The ecological
determinants of reinforcement in the rat. Physiology and Behav-
ior, 9, 705–716.

Conyers, C., Miltenberger, R., Maki, A., Barenz, R., Jurgens, M.,
Sailer, A., & Kopp, B. (2004). A comparison of response cost and
differential reinforcement of other behavior to reduce disruptive
behavior in a preschool classroom. Journal of Applied Behavior
Analysis, 37, 411–415.

Cowles, J. T. (1937). Food-tokens as incentives for learning by chim-
panzees. Comparative Psychological Monographs, 12, 1–96.

DeLeon, I. G., Anders, B. M., Rodriguez-Catter, V., & Neidert, P. L.
(2000). The effects of noncontingent access to single- versus mul-
tiple-stimulus sets on self-injurious behavior. Journal of Applied
Behavior Analysis, 33(4), 623–626.

DeLeon, I. G., & Iwata, B. A. (1996). Evaluation of multiple-stimulus
presentation format for assessing reinforcer preference. Journal
of Applied Behavior Analysis, 29, 519–532.

Delmendo, X., Borrero, J. C., Beauchamp, K. L., & Francisco, M. T.
(2009). Consumption and response output as a function of unit
price: Manipulation of cost and benefit components. Journal of
Applied Behavior Analysis, 42, 609–625.

Drabman, R. S., & Tucker, R. D. (1974). Why token economies fail.
Journal of School Psychology, 12(3), 178–188.

Fantino, E. (1977). Conditioned reinforcement: Choice and informa-
tion. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of
operant behavior (pp. 313–339). Englewood Cliffs: Prentice-Hall.

Felton, M., & Lyon, D. (1966). The post-reinforcement pause. Journal
of the Experimental Analysis of Behavior, 9, 131–134.

Ferster, C. B., & Culbertson, S. A. (1982). Behavior Principles
(3rd edn.). Englewood Cliffs: Prentice Hall.

Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement.
New York: Appleton Century-Crofts.

Foster, T. A., & Hackenberg, T. D. (2004). Unit price and choice in a
token-reinforcement context. Journal of the Experimental Analy-
sis of Behavior, 81(1), 5–25.

Foster, T. A., Hackenberg, T. D., & Vaidya, M. (2001). Second-order
schedules of token reinforcement with pigeons: Effects of fixed-
and variable-rate exchange schedules. Journal of Experimental
Analysis of Behavior, 76, 159–178.

Hackenberg, T. D. (2009). Token reinforcement: A review and analysis.
Journal of Experimental Analysis of Behavior, 91, 257–286.

Hursh, S. R. (1984). Behavioral economics. Journal of the Experimen-
tal Analysis of Behavior, 42, 435–452.

154 Journal of Contemporary Psychotherapy (2018) 48:145–154

1 3

Kazdin, A. E. (1977). The token economy: A review and evaluation.
New York: Plenum.

Kazdin, A. E. (1982). The token economy: A decade later. Journal of
Applied Behavior Analysis, 15, 431–445.

Kelleher, R. T. (1956). Intermittent conditioned reinforcement in chim-
panzees. Science, 124, 679–680.

Kelleher, R. T. (1958). Fixed-ratio schedules of conditioned reinforce-
ment with chimpanzees. Journal of the Experimental Analysis of
Behavior, 1, 281–289.

Kelleher, R. T. (1966). Conditioned reinforcement in second-order
schedules. Journal of the Experimental Analysis of Behavior, 9,
475–485.

Kelleher, R. T., & Gollub, L. R. (1962). A review of positive condi-
tioned reinforcement. Journal of the Experimental Analysis of
behavior, 5(4), 543–597.

Laraway, S., Snycerski, S., Michael, J., & Poling, A. (2003). Motivating
operations and terms to describe them: Some further refinements.
Journal of Applied Behavior Analysis, 36(3), 407–414.

Lerman, D. C., & Vorndran, C. M. (2002). On the status of knowledge
for using punishment: Implications for treating behavior disorders.
Journal of Applied Behavior Analysis, 35(4), 431–464.

Mace, F. C., & Critchfield, T. S. (2010). Translational research in
behavior analysis: Historical traditions and imperative for the
future. Journal of the Experimental Analysis of Behavior, 93(3),
293–312.

Mace, F. C., & Roberts, M. L. (1993). Factors affecting selection of
behavioral interventions. In J. Reichle & D. P. Wacker (Eds.),
Communicative alternatives to challenging behavior (pp. 113–
134). Baltimore: Brookes.

Malagodi, E. F. (1967a). Acquisition of the token-reward habit in the
rat. Psychological Reports, 20, 1335–1342.

Malagodi, E. F. (1967b). Fixed-ratio schedules of token reinforcement.
Psychonomic Science, 8, 469–470.

Malagodi, E. F. (1967c). Variable-interval schedules of token reinforce-
ment. Psychonomic Science, 8, 471–472.

Malagodi, E. F., Webbe, F. M., & Waddell, T. R. (1975). Second-order
schedules of token reinforcement: Effects of varying the sched-
ule of food presentation. Journal of the Experimental Analysis of
Behavior, 24, 173–181.

Matson, J. L., & Boisjoli, J. A. (2009). The token economy for children
with intellectual disability and/or autism: A review. Research in
Developmental Disabilities, 30(2), 240–248.

McGoey, K. E., & DuPaul, G. J. (2000). Token reinforcement and
response cost procedures: Reducing the disruptive behavior of
preschool children with attention-deficit/hyperactivity disorder.
School Psychology Quarterly, 15(3), 330–343.

Miltenberger, R. (2008). Behavior modification. Belmont: Wadsworth
Publishing.

Moher, C. A., Gould, D. D., Hegg, E., & Mahoney, A. M. (2008).
Non-generalized and generalized conditioned reinforcers: Estab-
lishment and validation. Behavioral Interventions, 23(1), 13–38.

Neef, N. A., Shade, D., & Miller, M. S. (1994). Assessing influen-
tial dimensions of reinforcers on choice in students with serious
emotional disturbance. Journal of Applied Behavior Analysis, 27,
575–583.

Pietras, C. J., & Hackenberg, T. D. (2005). Response-cost punish-
ment via token loss with pigeons. Behavioral Processes, 69(3),
343–356.

Roane, H. S., Call, N. A., & Falcomata, T. S. (2005). A preliminary
analysis of adaptive responding under open and closed economies.
Journal of Applied Behavior Analysis, 38(3), 335–348.

Roane, H. S., Falcomata, T. S., & Fisher, W. W. (2007). Applying the
behavioral economics principle of unit price to DRO schedule
thinning. Journal of Applied Behavior Analysis, 40(3), 529–534.

Roane, H. S., Vollmer, T. R., Ringdahl, J. E., & Marcus, B. A. (1998).
Evaluation of a brief stimulus preference assessment. Journal of
Applied Behavior Analysis, 31(4), 605–620.

Shahan, T. A. (2010). Conditioned reinforcement and response
strength. Journal of the Experimental Analysis of Behavior, 93,
269–289.

Skinner, B. F. (1974). About behaviorism. New York: Knopf.
Sran, S. K., & Borrero, J. C. (2010). Assessing the value of choice

in a token system. Journal of Applied Behavior Analysis, 43(3),
553–557.

Vollmer, T. R., & Iwata, B. A. (1991). Establishing operations and rein-
forcement effects. Journal of Applied Behavior Analysis, 24(2),
279–291.

Webbe, F. W., & Malagodi, E. F. (1978). Second-order schedules of
token reinforcement: Comparisons on performance under fixed-
ratio and variable-ratio exchange schedules. Journal of Experi-
mental Analysis of Behavior, 30, 219–224.

Weiner, H. (1962). Some effects of response cost upon human oper-
ant behavior. Journal of the Experimental Analysis of Behavior,
5(2), 201–208.

Williams, B. A. (1994). Conditioned reinforcement: Experimental and
theoretical issues. The Behavior Analyst, 17, 261–285.

Williams, B. A., & Dunn, R. (1991). Preference for conditioned rein-
forcement. Journal of the Experimental Analysis of Behavior, 55,
37–46.

Wolfe, J. B. (1936). Effectiveness of token-rewards for chimpanzees.
Comparative Psychology Monographs, 12, 1–72.

Yankelevitz, R. L., Bullock, C. E., & Hackenberg, T. D. (2008). Rein-
forcer accumulation in a token reinforcement context. Journal of
the Experimental Analysis of Behavior, 90, 283–299.

  • Token Economies: Using Basic Experimental Research to Guide Practical Applications
  • Abstract
    Introduction
    Conditioning Tokens as Effective Reinforcers
    Barrier: Insufficient Quality of Backup Reinforcers
    Basic Experimental Research
    In Applied Settings
    Barrier: Insufficient or Inconsistent Pairing
    Basic Experimental Research
    In Applied Settings
    Barrier: Overcoming Problematic Effects of Motivating Operations
    Basic Experimental Research
    In Applied Settings
    Barrier: Difficulty Shaping the Exchange Response
    Basic Experimental Research
    In Applied Settings

    Acknowledging and Investigating First- and Second-Order Schedules of Reinforcement
    Barrier: Appropriately Adjusting Local Contingencies of the Token-Production Schedule
    Basic Experimental Research
    In Applied Settings
    Barrier: Unforeseen Effects of the Exchange-Production Schedule
    Basic Experimental Research
    In Applied Settings
    Barrier: Overcoming Ratio Strain
    Basic Experimental Research
    In Applied Settings
    Barrier: Adjusting Prices of Backup Reinforcers
    Basic Experimental Research
    In Applied Settings
    Barrier: Response Cost and Reducing Inappropriate Behavior
    Basic Experimental Research
    In Applied Settings

    Conclusion
    References

O R I G I N A L P A P E R

Effects of a Perseverative Interest-Based Token
Economy on Challenging and On-Task Behavior
in a Child with Autism

Amarie Carnett • Tracy Raulston • Russell Lang •

Amy Tostanoski • Allyson Lee • Jeff Sigafoos •

Wendy Machalicek

Published online: 19 March 2014

� Springer Science+Business Media New York 2014

Abstract We compared the effects of a token economy intervention that either did
or did not include the perseverative interests of a 7-year-old boy with autism. An

alternating treatment design revealed that the perseverative interest-based tokens

were more effective at decreasing challenging behavior and increasing on-task

behavior than tokens absent the perseverative interest during an early literacy

activity. The beneficial effects were then replicated in the child’s classroom. The

results suggest that perseverative interest-based tokens might enhance the effec-

tiveness of interventions based on token economies.

Keywords Autism � Perseverative interest � Token economy � Challenging
behavior � Alternating treatment design

A. Carnett (&) � J. Sigafoos
Victoria University of Wellington, Karori Campus, PO Box 17-310, Wellington, New Zealand

e-mail: Amarie.Carnett@vuw.ac.nz

T. Raulston � W. Machalicek
University of Oregon, Eugene, OR, USA

R. Lang � A. Lee
Clinic for Autism Research Evaluation and Support, Texas State University, San Marcos,

TX, USA

R. Lang

Meadows Center for the Prevention of Educational Risk, The University of Texas at Austin, Austin,

TX, USA

A. Tostanoski

Vanderbilt University, Nashville, TN, USA

123

J Behav Educ (2014) 23:368–377

DOI 10.1007/s10864-014-9195-7

Introduction

Token economy interventions involve delivering small tangibles (e.g., tokens)

contingent on the presence or absence of target behaviors and then providing an

opportunity to exchange a preset number of these tokens for backup reinforcers.

Previous research has demonstrated that behaviors can be established, decreased,

and/or maintained using token economy systems (Hackenberg 2009; Matson and

Boisjoli 2009). Research has also investigated several variations of this intervention

including the use of a response cost (i.e., losing tokens for inappropriate behavior),

pairing tokens with praise, and delivering tokens on a variety of intermittent

reinforcement schedules. These variables have been shown to influence the

effectiveness of token economy interventions in some cases (Maggin et al. 2011;

Matson and Boisjoli 2009; Mottram and Berger-Gross 2004).

One aspect of the token economy that has received relatively little attention is the

token itself. Traditionally, tokens are considered to be neutral stimuli (e.g., tickets)

that gain reinforcing power by being paired with the backup reinforcers. Charlop-

Christy and Haymes (1998) investigated the effectiveness of incorporating the

idiosyncratic perseverative interests of children with autism within tokens in an

effort to increase the reinforcing power of the token. Charlop-Christy and Haymes

(1998) defined such intense interests as preoccupations or obsessions that an

individual continually seeks. Results from that study indicated that making use of

tokens that reflected the child’s perseverative interests (e.g., using a small picture of

a train as a token for a child who had a perseverative interest in trains) improved

intervention outcomes. To date, this appears to be the only study to have

demonstrated the potential value of individualizing tokens based on a child’s

perseverative interest.

The purpose of this current study was to replicate and extend the work of

Charlop-Christy and Haymes (1998). Specifically, we compared the effects of a

token economy intervention that either did or did not make use of tokens that

reflected a child’s perseverative interest. We examined the effects of this

manipulation on the challenging and on-task behavior of a 7-year-old boy with

autism during an early literacy activity in a public school special education

classroom and an

inclusion classroom.

Method

Participant, Setting, and Materials

Troy was a 7-year-old boy who had been diagnosed with autism. He resided at home

with his father, mother, and three older siblings and attended a local public school.

He scored a 31 on the Childhood Autism Rating Scale (CARS; Schopler et al.

1980), which is indicative of mild-moderate autistic symptoms, and a 99 on the

Behavior Assessment System for Children-II, which indicates an overall clinically

significant range (BASC-II; Reynolds and Kamphaus 2004). Troy spent the majority

of his school day in a special education life skills classroom with four to eight other

J Behav Educ (2014) 23:368–377 369

123

children with developmental disabilities, a special education teacher, and a teaching

assistant. Troy’s individualized education plan (IEP) called for him to spend 1 h of

his school day included in activities with students without disabilities. However,

Troy’s challenging behavior (i.e., screaming, falling, and/or lying on the floor)

occurred too frequently to be acceptable in an inclusion classroom (i.e., a classroom

with a combination of students with and without disability).

The Questions About Behavior Function (QABF) Scale (Matson et al. 2012)

suggested that Troy’s challenging behavior was maintained by escape from

demands. As a result, the inclusion time specified in his IEP was met by

nonacademic activities with fewer demands (e.g., lunch, recess). Troy’s school

counselor referred him to this study in an effort to identify a strategy that could be

used to increase Troy’s inclusion during academic instruction in the general

education classroom. Additionally, Troy had previous experience in using a

traditional token economy within a discrete-trial format, and thus did not require

additional training to use the token economy system for this study.

The baseline and intervention sessions were conducted in Troy’s life skills

classroom and in his inclusion classroom. A video camera on a tripod was used to

record the participant and the researcher during all sessions. The inclusion

classroom included one teacher, 14 students without disabilities, two students with

learning disabilities who spent 100 % of their time in that classroom, and two

students with developmental disabilities who divided their time between the

inclusion and life skills classrooms. Both classrooms had a regularly scheduled early

literacy activity, which lasted 10–12 min and occurred three or four times per week.

During the activity, the teacher sat in a chair and read a story to the children as they

sat on the carpet with a teacher assistant and researcher observing. The children

were expected to sit quietly, look at the teacher or book, listen, and answer

occasional reading comprehension questions.

Response Measurement and Interobserver Agreement

Data were collected on Troy’s challenging behavior and on-task behavior.

Challenging behavior was defined as screaming (i.e., loud vocalizations lasting

3 s or more that were considered disruptive in the classroom), falling, and/or lying

on the ground (i.e., collapsing head and body to the ground). Screaming and falling

often occurred in tandem, and the QABF suggested both were maintained by escape

from demands, so these two topographies, whether they occurred alone or in

combination, were recorded as challenging behavior. On-task behavior was defined

as sitting with buttocks on the ground, head oriented toward the teacher, and having

an absence of challenging behavior. Challenging behavior was scored using 10-s

partial interval recording, and on-task behavior was scored using 10-s whole-

interval recording (Kennedy 2005). The on-task behaviors were selected due to their

incompatibility with Troy’s challenging behavior; thus, challenging behavior and

on-task behavior could not be scored in the same interval. Interval data were

converted to a percentage by dividing the number of intervals with each dependent

variable by the total number of intervals, then multiplying by 100 to convert into a

percentage.

370 J Behav Educ (2014) 23:368–377

123

Data on interobserver agreement (IOA) were collected from videos for both

dependent variables during 30 % of the baseline and intervention sessions by two

trained independent coders. IOA was calculated by dividing the number of intervals

with agreement (i.e., both data collectors scored the presence or absence of

challenging behavior/on-task behavior for the interval) by the total number of

intervals (i.e., agreements plus disagreements), then multiplying by 100 to convert

into a percentage. Mean agreement for both dependent variables was 98.5 % (range

95–100 %).

Treatment integrity was assessed for 30 % of the sessions. A procedural checklist

of intervention procedures (available upon request) was used to record the accuracy

of intervention implementation. The mean of treatment integrity was 96.9 % (range

84.6–100 %).

Procedure

Research Design

The two token economy interventions (i.e., with and without embedded persever-

ative interests) were compared using alternating treatments with an initial baseline

design (Gast 2010). The alternating treatments phase was conducted in the life skills

classroom, and the intervention was implemented by the researcher.

Generalization

from the life skills classroom to the inclusion classroom was assessed by conducting

a probe in the inclusion classroom during baseline and by adding a third phase, best-

treatment phase, in which the intervention associated with less challenging behavior

and more on-task behavior was implemented in the inclusion classroom (Gast

2010). Across all phases of the study, the following conditions were held constant:

(a) session duration (10 min), (b) time of day when sessions were conducted, (c) the

types of backup reinforcers that were available, (d) the number and timing of

opportunities to exchange tokens for the backup reinforcers, and (e) the reading

level of the stories. During the intervention and generalization phases, the reading

activity was led by the classroom teacher. A teaching assistant was also present, and

the researcher implemented the intervention.

Baseline

Four of the five baseline sessions were conducted in the life skills classroom. Due to

high rates of challenging behavior, only one baseline session was conducted in the

inclusion classroom. The duration of the reading activity was always between 1

0

and 12 min. To keep session duration constant, data were recorded during the first

10 min only. During baseline, all teachers and assistants were told to conduct the

reading activity as they would normally. During baseline, the teachers in both

classrooms verbally prompted on-task behavior (e.g., ‘‘Troy, please be quiet and sit

up.’’), provided praise contingent upon on-task behavior, and occasionally ignored

challenging behavior or delivered a mild reprimand (e.g., ‘‘Troy, stop that.’’).

However, none of these components were consistently implemented, and despite

this effort, the participant’s challenging behavior had persisted for over 6 months.

J Behav Educ (2014) 23:368–377 371

123

Preference and Backup Reinforcers

Backup reinforcers were selected by first asking Troy’s teachers to identify potential

reinforcers that would be appropriate in their classrooms. The teachers suggested

small edibles (e.g., bite-sized candy or cracker) because they were inexpensive and

could be consumed quickly without causing distraction. A pairwise preference

assessment was then conducted to identify preference of bite-sized edibles (Fisher

et al. 1992). Prior to each session, Troy selected a backup reinforcer from his top

three preferences (i.e., M&M, fruit snack, and chip). The researcher reviewed on-

task behaviors with Troy using a visual support that included pictures and words of

targeted on-task behaviors (i.e., sitting down, staying quiet, and looking at the

teacher) prior to the start of all sessions. The visual support remained present and

was used to redirect challenging behavior if it occurred, at the end of each 20-s

interval (i.e., the researcher pointed to the picture that represented the desired

behavior instead of delivering a token) throughout each session.

Token Economy without Perseverative Interest

The token economy system that did not include Troy’s perseverative interest used

pennies with a small patch of Velcro

on the back that could be fastened to a token

board. Penny tokens were delivered by the researcher sitting near Troy, contingent

on 20-s of consecutive on-task behavior. A maximum of 30 tokens per 10 min

session could be earned. Backup reinforcers (i.e., bite-sized candy) could be

obtained for every 10 tokens earned, and an opportunity to exchange was presented

within sessions at each moment in which Troy had earned 10 tokens. For data

collection purposes, the exchanges were coded as on-task behavior. The token board

included circles drawn in groups of 10 as a visual representation of the number of

tokens needed to earn a backup reinforcer. Upon earning a token for targeted on-

task behaviors, Troy was handed a token to place on the board (also coded as on-

task).

Token Economy with Perseverative Interests

The token economy system used in this condition differed from the previously

described condition in that the pennies and token board were replaced by tokens and

a board related to Troy’s perseverative interest in jigsaw puzzles. Specifically, the

tokens were small foam puzzle pieces, and the token board was a thin cardstock

frame into which the pieces fit. This token board mirrored the traditional token

economy, in that it included 10 outlined locations for each puzzle piece. The same

procedures, response requirements, exchange rate, and backup reinforcers were used

in both token economy conditions (i.e., with and without perseverative interests).

Troy’s perseverative interest in puzzles was determined by interviews with teachers

and a free operant preference assessment, in which a puzzle was made available

alongside other toys and activity options (Roane et al. 1998). All of Troy’s teachers

agreed that he perseverated on a specific puzzle, and he devoted 100 % of his time

in the free operant preference assessment touching, holding, and manipulating the

372 J Behav Educ (2014) 23:368–377

123

puzzle pieces. Further, Troy always selected this specific puzzle when other puzzles

were available.

Generalization

Troy’s behavior during the group reading activity in the inclusion classroom was

measured in one baseline session using the same procedures as the other four

baseline sessions conducted in the life skills classroom. In the final best-treatment

phase of the study, three sessions were conducted in the inclusion classroom using

the perseverative interest token economy system.

Results

The top panel of Fig. 1 displays the percentage of intervals during which Troy

engaged in on-task behavior during the entire 10-s interval. During baseline, Troy

was on-task in the life skills classroom for a mean of 11 % of the intervals (range

8–18 %). In the inclusion classroom, he was on-task during 13 % of the intervals.

During the alternating treatment phase, both token economy interventions resulted

in an increase in on-task behavior relative to baseline. However, Troy was on-task

more often during the perseverative interest token economy condition

(M = 59.7 %, range 48–70 %) than in the token economy condition that did not

involve tokens reflecting his perseverative interest (M = 45 %, range 32–55 %).

The increase in on-task behavior in the perseverative interest condition was then

replicated during the final best-treatment phase in the inclusion classroom

(M = 64 %, range 52–72 %).

The bottom panel of Fig. 1 displays the percentage of 10-s interval during which

at least one instance of challenging behavior occurred. During baseline conditions in

the life skills classroom, challenging behavior occurred during a mean of 89 % of

intervals (range 82–92 %) and in 87 % of intervals in the inclusion classroom.

Challenging behavior decreased from baseline levels in both token economy

conditions; however, a lower percentage of intervals had challenging behavior in the

perseverative interest condition (M = 40 %, range 30–52 %) compared with the

condition where the token did not coincide with Troy’s perseverative interests

(M = 55 %, range 45–68 %). The reduction in challenging behavior in the

perseverative interest condition in the life skills classroom was replicated during the

final best-treatment phase in the inclusion classroom (M = 36 %, range 28–48 %).

Discussion

The results of this study replicate previous research demonstrating the utility of

token economy interventions for children with autism (Matson and Boisjoli 2009)

because both token economy interventions (i.e., with and without the perseverative

interest) resulted in decreased challenging behavior and increased on-task behavior.

Further, the superiority of the condition involving tokens reflecting Troy’s

J Behav Educ (2014) 23:368–377 373

123

perseverative interest is consistent with the findings of Charlop-Christy and Haymes

(1998). Finally, these data extend previous research by demonstrating the benefit of

interest-based tokens in a special education classroom with generalization to an

inclusion classroom.

The perseverative interests inherent to an autism spectrum disorder (ASD)

diagnosis often impede appropriate classroom behavior and learning (e.g., Rispoli

et al. 2011; Lang et al. 2010) and can be associated with serious challenging

behavior (e.g., Hausman et al. 2009; Matson et al. 2009). Thus, interventions have

primarily sought to address challenging behavior associated with such restricted and

repetitive behaviors and interests (RRBI) with antecedent manipulations to enrich

0

10

20

30

40

50

60

70

80

90

100

P
er

ce
nt

ag
e

of
1

0-
s

W
ho

le
I

nt
er

va
ls

w

it
h

O
n-

T
as

k
B

eh
av

io
r

Token Economy Comparison in Baseline PI Token Economy

in Inclusion

Life Skills
Classroom

With PI

Without

0
10
20
30
40
50
60
70
80
90
100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

P
er
ce
nt
ag
e
of
1
0-
s

P
ar

ti
al

I
nt

er
va

ls
w

it
h

C
ha

ll
en

gi
ng

B
eh

av
io

r Inclusion
Classroom

Probe

Sessions

Fig. 1 The top panel displays the percentage of 10-s whole interval during which Troy was on-task, and
the bottom panel displays the percentage of 10-s partial interval during which Troy engaged in
challenging behavior. The closed circles represent baseline in the life skills classroom, triangles represent
the inclusion classroom, open diamonds represent the token economy without the perseverative interest
(PI), and closed squares represent the token economy with the PI

374 J Behav Educ (2014) 23:368–377

123

the environment and prevent challenging behaviors, and consequence-based

interventions that involve interrupting the repetitive behavior (see Boyd et al.

2012 for a recent review). However, other researchers have demonstrated the utility

of capitalizing on perseverate interests by incorporating them into the intervention

procedures or making access to RRBI contingent on targeted appropriate behavior

or the absence of target challenging behavior (Baker et al. 1998; Charlop-Christy

and Haymes 1996, 1998; Vismara and Lyons 2011). This study, considered in

tandem with Charlop-Christy and Haymes (1998), suggests idiosyncratic persev-

erative interests can be utilized to improve intervention efficiency and effectiveness.

The putative mechanism of action responsible for the enhanced effectiveness is

likely the increased reinforcing value of the token itself. Compared with the use of

neutral stimuli as tokens, the reinforcement from perseverative interest-based tokens

may be more immediate, and thus more efficient, than relying only on the

reinforcing power of the backup edibles that were available only after a number of

tokens had been earned and exchanged. Although Troy was always willing to

exchange 10 tokens for the backup reinforcers, it is possible that some children may

value the perseverative interest-based tokens more than backup reinforcers. In such

cases, challenging behavior maintained by continued access to preferred tangibles

might be occasioned when the child is asked to exchange the high preferred token

for a less preferred item. Practitioners using this approach are therefore cautioned to

consider the reinforcing value of the perseverative interest token relative to the

backup reinforcers. If challenging behavior is observed during the exchange, it may

be preferable to use neutral stimuli as tokens or to merely use the preservative

interest tokens alone without additional backup reinforcers. As part of a larger effort

to better incorporate the characteristics of children with autism into intervention

approaches with the goal of improving educational outcomes, future research

designed to elucidate and then potentially address such a limitation remains

warranted.

These findings buttress the evidence supporting the use of token economy

systems with this population and align with the perspective that circumscribed

interests can be a unique strength of individuals with high-functioning ASD

(Mercier et al. 2000). Nevertheless, when children with ASD perseverate to the

exclusion of other activities, such RRBIs significantly restrict their social and

learning opportunities (Pierce and Courchesne 2001; Koegel et al. 1974; Lovaas

et al. 1971). Research could continue to investigate the effects of embedding

perseverative interests into other interventions, such as video modeling. However, it

is possible that the use of perseverative interests in this way may inadvertently lead

to a counterproductive increase in fascination with the perseverative interest.

Although we are not aware of this issue having been reported in previous research, it

would seem a plausible potential limitation that should be investigated as research

in this area continues.

The results of this current study should be considered in light of a few limitations.

First, we selected an alternating treatment design because teachers expressed

concern regarding a reversal to baseline conditions. Although this design facilitated

implementation in an applied setting, the lack of a reversal phase introduced the

potential of carryover effects. Second, to identify Troy’s perseverative interest, we

J Behav Educ (2014) 23:368–377 375

123

utilized teacher reports and a free operant preference assessment, which did not

capture a hierarchy of reinforcers (Roane et al. 1998). Further, there is not a well-

established procedure for distinguishing between high preferred stimuli and the

level of fascination indicative of a true perseverative interest, and our assertion that

puzzles were indeed a perseverative interest should be considered with caution. It is

possible that puzzles were merely highly preferred. Future research should further

investigate reinforcement hierarchies to determine more precise ways of identifying

perseverative interests. Third, the visual cues utilized to prompt on-task behavior

and redirect challenging behavior, although held constant in all intervention and

generalization sessions, were not evaluated as a separate intervention component.

Thus, we are uncertain to what degree they may have contributed to the effects on

the dependent variables. Finally, because on-task behavior increased and challeng-

ing behavior decreased with the use of both token systems, assessing the value of

the perseverative interest token proved difficult. It is possible that the effectiveness

of both systems might approach equivalence over time if challenging behavior

continued to decrease. Thus, future research should investigate the effects of

extended use of the two systems, as well as the effects of systematic fading

procedures of the embedded token system.

References

Baker, M. J., Koegel, R. L., & Koegel, L. K. (1998). Increasing the social behavior of young children with

autism using their obsessive behaviors. The Journal of the Association for Persons with Severe

Handicaps, 23, 300–308. doi:10.2511/rpsd.23.4.300.

Boyd, B. A., McDonough, S. G., & Bodfish, J. W. (2012). Evidence-based behavioral interventions for

repetitive behaviors in autism. Journal of Autism and Developmental Disorders, 42, 1236–1248.

doi:10.1007/s10803-011-1284-z.

Charlop-Christy, M. H., & Haymes, L. K. (1996). Using obsessions as reinforcers with and without mild

reductive procedures to decrease inappropriate behaviors of children with autism. Journal of Autism

and Developmental Disorders, 26, 527–545.

Charlop-Christy, M. H., & Haymes, L. K. (1998). Using objects of obsession as token reinforcers for

children with autism. Journal of Autism and Developmental Disorders, 28, 189–198. doi:10.1023/A:

1026061220171.

Fisher, W. W., Piazza, C. C., Bowman, L. G., Hagopian, L. P., Owens, J. C., & Slevin, I. (1992). A

comparison of two approaches for identifying reinforcers for persons with severe and profound

disabilities. Journal of Applied Behavior Analysis, 25, 491–498. doi:10.1901/jaba.1992.25-491.

Gast, D. L. (2010). Single subject research methodology in behavioral sciences. New York: Routledge.

Hackenberg, T. D. (2009). Token reinforcement: A review and analysis. Journal of the Experimental

Analysis of Behavior, 91, 257–286. doi:10.1901/jeab.2009.91-257.

Hausman, N., Kahng, S., Farrell, E., & Mongeon, C. (2009). Idiosyncratic functions: Severe problem

behavior maintained by access to ritualistic behaviors. Education and Treatment of Children, 32,

77–87.

Kennedy, C. (2005). Single-case designs for educational research. Boston: Pearson Education Inc.

Koegel, R. L., Firestone, P. B., Kramme, K. W., & Dunlap, G. (1974). Increasing spontaneous play by

suppressing self-stimulation in autistic children. Journal of Applied Behavior Analysis, 7, 521–528.

Lang, R., Regester, A., Rispoli, M., & Camargo, S. H. (2010). Rehabilitation issues for children with

autism spectrum disorders. Developmental Neurorehabilitation, 13, 153–155.

Lovaas, O. I., Schreibman, L., Koegel, R., & Rehm, R. (1971). Selective responding by autistic children

to multiple sensory input. Journal of Abnormal Psychology, 77(3), 211–222.

376 J Behav Educ (2014) 23:368–377

123

http://dx.doi.org/10.2511/rpsd.23.4.300

http://dx.doi.org/10.1007/s10803-011-1284-z

http://dx.doi.org/10.1023/A:1026061220171

http://dx.doi.org/10.1023/A:1026061220171

http://dx.doi.org/10.1901/jaba.1992.25-491

http://dx.doi.org/10.1901/jeab.2009.91-257

Maggin, D. M., Chafouleas, S. M., Goddard, K. M., & Johnson, A. H. (2011). A systematic evaluation of

token economies as a classroom management tool for students with challenging behavior. Journal of

School Psychology, 49, 529–554. doi:10.1016/j.jsp.2011.05.

001.

Matson, J. L., & Boisjoli, J. A. (2009). The token economy for children with intellectual disability and/or

autism: A review. Research in Developmental Disabilities, 30, 240–248. doi:10.1016/j.ridd.2008.04.

001.

Matson, J. L., Dempsey, T., & Fodstad, J. C. (2009). Stereotypies and repetitive/restrictive behaviours in

infants with autism and pervasive developmental disorder. Developmental Neurorehabilitation, 12,

122–127.

Matson, J. L., Tureck, K., & Rieske, R. (2012). The questions about behavioral function (QABF): Current

status as a method of functional assessment. Research in Developmental Disabilities, 33, 630–634.

doi:10.1016/j.ridd.2011.11.006.

Mercier, C., Mottron, L., & Belleville, S. (2000). A psychosocial study on restricted interests in high-

functioning persons with pervasive developmental disorders. Autism, 4(4), 29–46.

Mottram, L., & Berger-Gross, P. (2004). An intervention to reduce disruptive behaviours in children with

brain injury. Developmental Neurorehabilitation, 7, 133–143.

Pierce, K., & Courchesne, E. (2001). Evidence for a cerebellar role in reduced exploration and

stereotyped behavior in autism. Biological Psychiatry, 49, 655–664.

Reynolds, C. R., & Kamphaus, R. W. (2004). BASC-II: Behavior assessment system for children (2nd

ed.). Bloomington, MN: Pearson Assessments.

Rispoli, M. J., O’Reilly, M. F., Lang, R., Machalicek, W., Davis, T., Lancioni, G., et al. (2011). Effects of

motivating operations on aberrant behavior and academic engagement for two students with autism.

Journal of Applied Behavior Analysis, 44, 187–192.

Roane, H. S., Vollmer, T. R., Ringdahl, J. E., & Marcus, B. A. (1998). Evaluation of a brief stimulus

preference assessment. Journal of Applied Behavior Analysis, 31, 605–620. doi:10.1901/jaba.1998.

31-605.

Schopler, E., Reichler, R. J., Devellis, R. F., & Daly, K. (1980). Toward an objective classification of

childhood autism: Childhood autism rating scale (CARS). Journal of Autism and Developmental

Disabilities, 10, 91–103. doi:10.1007/BF02408436.

Vismara, L. A., & Lyons, G. L. (2011). Using perseverative interest to elicit joint attention behaviors in

young children with autism: Theoretical and clinical implications for understanding motivation.

Journal of Positive Behavioral Interventions, 9(4), 214–228. doi:10.1177/10983007070090040401.

J Behav Educ (2014) 23:368–377 377

123

http://dx.doi.org/10.1016/j.jsp.2011.05.001

http://dx.doi.org/10.1016/j.ridd.2008.04.001

http://dx.doi.org/10.1016/j.ridd.2008.04.001

http://dx.doi.org/10.1016/j.ridd.2011.11.006

http://dx.doi.org/10.1901/jaba.1998.31-605

http://dx.doi.org/10.1901/jaba.1998.31-605

http://dx.doi.org/10.1007/BF02408436

http://dx.doi.org/10.1177/10983007070090040401

Copyright of Journal of Behavioral Education is the property of Springer Science & Business
Media B.V. and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder’s express written permission. However, users may print,
download, or email articles for individual use.

  • Effects of a Perseverative Interest-Based Token Economy on Challenging and On-Task Behavior in a Child with Autism
  • Abstract
    Introduction
    Method
    Participant, Setting, and Materials
    Response Measurement and Interobserver Agreement
    Procedure
    Research Design
    Baseline
    Preference and Backup Reinforcers
    Token Economy without Perseverative Interest
    Token Economy with Perseverative Interests
    Generalization

    Results
    Discussion
    References

131

The Token Economy: A Recent Review and Evaluation

Christopher Doll
1
; T. F. McLaughlin

2
; Anjali Barretto

3

1
Gonzaga University, East 502 Boone Avenue, Spokane, WA 99258-0025, USA

cdoll2@zagmail.gonzaga.edu

2
Gonzaga University, East 502 Boone Avenue, Spokane, WA 99258-0025, USA

mclaughlin@gonzaga.edu

3
Gonzaga University, East 502 Boone Avenue, Spokane, WA 99258-0025, USA

barretto@gonzage.edu

Abstract – This article presents a recent and inclusive review of the use of token

economies in various environments (schools, home, etc.). Digital and manual

searches were carried using the following databases: Google Scholar, Psych Info

(EBSCO), and The Web of Knowledge. The search terms included: token economy,

token systems, token reinforcement, behavior modification, classroom management,

operant conditioning, animal behavior, token literature reviews, and token

economy concerns. The criteria for inclusion were studies that implemented token

economies in settings where academics were assessed. Token economies have been

extensively implemented and evaluated in the past. Few articles in the peer-

reviewed literature were found being published recently. While token economy

reviews have occurred historically (Kazdin, 1972, 1977, 1982), there has been no

recent overview of the research. During the previous several years, token

economies in relation to certain disorders have been analyzed and reviewed;

however, a recent review of token economies as a field of study has not been

carried out. The purpose of this literature review was to produce a recent review

and evaluation on the research of token economies across settings.

Key Words – Digital Search; Future Research; Literature Review; Research;

Token Programs

1 Introduction

This article presents a recent and inclusive review of the use of token economies in various settings.

Digital and manual searches were carried using the following databases: Google Scholar, Psych Info

(EBSCO), and The Web of Knowledge. The search terms included: token economy, token systems,

token reinforcement, behavior modification, classroom management, operant conditioning, animal

behavior, token literature reviews, and token economy concerns. The criteria for inclusion were studies

that implemented token economies in settings where academics were assessed.

International Journal of Basic and Applied Science,

Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

132 Insan Akademika Publications

2 History of Token Systems

Token systems, in one form or another, have been used for centuries and have evolved notably to

systems used today. Clay coins, which people could earn and exchange for goods and services, in the

early agricultural societies were part of the transition from simple barter systems to more complex

economies (Schmandt-Besserat, 1992). Before that, however, incentives- based structures were

created and sustained in a variety of cultures and as part of many institutions within those cultures.

Governments used the influencing abilities of rewards to shape behaviors in battle and throughout

society. Rewards have ranged from tangible prizes to socially significant titles (Doolittle, 1865;

Duran, 1964; Grant, 1967). During the first century, Grant (1967) explained that accomplishments of

gladiators were rewarded with property, prizes, and crowns. Carcopino (1940) described charioteers

in Rome during that same time being rewarded with their freedom after repeated victories. In ancient

China, soldiers received colored peacock feathers for bravery in battle (Doolittle, 1865). Several

military institutions in ancient civilizations utilized these systems of merit and rewards to incentivize

behavior. From the Aztecs in the 15
th
century (Duran, 1964), as well as the militaries of modern times,

the use of titles of distinction and medals to reward actions were common methods to promote certain

types of behavior, or responses. Modern research peaked in the 1970‟s where there was substantial

study surrounding psychiatry, clinical psychology, education, and mental health fields

(Kazdin, 1977).

Token economy systems have also been employed to modify animal behavior (Addessi, Mancini,

Crescimbene, & Visalberghi, 2011; Malagodi, 1967; Sousa, Matsuzawa, 2001). Malagodi‟s (1967)

study involving rats established a mechanism of exchange between marbles, which the rats earned

through a dispenser, and an edible primary reinforcer. In that study, token reinforcement under fixed

and variable interval schedules were shown to be as effective as the edible primary reinforcer to

increase lever pressing. In another study, Wolf (1936) compared the effectiveness of exchangeable

tokens, nonexchangeable tokens, and food to find that exchangeable tokens and food were comparable

in reinforcing ability. These studies clearly show that tokens, when paired with a primary reinforcer

are effective at modifying certain behaviors in animal subjects. Cowles (1937) found similar results

with exchangeable tokens when he taught chimpanzees new learning tasks. In Sousa and Matsuzawa‟s

(2001) study, not only did chimpanzees perform similarly with tokens as they did with direct food

rewards, but the researchers found that chimpanzees were able to collect and save several tokens

before exchanging them.

The military as well as mental health and educational facilities have increased their use of incentives

to shape behavior. Tangible items given as rewards evolved to tokens which could be exchanged for

certain privileges and rewards. This evolution of the token economy was a catalyst for increasingly

novel and diverse utilization of token-reinforcement systems. One example of how token systems

have been applied in an institutional setting was Alexander Maconochie‟s “Mark System”

implemented with a prison population during the 1840‟s (Kazdin, 1977). This token-based system

improved the conditions under which many prisoners lived; furthermore, it attempted to create an

incentive-driven system to reward positive behavior rather than give aversive consequences to

prisoners. Within this “Mark System,” sentences were converted to “marks” and the prisoners sought

to reduce these “marks,” or tokens, through good behavior within the prison system. Upon reaching a

certain level of tokens, the prisoner could then be released. The prisoners exchanged their tokens for

necessary items such as food, shelter, and clothes (Kazdin, 1977). A variation of the token economy

under Maconochie was the inclusion of a response cost component where negative or institutionally-

labeled aberrant behaviors resulted in the withdrawal of “marks.” Unique approaches such as the

Mark System have helped evolve the reward and cost structures resulting in “serious achievements in

reform, rehabilitation, and token economies” (Kazdin, 1977).

Doll, et. al. International Journal of Basic and Applied Science,

Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 133

3 Early History of Token Systems in the Schools

3.1 Token, tracking, exchange

Educational systems have employed token economies as a means to manage students for several

decades (Kazdin, 1982). The need to educate large numbers of children and the demand for

meaningful education helped to evolve the application of these token-based systems. As noted

previously, titles of distinction as well as tangible property have all been used to incentivize

individuals and their behavior. In schools, a variety of incentives have acted and continue to serve as

the rewards earned for certain defined target behaviors (Boniecki & Moore, 2003; Lolich,

McLaughlin, & Weber, 2012; McLaughlin & Malaby, 1975). As early as the 7
th
century, a monk in

Southern Europe gave out biscuits of leftover dough, also known as “petriolas” or “little rewards,” to

give to children who learned their prayers (Kazdin, 1977). Later on in the 1100‟s, Birnbaum (1962)

noted that using rewards such as nuts, figs, and honey were commonly implemented by educators as

incentives for learning. In the 16
th
century, Skinner (1966) described instances where fruit and cake

was advocated by Erasmus in order to help children learn Greek and Latin.

Within the past several centuries, the modern forms of the token economy have been increasingly used

in the education of society. Two of those systems came to the United States during the 1800‟s. Joseph

Lancaster‟s “Monitorial System” originated in England in the early part of the century and came to

New York in 1805. This system, when implemented in New York schools, contained a more explicit

use of tokens and of response cost. More-able peers were “Monitors” for less-able peers and each

skill-group was awarded different sets of privileges and prizes, based on level. The Monitorial System

allowed for the creation of helper teachers which allowed for the teaching of large numbers of

students. The solution to this problem of larger classes helped to spread this program across the

nation. A second system, Excelsior, established itself during the latter part of the 1800‟s when the

United States was experiencing significant growth in the use of token economies (Kazdin, 1977). This

system consisted of giving out “Excellent(s)” and “Perfect(s)” designations to students for pro-social

and pro-academic behaviors. These “Excellents” and “Perfects” were exchanged for “Merits,” which

in turn were saved and exchanged for a special certificate from the teacher attesting to great

performance. In both of these systems, prizes and rewards acted to make the token more powerful in

affecting behavior. Furthermore, in both of these token-reinforcement systems, back-up reinforcers

and prizes were integral in their setups and sustainment.

3.2 Definition of a Token System

Token economies have been extensively researched throughout the last several decades and applied in

a variety of settings. Teachers and caretakers have used these systems in general education, special

education, and community-based settings. Because of the variety of token-based systems and the ease

at which teachers can implement them, token economies are widely used across the nation.

The behavioral principles employed in token systems are based primarily upon the concept of operant

conditioning (Kazdin, 1977; McLaughlin & Williams, 1988). Within a token economy, tokens are

most often a neutral stimulus in the form of “points” or tangible items that are awarded to economy

participants for target behaviors. In a token-reinforcement system, the neutral token is repeatedly

presented alongside or immediately before the reinforcing stimulus. That stimulus may be a variation

of edibles, privileges, or other incentives. By performing this process of repeating presentations of

neutral tokens before the reinforcing stimulus, the neutral token becomes the reinforcing entity. As the

participants in the token experience the pairing of token and a previously reinforcing items, the token

International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

134 Insan Akademika Publications

itself may acquire reinforcing properties as a result. The token economy gains its utility and power to

modify behavior when the neutral tokens become secondary reinforcers. The effectiveness of this

process has been noted by Miller and Drennen (1970). They demonstrated that when praise is a

neutral stimulus, it could become a conditioned reinforcer through pairing it with another reinforcing

event.

3.2.1 Target behaviors of token economies

A token economy is often implemented because there are target behaviors that teachers would like to

increase or reduce. These behaviors must be identified by those who work in such classrooms.

Changes in these target behaviors often improve the classroom-learning environment or the needs for

that specific institution. Token economies can be used to minimize disruptions in a classroom as well

as increase student academic responding. This can depend on the classroom and the priorities of the

teacher. However, most teachers employ a token system to manage both academic and social

behaviors

(McLaughlin & Williams, 1988).

In a token economy it is important to clearly outline the target behaviors for the students as well as the

teacher (Kazdin, 1977). When a teacher is first implementing a token-reinforcement system it has

been recommended that desired behaviors are orally communicated, written down, or otherwise

clearly explained or modeled to the participants (Alberto & Troutman, 2012; McLaughlin & Williams,

1988). This communication with the participants is crucial and directly related to the effectiveness and

efficiency of the system (Alberto & Troutman, 2012; Cooper, Heron, & Heward, 2007).

3.2.2 Tokens

In order to establish and sustain a token economy system there needs to be tokens. These tokens then

serve as a way to provide consequences. Tokens can be tangible gaming-style chips, tickets, coins,

fake money, marbles, stickers, or stamps (McLaughlin & Williams, 1988). They can also come in the

form of more abstract items in the form of points or checkmarks given by the teacher or the economy‟s

“manager.” The choice of tokens can depend on the setting, population, manager‟s or teacher‟s

preference, cost, among other considerations. Population and setting considerations are related to

what type of tokens are going to be applicable for certain participants. A younger group, or students

with developmental or cognitive delays, may well benefit from more tangible items like coins or cards,

than more abstract items in the form of points or checkmarks (McLaughlin & Williams, 1988;

Stainback, Payne, Stainback, & Payne, 1973). Tangible tokens provide a concrete representation of

the number of tokens earned which can then be exchanged for rewards (B. Williams, R. Williams, &

McLaughlin, 1989). When choosing tokens, the teacher‟s preference, especially in relation to cost,

must be considered. Also, the choice of the token should include the difficulty or impossibility of the

token itself being duplicated and flooding the classroom with tokens not under the control of the

teacher. These factors must impact the types of tokens, which are used within the system, the

frequency at which they are delivered, and ultimately the back-up rewards that are available to give

value to the tokens.

3.2.3 Back-up rewards

Back-up rewards are the items that the students or persons have indicated they are willing to work.

Their desirability has been used to assign the number of tokens that are needed to purchase or take part

Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 135

in this reward (Kazdin, 1977). Without these back-up rewards, the tokens have no exchangeable

value. Also, tokens without value can negatively alter an individual‟s motivation (Wolf, 1936). The

more back-up rewards in the token system, the more substantial the reinforcing strength becomes

through pairing of tokens and rewards (B. Williams, R. Williams, & McLaughlin, 1989). Back-up

rewards have also been used in the home settings where they have included: ski trips, video games,

movies, or lunch at a chosen restaurant (Rustab & McLaughlin, 1988). Even with this variety of back-

up rewards, the monetary reward has been used very effectively (Jordan, McLaughlin, & Hunsaker,

1980). This is likely due to money‟s exchangeable abilities and its ability to act as one of the ultimate

Generalized Conditioned Reinforcers.

3.2.4 The exchange

An important part of the token economy is the exchange of tokens for certain back-up rewards chosen

by the economy‟s manager or students and in part by the needs and preferences of the participants.

The value of the token is a function of the reinforcers which are able to back-up their value (Kazdin,

1977). At the end of the period where tokens have been given, the teacher will decide to begin the

exchange process.

When a conditioned reinforcer like a token is exchanged for a variety of privileges and rewards, the

token is referred to as a generalized conditioned reinforcer (Kazdin, 1977). Generalized tangible

conditioned reinforcers, which can be exchanged for a variety of items, are used very frequently in

behavior modification programs (Kazdin, 1977). Tokens or generalized conditioned reinforcers also

come in the form of money used in society. The more items or rewards you can exchange for the

token, the more powerful the token becomes. Money and other generalized conditioned reinforcers are

more valuable than any single reinforce because they can purchase a variety of back-up reinforcers

(Kazdin, 1977). The power of generalized conditioned reinforcers was assessed when Sran and

Borrero (2010) compared behaviors reinforced by tokens which could be exchanged for a single

highly preferred item with tokens which could be exchanged for a variety of preferred items. They

found, while degrees of preference varied, all participants were shown to deliver higher rates of

responding during sessions where tokens could be exchanged for a variety of preferred items.

During the early implementation of the token economy, especially for lower-functioning persons, it is

important to have frequent exchange periods where participants can be quickly reinforced and target

behaviors can increase (O‟Leary & Drabman, 1971). Infrequent exchange periods at the beginning of

a token economy‟s implementation may prevent this type of system from working effectively. It is

important to determine and adapt the exchange period based on classroom needs (Kazdin, 1977;

McLaughlin & Williams, 1988). For some participants, especially those with Attention Deficit

Hyperactivity Disorder (ADHD), the immediacy in which a back-up reinforcer is received will be the

most influential dimension a token economy, making the time between token and exchange crucially

important (Neef, Bicard, & Endo, 2001; Reed & Martens, 2011). One of the important considerations

when carrying-out a token economy is its impact on the classroom environment or setting. The

exchange period should be quick to complete and not significantly impact the ability of the teacher to

manage the classroom or particular setting. Based on these considerations, it is important to schedule

exchange periods at the end of the class period, during a naturally occurring transition, or possibly at

the end of the day or week.

There are many different ways in which a token exchange can take place. Many types of exchange

systems have been implemented (Kazdin, 1977; McLaughlin, 1975). Tokens may be exchanged as

soon as they are earned (Bushell, 1978), at the end of a certain time period (McLaughlin & Malaby,

International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

136 Insan Akademika Publications

1972), or after a variable time period (McLaughlin & Williams, 1988). At the end of the token-reward

period, there may be a catalog of items and privileges, a “store” where the participant is able to

exchange tokens or a predetermined back-up reinforcer. Additionally, free-time itself may function as

its own generalized conditioned reinforce as it gives the participants access to a variety of back-up

rewards.

When the system is in place, teachers may choose an exchange time based on classroom schedule or

student needs. Token economy exchange periods could take place at the end of a 50-minute class

throughout the day, daily, weekly, or biweekly. The effectiveness of the token economy may decrease

as more if more time passes between presentation of the token and exchange for the backup reinforcers

(Kazdin, 1977; Neef et al., 2001; Reed & Martens, 2011). Variability of the exchange times as

opposed to fixed time periods where tokens are traded for back-up rewards have been shown to

increase response rates as well as maintenance of the behavior (McLaughlin & Malaby, 1976).

According to McLaughlin and Malaby (1976), executing variable exchange times within a token

economy is effective and an important consideration for any teacher or economy manager to consider.

3.3 Variations of Token Economies

3.3.1 Response cost

During a response cost system, tokens are taken away as students engage in certain pre-defined

behaviors. When tokens are taken from the student that is the cost of the behavior. In this variation of

the token economy, each unwanted behavior will have a cost which results in the confiscation of a

determined amount of tokens. Response cost is very commonly used to suppress behavior (Kazdin,

1977). The most commonly used form of response cost is the withdrawal of tokens or fines. Token

economies are unique because tokens can be presented or removed (Kazdin, 1977; McLaughlin &

Malaby, 1977a). Hall et al. (1972) employed response cost to reduce whining in a young child. The

researchers used slips of paper given to the boy with his name printed on them. The slips were taken

away for negative behaviors. Even when these slips had no apparent value, this response cost system

drastically reduced negative behaviors. Iwata and Bailey (1974) compared token reinforcement and

response cost in a special education classroom. Both were equally effective at improving behaviors.

However, the teacher was more negative with the students when response cost was used in the

classroom. In McLaughlin and Malaby (1977a), token reinforcement and response cost system was

found to be more effective at increasing target behavior than token reinforcement alone. Achievement

Place, (Kirigan, Braukman, Atwater, & Wolf, 1982), where at-risk youth are often sent to learn

important social and academic skills, so they can be placed back into mainstream society, effectively

implements a token reinforcement system with response cost to reduce severe behaviors while

increasing pro-social and academic behaviors (Ayllon & Azrin, 1968; Bailey, Wolf, & Phillips, 1970;

McLaughlin & Malaby, 1977a). In general, token economies with and without a response cost

component have been effective in different settings. It is important to note; however, that a program

solely reliant on response cost and punishment-oriented management are less likely to result in

creating pro-social behaviors in the participants (Iwata & Bailey, 1974; Kazdin, 1977). This is

interesting considering that, in some studies, there seems to be a preference by the teachers of response

cost when compared to a token reinforcement only system (McGoey & DuPaul, 2000). In McGoey

and DuPaul (2000), a preschool class compared stickers rewarded to students and stickers being

removed for off-task behavior. They found them to be equally effective. This finding replicates Iwata

and Bailey. However, it is important to consider that reinforcement for specific target behaviors is

more likely to develop pro-social responses as alternatives for the behaviors to being suppressed

(Kazdin, 1977).

Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 137

3.3.2 Lottery systems

Instead of a token economy where behaviors earn tokens to be exchanged at later period, lottery-based

systems add an additional component to the exchange period. In this type of economy, target

behaviors are rewarded with a token, or ticket and at the end of the reward period there is a lottery to

determine which individuals earn a backup reward. This can minimize the amount of backup rewards

delivered in the token economy by choosing only a select number of tokens, or tickets, to exchange. A

weakness of this type of system would be some ages and populations may be difficult to affect without

a direct correspondence of tokens and backup rewards (McLaughlin & Williams, 1988).

3.3.3 Individual vs whole class

It will be up to the teacher or manager of the economy to determine whether tokens will be awarded to

entire groups or to individuals within the group. The advantage of developing a group-oriented token

economy is the ease of which teachers may implement and track tokens and rewards (Kazdin, 1977).

These class-wide systems have also been well documented and seem to be useful in reducing

unwanted behavior (Bushell, Wrobel, & Michaelis, 1968; Packard, 1970). Consequences in these

class-wide economies can be group or individually administered, depending on the system chosen.

Packard (1970) evaluated a token economy under a group contingency in four elementary school

classes where off-task behavior was a concern. In Packard‟s study, certain class periods were chosen

for each grade and a class goal was assigned to raise on-task behavior. When the class met the criteria

for on-task behavior, they were given points which could then be exchanged for group or individually

assigned rewards (Packard, 1970). The results in that study showed baseline levels of below 10% on-

task behavior rise to between 70-100% on-task behaviors during class periods once the group-

contingent token economy was implemented (Packard, 1970).

3.3.4 Level systems

Level systems are a variation of token economy. In these systems, different levels correspond to

different degrees of participant behavior. For example, increasing preferred target behaviors may

result in higher levels which then translate to higher rates of reinforcement and privilege while

unwanted behaviors may result in a decreased rate of reinforcement or loss of privileges. In one level

system, each participant was assigned a shape or character and every 2-4 hours, would be moved up or

down the six-level system (Filcheck, McNeil, & Greco, 2004). Each system can be monitored

differently; however, the movement from one level to another based on participant behavior which

results in varying levels of reinforcement. Filcheck et al. (2004) compared a system where efficiency

was a priority and all rewards were able to be dispensed within three minutes. The researchers found

this efficient exchange to be beneficial during class times. The ability to efficiently dispense rewards

and levels make these systems easily customized based on the needs of the setting.

International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

138 Insan Akademika Publications

3.4 Efficacy of Token Systems

3.4.1 General Outcomes

Research with individuals in classroom settings using token economies has been firmly established the

efficacy of token reinforcement in altering a wide range of responses (Kazdin, 1977). There is a

significant need for effective behavior management systems. Lavigne (1998) notes that children

behavior problems are increasing, with estimates ranging from 2 to 17% of the population. This rate

of children with behavior problems is highlighting the demand for behavior management systems

which are data-based and effective. Token Economy systems are able to have a profound impact on

schools, classrooms, and community-based settings. One variation of the token economy, a response

cost system, is known to have produced higher levels of on-task behavior than when compared to

medication (Rapport, Murphy, & Bailey, 1982). The structure and implementation of the token

economy is important as noted by Kazdin (1977) where he describes the effectiveness of

reinforcement depends on: the delay between performance of response and delivery of reinforcement,

the magnitude and quality of the reinforcer, and the schedule of reinforcement. Many factors are

important in the consideration of a token economy. Whether or not reinforcement takes place on a

continuous or intermittent basis can impact the likelihood of maintenance (Kazdin, 1977).

3.4.2 Preschool

Token economies in the preschool setting have been utilized with a variety of modifications to this

behavior-management system (Filchek et al., 2004; McGoey & DuPaul, 2000). As the need for

behavioral interventions increase, it is important for preschool teachers to be aware of these token-

oriented procedures, and using these systems classroom-wide may be a great pro-active benefit

(Filcheck et. al., 2004).

Filcheck et al. (2004) compared the effectiveness of a class-wide token economy level system with

parent-training techniques in managing aberrant behaviors. These authors note that class-wide

application of the token economy has not been previously analyzed. However, group and individual

application of token systems have effectively reduced disruptive behavior in other settings (Bushell,

Wrobel, & Michaelis, 1968; Packard, 1970). The classroom in Filcheck et al. was described as “out of

control” and was chosen for behavioral intervention. The token economy used was a level system

where the top three levels included sunny faces which get increasingly happy, the center level is the

starting point and is blank and white, while the bottom three levels include cloudy faces that get

increasingly greyer and sad (Filcheck et al., 2004). In this system, promotion to different levels

within the preschool class allowed participants to complete certain activities while other children, who

were not promoted, were continuing with the pre-determined class schedule. Furthermore, at the end

of certain activities, all participants with “positive” behavior levels receive additional rewards like

stickers or activities with the teacher. In this system, the level system was found to decrease rates of

inappropriate behaviors; additionally, when the parent training was implemented further decreases

occurred (Filcheck et al., 2004). It is important to consider that in this study the training time

necessary for each of the two behavior management tools. In this study, the Level System took 4

hours and 30 minutes to train staff on including all consultation and feedback time; however, the

parent training took 11 hours and 30 minutes (Filcheck et. al. 2004). In term so effectiveness and time

efficiency, the level system seemed to have the greatest rate of positive return.

Additional studies have shown rapid behavioral improvement when a token economy is implemented.

A study involving a sticker chart in McGoey and DuPaul (2000) was managed by teachers placing

Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 139

stickers on a classroom board when they “caught” students being on-task. When a student earned a

certain number of small stickers, they were rewarded with a big sticker (McGoey & DuPaul, 2000).

For the response cost portion of this study, stickers were removed contingent on being off-task and

when the session ended, the big sticker was kept or removed from the chart. These token economy

and response cost systems resulted in large decreases of aberrant behavior (McGoey & DuPaul, 2000).

Implementing token economies in a preschool setting, Sran and Borrero (2010) compared two

variations of this behavior management system. In this study, tokens that were exchanged for a variety

of preferred items were shown to be more effective than tokens that could only be exchanged for one

highly preferred item. These results are consistent with previous research which shows generalized

conditioned reinforcers are more reinforcing than a single reinforce (Kazdin, 1977).

3.4.3 Elementary school

Elementary school classrooms, based on research study volume, seem to be one of the most common

settings in which token economy systems are used (Coupland & McLaughlin, 1981; Ruesch &

McLaughlin, 1981; Thompson, McLaughlin, & Derby, 2011). Many studies exist which show the

effectiveness of this type of behavior management tool. One of these studies, employed a free time

reward when five tokens had been earned (Ruesch, McLaughlin, 1981). The rationale that free time

would consist of a variety of reinforcers made it unlikely that satiation would occur (Kazdin, 1977). In

Ruesch and McLaughlin, (1981) a clear increase in student assignment completion took place. When

token economies were used to decrease inappropriate behavior by rewarding being on task, there is

proven effectiveness with this behavior management system (Coupland & McLaughlin, 1981). Under

a token economy with sixth grade participants, points were given and subtracted for appropriate and

inappropriate behavior respectively (McLaughlin & Malaby, 1976).

McLaughlin and Malaby (1977a) compared token reinforcement with and without response cost in a

special education elementary classroom. In McLaughlin and Malaby‟s (1977a) study, ten participants

were asked to write letters for a several minute session where they earned no token reinforcement

during baseline, token reinforcement during the next phase, and token reinforcement plus response

cost during the final phase. The overall results were such that, in this elementary classroom, token

reinforcement plus response cost resulted in higher rates of target behavior (McLaughlin & Malaby,

1977a). In another study, McLaughlin and Malaby (1976) analyzed assignment completion under

different schedules of token exchange. During that study involving a fifth and sixth grade class, points

were earned or taken away depending on whether children displayed appropriate or inappropriate

behavior. The results showed that participants had higher rates of appropriate behavior, as measured

through assignment completion, when there were a variable number of days between token award and

exchange (McLaughlin & Malaby, 1976). According to the authors, McLaughlin and Malaby (1976)

note that such a system where variable exchange days were implemented should be considered for any

teacher or economy manager interested in impacting the rates of assignment completion.

3.4.4 Middle school

Middle school classrooms have seen many instances of positive behavioral outcomes as part of a token

economy (Flaman & McLaughlin, 1986; Maglio & McLaughlin, 1981; Swain & McLaughlin, 1998;

Truchlicka, McLaughlin, & Swain, 1998). Maglio and McLaughlin (1981) note the importance of a

teacher‟s ability to manage the token system in their study where a student‟s partial self-management,

with teacher supervision, of points along with back-up reinforcers resulted in a significant decrease of

inappropriate behaviors. Besides social behavior, academic improvement has also been seen during

International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

140 Insan Akademika Publications

token reinforcement (Flaman & McLaughlin, 1986). Flaman and McLaughlin‟s study took place in a

junior high school drop-out prevention program where the subject rarely completed an assignment

unless given one-on-one assistance. In that study, correct answers on a worksheet resulted in 1-2

points per problem that could be exchanged for free-time on a classroom microcomputer. This study

increased the rate of correct answers from 34% to 69% correct during the first phase, and to 79%

during the second phase of token reinforcement (Flaman & McLaughlin, 1986). A second system

where assignment accuracy was a concern included bonus points (Swain & McLaughlin, 1998). In

that study, four middle school special education students which were previously being managed by a

token reinforcement system were offered fifty extra bonus tokens or points for assignment scores

greater than 80% (Swain & McLaughlin, 1986). This bonus contingency resulted in an increase of

math accuracy. When response cost is implemented in a high school setting, positive results are

possible (Truchlicka, McLaughlin, & Swain, 1998). Truchlicka et al. (1998) implemented a response

cost to an already functioning token reinforcement system. In this system, an accuracy goal of 85%

was required to earn token reinforcers; however, if that accuracy level was not reached, tokens were

removed or privileges were denied. This study concluded that the response cost phase resulted in a

higher rate of accuracy for each subject. The implementation of a point gain or point lose system had

a greater impact than a token reinforcing system.

3.4.5 High school

Implementation of token economies in the high school setting occurs at a much lower rate than when

compared to elementary school or middle school settings. This may be attributed to the fact that

teachers are more apprehensive towards this type of system; alternatively, the lower rate of occurrence

could be due to a perceived lack of effectiveness.

In a study by Crawford and McLaughlin (1982), token reinforcement was evaluated as a means to

increase on-task behavior. This study was conducted in a high school within a self-contained special

education classroom with a 15-year-old student. The student was given tokens and worked for a

chosen back-up reinforce which cost 30-40 cents worth of tokens. In this study there was a clear

increase in on-task behavior during the token-reinforcement phases. According to the study, on-task

behavior from the student more than doubled when tokens were first introduced (Crawford &

McLaughlin, 1982).

3.4.6 College or University

Token systems in college settings have also been assessed for effectiveness. Participation in class

within all settings is a priority and a goal for many teachers and professors, and two studies

specifically, aimed to analyze the impact of tokens on classroom participation in college settings.

Jalongo (1998) determined that only approximately 10% of students voluntarily participate in class

discussions. In one study, good questions that related to content, made sense, among other

requirements, were rewarded with token slips that were exchanged for bonus course points (Nelson,

2010). This study involved 318 undergraduate students and reported that classes asked higher rates of

questions when the token economy was implemented. An additional study involving token economies

at the college level analyzed the impact of class participation before, during, and after implementation

of the behavior management system (Boniecki & Moore, 2003). This study found that questions were

asked, and classroom participation was greater, when a token economy was introduced. The tokens in

this system were exchanged for .25% of additional credit towards the final course grade (Boniecki &

Moore, 2003). Students were more than twice as likely to participate than before the token economy

Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 141

system. Both token economy studies found an increase in classroom participation when a token

reinforcer was introduced; notably, in both cases, the tokens were exchanged for extra credit towards

the final grades in the classes. Grades could potentially be considered highly preferred items for

college students seeking certain GPAs, job prospects, etc.

3.4.7 Community and home

Applicability of the token economy can also be found in home-based and community settings (Bippes,

McLaughlin, & Williams, 1986; Jordan, McLaughlin, & Hunsaker, 1980; Rustab & McLaughlin,

1988). Token systems implemented at home can be effective at reducing or increasing similar

behaviors that are found in the school setting, as well as social behaviors and task-related behaviors

(Alvord, 1971; Arnett & Ulrich, 1975). Implementation in the community detention centers have also

delivered increased rates of accuracy and target behaviors (Bippes, et. al., 1986). In Rustab and

McLaughlin‟s (1988) study, inappropriate behavior and spelling accuracy were measured during

baseline and post-token economy implementation. In this particular case, tokens were rewarded for

every 5 minutes of appropriate behavior and tokens were exchanged weekly for privileges within and

outside the home. Inappropriate behavior immediately decreased once token reinforcement began.

When target academic and social behaviors were only reinforced through tokens at home, the higher

rates of on-task behavior and spelling accuracy at home were generalized to higher rates of the

behaviors in school (Rustab & McLaughlin, 1986). Home-exclusive behaviors in the category of

chores and social demands were also dramatically increased during another study (Christophersen,

Arnold, Hill, & Quilitch, 1972). Home-based token economies using 1 cent per minute token rewards

have been shown to increase on-task behavior (Jordan et al., 1980).

Token economies in the schools where consequences were dispensed at the participant‟s home have

also resulted in improved classroom performance and study behavior (Bailey, Wolf, & Phillips, 1970).

In this study, on task “yes‟” were rewarded with privileges at home (Bailey et al., 1970). Partnerships

between the classroom teacher and the home guardian of the participant can play an effective role in

behavior modification. In many cases of children with severe behavior, classroom teachers may not be

in possession of reinforcing contingencies, and, may require a parent or guardian to devise effective

consequences (Bailey et al., 1970). Moreover, concerns of a lack of maintenance and participants

being unable to generalize behavioral gains made in the school setting make home-involvement more

attractive (Brown, Montgomery, & Barclay, 1969; Walker & Buckley, 1972). Involving the parents or

guardians in such a way that they are dispensing the consequences for behavior occurring in other

settings is an effective method to sustain a token economy (Bailey et al., 1970; Cantrell, Cantrell,

Huddleston, & Woolridge, 1969; McKenzie, Clark, Wolf, Kothera, & Benson, 1968; Thorne, Tharp, &

Wetzel, 1967).

3.5 Limitations and Ethical Concerns with a Token Economy

As with any system which has been widely implemented, token economies have been the target of

ethical concerns as well as criticisms stemming from published and perceived weaknesses (Kohn,

1999). Doubts and concerns about token economies have existed since the behavior modification

method has taken on a more mainstream role in society. Early criticism of Alexander Maconochie‟s

“Mark System” described his program as indulging the prisoners rather than providing the punishment

and social revenge usually accorded them (Kazdin, 1977). The tickets given out in New York City

schools originating from Lancaster‟s “Monitorial System” of reward and punishment was withdrawn

in the 1830‟s because the trustees believed that cunning behavior rather than meritorious behavior was

International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

142 Insan Akademika Publications

being rewarded (Kazdin, 1977). However, token-based reinforcement systems tend to be extremely

effective as a method to modify behavior (Chance, 2006; Kazdin, 1977). Notably within a token

economy, a large number of target behaviors, clients, and back-up reinforcers can be incorporated into

a single, highly efficient method (Kazdin, 1977). A general concern inherent to any behavior

management system is its ability to be fair, reliable, and functional. Stealing of tokens, lack of

participation, token-economy sabotage by participants are some of the ways that this behavior

management system may fail from within. It is vital that token economy managers are aware of these

possibilities and take steps to pre-empt any of these negative consequences of poor planning

(McLaughlin & Williams, 1988).

Modern critiques of the token economy have come from education professionals, administrators, and

community members. This criticism has stemmed from philosophical opposition to token

reinforcement. These critics have suggested that token reinforcement constitutes bribery or blackmail

(Kazdin, 1977; Kohn, 1999). However, when one defines bribery in the correct manner, token

reinforcement is not used to reward unethical or illegal behaviors. Therefore, labeling token

reinforcement as bribery is totally inappropriate (Chance, 2006). Although social and philosophical

opposition are fruitful topics for the media, the inappropriate use of such terms as bribery, rewards as

suggested by Kohn (1999) is totally inaccurate. There have been concerns that students may become

dependent on these systems and they will only constantly working for tangible tokens or backup

rewards. Furthermore, there is criticism that these systems may undermine intrinsic motivation for

students (Kohn, 2006). While intrinsic motivation may produce qualitatively different results, not all

individuals possess such willingness and appropriate behavior must be more directly reinforced.

As part of the token economy, teachers and others use back-up reinforcers to give value or potency to

the token (Kazdin, 1977). Some systems employ back-up reinforcers that are new to the environment,

while others use back-up reinforcers that more naturally “fit,” such as recess or a free break during

class in a school setting (McLaughlin, 1981; McLaughlin & Malaby, 1972, 1975, 1976). An important

component to remedy a loss of target behavior over time is to create token economies where the back-

up reward is a natural reinforcer, where, instead of an external prize that costs money and is

administered by the economy manager, the tokens could be exchanged for a rest period or a water

break. Even when these two different forms of back-up reinforcers are dispensed, it is setting the

occasion for the participant to be rewarded for certain behaviors, just as an employee would be

rewarded with a paycheck, a participant would be able to earn tokens. Token-reinforcement systems

can easily be compared to the adult world of work and society as a whole where certain work or

behaviors are rewarded with tokens, or cash. Token-based programs can leave the participants

dependent on earning rewards for target behaviors. Once tokens are withdrawn, desirable behavior

may decrease or inappropriate behaviors increase (Kazdin, 1977). As a token-economy manager

attempts to phase out the program, it is important that specific procedures are implemented in order to

withdraw the economy without a loss of behavior gains. Kazdin (1977) and others note that creating a

procedure where exchange periods become less frequent and increasingly variable may improve the

likelihood of maintenance (McLaughlin & Malaby, 1972, 1975, 1976). Additionally, self-monitoring

by the participant may also help the behavior to generalize across settings and even after tangible

rewards are being exchanged explicitly by the manager (Turkewitz, O‟Leary, & Ironsmith, 1975).

These modifications have been shown to remedy these issues related to maintenance and

generalization.

Another concern is that token economies are sometimes substantial work for the staff that administers

them. Teachers are encountering larger classes with increasing numbers of behavioral issues;

however, easily implemented systems can address their needs as well as the varied classroom

management concerns (Barth, 1979). The degree in which a teacher can easily implement this token

Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 143

economy strategy is an issue for teachers who are busy teaching. Often, it is difficult to engage in

elaborate systems that mandate data collection, token management, and intricate exchange processes.

While there are systems which are administratively more involved than others, it is possible to

implement systems which are easy to implement and evaluate. A system of easy administration was

studied in Rustab and McLaughlin‟s (1988) home-based system where a parent was able to administer

the system without any outside help once the parent was trained on token reinforcement. Additionally,

when a token economy was implemented to increase piano practice time, the parent was able to

implement the procedures with little training and administrative struggle (Jordan, McLaughlin, &

Hunsaker, 1980). Concerns over the administrative aspects can be mitigated with deliberate planning

of the token economy. For example, response cost was preferred by teachers and sustained after

research ended in a preschool classroom due to easier management (McGoey & DuPaul, 2000). In

McGoey and DuPaul (2000), the researchers noted that catching individuals within a large class or

group made a response cost system easier to implement. Making preferences for one system

modification over another, especially when implementing a token economy with an entire classroom,

will help teachers decrease administrative tasks inherent in some token economies while allowing this

system to function as a behavior management tool.

Next, there are limitations of token economies, notably concerning participants who exhibit severe

behavior in a class or group-home setting. These participants with severe behaviors may not be

affected by a token economy system that would work for most other individuals (Kazdin, 1977).

Some participants simply do not respond to the token economy for one or multiple reasons.

Potentially, with severe behavior, other therapies may be implemented to decrease inappropriate

responses. If the problem is behavioral it will be up to the manager of the system to determine

whether certain modifications can be made to enhance the viability of the token economy. If a student

is not responding to the token economy, then it would be necessary to evaluate the procedures used to

give tokens, exchange tokens, as well as the actual rewards being given out in exchange for the tokens.

For example, altering the back-up rewards where they are more reinforcing for an individual would be

way to make the token economy more effective. As previously noted, if the classroom teacher is

unable to dispense appropriate consequences which do have significant reinforcing qualities, involving

those who can by communicating with the parent or guardian at the participant‟s home may result in a

more effective token economy (Bailey et al., 1970).

Cost is a significant consideration when implementing a token economy, and can be a limitation when

a teacher or other manager is beginning to plan the back-up reinforcers being used. This is especially

true when trying to configure a genuinely reinforcing reward with the ability to drive behavior

modification, a potentially challenging mission with increasingly older participants. There are several

studies which aim to develop token economies which are effective and cost-conscious. The purpose of

McLaughlin and Malaby (1977b) study was to evaluate the effects of a cost free token reinforcement

program on special education students. Rewards included: recess, extra gym time, films, free time,

special jobs, messenger, art projects, and buying the teacher lunch. It was shown that this system

delivered an increase in the frequency of letters traced. The number of target responses varied from

15-84 during baseline, to 30-108 during the token phase (McLaughlin & Malaby, 1977b). It is clear

that token economies can be effective at a low cost when certain rewards are used in the program.

Free and low cost reinforcers are also a realistic option for token economy administrators of older and

more sophisticated students (Crawford & McLaughlin, 1982). In Crawford and McLaughlin, (1982) a

single cassette tape was purchased and listening time acted as a back-up reward; a cost effective

reinforcer within that token economy increased levels of on-task behaviors. Ultimately, it is the

responsibility of teachers and economy administrators to utilize the low cost and free options available

to them and within their classroom and community.

International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

144 Insan Akademika Publications

These concerns and limitations of token economies are genuine and should be addressed in one way or

another; however, they are no reason to cease implementation of a token economy. All concerns and

limitations listed above and throughout this literature review can be mitigated through careful review

and modification of the token economy. Concerns may be best addressed through meaningful

communication between the token economy manager and the concerned individual. Communication

an education of the teachers, parents, and community members may help reduce the concerns and

likelihood that public distress may preemptively end the token economy in the classroom.

4 Suggestions for Future Research

It is important to elaborate on and conduct further research on token economies with a variety of

settings, participants, and modifications. As this behavior management system has seen wide-range

success in increasing target behaviors, while decreasing others, it is important to expand the scope of

utilization of the token economy. More studies with older participants should be conducted. Notably,

research should be completed with students in middle and high schools; in particular, research

implemented with older students diagnosed with emotional, behavioral, and social disabilities would

benefit students and teachers significantly.

Additionally, it is important to evolve teacher education programs to where new teachers have strong

classroom management foundations. Successful classroom management techniques are crucial to

successful teaching and student learning: token economies are an important aspect of classroom

management which teachers could implement. Beyond learning the techniques available to teachers in

their programs, instilling a meaningful knowledge of behavioral principles are important for successful

classroom management and token economy implementation in particular.

Another suggestion for future research relates to maintenance of certain target behaviors which were

reinforced in a token economy. Maintenance of skills is crucial for real world application and long-

term success. Sustainment of behavioral gains is important to the teacher‟s target behavior goals,

long-term success for the student, and various social rewards. Research which elaborates on

maintenance realities of behavior post-token reinforcement would be helpful for practitioners on how

best to continue the gains made during a token economy. Within the area of back-up reinforcers, the

type of item used may help to strengthen the long-term sustainability and maintenance of the token

system. Research which discusses whether more natural reinforcers, which are part of the setting in

which the participants live, work, or are taught, are more effective and sustainable than more abstract

or artificial rewards or reinforcers.

5 Analysis and Conclusions

Ultimately, token economies have been found to be an effective method of behavior management

across various settings. This analysis has compiled evidence of effectiveness across school and

community settings; however, token- reinforcement systems have seen remarkably diverse

applications in prisons, military organizations, and psychiatric hospitals. Based on this collection of

studies, it is important to note the trends which exist in the modern implementation of the token

economy; particularly, the populations most often studied and the types of modifications implemented

across varied settings. In order to effectively implement a token economy, it is important to fully

understand the principles of behavior, the variety of token systems, and how to manipulate the

conditions of the token economy in order to best serve the needs of a particular group or setting.

Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 145

Based on the review of literature, it seems there has been a decline in the quantity of research articles

of token economies throughout the past several decades. The works referenced in this review illustrate

the great majority of articles are dated before 1990. Moreover, each decade from 1960, 1970, and

1980 resulted in an average of approximately three times the number of articles when compared to

each decade after. Clearly, based on the references reviewed for this article and searches completed on

various databases, token-economy research since the 1960‟s through 1980‟s has experienced a sharp

decline. There may not be a single explanation why this reduction in research has occurred in this

area; however, there are several possible reasons. One, the steep reduction of research could be a

result of overwhelming data and research on the topic‟s effectiveness. Another possibility could be a

decline in use as increasing numbers of school districts and communities have avoided using extrinsic

rewards, and token economy systems, to manage classrooms. Third, the reduction in research of token

economies could be attributed to researchers‟ concentration on novel management techniques or more

unique learning strategies. While these given reasons may or may not be the actual reason for the

decline in token research, they each have an important role in the discussion.

The reduction in research articles vetting the token economy since the around the 1970‟s leaves much

work to be done. The effectiveness of these systems in middle and high school has been addressed

only minimally. The same is true for higher education settings where token economies have shown to

be, so far, highly effective. Specifically, research deficits can be cited with the lack of completed

studies involving participants with emotional and behavioral disorders in the high school classroom.

These deficits should be remedied, especially if one of the reasons for the decline in research was a

result of the overwhelming attention the topic received in decades past. There are still areas within the

token economy that have not been adequately addressed. While the token economy is widely known it

is important to inform the education community of the potential for even greater utilization across an

even larger number of settings and populations.

In the research on token systems, there are certain settings where a reader is more likely to find a study

relating to the implementation of the behavior modification system. Elementary settings are much

more likely to implement a token-reinforcement system, based on the articles reviewed, than middle or

high school settings. The older and more senior a participant, the less likely there is to be a study on

effective behavior modification using a token-reinforcement. Of particular note, classrooms composed

of students with emotional, social, and behavioral disabilities have not widely implemented token

systems. Research with these high-needs populations would add knowledge to the field and enhance

behavior management in those classrooms. This could really be beneficial for those teachers working

in such classrooms.

An additional area of noticeable weakness within token economy literature is related to maintenance

and generalization of treatment effects both during and after program implementation (Kohn, 1999;

Turkewitz et al., 1975). Varying schedules of exchange from fixed (once per period or week) to a

more variable one (exchange from once a week to once every 3 weeks for example) may help to

mitigate maintenance concerns. Variable exchanges have been shown to increase maintenance of the

skill and to be effective (McLaughlin & Malaby, 1976). Also, additional research where the long-term

assessment of such outcomes is employed is clearly needed.

Acknowledgement

This research was completed in partial fulfillment for the requirements of the first author‟s Master‟s

Thesis in the Master of Initial Teaching (MIT) by the first author from the Department of Special

Education at Gonzaga University. The author would like to give particular thanks to various faculty

International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

146 Insan Akademika Publications

members. Now teaching students with EBD in the Lake Washington School District. Requests for

reprints should be addressed to Christopher Doll, MIT, Lake Washington School District #414, Juanita

High School, 10601 NE 132
nd

St., Kirkland, WA 98034.

References

Addessi, E., Mancini, A., Crescimbene, L., & Visalberghi, E. (2011). How social context, token value,

and time course affect token exchange in Capuchin monkeys. International Journal of

Primatology, 32, 83-98.

Alberto, P., & Troutman, A. (2012). Applied behavior analysis for teachers (2
nd

ed.) Upper Saddle

River NJ: Pearson Education.

Alvord, J. (1975). Home token economy. Champaign, IL: Research Press.

Arnett, M. S. & Ulrich, R. C. (1975). Behavioral control in the home setting. Psychological Record,

25, 395-413.

Ayllon, T., & Azrin, N. (1968). The token economy. New York, NY: Appleton-Century-Crofts.

Barth, R. (1979). Home-based reinforcement of school behavior: A review and analysis. Review of

Educational Research, 3, 436-458.

Bailey, J. S., Wolf, M. M., & Phillips, E. L. (1970). Home-based reinforcement and the modification of

pre-delinquents‟ classroom behavior. Journal of Applied Behavior Analysis, 3, 223-233.

Bippes, R., McLaughlin, T. F., & Williams, R. L. (1986). A classroom token system in a detention

center: Effects for academic and social behavior. Techniques: A Journal for Remedial

Education and Counseling, 2, 126-132.

Birnbaum, P. (Ed.). (1962). A treasury of Judaism. New York, NY: Hebrew Publishing Company.

Boniecki, K. A., & Moore, S. (2003). Breaking the silence: Using a token economy to reinforce

classroom participation. Teaching of Psychology, 30, 224-227.

Brown, J., Montgomery, R., & Barclay, L. (1969). An example of psychologist management of

teaching reinforcement procedures in the elementary classroom. Psychology in the Schools, 6,

336-340.

Bushell, D. (1978). An engineering approach to the elementary classroom: The behavior analysis

follow-through project. In A. C Catania & T. A. Brigham (Eds.), Handbook of applied

behavior analysis: Social and instructional processes (pp. 525-563). New York, NY:

Irvington.

Bushell Jr., D., Wrobel, P. A., & Michaelis, M. L. (1968). Applying „group‟ contingencies to the

classroom study behavior of preschool children. Journal of Applied Behavior Analysis, 1, 55-

61.

Cantrell, R., Cantrell, M., Huddleston, C., & Woolridge, R. (1969). Contingency contracting with

school problems. Journal of Applied Behavior Analysis, 2, 215-220.

Carcopino, J. (1940). Daily life in ancient Rome. New Haven, CT: Yale University Press.

Chance, P. (2006). First course in applied behavior analysis. Long Grove, IL: Waveland Publishing

Christophersen, E. R., Arnold, C. M., Hill, D. W., & Quilitch, H. R. (1972). The home point system:

Token reinforcement procedures for application by parents of children with behavior

problems. Journal of Applied Behavior Analysis, 5, 485-497.

Cooper, J. O., Heron, T., & Heward, W. L. (2007). Applied behavior analysis (2
nd

ed.). Upper Saddle

River, NJ: Prentice-Hall Pearson Education.

Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 147

Coupland, L., McGregor, S., & McLaughlin, T. F. (1981). Reduction of inappropriate noise through

the use of a token economy. B. C. Journal of Special Education, 5, 65-75.

Cowles, J.T. (1937). Food-tokens as incentives for learning by chimpanzees. Comparative Psychology

Monographs, 23, 1-96.

Crawford, D. J., & McLaughlin, T. F. (1982). Token reinforcement of on-task behavior in a secondary

special education setting. Behavioral Engineering, 7, 109-117.

Dickerson, F. B., & Tenhula, W. N. (2005). The token economy for schizophrenia: Review of the

literature and recommendations for future research. Schizophrenia Research, 75, 405-416.

Doolittle, J. (1865). Social life of the Chinese: With some account of their religious, governmental,

educational, and business customs and opinions, Volume 1. New York, NY: Harper &

Brothers.

Duran, F. D. (1964). The aztecs. New York, NY: Orion Press.

Filcheck, H. A., McNeil, C. B., Greco, L. A., & Bernard, R. S. (2004). Using a whole-class token

economy and coaching of teacher skills in a preschool classroom to manage disruptive

behavior. Psychology in the Schools, 41, 351-361.

Flaman, F., & McLaughlin, T. F. (1986). Token reinforcement: Effects for accuracy of math

performance and generalization to social behavior with an adolescent student. A Journal for

Remedial Education and Counseling, 2, 39-47.

Grant, M. (1967). Gladiators. London: Trinity Press.

Hall, R. V., Axelrod, S., Foundopoulos, M., Shellman, J., Campbell, R. A., & Cranston, S. S. (1972).

The effective use of punishment to modify behavior in the classroom. In K. D. O‟Leary & S.

G. O‟Leary (Eds.), Classroom Management: The successful use of behavior modification (pp.

173-182). New York, NY: Pergamon Press.

Iwata, B. A., & Bailey, J. S. (1974). Reward versus cost token systems: An analysis of the effects on

students and teacher. Journal of Applied Behavior Analysis, 7, 567-576.

Jalongo, M., Tweist, M., Gerlack, G., & Skoner, D. (1998). The college learner. Upper Saddle River,

NJ: Merrill.

Jordan, D., McLaughlin, T. F., & Hunsaker, D. (1980). The effects of monetary reinforcement on piano

practice in the home. Education and Treatment of Children, 3, 161-163.

Kazdin, A. E. (1977). The token economy: A review and evaluation. New York, NY: Plenum Press.

Kazdin, A. E. (1982). The token economy: A decade later. Journal of Applied Behavior Analysis, 5,

431-445.

Kazdin, A. E., & Bootzin, R. R. (1972). The token economy: An evaluative review. Journal of Applied

Behavior Analysis, 5, 343-372.

Kirigan, K. A., Braukman, C. J., Atwater, J. D., & Wolf, M. M. (1982). An evaluation of Teaching-

Family (Achievement Place) group homes for juvenile offenders. Journal of Applied Behavior

Analysis, 15, 1-16.

Kohn, A. (1999). Punished by rewards: The trouble with gold stars, incentive plants, A’s, praise and

other bribes. Boston, MA: Houghton Mifflin.

Lavigne, J. V., Gibbons, R. D., Christoffel, K. K., Arend, R., Rosenbaum, D., Binns, H., Dawson,

N., … Isaacs, C. (1998). Prevalence rates and correlates of psychiatric disorders among

preschool children. Journal of the American Academy of Child and Adolescent Psychiatry, 35,

204-214.

Lolich, E., McLaughlin, T. F., & Weber, K. P. (2012). The effects of using reading racetracks

combined with direct instruction precision teaching and a token economy to improve the

reading performance for a 12-year-old student with learning disabilities. Academic Research

International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

Doll, et. al.

148 Insan Akademika Publications

International, 2(3), 245-252. Retrieved from:

http://174.36.46.112/~savaporg/journals/issue.html

Maggin, D. M., Chafouleas, S. M., Goddard, K. M., & Johnson, A. H. (2011). A systematic evaluation

of token economies as a classroom management tool for students with challenging behavior.

Journal of School Psychology, 49, 529-554.

Maglio, C. L., & McLaughlin, T. F. (1981). Effects of a token reinforcement system and teacher

attention in reducing inappropriate verbalizations with a junior high school student. Corrective

and Social Psychiatry and Journal of Behavior Technology Methods and Therapy, 27, 140-

145.

Malagodi, E. F. (1967). Acquisition of the token-reward habit in the rat. Psychological Reports, 20,

1335-1342.

Matson, J. L., & Boisjoli, J. A. (2009). The token economy for children with intellectual disability

and/or autism: A review. Research in Developmental Disabilities, 30, 240-248.

McGoey, K. E., & DuPaul, G. J. (2000). Token reinforcement and response cost procedures: Reducing

the disruptive behavior of preschool children with Attention-Deficit/Hyperactivity Disorder.

School Psychology Quarterly, 15, 330-343.

McKenzie, H., Clark, M., Wolf, M., Kothera, R., & Benson, C. Behavior modification of children with

learning disabilities using grades as token reinforcers. Exceptional Children, 38, 745-752.

McLaughlin, T. F. (1981). The effects of a classroom token economy on math performance in an

intermediate grade school class. Education and Treatment of Children, 4, 139-147.

McLaughlin, T. F., & Malaby, J. E. (1972). Intrinsic reinforcers in a classroom token economy.

Journal of Applied Behavior Analysis, 5, 263-270.

McLaughlin, T. F., & Malaby, J. E. (1975). The effects of various token reinforcement contingencies

on assignment completion and accuracy during variable and fixed token exchange schedules.

Canadian Journal of Behavioral Sciences, 7, 412-419.

McLaughlin, T. F., & Malaby, J. E. (1976). An analysis of assignment completion and accuracy across

time under fixed, variable, and extended token exchange periods in a classroom token

economy. Contemporary Educational Psychology, 1, 346-355.

McLaughlin, T. F., & Malaby, J. E. (1977a). The comparative effects of token-reinforcement with and

without a response cost contingency with special education children. Educational Research

Quarterly, 2, 34-41.

McLaughlin, T. F., & Malaby, J. E. (1977b). A cost free token reinforcement program for special

education students. Corrective and Social Psychiatry and Journal of Behavior Technology

Methods and Therapy, 23, 111-116.

McLaughlin, T. F., & Williams, R. L. (1988). The token economy in the classroom. In J. C. Witt, S. N.

Elliot, & F. M. Gresham (Eds.). Handbook of behavior therapy in education (pp. 469-487).

New York, NY: Plenum.

Miller, P. M., & Drennen, W. T. (1970). Establishment of social reinforcement as an effectivce

modifier of verbal behavior in chronic psychiatric patients. Journal of Abnormal Psychology,

76, 392-395.

Neef, N. A., Bicard, D. F., & Endo, S. (2011). Assessment of impulsivity and the development of self-

control in students with attention deficit hyperactivity disorder. Journal of Applied Behavior

Analysis, 34, 397-408.

Nelson, K. G. (2010). Exploration of classroom participation in the presence of a token economy.

Journal of Instructional Psychology, 37, 49-56.

http://174.36.46.112/~savaporg/journals/issue.html

Doll, et. al. International Journal of Basic and Applied Science,
Vol. 02, No. 01, July 2013, pp. 131-149

www.insikapub.com 149

O‟Leary, K. D., & Drabman, R. (1971). Token reinforcement programs in the classroom: A review.

Psychological Bulletin, 75, 379-398.

Packard, R. G. (1970). The control of „classroom attention‟: A group contingency for complex

behavior. Journal of Applied Behavior Analysis, 3, 13-28.

Rapport, M. D., Murphy, H. A., & Bailey, J. S. (1982). Ritalin vs. response cost in the control of

hyperactive children: A within-subject comparison. Journal of Applied Behavior Analysis, 15,

205-216.

Reed, D. D., & Martens, B. K. (2011). Temporal discounting predicts student responsiveness to

exchange delays in a classroom token system. Journal of Applied Behavior Analysis, 44, 1-18.

Ruesch, U., & McLaughlin, T. F. (1981) Effects of a token system using a free-time contingency to

increase assignment completion with individuals in the regular classroom. B. C. Journal of

Special Education, 5, 347-355.

Rustab, K. E., & McLaughlin, T. F. (1988). Reducing inappropriate behavior in the home with a token

economy. Behaviour Change, 5, 160-164.

Sran, S. K., & Borrero, J. C. (2010). Assessing the value of choice in a token system. Journal of

Applied Behavior Analysis, 43, 553-557.

Schmandt-Besserant, D. (1992). Before writing: Volume 1: From counting to cuneiform. Austin, TX:

University of Texas Press.

Skinner, B. F. (1966). What is the experimental analysis of behavior?. Journal of the Experimental

Analysis of Behavior, 9, 213-218.

Sousa, C., & Matsuzawa, T. (2001). The use of tokens as rewards and tools by chimpanzees (pan

troglodytes). Animal Cognition, 4, 213-221.

Stainback, W., Payne, J. S., Stainback, S., & Payne, R. A. (1973). Establishing a token economy in the

classroom. Columbus, OH: Merrill.

Swain, J. C., & McLaughlin, T. F. (1998). The effects of bonus contingencies in a classwide token

program on math accuracy with middle-school students with behavioral disorders. Behavioral

Interventions, 13, 11-19.

Thompson, M. J., McLaughlin, T. F., & Derby, K. M. (2011). The use of differential reinforcement to

decrease the inappropriate verbalizations of a nine-year-old girl with autism. Electronic

Journal of Research in Educational Psychology, 9, 183-196.

Thorne, G., Tharp, R., & Wetzel, R. (1967). Behavior modification techniques: New tools for

probation officers. Federal Probation.

Truchlicka, M., McLaughlin, T. F., Swain, J. C. (1998). Effects of token reinforcement and response

cost on the accuracy of spelling performance with middle-school special education students

with behavior disorders. Behavioral Interventions, 13, 1-10.

Turkewitz, H., O‟Leary, K. D., & Ironsmith, M. (1975). Generalization and maintenance of

appropriate behavior through self-control. Journal of Consulting and Clinical Psychology, 43,

577-583.

Walker, H., & Buckley, N. (1972). Programming generalization and maintenance of treatment effects

across time and across settings. Journal of Applied Behavior Analysis, 5, 209-224.

Williams, B. F., Williams, R. L, & McLaughlin, T. F. (1989). The use of token economies with

individuals who have developmental disabilities. In E. Cipani (Ed.), The treatment of severe

behavior disorders (pp. 3-18). Washington, DC: AAMR Publications.

Wolf, J. B. (1936). Effectiveness of token-rewards for chimpanzees. Comparative Psychology

Monographs, 12, 1-72.

EFFICACY OF AND PREFERENCE FOR REINFORCEMENT

AND

RESPONSE COST IN TOKEN ECONOMIES

ERICA S. JOWETT HIRST
SOUTHERN ILLINOIS UNIVERSITY

CLAUDIA L. DOZIER
UNIVERSITY OF KANSAS

AND

STEVEN W. PAYNE
STATE UNIVERSITY OF NEW YORK

Researchers have shown that both differential reinforcement and response cost within token
economies are similarly effective for changing the behavior of individuals in a group context
(e.g., Donaldson, DeLeon, Fisher, & Kahng, 2014; Iwata & Bailey, 1974). In addition, thes

e

researchers have empirically evaluated preference for these procedures. However, few previou

s

studies have evaluated the individual effects of these procedures both in group contexts and in
the absence of peers. Therefore, we replicated and extended previous research by determining
the individual effects and preferences of differential reinforcement and response cost under both
group and individualized conditions. Results demonstrated that the procedures were equal

ly

effective for increasing on-task behavior during group and individual instruction for most chil-
dren, and preference varied across participants. In addition, results were consistent across partici-
pants who experienced the procedures in group and individualized settings.
Key words: differential reinforcement, independent group contingency, preference, response

cost, token economy

The token economy is a common behaviora

l

intervention that has been demonstrated to be
effective for increasing appropriate behavior
and decreasing inappropriate behavior for many
populations across different settings (Doll,
McLaughlin, & Barretto, 2013; Hackenberg,
2009; Kazdin, 1977). Token economies involve
delivery, removal, or both delivery and removal
of conditioned reinforcers (e.g., tokens and
points) that can be exchanged for back-up rein-
forcers (e.g., prizes, treats, and leisure activ-
ities). When tokens are delivered contingent on
appropriate behavior or for the absence of inap-
propriate behavior, these procedures are termed

differential reinforcement of alternative behavior
(DRA) or differential reinforcement of other
behavior (DRO), respectively. When tokens are
removed contingent on inappropriate behavior
or for the absence of appropriate behavior, this
procedure is termed response cost (RC).
An advantage of token economies is that

they can be implemented with a group of indi-
viduals as a general behavior-management strat-
egy during small-group instruction or as a
classwide intervention. Classwide behavior-
management strategies such as token economies
should be considered to address minor disrup-
tive behavior, to increase motivation for learn-
ing, or as a complement to an individualized
intervention. However, general behavior-
management strategies may not be effective in
isolation for some individuals who engage in
severe problem behavior or have more intense

Correspondence concerning this article should be
addressed to Claudia L. Dozier, Department of Applied
Behavioral Science, University of Kansas, Lawrence, Kan-
sas 66045 (e-mail: cdozier@ku.edu).
doi: 10.1002/jaba.294

JOURNAL OF APPLIED BEHAVIOR ANALYSIS 2016, 49,

329

–345 NUMBER 2 (SUMMER

)

329

deficits in learning. These individuals may
require more individualized, function-based
assessment, intervention, and additional sup-
port. Regardless, token economies are common
in classrooms and numerous other environ-
ments because they are likely to create motiva-
tion for changes in behavior for most
individuals in the group, creating a more man-
ageable and effective learning environment.
After numerous studies were conducted to

demonstrate the effectiveness of reinforcement
and RC procedures in token economies,
researchers began to compare the effectiveness
of these two procedures (e.g., Brent & Routh,
1978; Broughton & Lahey, 1978; Iwata & Bai-
ley, 1974; Panek, 1970). Overall, most studies
that have compared differential reinforcement
(DR) to RC have demonstrated equal effective-
ness of the two procedures (e.g., Capriotti,
Brandt, Ricketts, Espil, & Woods, 2012;
Donaldson, DeLeon, Fisher, & Kahng, 2014;
Iwata & Bailey, 1974; McGoey & DuPaul,
2000). However, these results are limited in
two important ways. First, most studies
involved the use of group contingencies (i.e.,
the implementation of the procedures in the
context of a group in which others are present),
which may have influenced responding. For
example, comments made or behaviors mod-
eled by others in the group may have influ-
enced target responding. Second, most studies
reported only group averages with respect to
target behavior, which does not allow analysis
of individual differences. For example, Iwata
and Bailey (1974) compared DRO and RC for
decreasing rule violations and increasing on-
task behavior of 15 children in a classroom.
During DRO, tokens were delivered at the end
of a 3- to 5-min interval if no rule violations
occurred during that interval. During RC,
tokens were removed at the end of an interval
if any rule violations occurred during that inter-
val. The children could earn or lose up to
10 tokens throughout a 30-min math period,
and the tokens could be exchanged for snacks

and free time. Results showed that the proce-
dures were similarly effective for reducing rule
violations and off-task behavior. However, the
authors reported group averages, which may
not be representative of individual responding.
Furthermore, because the study was conducted
as a group intervention, the influence of peer
behavior on target responding is unknown.
More recently, Donaldson et al. (2014) com-

pared DRO and RC for decreasing the disrup-
tive behavior of 12 first-grade students.
Although the procedures were implemented in
a group context, the authors reported both
group-average outcomes and individual out-
comes. Group-average data showed low to zero
levels of problem behavior; however, an analysis
of individual data showed that responding dur-
ing DRO was somewhat variable for four of
the 12 participants. Although this study, along
with Iwata and Bailey (1974) and most others,
provides preliminary evidence regarding the
effectiveness of reinforcement and RC when
used in a token economy, because the proce-
dures were implemented in a group context,
the influence of peers on target responding is
unknown. For example, individuals may show
an increase or decrease in target behavior
because their peers are (a) engaging in a target
behavior, (b) prompting them to engage in a
target behavior, (c) providing reinforcers (e.g.,
attention) for them to engage in appropriate
target behavior, (d) implementing punishers
(e.g., reprimands) for not engaging in a target
behavior (Salend & Kovalich, 1981), or
(e) extinguishing previously reinforced target
behavior (e.g., no longer delivering attention).
Therefore, to further isolate the effects of rein-
forcement and RC contingencies in token
economies, conducting the comparison while
students work independently or are otherwise
not in the presence of others might be
important (Capriotti et al., 2012; Sindelar,
Honsaker, & Jenkins, 1982). Furthermore,
comparing responding of a single individual
when in the presence and absence of peers to

ERICA S. JOWETT HIRST et al.330

determine whether changes in responding are
associated with the presence or absence of peers
would be useful.
In addition to comparing the effectiveness of

DR and RC procedures in individual and
group contexts, considering preference is also
important; however, only two studies that have
compared DR and RC in token economies
have empirically evaluated preferen

ce

(Donaldson et al., 2014; Iwata & Bailey,
1974). Iwata and Bailey (1974) compared the
effects of DRO and RC for reducing disruptive
classroom behavior displayed by 15 elementary
school special-education students. To deter-
mine preference across the procedures, the
experimenters conducted a choice assessment
during which each child was given the opportu-
nity to select which token procedure would be
implemented for a particular session. After all
children made a selection, the chosen token
procedure was implemented for each child. The
results showed that four students chose DRO
most often, five students chose RC more often,
and six students switched their selection across
opportunities. Donaldson et al. (2014) used a
similar procedure and found that six of the
12 children preferred RC, four children pre-
ferred DRO, and two children had approxi-
mately equal preference.
These studies provide evidence that prefer-

ence varies among individuals; however, the
results are limited, at least in Donaldson
et al. (2014), because children made selections
vocally and in the presence of their peers
(Iwata & Bailey, 1974, did not provide infor-
mation regarding how or where children made
a selection). Therefore, some children’s selec-
tions may have been influenced by the presence
or behavior (e.g., choices or comments) of their
peers (Donaldson et al., 2014). To isolate indi-
vidual preference, it is important to conduct a
preference assessment when the child is not in
the presence of his or her peers (e.g., Layer,
Hanley, Heal, & Tiger, 2008). For example,
Layer et al. (2008) presented choices on an

upright board in front of each child with the
choices facing the child (not visible to other
children) and then had the child use a motor
response (i.e., pointing), rather than a vocal
response (i.e., stating which procedure he or
she liked best), to make his or her selection.
This procedure controlled for both visual and
auditory observation of other children’s choice.
Overall, given the demonstrated effectiveness

of DR and RC but unknown influence of peers
and lack of empirical data for preference in the
absence of peers, further research is warranted.
The current study involved several evaluations
that replicate and extend previous research. The
purpose of the first evaluation was to replicate
research directly comparing the effectiveness of
DR and RC procedures in a group setting. The
second purpose was to provide a direct compar-
ison of the effectiveness of DR and RC proce-
dures for the on-task behavior of individual
children engaged in a solitary work task. The
third purpose was to evaluate individual prefer-
ence of all children in the absence of peers.
Finally, responding of individuals who partici-
pated in both the small-group activity and the
solitary work task was compared to determine
if the presence of peers influenced responding.

STUDY 1: DR VERSUS RC (GROUP)

Method
Participants and setting. Three groups of

three typically developing preschool-aged (3 to
5 years old) children who attended a university-
based preschool program participated. All chil-
dren could follow multistep instructions (e.g.,
walk to your cubby, hang up your jacket, and
come sit on the floor) and communicated using
vocal speech. We conducted sessions 3 to 5 days
per week, once or twice per day, in a quiet area
of the classroom separate from all other chil-
dren. During each session, only one group of
participants was present. Participants sat next
to one another on the floor on designated mats
across from the experimenter, and one to two

331REINFORCEMENT AND RESPONSE COST

data collectors and relevant session materials
were present.
Materials. During all sessions, small-group

activity materials were present. Materials
included plastic letters and numbers for expres-
sive labeling and individual bingo boards with
various items (i.e., plastic buttons and jewels)
for matching. During some sessions, tokens
(i.e., pennies) were present that could be earned
or lost. Tokens were attached to and removed
from laminated strips of paper (approximately
10.2 cm by 30.5 cm) with 10 square pieces of
Velcro. Participants earned access to a toy roo

m

with tangible items (e.g., stickers, plastic rings,
spin tops, sticky hands), edible items (e.g.,
gummies, Smarties, Skittles, and M&Ms), and
leisure activities (e.g., video games and DVDs)
via token exchange following some sessions
(DR and RC). Different-colored materials (pos-
ters and token boards) were present during
each of the different conditions to aid in dis-
crimination between conditions.
Response measurement and interobserver agree-

ment. Trained graduate and undergraduate stu-
dents collected data using paper and a pencil.
The dependent variable was percentage of
intervals with on-task behavior. We defined on-
task behavior as sitting on a mat (i.e., bottom
on the mat), keeping hands to oneself (i.e.,
keeping hands in lap unless instructed to
manipulate activity materials), and sitting
quietly (i.e., talking only when the experi-
menter asked or called on the participant to
respond). We partitioned sessions into 5-s
intervals and scored on-task behavior for each
child using a momentary-time-sample proce-
dure. That is, at the end of every 5-s interval
(signaled by an auditory cue), the data collector
scored whether each child was on task at that
moment. After each session, we collected data
for on-task behavior of an individual child by
dividing the number of intervals on task by the
total number of intervals in the session and
converting the result to a percentage. In addi-
tion, for two groups, experimenters collected

data on the number of tokens that remained on
each participant’s board at the end of a DR ses-
sion or the number of empty spaces on each
participant’s board at the end of each RC ses-
sion. We later subtracted the number of empty
spaces counted after RC sessions from 10 to
compare number of net tokens in each session.
Two independent observers collected data

for at least 30% of sessions and then calculated
interobserver agreement for on-task behavior by
dividing the number of 5-s intervals during
which both observers agreed by the total num-
ber of intervals and converting the result to a
percentage. We defined an agreement for on-
task behavior as both observers scoring or not
scoring the occurrence of the behavior in a
given interval. We calculated interobserver
agreement for token count using the total
method. That is, we divided the smaller num-
ber of tokens that remained on a board (at the
end of each DR session) or were missing from
the board (at the end of each RC session) by
the larger number and converted the result to a
percentage. Interobserver agreement averaged
93% (range, 73% to 100%) for on-task behav-
ior and 99% (range, 88% to 100%) for token
count.
Procedure. All sessions lasted 5 min. During

all sessions, the participants sat next to one
another and in front of the experimenter in a
small area away from the other children in the
classroom. In addition, the experimenter placed
bingo boards with pieces and token boards
(in some sessions) in front of each participant
and a colored poster board on the wall in front
of the children. Before the start of the first ses-
sion of each condition, the experimenter
described the rules and the session contingen-
cies and required each participant to practice
engaging in related behaviors (e.g., sitting
quietly, talking out of turn, keeping hands in
lap, and touching materials) to experience the
consequences associated with each behavior.
During the 5-min sessions, the experimenter
provided continuous individual and group

ERICA S. JOWETT HIRST et al.332

instructions to name letters and numbers (e.g.,
the experimenter held up a plastic letter and
said, “Caroline, what letter is this?” and “Can
everybody tell me what letter this is?”) and
place a marker on a specific bingo board letter
or number (e.g., “Ok everyone, put a gem on
the letter d”). The experimenter delivered sev-
eral instructions during a session in a way that
was similar to instructions delivered during a
classroom activity; however, the rate at which
instructions were provided varied depending on
responding. During all sessions, if a child
(or children) responded correctly, the experi-
menter delivered praise, and if any child did
not respond correctly, the experimenter
prompted the correct response and then moved
on to another instruction.
First, the experimenter conducted baseline

sessions to determine the level of on-task
behavior in the absence of programmed conse-
quences. Next, the experimenter practiced
token trading with the participants. That is, the
experimenter gave each child tokens and the
opportunity to trade the tokens for various
items (e.g., prizes and snacks). Next, we com-
pared DR and RC to determine their effects on
on-task behavior. During DR and RC sessions,
the experimenter observed each participant in
the group at the same moment every 30 s on
average (ranging from 15 to 45 s) according to
a schedule based on a pseudorandom number
generator in Excel. We created three versions of
the schedule and rotated across sessions to
reduce the likelihood that the participants
would learn a schedule. During each scheduled
observation and depending on the condition,
the experimenter quietly delivered a token to
every child who was on task at that moment
(DR) or removed a token from any child who
was off task at that moment (RC). The experi-
menter did not say anything when delivering or
removing a token. We used the same schedules
across both conditions; therefore, the possible
number of net tokens across conditions was
equal (i.e., 10 tokens). In addition, the last

opportunity to earn or lose a token was at the
last second of each session; therefore, no partic-
ipant could earn or lose all tokens before the
end of the session.
After each DR and RC session, an experi-

menter took the participant to a room that
contained many different toys, leisure activities,
edible items, and trinkets that were not found
in the preschool classroom and gave the partici-
pant the opportunity to trade tokens for edible
items or trinkets or engagement with a toy or
leisure activity. A participant could trade one
token for 1 min to play with a toy or leisure
activity, one token for one edible item to con-
sume, or three tokens for one trinket to take
home. Each participant could spend the num-
ber of tokens he or she had for any combina-
tion of the above. All participants traded all
tokens at the end of a session. We used a mul-
tielement design in which we rapidly alternated
baseline, RC, and DR conditions to compare
the effects of the different procedures on on-
task behavior.
Baseline. Before the start of all baseline ses-

sions, the experimenter described the rules and
contingencies for the session and posted a white
board on the wall in front of the participants.
The experimenter stated the rules as follows:
“Today it’s white, and there are no tokens.
When we start, you need to sit on your mat,
keep your hands to yourself, and raise your
hand to talk.” During the session, the experi-
menter did not provide any programmed con-
sequences for any behavior, with the exception
of responses to correct and incorrect responding
(as mentioned above).
Differential reinforcement. Before the start of

all DR sessions, the experimenter described the
rules and contingencies for the session, posted a
green poster board on the wall in front of the
participants, and placed a green board with no
tokens on the floor in front of each participant.
The experimenter stated the rules as follows:
“Today you get the green board, and it doesn’t
have any tokens. If you stay on your mat, keep

333REINFORCEMENT AND RESPONSE COST

your hands to yourself, and raise your hand to
talk, you will get a token. If you get off your
mat, touch your friends, or talk during some-
one else’s turn, you will not get any tokens.
When small group is done, you can trade
your tokens for prizes and candy. If you don’t
have any tokens, you don’t get anything.”
Each participant had his or her own token
board. Throughout the session, the experi-
menter watched a timer, and during a sched-
uled observation, placed a token on the token
board of any participant who was on task.
The experimenter did not deliver any pro-
grammed consequences for participants who
were not on task.
Response cost. Before the start of all RC ses-

sions, the experimenter described the rules and
contingencies for the session, posted a red
poster board on the wall in front of the partici-
pants, and placed a red board with 10 tokens
in front of each participant. The experimenter
stated the rules as follows: “Today you get the
red board, and it has 10 tokens. If you stay on
your mat, keep your hands to yourself, and
raise your hand to talk, you will keep your
tokens. If you get off your mat, touch your
friends, or talk during someone else’s turn, you
will lose tokens. When small group is done,
you can trade your tokens for prizes and candy.
If you don’t have any tokens, you don’t get
anything.” During the session, the experi-
menter followed the variable momentary obser-
vation schedule as in the DR condition;
however, when a scheduled observation
occurred, the experimenter did not deliver con-
sequences for any participant who was on task
and removed a token from any participant’s
token board who was not on task.
Choice. When we observed stable levels of

responding in the DR and RC phases for
each participant, we conducted a preference
assessment to determine the procedure that
each participant preferred. We conducted this
evaluation with Groups 2 and 3 only because
one participant in Group 1 left the preschool

before evaluation of preference. We used a pro-
cedure similar to that used by Layer
et al. (2008) to evaluate preference. Before each
session, the experimenter placed the stimuli
(i.e., different-colored token boards and materi-
als) associated with each type of condition (i.e.,
baseline, RC, and DR) on the floor where the
experimenter conducted sessions. We presented
the DR token board without tokens present
and the RC token board with all tokens on the
board. Near each of the token boards was a
small strip of paper that matched the color of
the stimuli (e.g., a green strip of paper was
placed in front of the the DR token board).
The experimenter called each participant to the
small-group area one at a time and reminded
him or her of the contingencies associated with
each set of materials. Next, the experimenter
asked the participant to pick which session he
or she liked best by placing the colored strip of
paper associated with the selected condition
into a canvas bag. When the participant made
a selection, he or she was asked to go play in
another area of the classroom until this proce-
dure was repeated with each participant. This
method reduced the likelihood that a partici-
pant’s choice would be influenced by other
children’s prompts or comments or by obser-
ving the choices of other members in the
group. Although it is possible that children
could have discussed their choices with a peer
before his or her selection, informal observa-
tions suggest that this did not occur. However,
we did observe participants occasionally discuss
their choices after all participants had made a
selection. After all participants independently
made a selection, the experimenter called them
to the small-group area, drew a color from the
bag, then explained the contingencies in place
for the chosen session. After the experimenter
had explained the contingencies for the chosen
procedure, the experimenter implemented the
type of session chosen as described above. We
determined individual preference by counting
the number of selections of each procedure; the

ERICA S. JOWETT HIRST et al.334

procedure that an individual selected most
often was identified as the preferred procedure.
During the choice phase, we calculated inter-

observer agreement for selection of a procedure
using a total agreement method. That is, we
scored an agreement if both observers agreed
which procedure the participant selected and a
disagreement if the two observers disagreed.
Thus, interobserver agreement for selection
of a procedure for a particular session was
either 100% (the two observers agreed) or 0%
(the two observers disagreed). Interobserver
agreement for selection was 100% for all
participants.

Results
Figure 1 displays graphs of the percentage of

intervals of on-task behavior for all participants
in Groups 1, 2, and 3 and individual cumula-
tive selections and experimenter-selected proce-
dures during the choice phase for Groups
2 and 3. During the initial baseline, most parti-
cipants engaged in moderate to low levels of
on-task behavior, although participants in
Group 1 engaged in somewhat higher levels of
on-task behavior. When we compared DR and
RC, we observed similarly high levels of on-task
behavior for six of the nine participants (93%
during DR and 95% during RC) and higher
levels of on-task behavior during RC for three
participants (Adam, Molly, and Carl). When
we evaluated preference, one participant
switched his selections but selected DR more
than RC (Paul), two participants switched their
selections but selected RC more than DR (Judy
and Molly), and three participants selected

RC

exclusively (Carl, Jack, and Lance).
Table 1 provides a summary of results with

respect to percentage of selections during the
choice phase and average net tokens yielded
during the DR and RC comparison phase. We
did not evaluate preference or calculate net
tokens for Group 1; therefore, Table 1 includes
data only for participants in Groups 2 and

3. Preference results show that one participant
chose DR more than RC (Paul), and the other
five participants chose RC more than
DR. Also, three of six participants had an aver-
age difference of at least 0.5 tokens between
the two procedures, and all three participants
(Molly, Carl, and Lance) preferred response
cost, which was the procedure for which more
net tokens were yielded.

STUDY 2: DRA VERSUS RC
(INDIVIDUAL)

Method
The purposes of Study 2 were twofold. The

first purpose was to replicate Study 1 by com-
paring the effectiveness of and preference for
DR and RC in the context of an independent
work task. The second purpose was to compare
responding of participants in Studies 1 and
2 to evaluate the influence of the presence of
peers.
Participants and setting. Thirteen typically

developing preschool-aged (3 to 5 years old)
children (three of whom participated in Study
1) and one child with cerebral palsy (Brianna),
who were enrolled in a university-based pre-
school program, participated. All children could
follow multistep instructions and communi-
cated using vocal speech. We conducted ses-
sions 3 to 5 days per week, once or twice per
day, in session rooms that contained tables,
chairs, and relevant session materials. The
experimenter, one participant, and one or two
data collectors were present for each session.
Materials. During all sessions, we placed

worksheets with printed letters and shapes and
markers on a child-sized table, and two chairs
were available for the child and experimenter.
In addition, we placed toys from the preschool
classroom (e.g., puzzles, dolls, toy cars, coloring
book, and crayons) on the floor on the opposite
side of the session room. Tokens were identical
to those used in Study 1. We also used
different-colored token boards and poster

335REINFORCEMENT AND RESPONSE COST

02550751

0

0

B
L

D
R

v
s

R
C

A
da

m
B
L
D
R
v
s
R
C

C
ho

ic
e Pa

ul

0246810

B
L
D
R
v
s
R
C

C
ar

l
C
ho

ic
e

025507510
0

L
uk

e

Ju
dy

0246810

Ja
ck

0
5

10
15

20

025507510
0

A
nn

a
0

5
10

15
20

25
30M
ol

ly
5
10
15
20

25
30

0246810

L
an

ce

G
ro

u
p

1
G

ro
u

p
2

G
ro
u
p

3
% Intervals (On task)

S
es

si
on

s

Cumulative Selections

F
ig
u
re

1.
P
er
ce
n
ta
ge

of
on
-t

as
k

be
h
av
io
r
fo
r
A
da
m
,
L
u
ke
,
an
d
A
n
n
a
(G

ro
u
p
1)
;
P
au
l,
Ju
dy
,
an
d
M
ol
ly

(G
ro
u
p
2)
;
an
d
C
ar
l,
Ja
ck
,
an
d
L
an
ce

(G
ro
u
p
3)

du
ri
n
g

R
C

(fi
lle
d
ci
rc
le
s)
an
d
D
R
(fi
lle
d

tr
ia
n
gl
es

),
ba
se
lin

e
(o
pe
n
sq
u
ar
es
),
an
d
cu
m
u
la
ti
ve

se
le
ct
io
n
s
(o
pe
n
ci
rc
le
s
fo
r
R
C

se
le
ct
io
n
s
an
d
op
en

tr
ia
n
gl
es

fo
r
D
R
se
le
ct
io
n
s)
.

T
h
e
fi
lle
d
da
ta
po
in
ts
gr
ap
h
ed

al
on
g
th
e
le
ft
y
ax
is
du

ri
n
g
th
e
ch
oi
ce

ph
as
e
(G

ro
u
ps

2
an
d
3)

re
pr
es
en
t
pe
rc
en
ta
ge

of
i

n
te

rv
al

s
on

ta
sk

du
ri
n
g
th
e
co
n
di
ti
on

th
e
ex
pe
ri

m
en
te
r
se
le
ct
ed
.

ERICA S. JOWETT HIRST et al.336

boards to aid in the discrimination between the
conditions as in Study 1. Furthermore, partici-
pants earned access to the same toy room used
in Study 1 after some sessions; however, some
of the toys changed over time.
Response measurement and interobserver agree-

ment. Trained graduate and undergraduate stu-
dents collected data using handheld computers.
The dependent variable during all sessions was
percentage of intervals of on-task behavior. We
defined on-task behavior as the first instance of
walking to the work table, the first instance of
removing the lid of the marker, moving the
marker approximately within the boundaries of
the printed lines of a worksheet, and turning
over pages to access a new worksheet. We did
not score on-task behavior if the participant
was scribbling or drawing pictures on the work-
sheet or making patterns (e.g., dashed lines or
dots) within the printed boundaries of the let-
ters or shapes. We partitioned sessions into 5-s
intervals and scored on-task behavior using
partial-interval recording. That is, we scored
on-task behavior if it occurred during any por-
tion of the 5-s interval. Next, we converted
data to a percentage by dividing the number of
intervals during which the child was on task by
the total number of intervals in the session. We
also collected data on the frequency of token
delivery (i.e., when the experimenter placed a
token on the token board) and token removal
(i.e., when the experimenter removed a token
from the token board).

We calculated interobserver agreement for
on-task behavior as in Study 1 and calculated
interobserver agreement coefficients for token
delivery or removal by dividing the session time
into 5-s intervals and comparing observer data
on an interval-by-interval basis. If exact agree-
ment occurred (i.e., both observers scored or
did not score a token delivery or removal
within a 5-s interval), we gave a score of 1 for
that interval. For any disagreements, we divided
the smaller score in each interval by the larger.
We then summed interval scores, divided them
by the total number of observation intervals,
and converted the result to a percentage. Inter-
observer agreement for on-task behavior was
93% (range, 73% to 100%) and for token
delivery or removal it was 96% (range, 78%
to 100%).
Design. We used a multielement design for

10 participants to compare the effects of the
different procedures on on-task behavior, and
we conducted sessions in a quasirandom order.
In addition, for two of these participants, we
used a reversal design following the multiele-
ment design to rule out discrimination failure
or carryover effects during the multielement
comparison. However, because we conducted
the reversal designs after the participants had a
history of both procedures, we used a reversal
design with four participants to determine
levels of responding during DRA before and
after a history of RC.
Procedure. All sessions lasted 5 min. Before

the first session of each condition, the experi-
menter described the session contingencies and
required the participant to practice engaging in
related behaviors (i.e., tracing or playing with
toys) to experience the consequences associated
with each behavior, as in Study 1. For example,
the experimenter required the participant to
practice tracing by providing a vocal and model
prompt (i.e., “Try tracing like this,” while
demonstrating tracing), and used physical guid-
ance as necessary. After the participant prac-
ticed tracing, the experimenter provided the

Table 1
Percentage of Selections and Average Net Tokens Yielded

for Participants in Study 1 (Group Analysis)

% selections Average net tokens

Participant Group DR RC DR RC

Paul 2 67 33 9.8 9.9
Molly 2 22 78 8.5 9.4
Judy 2 11 89 9.4 9.0
Carl 3 0 100 7.3 9.1
Jack 3 0 100 9.1 9.1
Lance 3 0 100 8.9 9.6

337REINFORCEMENT AND RESPONSE COST

relevant consequences and repeated the contin-
gency for that particular phase (e.g., “Look,
you got a token because you were tracing.”).
Before the start of each subsequent session dur-
ing a particular phase, the experimenter
described the session contingencies (see condi-
tion descriptions below).
First, we conducted baseline sessions to

determine the level of on-task behavior in the
absence of programmed consequences. Next,
the experimenter practiced token trading
with the participant, as in Study 1. During
DRA and RC sessions, the experimenter deliv-
ered or removed tokens according to the
same variable momentary schedule used in
Study 1; however, the experimenter conducted
observations on a fixed 30-s schedule for
four participants (Brianna, Mark, Zoey, and
Sam), who participated later in the study, to
simplify data collection. In addition, after each
DRA and RC session, participants traded
tokens for prizes, candy, and access to leisure
items.
Baseline. Before the start of all baseline ses-

sions, the experimenter described the rules and
contingencies for the session and placed a white
board with no tokens near the participant. The
experimenter stated the rules as follows: “Today
you get the white board, and there are no
tokens. When we start, you can either work
on tracing or play with toys. If you are
working (i.e., tracing), nothing will happen, if
you are not working, nothing will happen.”
During the session, the experimenter did not
provide programmed consequences for any
behavior.
Differential reinforcement of alternative behav-

ior. Before the start of all DRA sessions, the
experimenter described the rules and contin-
gencies for the session and placed a green board
with no tokens near the participant. The exper-
imenter stated the rules as follows: “Today you
get the green board, and it doesn’t have any
tokens on it. When we start, you can either
work on tracing or play with toys. If you are

working, you will get a token; if you are not
working, you will not get a token. At the end,
you can trade your tokens for prizes and
snacks. If you don’t have any tokens, you don’t
get anything.” Throughout the session, the
experimenter watched a timer. If the partici-
pant was on task at the time of a scheduled
observation, the experimenter placed a token
on the token board. If the participant was not
on task at the time of the scheduled observa-
tion, the experimenter did not provide any pro-
grammed consequences.
Response cost. Before the start of all RC ses-

sions, the experimenter described the rules and
contingencies for the session and placed a red
board with 10 tokens near the participant. The
experimenter stated the rules as follows: “Today
you get the red board, and it has 10 tokens on
it. When we start, you can either work on trac-
ing or play with toys. If you are working, you
will keep your tokens; if you are not working,
you will lose tokens. At the end, you can trade
your tokens for prizes and snacks. If you don’t
have any tokens, you don’t get anything.”
Throughout the session, the experimenter
watched a timer. If the participant was on task
at the time of a scheduled observation, the
experimenter did not provide any programmed
consequences. If the participant was not on
task at the time of a scheduled observation, the
experimenter removed a token from the token
board.
Choice. When we observed stable levels of

responding in the DRA and RC evaluations,
we conducted a preference assessment to deter-
mine the procedure that each participant pre-
ferred. Before each session, the experimenter
placed the stimuli (i.e., poster and token
boards) associated with each type of condition
(i.e., baseline, RC, and DRA) near the partici-
pant and reminded him or her of the contin-
gencies associated with each set of materials.
For example, the experimenter reminded the
participant that the white board means that
there are no tokens; the green board means that

ERICA S. JOWETT HIRST et al.338

he or she can earn tokens if he or she is tracing;
and the red board means that he or she could
keep his or her tokens if he or she is tracing.
The experimenter switched the placement of
the different sets of stimuli and materials each
session. After the experimenter reminded the
participant of the contingencies associated with
each set of materials, the experimenter asked
the participant to pick (by pointing to or
touching a set of materials) which session he or
she wanted to do. When the participant made
the selection, the experimenter explained the
contingencies in place for the session (e.g.,
“You picked green, you will get a token when I
see that you are working on tracing.”). After
the participant chose a procedure, the experi-
menter implemented the chosen type of session
as described above. The experimenter con-
ducted sessions until we observed a stable pat-
tern of selections. During the choice phase, we
calculated interobserver agreement as in Study
1; it was 100% for all participants.

Results
Figure 2 shows the results for 10 of the

14 participants. During the initial baseline, all
participants engaged in moderate to low levels
of on-task behavior, and these levels remained
low throughout the evaluation (with the excep-
tion of Adam, Frank, and Martin, who engaged
in variable levels of on-task behavior during
baseline). When we compared DRA and RC
using a multielement design, we observed
(a) similar levels of on-task behavior for eight
of the 10 participants (average of 88% during
DRA and 85% during RC), (b) higher levels of
on-task behavior during DRA for one partici-
pant (Emily; 94% during DRA and 82% dur-
ing RC), and (c) higher levels of on-task
behavior during RC for one participant (Adam;
47% during DRA and 65% during RC). When
we compared DRA and RC using a reversal
design for two participants (Anna and Caro-
line), we observed similar and high levels of on-

task behavior as during the multielement evalu-
ation. When we evaluated preference, two par-
ticipants selected DRA exclusively (Paul and
Frank), three participants switched their selec-
tions but selected DRA more than RC (Martin,
Emily, and Adrianna), three participants
switched their selections but selected RC more
than DRA (Elisa, Adam, and Anna), and two
participants selected RC exclusively (Collin and
Caroline).
Figure 3 shows the results for Brianna,

Mark, Zoey, and Sam. During baseline ses-
sions, all participants engaged in low to zero
levels of on-task behavior. When we compared
DRA and RC using a reversal design only, we
observed similar and high levels of on-task
behavior for three of the four participants
(Brianna, Mark, and Zoey); however, we
observed higher levels of on-task behavior dur-
ing RC for one participant (Sam; 62% during
DRA and 90% during RC). These data suggest
that a history of response cost is not likely to
influence responding during DRA.
Table 2 provides a summary of results from

Study 2 with respect to the percentage of selec-
tions in the choice phase and the net tokens
yielded for participants during the DRA and
RC comparison phases. We evaluated prefer-
ence for 10 of the 14 participants and calcu-
lated net tokens for all participants. Preference
results show that five participants chose DR
more than RC and five chose RC more than
DR. Although these results are similar to those
of previous studies (e.g., Donaldson et al.,
2014; Iwata & Bailey, 1974), these results were
somewhat different than those of Study 1. That
is, the majority of participants preferred RC in
Study 1, but only half of the participants pre-
ferred RC in Study 2. Also, five of the 10 parti-
cipants in Study 2 for which we also assessed
preference had an average difference of at least
0.5 tokens between the two procedures, and
four of these five participants (Frank, Paul,
Adam, and Anna) preferred the procedure that
yielded more net tokens.

339REINFORCEMENT AND RESPONSE COST

GENERAL DISCUSSION

Overall, DR and RC were effective proce-
dures for increasing the on-task behavior of the
majority of children who participated in a
group activity (Study 1), and these findings
replicated those of previous research (e.g.,

Donaldson et al., 2014; Iwata & Bailey, 1974).
However, similar to Donaldson et al. (2014)
and Tanol, Johnson, McComas, and Cote
(2010), the procedures were differentially effec-
tive for some individuals in the group, which
suggests that analyzing individual data is

5 10 15 20 25 30 35 40 45 50

0
20

40

60

80

100

BL

Paul

DRA vs RC Choice

5 10 15 20 25 30 35 40

0
20
40
60
80

100
BL

Elisa

DRA vs RC Choice

5 10 15 20 25 30

0
20
40
60
80
100
BL

Frank

ChoiceDRA vs RC

10 20 30 40 50 60 70

0
20
40
60
80
100
BL

Adam

DRA vs RC Choice

5 10 15 20 25

0
20
40
60
80
100
BL

Martin

DRA vs RC Choice
5 10 15 20 25 30 35 40
0
20
40
60
80
100
BL

Collin

DRA vs RC Choice

5 10 15 20 25 30 35 40 45

0
20
40
60
80
100
BL

Emily

DRA vs RC Choice

10 20 30 40 50 60 70 80

0
20
40
60
80
100
BL

Anna

DRA vs RC Choice RC RC D D

5 10 15 20 25 30
0
20
40
60
80
100
BL

Adrianna

DRA vs RC Choice

10 20 30 40 50 60 70 80 90 100
0

20
40
60
80
100

Caroline

BL

DRA vs RC Choice D D DRA RC RC

%
I

n
te
rv
al

s
(O

n
t

as
k
)

Sessions

Figure 2. Percentage of on-task behavior for Paul, Frank, Martin, Emily, Adrianna, Elisa, Adam, Collin, Anna, and
Caroline during RC, DRA (also denoted as D during the short reversal phases for Anna and Caroline), and baseline in
the comparative analysis and choice phases. The symbol used for each data point during the choice phase represents the
condition selected by the participant for that session.

ERICA S. JOWETT HIRST et al.340

important because these differences may not
have been observed if we reported only group
averages. The importance of analyzing individ-
ual data is further supported by the results of
Study 2, which showed differential effects for
three participants (Adam, Emily, and Sam),
whereas the overall results suggest that the two
procedures are equally effective.
Several variables might have influenced

results of the current study, including the type
of contingency used (individual vs. group
oriented) and the experimental design. Results
showed that the comparative effectiveness of
the procedures was the same for all three

participants who participated in Studies 1 and
2 (Adam, Anna, and Paul). That is, RC was
more effective than DR for Adam during the
group activity and solitary work task, and the
procedures were equally effective for Anna and
Paul under both conditions. These results sug-
gest that the presence of peers did not influ-
ence the comparative effectiveness of DR and
RC. However, an analysis of the results for
Adam and Anna shows that these participants
engaged in 10% to 20% higher levels of on-
task behavior during the group evaluation than
in the individual evaluation. These results ten-
tatively suggest that the presence of peers may
enhance the effectiveness of the procedures for
some children. Because both procedures
resulted in equally higher levels of responding
in the presence of peers, it could be that obser-
ving a peer receiving a token increases the
value of the token or functions as a discrimina-
tive stimulus for on-task behavior (during DR
conditions). In addition, the aversiveness of
token loss might also be enhanced when
tokens are removed in the presence of peers
(during RC).
Although the relative efficacy of DR and RC

was not influenced by the use of group-
oriented contingencies, the overall effectiveness
of the procedures was greater during the group
activity. These higher levels of on-task behavior
during the group activity may have been due
to the differential effort or task difficulty across
tasks in the group activity and individual activ-
ity (i.e., it may have been more effortful to
trace letters than to keep one’s hands in one’s
lap and sit on the mat). In addition, higher
levels of on-task behavior in the group activity
may have been due to the absence of a salient
alternative task, as was provided in the individ-
ual activity (i.e., toys were available). However,
there were many alternative tasks available dur-
ing the group activity, such as playing with or
manipulating the bingo boards and pieces and
leaving the mat to join other activities in the
classroom.

2 4 6 8 10 12
0

20
40
60
80
100
%
I

nt
er

va
ls

(
O

n
T

as
k)

Brianna

DRA RC

BL

DRA

2 4 6 8 10 12
0
20
40
60
80
100
%
I
nt
er
va
ls
(
O
n
T
as
k)

Mark

DRA RC
BL
DRA
2 4 6 8 10 12
0
20
40
60
80
100
%
I
nt
er
va
ls
(
O
n
T
as
k)

Zoey

DRA
BL

RC DRA

5 10 15 20 25 30
0

20
40
60
80
100
%
I
nt
er
va
ls
(
O
n
T
as
k)
BL

Sam

DRA RC DRA

Sessions
RC

Figure 3. Percentage of on-task behavior for Brianna,
Mark, Zoey, and Sam during RC, DRA, and baseline.

341REINFORCEMENT AND RESPONSE COST

We used a multielement design in Study
1 and for 10 participants in Study 2. Thus,
similar effects observed across DR and RC may
have been due to multiple-treatment interfer-
ence because of the rapid alternation of condi-
tions that were similar in numerous respects.
Although we attempted to control for multiple-
treatment interference by including session
rules and discriminative stimuli, we also
attempted to address this concern by evaluating
the effects when a different design was used.
For two participants in Study 2 (Anna and Car-
oline), in which we used both a multielement
design and a reversal design to compare the
effects of DR and RC, we found similar results
regardless of which design was used. In addi-
tion, for four participants in Study 2 (Brianna,
Mark, Zoey, and Sam), in which we used only
a reversal design to compare DR and RC, we
showed similar levels of on-task behavior across
the two procedures as well as similar levels of
on-task behavior regardless of whether DR was
conducted before or after RC. These data sug-
gest that the use of a multielement design was
unlikely to influence the results.
With respect to preference, five of the 15 par-

ticipants in the choice evaluation preferred DR,

and the other 10 participants preferred RC. As
suggested in previous research (e.g., Donaldson
et al., 2014), several variables may have influ-
enced preference for the different procedures.
Participants may select the reinforcement pro-
cedure to avoid the loss condition, as observed
by Pietras, Brandt, and Searcy (2010), who
found that when they equated net tokens, par-
ticipants avoided the procedure that involved
token loss. In addition, participants may prefer
reinforcement, specifically when reinforcer
delivery is spaced evenly throughout the ses-
sion, because token delivery signals time pro-
gression through the session. That is, token
delivery provides feedback regarding the dura-
tion of the session, which may be valuable,
especially with young children.
With respect to preference for RC, the

potential aversion associated with RC may have
been eliminated because participants did not
contact loss often; as Donaldson et al. (2014)
noted, one participant mentioned preference
for RC due to losing few tokens. However,
additional variables also warrant consideration.
First, some participants may have preferred RC
because selection of the RC procedure results
in the delivery of all tokens; therefore, access to
all tokens may function as a reinforcer for selec-
tion of that procedure. In addition, selection of
RC over DR may be because, from the child’s
perspective, starting with tokens is viewed as
not having to work for the tokens. That is, the
procedure appears to be less effortful. To rule
out influence of the presence of tokens, future
researchers might evaluate preference under
conditions in which the tokens are present for
DR and RC (i.e., a cup of tokens next to the
DRA token board and tokens attached to the
RC board) or the tokens are not present (i.e.,
placing colored strips of paper representing
each procedure or asking the participant which
procedure he or she would like to do).
Other variables that might influence prefer-

ence in the current study are the consequences
that followed selection of a particular condition

Table 2
Percentage of Selections and Average Net Tokens Yielded

for Participants in Study 2 (Individual Analysis)

% selections Average net tokens

Participant DR RC DR RC

Frank 100 0 8.9 8.3
Paul 100 0 9.6 9.1
Martin 82 18 9.3 9.3
Adrianna 75 25 9.6 9.7
Emily 67 33 8.5 8.2
Adam 28 72 4.1 5.3
Elisa 21 79 8.4 8.2
Anna 18 82 7.6 8.8
Collin 0 100 9.8 9.1
Caroline 0 100 5.7 5.4
Brianna 8.7 8.7
Mark 9.4 9.6
Zoey 9.4 8.7
Sam 6.1 8.8

ERICA S. JOWETT HIRST et al.342

(DRA vs. RC) and the net tokens earned
within a particular condition. Participants in
the group evaluation may have chosen a differ-
ent procedure the next time they were offered a
choice if the experimenter did not implement
the procedure they had chosen in a given ses-
sion. However, an evaluation of data for parti-
cipants in Study 2 showed that participants
switched their selection during subsequent
choice opportunities when the session that the
experimenter implemented after a selection did
not match the initial selection on 38% (Paul),
38% (Molly), and 50% (Judy) of selections.
These results suggest that switches in selections
were not influenced by whether the session that
was implemented matched the procedure they
had selected, and these findings are consistent
with those of Layer et al. (2008).
Previous researchers have evaluated the

potential influence of net tokens across DR and
RC conditions. Iwata and Bailey (1974) calcu-
lated the average number of net tokens for the
class, and Donaldson et al. (2014) calculated
individual net token averages; both studies
found that net tokens were similar across proce-
dures. Although the number of net tokens was
similar, because some participants preferred one
procedure over another, it could be that even
slight differences may influence preference. In
the current study, we were able to evaluate
preference for 15 participants (twice with Paul)
and found that seven of the 14 children who
participated once (and Paul on one occasion in
Study 2) yielded an average difference of at
least 0.5 tokens between the two procedures.
Of these eight participants, seven preferred the
procedure for which more net tokens were
yielded during the comparison phase. However,
in previous research and in the current study,
experimenters did not manipulate the number
of net tokens. Therefore, the influence of net
tokens on preference is unknown, and research
on this variable is warranted.
Another point of discussion relates to

best practice guidelines. The general

recommendation is to use reinforcement-based
procedures when possible (Bailey & Burch,
2005). Therefore, because RC is a negative
punishment procedure (Kazdin, 1977), RC
often is not recommended before implementa-
tion of positive reinforcement procedures.
However, given that (a) RC is just as effective
as reinforcement, (b) RC has limited side
effects (Kazdin, 1972), (c) more participants
preferred RC in the current study, and
(d) previous researchers have also found prefer-
ence for punishment procedures (e.g., Hanley,
Piazza, Fisher, & Maglieri, 2005), reconsidera-
tion of best practice appears to be warranted.
Perhaps the use of effective and preferred pro-
cedures should be considered best practice
(e.g., Hanley, 2010).
There are several areas for future research.

First, we were able to compare responding of
only three individuals who participated in both
the group activity and solitary work task; there-
fore, our conclusions about the effects of peer
presence are limited, and future researchers
should consider conducting this evaluation with
a larger number of participants. Second,
because we conducted both preference evalua-
tions in Studies 1 and 2 in the absence of peers,
we were unable to compare choice in the pres-
ence versus absence of peers.
Third, we did not collect data on side effects

of the procedures, which may be important,
specifically with the possibility of negative side
effects (e.g., emotional responding or increases
in problem behavior) when RC procedures are
used. However, little to no negative side effects
have been reported during the use of RC proce-
dures (Conyers et al., 2004; Kazdin, 1972) nor
were negative side effects observed in the cur-
rent study.
Fourth, future researchers should include a

measure of accuracy. In the current study, we
selected on-task behavior because it was age
appropriate, but we did not measure the accu-
racy of responding. Iwata and Bailey (1974)
showed decreases in rule violations without

343REINFORCEMENT AND RESPONSE COST

increasing correct responding. Because on-task
behavior is a prerequisite for accurate respond-
ing in many situations, correct responding
should increase as children are attending; there-
fore, future researchers should measure changes
in accuracy when reinforcement and punish-
ment contingencies are in effect for on-task
behavior.
Fifth, we arranged individual contingencies,

rather than interdependent group-oriented con-
tingencies or dependent group-oriented contin-
gencies. Individual and interdependent group-
oriented contingencies require that the teacher
monitor the behavior of each child and then
deliver consequences based on the behavior of
each child individually or for the behavior of
the group, respectively; on the other hand, a
dependent group-oriented contingency requires
that a teacher monitor the behavior of only one
child in the group. Herman and Tramontana
(1971) found no difference in the effectiveness
of individual and group contingencies and sug-
gested that group contingencies may be easier
for teachers. Therefore, future researchers
should compare DR and RC using dependent
and interdependent group-oriented contingen-
cies (see Litow & Pumroy, 1975, for a brief
review of group contingencies).
Finally, because we associated specific colors

with the different procedures, children’s choices
for procedures may have been based on prefer-
ence for color rather than procedure. However,
anecdotal reports do not suggest that partici-
pants had strong preferences for colors (i.e., it
was not common for participants to report
color preference during the choice evaluation).
Future researchers might control for the influ-
ence of color preferences by using low or mod-
erately preferred colors for the stimuli used for
the DR and RC procedures (e.g., Luczynski &
Hanley, 2009) or changing the colors associ-
ated with the procedures throughout the study.
In summary, there are several important

implications of the current study. First, the
results suggest that both DR and RC are

similarly effective; therefore, teachers might use
the procedure that more children prefer or that
is easier to implement in a classroom setting.
Second, the presence of peers does not appear
to influence the relative efficacy of the proce-
dures; therefore, future researchers might con-
tinue to conduct comparisons of DR and RC
in group settings for more efficient data collec-
tion. Finally, considerations for best practice
should take into account preference, given the
large number of participants who preferred RC.

REFERENCES

Bailey, J. S., & Burch, M. R. (2005). Ethics for behavior
analysts. Mahwah, NJ: Erlbaum. doi: 10.4324/
9781410613738

Brent, D. E., & Routh, D. K. (1978). Response cost and
impulsive word recognition errors in reading-disabled
children. Journal of Abnormal Child Psychology, 6,
211–219. doi: 10.1007/bf00919126

Broughton, S. F., & Lahey, B. B. (1978). Direct and col-
lateral effects of positive reinforcement, response cost,
and mixed contingencies for academic performance.
Journal of School Psychology, 16, 126–136. doi:
10.1016/0022-4405(78)90051-1

Capriotti, M. R., Brandt, B. C., Ricketts, E. J.,
Espil, F. M., & Woods, D. W. (2012). Comparing
the effects of differential reinforcement of other
behavior and response-cost contingencies on tics in
youth with Tourette syndrome. Journal of Applied
Behavior Analysis, 45, 251–263. doi: 10.1901/
jaba.2012.45-251

Conyers, C., Miltenberger, R., Maki, A., Barenz, R.,
Jurgens, M., Sailer, A., … Kopp, B. (2004). A com-
parison of response cost and differential reinforce-
ment of other behavior to reduce disruptive behavior
in a preschool classroom. Journal of Applied Behavior
Analysis, 37, 411–415. doi: 10.1901/
jaba.2004.37-411

Doll, C., McLaughlin, T. F., & Barretto, A. (2013). The
token economy: A recent review and evaluation.
International Journal of Basic and Applied Science, 2,
131–149.

Donaldson, J. M., DeLeon, I. G., Fisher, A. B., &
Kahng, S. (2014). Effects of and preference for condi-
tions of token earn versus token loss. Journal of
Applied Behavior Analysis, 47, 537–548. doi:
10.1002/jaba.135

Hackenberg, T. D. (2009). Token reinforcement: A
review and analysis. Journal of the Experimental Analy-
sis of Behavior, 91, 257–286. doi: 10.1901/
jeab.2009.91-257

ERICA S. JOWETT HIRST et al.344

Hanley, G. P. (2010). Toward effective and preferred pro-
gramming: A case for the objective measurement of
social validity with recipients of behavior-change pro-
grams. Behavior Analysis in Practice, 3, 13–21.

Hanley, G. P., Piazza, C. C., Fisher, W. W., &
Maglieri, K. A. (2005). On the effectiveness of and
preference for punishment and extinction compo-
nents of function-based interventions. Journal of
Applied Behavior Analysis, 38, 51–65. doi: 10.1901/
jaba.2005.6-04

Herman, S. H., & Tramontana, J. (1971). Instructions
and group versus individual reinforcement in modify-
ing disruptive group behavior. Journal of Applied
Behavior Analysis, 4, 113–119. doi: 10.1901/
jaba.1971.4-113

Iwata, B. A., & Bailey, J. S. (1974). Reward versus cost
token systems: An analysis of the effects on students
and teacher. Journal of Applied Behavior Analysis, 7,
567–576. doi: 10.1901/jaba.1974.7-567

Kazdin, A. E. (1972). Response cost: The removal of con-
ditioned reinforcers for therapeutic change. Behavior
Therapy, 3, 533–546. doi: 10.1016/S0005-7894(72)
80001-7

Kazdin, A. E. (1977). The token economy: A review and
evaluation. New York, NY: Plenum Press. doi:
10.1007/978-1-4613-4121-5

Layer, S. A., Hanley, G. P., Heal, N. A., & Tiger, J. H.
(2008). Determining individual preschoolers’ prefer-
ences in a group arrangement. Journal of Applied
Behavior Analysis, 41, 25–37. doi: 10.1901/
jaba.2008.41-25

Litow, L., & Pumroy, D. K. (1975). A brief review of
classroom group-oriented contingencies. Journal of
Applied Behavior Analysis, 8, 341–347. doi: 10.1901/
jaba.1975.8-341

Luczynski, K. C., & Hanley, G. P. (2009). Do children
prefer contingencies? An evaluation of the efficacy of
and preference for contingent versus noncontingent
social reinforcement during play. Journal of Applied

Behavior Analysis, 42, 511–525. doi: 10.1901/
jaba.2009.42-511

McGoey, K. E., & DuPaul, G. J. (2000). Token rein-
forcement and response cost procedures: Reducing
the disruptive behavior of preschool children with
attention-deficit/hyperactivity disorder. School Psychol-
ogy Quarterly, 15, 330–343. doi: 10.1037/h0088790

Panek, D. M. (1970). Word association learning
by chronic schizophrenics on a token economy
ward under conditions of reward and punishment.
Journal of Clinical Psychology, 26, 163–167. doi:
10.1002/1097-4679(197004)26:2<163::aid-jclp2270 260208>3.0.co;2–5

Pietras, C. J., Brandt, A. E., & Searcy, G. D. (2010).
Human responding on random-interval schedules of
response-cost punishment: The role of reduced rein-
forcement density. Journal of the Experimental Analysis
of Behavior, 93, 5–26. doi: 10.1901/jeab.2010.93-5

Salend, S. J., & Kovalich, B. (1981). A group response-
cost system mediated by free tokens: An alternative
to token reinforcement. American Journal of Mental
Deficiency, 86, 184–187.

Sindelar, P. T., Honsaker, M. S., & Jenkins, J. R. (1982).
Response cost and reinforcement contingencies of
managing the behavior of distractible children in
tutorial settings. Learning Disability Quarterly, 5,
3–13. doi: 10.2307/1510610

Tanol, G., Johnson, L., McComas, J., & Cote, E. (2010).
Responding to rule violations or rule following: A
comparison of two versions of the Good Behavior
Game with kindergarten students. Journal of School
Psychology, 48, 337–355. doi: 10.1016/j.
jsp.2010.06.001

Received December 2, 2014
Final acceptance October 8, 2015
Action Editor, Jeanne Donaldson

345REINFORCEMENT AND RESPONSE COST

  • EFFICACY OF AND PREFERENCE FOR REINFORCEMENT AND RESPONSE COST IN TOKEN ECONOMIES
  • STUDY 1: DR VERSUS RC (GROUP)
    Method
    Participants and setting
    Materials
    Response measurement and interobserver agreement
    Procedure
    Baseline
    Differential reinforcement
    Response cost
    Choice

    Results
    STUDY 2: DRA VERSUS RC (INDIVIDUAL)
    Method
    Participants and setting
    Materials
    Response measurement and interobserver agreement
    Design
    Procedure
    Baseline
    Differential reinforcement of alternative behavior
    Response cost
    Choice

    Results
    GENERAL DISCUSSION
    REFERENCES

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP