psychology self research paper
important paper with one requirement and two examples
Psychologyself-research report requirement
1.
APA Format
2. 5-6 pages
3. No plagiarism
4. With reference
5. Four variables you will look at: study, sleep, two measures of emotion/mood
What will you experiment on?
● Note that the type of research you will be doing is experimental
● This means that you will have a control (no manipulation) setting and an experimental (manipulated) setting
● Four variables you will look at: study, sleep, two measures of emotion/mood
● You will have a primary focus and a secondary focus
○ Primary focus: the effect of an IV on a DV
○ Secondary focus: the correlation between two variables (which you decide is the IV and DV is up to you)
● Designate three of four variables to these focuses, leave fourth out of study
● For the primary focus, choose an IV (this will be your “intervention”)
Choosing Your Emotion/Mood Variable
● Discrete Emotions Questionnaire
● Uses “Positive and Negative Affect Schedule”
○ Positive affect: propensity to experience positive emotions and respond positively to surroundings
○ Negative affect: propensity to experience negative emotions and respond negatively to surroundings
● Choose one positive item and one negative item to track during week 1
Operational Definitions
● What is study? (e.g. measure of accuracy, length of study period, number of concepts recalled)
● What is sleep? (e.g. time unconscious that does not include periods of awakeness during night or period of attempting to fall asleep)
● What is *insert negative emotion here*? (e.g. What is your definition of this emotion) ● What is *insert positive emotion here*? (e.g. What is your definition of this emotion?)
● When will you be studying? For how long? How will you measure it?
● When will you be sleeping? What is included in sleeping? How will you measure it? ● When are you affected by this emotion?
● Is it related to a particular activity? When = time of day or proximity to activity
COMPARISON OF STUDENT EVALUATIONS OF TEACHING 19
Comparison of Student Evaluations of Teaching With Online and Paper-Based Administration
Claudia J. Stanny and James E. Arruda
PSYC 473
Department of Psychology, University of West Florida
Abstract
When institutions administer student evaluations of teaching (SETs) online, response rates are lower relative to paper-based administration. We analyzed average SET scores from 364 courses taught during the fall term in 3 consecutive years to determine whether administering SET forms online for all courses in the 3rd year changed the response rate or the average SET score. To control for instructor characteristics, we based the data analysis on courses for which the same instructor taught the course in each of three successive fall terms. Response rates for face-to-face classes declined when SET administration occurred only online. Although average SET scores were reliably lower in Year 3 than in the previous 2 years, the magnitude of this change was minimal (0.11 on a five-item Likert-like scale). We discuss practical implications of these findings for interpretation of SETs and the role of SETs in the evaluation of teaching quality.
Comparison of Student Evaluations of Teaching With Online and Paper-Based Administration
Student ratings and evaluations of instruction have a long history as sources of information about teaching quality (Berk, 2013). Student evaluations of teaching (SETs) often play a significant role in high-stakes decisions about hiring, promotion, tenure, and teaching awards. As a result, researchers have examined the psychometric properties of SETs and the possible impact of variables such as race, gender, age, course difficulty, and grading practices on average student ratings (Griffin et al., 2014; Nulty, 2008; Spooren et al., 2013). They have also examined how decision makers evaluate SET scores (Boysen, 2015a, 2015b; Boysen et al., 2014; Dewar, 2011). In the last 20 years, considerable attention has been directed toward the consequences of administering SETs online (Morrison, 2011; Stowell et al., 2012) because low response rates may have implications for how decision makers should interpret SETs.
Online Administration of Student Evaluations
Administering SETs online creates multiple benefits. Online administration enables instructors to devote more class time to instruction (vs. administering paper-based forms) and can improve the integrity of the process. Students who are not pressed for time in class are more likely to reflect on their answers and write more detailed comments (Morrison, 2011; Stowell et al., 2012; Venette et al., 2010). Because electronic aggregation of responses bypasses the time-consuming task of transcribing comments (sometimes written in challenging handwriting), instructors can receive summary data and verbatim comments shortly after the close of the term instead of weeks or months into the following term.
Despite the many benefits of online administration, instructors and students have expressed concerns about online administration of SETs. Students have expressed concern that their responses are not confidential when they must use their student identification number to log into the system (Dommeyer et al., 2002). However, breaches of confidentiality can occur even with paper-based administration. For example, an instructor might recognize student handwriting (one reason some students do not write comments on paper-based forms), or an instructor might remain present during SET administration (Avery et al., 2006).
In-class, paper-based administration creates social expectations that might motivate students to complete SETs. In contrast, students who are concerned about confidentiality or do not understand how instructors and institutions use SET findings to improve teaching might ignore requests to complete an online SET (Dommeyer et al., 2002). Instructors in turn worry that low response rates will reduce the validity of the findings if students who do not complete an SET differ in significant ways from students who do (Stowell et al., 2012). For example, students who do not attend class regularly often miss class the day that SETs are administered. However, all students (including nonattending students) can complete the forms when they are administered online. Faculty also fear that SET findings based on a low-response sample will be dominated by students in extreme categories (e.g., students with grudges, students with extremely favorable attitudes), who may be particularly motivated to complete online SETs, and therefore that SET findings will inadequately represent the voice of average students (Reiner & Arnold, 2010).
Effects of Format on Response Rates and Student Evaluation Scores
The potential for biased SET findings associated with low response rates has been examined in the published literature. In findings that run contrary to faculty fears that online SETs might be dominated by low-performing students, Avery et al. (2006) found that students with higher grade-point averages (GPAs) were more likely to complete online evaluations. Likewise, Jaquett et al. (2017) reported that students who had positive experiences in their classes (including receiving the grade they expected to earn) were more likely to submit course evaluations.
Institutions can expect lower response rates when they administer SETs online (Avery et al., 2006; Dommeyer et al., 2002; Morrison, 2011; Nulty, 2008; Reiner & Arnold, 2010; Stowell et al., 2012; Venette et al., 2010). However, most researchers have found that the mean SET rating does not change significantly when they compare SETs administered on paper with those completed online. These findings have been replicated in multiple settings using a variety of research methods (Avery et al., 2006; Dommeyer et al., 2004; Morrison, 2011; Stowell et al., 2012; Venette et al., 2010).
Exceptions to this pattern of minimal or nonsignificant differences in average SET scores appeared in Nowell et al. (2010) and Morrison (2011), who examined a sample of 29 business courses. Both studies reported lower average scores when SETs were administered online. However, they also found that SET scores for individual items varied more within an instructor when SETs were administered online versus on paper. Students who completed SETs on paper tended to record the same response for all questions, whereas students who completed the forms online tended to respond differently to different questions. Both research groups argued that scores obtained online might not be directly comparable to scores obtained through paper-based forms. They advised that institutions administer SETs entirely online or entirely on paper to ensure consistent, comparable evaluations across faculty.
Each university presents a unique environment and culture that could influence how seriously students take SETs and how they respond to decisions to administer SETs online. Although a few large-scale studies of the impact of online administration exist (Reiner & Arnold, 2010; Risquez et al., 2015), a local replication answers questions about characteristics unique to that institution and generates evidence about the generalizability of existing findings.
Purpose of the Present Study
In the present study we examined patterns of responses for online and paper-based SET scores at a midsized, regional, comprehensive university in the United States. We posed two questions: First, does the response rate or the average SET score change when an institution administers SET forms online instead of on paper? Second, what is the minimal response rate required to produce stable average SET scores for an instructor? Whereas much earlier research relied on small samples often limited to a single academic department, we gathered SET data on a large sample of courses (N = 364) that included instructors from all colleges and all course levels over 3 years. We controlled for individual differences in instructors by limiting the sample to courses taught by the same instructor in all 3 years. The university offers nearly 30% of course sections online in any given term, and these courses have always administered online SETs. This allowed us to examine the combined effects of changing the method of delivery for SETs (paper-based to online) for traditional classes and changing from a mixed method of administering SETs (paper for traditional classes and online for online classes in the first 2 years of data gathered) to uniform use of online forms for all classes in the final year of data collection.
Method
Sample
Response rates and evaluation ratings were retrieved from archived course evaluation data. The archive of SET data did not include information about personal characteristics of the instructor (gender, age, or years of teaching experience), and students were not provided with any systematic incentive to complete the paper or online versions of the SET. We extracted data on response rates and evaluation ratings for 364 courses that had been taught by the same instructor during three consecutive fall terms (2012, 2013, and 2014).
The sample included faculty who taught in each of the five colleges at the university: 109 instructors (30%) taught in the College of Social Science and Humanities, 82 (23%) taught in the College of Science and Engineering, 75 (21%) taught in the College of Education and Professional Studies, 58 (16%) taught in the College of Health, and 40 (11%) taught in the College of Business. Each instructor provided data on one course. Approximately 259 instructors (71%) provided ratings for face-to-face courses, and 105 (29%) provided ratings for online courses, which accurately reflects the proportion of face-to-face and online courses offered at the university. The sample included 107 courses (29%) at the beginning undergraduate level (1st- and 2nd-year students), 205 courses (56%) at the advanced undergraduate level (3rd- and 4th-year students), and 52 courses (14%) at the graduate level.
Instrument
The course evaluation instrument was a set of 18 items developed by the state university system. The first eight items were designed to measure the quality of the instructor, concluding with a global rating of instructor quality (Item 8: “Overall assessment of instructor”). The remaining items asked students to evaluate components of the course, concluding with a global rating of course organization (Item 18: “Overall, I would rate the course organization”). No formal data on the psychometric properties of the items are available, although all items have obvious face validity.
Students were asked to rate each instructor as poor (0), fair (1), good (2), very good (3), or excellent (4) in response to each item. Evaluation ratings were subsequently calculated for each course and instructor. A median rating was computed when an instructor taught more than one section of a course during a term.
The institution limited our access to SET data for the 3 years of data requested. We obtained scores for Item 8 (“Overall assessment of instructor”) for all 3 years but could obtain scores for Item 18 (“Overall, I would rate the course organization”) only for Year 3. We computed the correlation between scores on Item 8 and Item 18 (from course data recorded in the 3rd year only) to estimate the internal consistency of the evaluation instrument. These two items, which serve as composite summaries of preceding items (Item 8 for Items 1–7 and Item 18 for Items 9–17), were strongly related, r(362) = .92. Feistauer and Richter (2016) also reported strong correlations between global items in a large analysis of SET responses.
Design
This study took advantage of a natural experiment created when the university decided to administer all course evaluations online. We requested SET data for the fall semesters for 2 years preceding the change, when students completed paper-based SET forms for face-to-face courses and online SET forms for online courses, and data for the fall semester of the implementation year, when students completed online SET forms for all courses. We used a 2 × 3 × 3 factorial design in which course delivery method (face to face and online) and course level (beginning undergraduate, advanced undergraduate, and graduate) were between-subjects factors and evaluation year (Year 1: 2012, Year 2: 2013, and Year 3: 2014) was a repeated-measures factor. The dependent measures were the response rate (measured as a percentage of class enrollment) and the rating for Item 8 (“Overall assessment of instructor”).
Data analysis was limited to scores on Item 8 because the institution agreed to release data on this one item only. Data for scores on Item 18 were made available for SET forms administered in Year 3 to address questions about variation in responses across items. The strong correlation between scores on Item 8 and scores on Item 18 suggested that Item 8 could be used as a surrogate for all the items. These two items were of particular interest because faculty, department chairs, and review committees frequently rely on these two items as stand-alone indicators of teaching quality for annual evaluations and tenure and promotion reviews.
Results
Response Rates
Response rates are presented in Table 1. The findings indicate that response rates for face-to-face courses were much higher than for online courses, but only when face-to-face course evaluations were administered in the classroom. In the Year 3 administration, when all course evaluations were administered online, response rates for face-to-face courses declined (M = 47.18%, SD = 20.11), but were still slightly higher than for online courses (M = 41.60%, SD = 18.23). These findings produced a statistically significant interaction between course delivery method and evaluation year, F(1.78, 716) = 101.34, MSE = 210.61, p < .001.[footnoteRef:0] The strength of the overall interaction effect was .22 (ηp2). Simple main-effects tests revealed statistically significant differences in the response rates for face-to-face courses and online courses for each of the 3 observation years.[footnoteRef:1] The greatest differences occurred during Year 1 (p < .001) and Year 2 (p < .001), when evaluations were administered on paper in the classroom for all face-to-face courses and online for all online courses. Although the difference in response rate between face-to-face and online courses during the Year 3 administration was statistically reliable (when both face-to-to-face and online courses were evaluated with online surveys), the effect was small (ηp2 = .02). Thus, there was minimal difference in response rate between face-to-face and online courses when evaluations were administered online for all courses. No other factors or interactions included in the analysis were statistically reliable. [0: A Greenhouse–Geisser adjustment of the degrees of freedom was performed in anticipation of a sphericity assumption violation.] [1: A test of the homogeneity of variance assumption revealed no statistically significant difference in response rate variance between the two delivery modes for the 1st, 2nd, and 3rd years.]
Evaluation Ratings
The same 2 × 3 × 3 analysis of variance model was used to evaluate mean SET ratings. This analysis produced two statistically significant main effects. The first main effect involved evaluation year, F(1.86, 716) = 3.44, MSE = 0.18, p = .03 (ηp2 = .01; see Footnote 1). Evaluation ratings associated with the Year 3 administration (M = 3.26, SD = 0.60) were significantly lower than the evaluation ratings associated with both the Year 1 (M = 3.35, SD = 0.53) and Year 2 (M = 3.38, SD = 0.54) administrations. Thus, all courses received lower SET scores in Year 3, regardless of course delivery method and course level. However, the size of this effect was small (the largest difference in mean rating was 0.11 on a five-item scale).
The second statistically significant main effect involved delivery mode, F(1, 358) = 23.51, MSE = 0.52, p = .01 (ηp2 = .06; see Footnote 2). Face-to-face courses (M = 3.41, SD = 0.50) received significantly higher mean ratings than did online courses (M = 3.13, SD = 0.63), regardless of evaluation year and course level. No other factors or interactions included in the analysis were statistically reliable.
Stability of Ratings
The scatterplot presented in Figure 1 illustrates the relation between SET scores and response rate. Although the correlation between SET scores and response rate was small and not statistically significant, r(362) = .07, visual inspection of the plot of SET scores suggests that SET ratings became less variable as response rate increased. We conducted Levene’s test to evaluate the variability of SET scores above and below the 60% response rate, which several researchers have recommended as an acceptable threshold for response rates (Berk, 2012, 2013; Nulty, 2008). The variability of scores above and below the 60% threshold was not statistically reliable, F(1, 362) = 1.53, p = .22.
Discussion
Online administration of SETs in this study was associated with lower response rates, yet it is curious that online courses experienced a 10% increase in response rate when all courses were evaluated with online forms in Year 3. Online courses had suffered from chronically low response rates in previous years, when face-to-face classes continued to use paper-based forms. The benefit to response rates observed for online courses when all SET forms were administered online might be attributed to increased communications that encouraged students to complete the online course evaluations. Despite this improvement, response rates for online courses continued to lag behind those for face-to-face courses. Differences in response rates for face-to-face and online courses might be attributed to characteristics of the students who enrolled or to differences in the quality of student engagement created in each learning modality. Avery et al. (2006) found that higher performing students (defined as students with higher GPAs) were more likely to complete online SETs.
Although the average SET rating was significantly lower in Year 3 than in the previous 2 years, the magnitude of the numeric difference was small (differences ranged from 0.08 to 0.11, based on a 0–4 Likert-like scale). This difference is similar to the differences Risquez et al. (2015) reported for SET scores after statistically adjusting for the influence of several potential confounding variables. A substantial literature has discussed the appropriate and inappropriate interpretation of SET ratings (Berk, 2013; Boysen, 2015a, 2015b; Boysen et al., 2014; Dewar, 2011; Stark & Freishtat, 2014).
Faculty have often raised concerns about the potential variability of SET scores due to low response rates and thus small sample sizes. However, our analysis indicated that classes with high response rates produced equally variable SET scores as did classes with low response rates. Reviewers should take extra care when they interpret SET scores. Decision makers often ignore questions about whether means derived from small samples accurately represent the population mean (Tversky & Kahneman, 1971). Reviewers frequently treat all numeric differences as if they were equally meaningful as measures of true differences and give them credibility even after receiving explicit warnings that these differences are not meaningful (Boysen, 2015a, 2015b).
Because low response rates produce small sample sizes, we expected that the SET scores based on smaller class samples (i.e., courses with low response rates) would be more variable than those based on larger class samples (i.e., courses with high response rates). Although researchers have recommended that response rates reach the criterion of 60%–80% when SET data will be used for high-stakes decisions (Berk, 2012, 2013; Nulty, 2008), our findings did not indicate a significant reduction in SET score variability with higher response rates.
Implications for Practice
Improving SET Response Rates
When decision makers use SET data to make high-stakes decisions (faculty hires, annual evaluations, tenure, promotions, teaching awards), institutions would be wise to take steps to ensure that SETs have acceptable response rates. Researchers have discussed effective strategies to improve response rates for SETs (Nulty, 2008; see also Berk, 2013; Dommeyer et al., 2004; Jaquett et al., 2016). These strategies include offering empirically validated incentives, creating high-quality technical systems with good human factors characteristics, and promoting an institutional culture that clearly supports the use of SET data and other information to improve the quality of teaching and learning. Programs and instructors must discuss why information from SETs is important for decision-making and provide students with tangible evidence of how SET information guides decisions about curriculum improvement. The institution should provide students with compelling evidence that the administration system protects the confidentiality of their responses.
Evaluating SET Scores
In addition to ensuring adequate response rates on SETs, decision makers should demand multiple sources of evidence about teaching quality (Buller, 2012). High-stakes decisions should never rely exclusively on numeric data from SETs. Reviewers often treat SET ratings as a surrogate for a measure of the impact an instructor has on student learning. However, a recent meta-analysis (Uttl et al., 2017) questioned whether SET scores have any relation to student learning. Reviewers need evidence in addition to SET ratings to evaluate teaching, such as evidence of the instructor’s disciplinary content expertise, skill with classroom management, ability to engage learners with lectures or other activities, impact on student learning, or success with efforts to modify and improve courses and teaching strategies (Berk, 2013; Stark & Freishtat, 2014). As with other forms of assessment, any one measure may be limited in terms of the quality of information it provides. Therefore, multiple measures are more informative than any single measure.
A portfolio of evidence can better inform high-stakes decisions (Berk, 2013). Portfolios might include summaries of class observations by senior faculty, the chair, and/or peers. Examples of assignments and exams can document the rigor of learning, especially if accompanied by redacted samples of student work. Course syllabi can identify intended learning outcomes; describe instructional strategies that reflect the rigor of the course (required assignments and grading practices); and provide other information about course content, design, instructional strategies, and instructor interactions with students (Palmer et al., 2014; Stanny et al., 2015).
Conclusion
Psychology has a long history of devising creative strategies to measure the “unmeasurable,” whether the targeted variable is a mental process, an attitude, or the quality of teaching (e.g., Webb et al., 1966). In addition, psychologists have documented various heuristics and biases that contribute to the misinterpretation of quantitative data (Gilovich et al., 2002), including SET scores (Boysen, 2015a, 2015b; Boysen et al., 2014). These skills enable psychologists to offer multiple solutions to the challenge posed by the need to objectively evaluate the quality of teaching and the impact of teaching on student learning.
Online administration of SET forms presents multiple desirable features, including rapid feedback to instructors, economy, and support for environmental sustainability. However, institutions should adopt implementation procedures that do not undermine the usefulness of the data gathered. Moreover, institutions should be wary of emphasizing procedures that produce high response rates only to lull faculty into believing that SET data can be the primary (or only) metric used for high-stakes decisions about the quality of faculty teaching. Instead, decision makers should expect to use multiple measures to evaluate the quality of faculty teaching.
References
Avery, R. J., Bryant, W. K., Mathios, A., Kang, H., & Bell, D. (2006). Electronic course evaluations: Does an online delivery system influence student evaluations? The Journal of Economic Education, 37(1), 21–37. https://doi.org/10.3200/JECE.37.1.21-37
Berk, R. A. (2012). Top 20 strategies to increase the online response rates of student rating scales. International Journal of Technology in Teaching and Learning, 8(2), 98–107.
Berk, R. A. (2013). Top 10 flashpoints in student ratings and the evaluation of teaching. Stylus.
Boysen, G. A. (2015a). Preventing the overinterpretation of small mean differences in student evaluations of teaching: An evaluation of warning effectiveness. Scholarship of Teaching and Learning in Psychology, 1(4), 269–282. https://doi.org/10.1037/stl0000042
Boysen, G. A. (2015b). Significant interpretation of small mean differences in student evaluations of teaching despite explicit warning to avoid overinterpretation. Scholarship of Teaching and Learning in Psychology, 1(2), 150–162. https://doi.org/10.1037/stl0000017
Boysen, G. A., Kelly, T. J., Raesly, H. N., & Casner, R. W. (2014). The (mis)interpretation of teaching evaluations by college faculty and administrators. Assessment & Evaluation in Higher Education, 39(6), 641–656. https://doi.org/10.1080/02602938.2013.860950
Buller, J. L. (2012). Best practices in faculty evaluation: A practical guide for academic leaders. Jossey-Bass.
Dewar, J. M. (2011). Helping stakeholders understand the limitations of SRT data: Are we doing enough? Journal of Faculty Development, 25(3), 40–44.
Dommeyer, C. J., Baum, P., & Hanna, R. W. (2002). College students’ attitudes toward methods of collecting teaching evaluations: In-class versus on-line. Journal of Education for Business, 78(1), 11–15. https://doi.org/10.1080/08832320209599691
Dommeyer, C. J., Baum, P., Hanna, R. W., & Chapman, K. S. (2004). Gathering faculty teaching evaluations by in-class and online surveys: Their effects on response rates and evaluations. Assessment & Evaluation in Higher Education, 29(5), 611–623. https://doi.org/10.1080/02602930410001689171
Feistauer, D., & Richter, T. (2016). How reliable are students’ evaluations of teaching quality? A variance components approach. Assessment & Evaluation in Higher Education, 42(8), 1263–1279. https://doi.org/10.1080/02602938.2016.1261083
Gilovich, T., Griffin, D., & Kahneman, D. (Eds.). (2002). Heuristics and biases: The psychology of intuitive judgment. Cambridge University Press. https://doi.org/10.1017/CBO9780511808098
Griffin, T. J., Hilton, J., III, Plummer, K., & Barret, D. (2014). Correlation between grade point averages and student evaluation of teaching scores: Taking a closer look. Assessment & Evaluation in Higher Education, 39(3), 339–348. https://doi.org/10.1080/02602938.2013.831809
Jaquett, C. M., VanMaaren, V. G., & Williams, R. L. (2016). The effect of extra-credit incentives on student submission of end-of-course evaluations. Scholarship of Teaching and Learning in Psychology, 2(1), 49–61. https://doi.org/10.1037/stl0000052
Jaquett, C. M., VanMaaren, V. G., & Williams, R. L. (2017). Course factors that motivate students to submit end-of-course evaluations. Innovative Higher Education, 42(1), 19–31. https://doi.org/10.1007/s10755-016-9368-5
Morrison, R. (2011). A comparison of online versus traditional student end-of-course critiques in resident courses. Assessment & Evaluation in Higher Education, 36(6), 627–641. https://doi.org/10.1080/02602931003632399
Nowell, C., Gale, L. R., & Handley, B. (2010). Assessing faculty performance using student evaluations of teaching in an uncontrolled setting. Assessment & Evaluation in Higher Education, 35(4), 463–475. https://doi.org/10.1080/02602930902862875
Nulty, D. D. (2008). The adequacy of response rates to online and paper surveys: What can be done? Assessment & Evaluation in Higher Education, 33(3), 301–314. https://doi.org/10.1080/02602930701293231
Palmer, M. S., Bach, D. J., & Streifer, A. C. (2014). Measuring the promise: A learning-focused syllabus rubric. To Improve the Academy: A Journal of Educational Development, 33(1), 14–36. https://doi.org/10.1002/tia2.20004
Reiner, C. M., & Arnold, K. E. (2010). Online course evaluation: Student and instructor perspectives and assessment potential. Assessment Update, 22(2), 8–10. https://doi.org/10.1002/au.222
Risquez, A., Vaughan, E., & Murphy, M. (2015). Online student evaluations of teaching: What are we sacrificing for the affordances of technology? Assessment & Evaluation in Higher Education, 40(1), 210–234. https://doi.org/10.1080/02602938.2014.890695
Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642. https://doi.org/10.3102/0034654313496870
Stanny, C. J., Gonzalez, M., & McGowan, B. (2015). Assessing the culture of teaching and learning through a syllabus review. Assessment & Evaluation in Higher Education, 40(7), 898–913. https://doi.org/10.1080/02602938.2014.956684
Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research. https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1
Stowell, J. R., Addison, W. E., & Smith, J. L. (2012). Comparison of online and classroom-based student evaluations of instruction. Assessment & Evaluation in Higher Education, 37(4), 465–473. https://doi.org/10.1080/02602938.2010.545869
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76(2), 105–110. https://doi.org/10.1037/h0031322
Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42. https://doi.org/10.1016/j.stueduc.2016.08.007
Venette, S., Sellnow, D., & McIntyre, K. (2010). Charting new territory: Assessing the online frontier of student ratings of instruction. Assessment & Evaluation in Higher Education, 35(1), 101–115. https://doi.org/10.1080/02602930802618336
Webb, E. J., Campbell, D. T., Schwartz, R. D., & Sechrest, L. (1966). Unobtrusive measures: Nonreactive research in the social sciences. Rand McNally.
Table 1
Means and Standard Deviations for Response Rates (Course Delivery Method by Evaluation Year)
Administration year
Online course
M
SD
M
SD
Year 1: 2012
71.72
16.42
32.93
15.73
Year 2: 2013
72.31
14.93
32.55
15.96
Year 3: 2014
47.18
20.11
41.60
18.23
Note. Student evaluations of teaching (SETs) were administered in two modalities in Years 1 and 2: paper based for face-to-face courses and online for online courses. SETs were administered online for all courses in Year 3.
Figure 1
Scatterplot Depicting the Correlation Between Response Rates and Evaluation Ratings
Note. Evaluation ratings were made during the 2014 fall academic term.
FAKE NEWS, FAST AND SLOW 1
Fake News, Fast and Slow:
Deliberation Reduces Belief in False (But Not True) News Headlines
Bence Bago, David G. Rand, and Gordon Pennycook
BUS 204
Hill/Levene Schools of Business, University of Regina
Abstract
What role does deliberation play in susceptibility to political misinformation and “fake news”? The Motivated System 2 Reasoning (MS2R) account posits that deliberation causes people to fall for fake news because reasoning facilitates identity-protective cognition and is therefore used to rationalize content that is consistent with one’s political ideology. The classical account of reasoning instead posits that people ineffectively discern between true and false news headlines when they fail to deliberate (and instead rely on intuition). To distinguish between these competing accounts, we investigated the causal effect of reasoning on media truth discernment using a two-response paradigm. Participants (N = 1,635 Mechanical Turkers) were presented with a series of headlines. For each paradigm, participants were first asked to give an initial, intuitive response under time pressure and concurrent working memory load. They were then given an opportunity to rethink their response with no constraints, thereby permitting more deliberation. We also compared these responses to a (deliberative) one-response baseline condition where participants made a single choice with no constraints. Consistent with the classical account, we found that deliberation corrected intuitive mistakes: Participants believed false headlines (but not true headlines) more in initial responses than in either final responses or the unconstrained one-response baseline. In contrast—and inconsistent with the MS2R account—we found that political polarization was equivalent across responses. Our data suggest that, in the context of fake news, deliberation facilitates accurate belief formation and not partisan bias.
Fake News, Fast and Slow:
Deliberation Reduces Belief in False (But Not True) News Headlines
Although inaccuracy in news is nothing new, so-called fake news—“fabricated information that mimics news media content in form but not in organizational process or intent” (Lazer et al., 2018, p. 1094)—has become a focus of attention in recent years. Fake news represents an important test case for psychologists: What is it about human reasoning that allows people to fall for blatantly false content? Here we consider this question from a dual-process perspective, which distinguishes between intuitive and deliberative cognitive processing (Evans & Stanovich, 2013; Kahneman, 2011). The theory posits that intuition allows for quick automatic responses that are often based on heuristic cues, whereas effortful deliberation can override and correct intuitive responses.
With respect to misinformation and the formation of (in)accurate beliefs, there is substantial debate about the roles of intuitive versus deliberative processes. In particular, there are two major views: the Motivated System 2 Reasoning (MS2R) account and the classical reasoning account. According to the MS2R account, people engage in deliberation to protect their (often political) identities and to defend their preexisting beliefs. As a result, deliberation increases partisan bias (Charness & Dave, 2017; Kahan, 2013, 2017; Kahan et al., 2012; Sloman & Rabb, 2019).[footnoteRef:0] In the context of evaluating news, this means that increased deliberation will lead to increased political polarization and decreased ability to discern true from false. Support for this account comes from studies that correlate deliberativeness with polarization. For example, highly numerate people are more likely to be polarized on a number of political issues, including climate change (Kahan et al., 2012) and gun control (Kahan et al., 2017). Furthermore, Kahan et al. (2017) experimentally manipulated the political congruence of information they presented to participants and found that the ratings of highly numerate participants responded more to the congruence manipulation. [0: ]
The classical account of reasoning, in contrast, argues that when people engage in deliberation, it typically helps uncover the truth (Evans, 2010; Evans & Stanovich, 2013; Pennycook & Rand, 2019a; Shtulman & McCallum, 2014; Stanovich, 2011; Swami et al., 2014). In the context of misinformation, the classical account therefore posits that it is lack of deliberation that promotes belief in fake news, while deliberation results in greater truth discernment (Pennycook & Rand, 2019a). Support for the classical account comes from correlational evidence that people who are dispositionally more deliberative are better able to discern between true and false news headlines, regardless of the ideological alignment of the content (Pennycook & Rand, 2019a; see also Bronstein et al., 2019; Pennycook & Rand, 2019b). Relatedly, it has been shown that people update their prior beliefs when presented with evidence about the scientific consensus regarding anthropogenic climate change, regardless of their prior motivation or political orientation (van der Linden et al., 2018; see also Lewandowsky et al., 2013). It has also been shown that training to detect fake news decreases belief regardless of partisanship (Roozenbeek & van der Linden, 2019a, 2019b). Although the researchers did not directly manipulate deliberation, these results suggest that engaging in reasoning leads to more accurate, rather than more polarized, beliefs.
To differentiate between the motivated and classical accounts, the key question, then, is this: When assessing news, does deliberation cause an increase in polarization or in accuracy? Here we shed new light on this question by experimentally investigating the causal link between deliberation and polarization (MS2R) versus correction (classical reasoning). Specifically, we used the two-response paradigm, in which participants are presented with the same news headline twice. First, they are asked to give a quick, intuitive response under time pressure and working memory load (Bago & De Neys, 2019). After this, they are presented with the task again and asked to give a final response without time pressure or working memory load (thus allowing unrestricted deliberation). This paradigm has been shown to reliably manipulate the relative roles of intuition and deliberation across a range of tasks (e.g., Bago & De Neys, 2017, 2019; Thompson et al., 2011).
The classical account predicts that false headlines—but not true headlines—will be judged to be less accurate in deliberative (final) responses compared to intuitive (initial) responses and that this should be the case regardless of whether the headlines are politically concordant (e.g., a headline with a pro-Democratic lean for a Democrat) or discordant (e.g., a headline with a pro-Democratic lean for a Republican). In contrast, the MS2R account predicts that politically discordant headlines will be judged to be less accurate and politically concordant headlines will be judged to be more accurate for deliberative responses compared with intuitive responses, regardless of whether the headlines are true or false.
Method
Data, preregistrations of sample sizes and primary analyses, and supplemental materials are available on the Open Science Framework (https://osf.io/egy8p). The preregistered sample for this study was 1,000 online participants recruited from Mechanical Turk (Horton et al., 2011): 400 for the one-response baseline condition and 600 for the two-response experiment. Participants from previous experiments of ours on this topic were not allowed to participate. In total, 1,012 participants were recruited (503 women and 509 men; Mage = 36.9 years). The research project was approved by the University of Regina and the MIT Research Ethics Boards.
Participants rated the accuracy of 16 actual headlines taken from social media: four each of Republican-consistent false, Republican-consistent true, Democrat-consistent false, and Democrat-consistent true. Headlines were presented in a random order and randomly sampled from a pool of 24 total headlines (from Pennycook & Rand, 2019a). For each headline, participants were asked “Do you think this headline describes an event that actually happened in an accurate way?” with the response options “Yes” or “No” (the order of “Yes/No” vs. “No/Yes” was counterbalanced across participants).
In the one-response baseline, participants merely rated the 16 headlines, taking as long as they desired for each. In the two-response experiment, participants made an initial response in which the extent of deliberation was minimized by having participants complete a load task (memorizing a pattern of five dots in a 4 x 4 matrix; see Bago & De Neys, 2019) and respond within 7 s (the average reading time in a pretest with 104 participants). They were then presented with the same headline again—with no time deadline or load—and asked to give a final response.
After rating the 16 headlines, participants completed a variety of demographic measures, including the Cognitive Reflection Test (CRT; Frederick, 2005; Thomson & Oppenheimer, 2016) and a measure of support for the Republican Party versus Democratic Party (which we used to classify headlines as politically concordant vs. discordant).
We analyzed the results using mixed-effect logistic regression models, with headlines and participants as random intercepts. Any analysis that was not preregistered is labeled as post hoc. We necessarily excluded the 4.1% of trials in which individuals missed the initial response deadline. We also preregistered that we would exclude trials in which individuals gave an incorrect response to the load task. However, we found a significant correlation between score on the CRT and performance on the cognitive load task (r = .11, p < .0001); thus, we kept the incorrectly solved load trials to avoid a possible selection bias. Note that, for completeness, we also ran the analysis with the preregistered exclusions and there were no notable deviations from the results presented here. Furthermore, 14 participants did not give a response to our political ideology question and were also excluded from subsequent analyses. As preregistered, we excluded no trials when comparing the one-response baseline to the final response of the two-response paradigm to avoid selection bias (apart from the 14 participants who did not answer the ideology question).
Results
Politically Neutral Pretest
We begin by reporting the results of a pretest that used politically neutral headlines (N = 623; see the online supplemental materials for details). Because there is no motivation to (dis)believe these headlines, the straightforward prediction was that deliberation would reduce the perceived accuracy of false (but not true) headlines. Indeed, in the two-response experiment there was a significant interaction between headline veracity and response number (initial vs. final; b = 0.47, 95% confidence interval [CI] [0.29, 0.65], p < .0001). Similarly, when comparing across conditions, there was a significant interaction between headline veracity and condition (one-response baseline vs. two-response experiment), using either the initial response (b = 0.61, 95% CI [0.42, 0.79], p < .0001) or the final response from the two-response experiment (b = 0.23, 95% CI [0.04, 0.41], p = .018) .This is shown in Figure 1. Deliberation increased the ability to discern true versus false politically neutral headlines.
Within-Subject Analysis
We now turn to our main experiment, where participants judged political headlines, to adjudicate between the MS2R and classical accounts (see Figure 2). First, we compared initial (intuitive) versus final (deliberative) responses within the two-response experiment to investigate the causal effect of deliberation within-subject. Consistent with the classical account, we found a significant interaction between headline veracity and response number (b = 0.36, 95% CI [0.20, 0.52], p < .0001), such that final responses rated false (but not true) news as less accurate relative to initial answers. Moreover, inconsistent with the MS2R account, there was no interaction between political concordance and response number (b = 0.004, 95% CI [−0.16, 0.17], p = .96) and no three-way interaction between response type, political concordance, and headline veracity (b = 0.03, 95% CI [−0.14, 0.21], p = .72). Thus, people were more likely to correct their response after deliberation, regardless of whether the item was concordant or discordant with their political beliefs. Naturally, concordance had some effect—people rated politically concordant headlines as more accurate than discordant ones (b = −0.21, 95% CI [−0.34, −0.07], p = .003)—but this was equally true for initial and final responses.
There was also a significant interaction between political concordance and headline veracity (b = −0.3, 95% CI [−0.47, −0.14], p = .0003), such that the difference between politically concordant and discordant news was larger for real items than for fake items—that is, people were more politically polarized for real news than for fake news—but, again, this was equally true for initial versus final responses. Finally, we found significant main effects of veracity (perceived accuracy was lower for false compared to true news; b = 1.56, 95% CI [1.14, 1.98], p < .0001) and response type (perceived accuracy was lower for final compared to initial responses; b = −0.38, 95% CI [−0.52, −0.25], p < .0001).
We then examined the role of dispositional differences in deliberativeness (as measured by performance on the CRT). We replicated prior findings that people who scored higher on the CRT were better at discerning true versus false headlines. We also found significant interactions with response number such that this relationship between CRT and discernment was stronger for final responses than initial responses (although still present for initial responses).
Between-Subjects Analysis
Finally, we compared perceived accuracy ratings in the two-response experiment with ratings from the one-response baseline (see Figure 2). We first report a post hoc analysis comparing the initial (intuitive) response from the two-response experiment with the one-response baseline. This recapitulates a standard load-time pressure experiment, in which some participants responded under load-pressure whereas others did not. We found a significant interaction between headline veracity and condition (b = 0.34, 95% CI [0.17, 0.51], p < .0001); concordance and veracity (b = −0.31, 95% CI [−0.48, −0.15], p = .0002); and veracity, condition, and concordance (b = 0.21, 95% CI [0.03, 0.39], p = .035). Load-time pressure increased perceived accuracy of fake headlines regardless of political concordance. Load-time pressure had no effect for politically concordant real headlines but did decrease perceived accuracy of politically discordant real headlines. Therefore, deliberation causes an increase in truth discernment for both concordant and discordant headlines.
We conclude by comparing the final (deliberative) response from the two-response experiment with the one-response baseline. This allows us to test whether forcing participants to report an initial response in the two-response experiment had some carryover effect on their final response (e.g., anchoring). Although there was no significant interaction between veracity and condition (b = 0.03, 95% CI [−0.14, 0.20], p = .74), there was a significant interaction between veracity and concordance (b = −0.28, 95% CI [−0.44, −0.11], p = .0009) and a significant three-way interaction between veracity, condition, and concordance (b = 0.19, 95% CI [0.01, 0.37], p = .037). Politically discordant items showed an anchoring effect whereby perceived accuracy of fake headlines was lower—and perceived accuracy of real headlines was higher—for the one-response baseline relative to the final response of the two-response condition. For politically concordant items, however, there was no such anchoring effect. Together with the significant anchoring effect among politically neutral headlines observed in our pretest, this suggests that there is something unique about politically concordant items when it comes to anchoring.
Discussion
What is the role of deliberation in assessing the truth of news? We found experimental evidence supporting the classical account over the MS2R account. Broadly, we found that people made fewer mistakes in judging the veracity of headlines—and in particular were less likely to believe false claims—when they deliberated, regardless of whether the headlines aligned with their ideology. Conversely, we found no evidence that deliberation influenced the level of partisan bias or polarization.
Theoretical Implications
These observations have important implications for both theory and practice. From a theoretical perspective, our results provide the first causal evidence regarding the “corrective” role of deliberation in media truth discernment. There has been a spirited debate regarding the role of deliberation and reasoning among those studying misinformation and political thought, but this debate has proceeded without causal evidence regarding the impact of manipulating deliberation on polarization versus correction. To our knowledge, our experiment is the first that enables this to be done—and provides clear support for the classical account of reasoning.
Limitations and Future Directions
Using similar methods to test the role of deliberation in the continued influence effect (Johnson & Seifert, 1994), wherein people continue to believe in misinformation even after it was retracted or corrected (Lewandowsky et al., 2012), is a promising direction for future work. So too is examining the impact of deliberation on the many (psychological) factors that have been shown to influence the acceptance of corrections, such as trust in the source of original information (Swire et al., 2017), underlying worldview or political orientation (Ecker & Ang, 2019), and strength of encoding of the information (Ecker et al., 2011). For example, deliberation might make it easier to accept corrections and update beliefs. Relatedly, the computations taking place during deliberation are underspecified, and therefore future work could benefit from developing formalized, computational models that better characterize underlying computations, such as the decision by sampling model (Stewart et al., 2006).
One limitation of the current work is that it was conducted on nonnationally representative samples from Mechanical Turk. However, it was not imperative for us to have an ideologically representative sample, because we were not making comparisons between holders of one ideology versus another. Instead, we investigated motivated reasoning—which should apply to both Democrats and Republicans—by comparing concordant versus discordant headlines (collapsing across Democrats and Republicans). It would be interesting for future work to replicate our results using a more representative sample to investigate the potential for partisan asymmetries in the impact of deliberation.
Another potential concern is that our sample may not have contained the people who are the most susceptible to misinformation, given that the baseline levels of belief in fake news we observed were low (Kahan, 2018). This problem is endemic in survey-based research on misinformation. Future work could address such issues by using advertising on social media to recruit participants who have actually shared misinformation in the past.
Practical Implications
From a practical perspective, the proliferation of false headlines has been argued to pose potential threats to democratic institutions and people by increasing apathy and polarization or even inducing violent behavior (Lazer et al., 2018). Thus, there is a great deal of interest around developing policies to combat the influence of misinformation. Such policies should be grounded in an understanding of the underlying psychological processes that lead people to fall for inaccurate content. Our results suggest that fast, intuitive (likely emotional; Martel et al., 2019) processing plays an important role in promoting belief in false content—and therefore that interventions that promote deliberation may be effective. Relatedly, this suggests that the success of fake news on social media may be related to users’ tendency to scroll quickly through their newsfeeds and the use of highly emotionally engaging content by authors of fake news. Most broadly, our results support the conclusion that encouraging people to engage in more thinking will be beneficial rather than harmful.
References
Bago, B., & De Neys, W. (2017). Fast logic? Examining the time course assumption of dual process theory. Cognition, 158, 90–109. https://doi.org/10.1016/j.cognition.2016.10.014
Bago, B., & De Neys, W. (2019). The intuitive greater good: Testing the corrective dual process model of moral cognition. Journal of Experimental Psychology: General, 148(10), 1782–1801. https://doi.org/10.1037/xge0000533
Bronstein, M. V., Pennycook, G., Bear, A., Rand, D. G., & Cannon, T. D. (2019). Belief in fake news is associated with delusionality, dogmatism, religious fundamentalism, and reduced analytic thinking. Journal of Applied Research in Memory & Cognition, 8(1), 108–117. https://doi.org/10.1016/j.jarmac.2018.09.005
Charness, G., & Dave, C. (2017). Confirmation bias with motivated beliefs. Games and Economic Behavior, 104, 1–23. https://doi.org/10.1016/j.geb.2017.02.015
Dawson, E., Gilovich, T., & Regan, D. T. (2002). Motivated reasoning and performance on the Wason Selection Task. Personality and Social Psychology Bulletin, 28(10), 1379–1387. https://doi.org/10.1177/014616702236869
Ecker, U. K., & Ang, L. C. (2019). Political attitudes and the processing of misinformation corrections. Political Psychology, 40(2), 241–260. https://doi.org/10.1111/pops.12494
Ecker, U. K., Lewandowsky, S., Swire, B., & Chang, D. (2011). Correcting false information in memory: Manipulating the strength of misinformation encoding and its retraction. Psychonomic Bulletin & Review, 18, 570–578. https://doi.org/10.3758/s13423-011-0065-1
Evans, J. (2010). Thinking twice: Two minds in one brain. Oxford University Press.
Evans, J. S. B., & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. https://doi.org/10.1177/1745691612460685
Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic Perspectives, 19(4), 25–42. https://doi.org/10.1257/089533005775196732
Haidt, J. (2001). The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychological Review, 108(4), 814–834. https://doi.org/10.1037/0033-295X.108.4.814
Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14, 399–425. https://doi.org/10.1007/s10683-011-9273-9
Johnson, H. M., & Seifert, C. M. (1994). Sources of the continued influence effect: When misinformation in memory affects later inferences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(6), 1420–1436. https://doi.org/10.1037/0278-7393.20.6.1420
Kahan, D. M. (2013). Ideology, motivated reasoning, and cognitive reflection. Judgment and Decision Making, 8(4), 407–424. http://journal.sjdm.org/13/13313/jdm13313
Kahan, D. M. (2017). Misconceptions, misinformation, and the logic of identity-protective cognition (Cultural Cognition Project Working Paper Series No. 164). Yale Law School. https://doi.org/10.2139/ssrn.2973067
Kahan, D. (2018). Who “falls for” fake news? Apparently no one. The Cultural Cognition Project at Yale Law School. Internet Archive. https://web.archive.org/web/20200919114124/http://www.culturalcognition.net/blog/2018/10/25/who-falls-for-fake-news-apparently-no-one.html
Kahan, D. M., Peters, E., Dawson, E. C., & Slovic, P. (2017). Motivated numeracy and enlightened self-government. Behavioural Public Policy, 1(1), 54–86. https://doi.org/10.1017/bpp.2016.2
Kahan, D. M., Peters, E., Wittlin, M., Slovic, P., Ouellette, L. L., Braman, D., & Mandel, G. (2012). The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change, 2, 732–735. https://doi.org/10.1038/nclimate1547
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus & Giroux.
Lazer, D. M. J., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., Schudson, M., Sloman, S. A., Sunstein, C. R., Thorson, E. A., Watts, D. J., & Zittrain, J. L. (2018, March 9). The science of fake news. Science, 359(6380), 1094–1096. https://doi.org/10.1126/science.aao2998
Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N., & Cook, J. (2012). Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3), 106–131. https://doi.org/10.1177/1529100612451018
Lewandowsky, S., Gignac, G. E., & Vaughan, S. (2013). The pivotal role of perceived scientific consensus in acceptance of science. Nature Climate Change, 3, 399–404. https://doi.org/10.1038/nclimate1720
Martel, C., Pennycook, G., & Rand, D. (2019). Reliance on emotion promotes belief in fake news. PsyArXiv. https://doi.org/10.31234/osf.io/a2ydw
Mercier, H., & Sperber, D. (2011). Why do humans reason? Arguments for an argumentative theory. Behavioral and Brain Sciences, 34(2), 57–74. https://doi.org/10.1017/S0140525X10000968
Pennycook, G., & Rand, D. G. (2019a). Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition, 188, 39–50. https://doi.org/10.1016/j.cognition.2018.06.011
Pennycook, G., & Rand, D. G. (2019b). Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking. Journal of Personality, 88(2), 185–200. https://doi.org/10.1111/jopy.12476
Roozenbeek, J., & van der Linden, S. (2019a). The fake news game: Actively inoculating against the risk of misinformation. Journal of Risk Research, 22(5), 570–580. https://doi.org/10.1080/13669877.2018.1443491
Roozenbeek, J., & van der Linden, S. (2019b). Fake news game confers psychological resistance against online misinformation. Palgrave Communications, 5, Article 65. https://doi.org/10.1057/s41599-019-0279-9
Shtulman, A., & McCallum, K. (2014). Cognitive reflection predicts science understanding. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (pp. 2937–2942). Cognitive Science Society.
Sloman, S. A., & Rabb, N. (2019). Thought as a determinant of political opinion. Cognition, 188, 1–7. https://doi.org/10.1016/j.cognition.2019.02.014
Stanovich, K. (2011). Rationality and the reflective mind. Oxford University Press.
Stewart, N., Chater, N., & Brown, G. D. (2006). Decision by sampling. Cognitive Psychology, 53(1), 1–26. https://doi.org/10.1016/j.cogpsych.2005.10.003
Swami, V., Voracek, M., Stieger, S., Tran, U. S., & Furnham, A. (2014). Analytic thinking reduces belief in conspiracy theories. Cognition, 133(3), 572–585. https://doi.org/10.1016/j.cognition.2014.08.006
Swire, B., Berinsky, A. J., Lewandowsky, S., & Ecker, U. K. (2017). Processing political misinformation: Comprehending the Trump phenomenon. Royal Society Open Science, 4, Article 160802. https://doi.org/10.1098/rsos.160802
Thompson, V. A., Prowse Turner, J. A., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63(3), 107–140. https://doi.org/10.1016/j.cogpsych.2011.06.001
Thomson, K. S., & Oppenheimer, D. M. (2016). Investigating an alternate form of the cognitive reflection test. Judgment and Decision Making, 11(1), 99–113. http://journal.sjdm.org/15/151029/jdm151029
van der Linden, S. V., Leiserowitz, A., & Maibach, E. (2018). Scientific agreement can neutralize politicization of facts. Nature Human Behaviour, 2, 2–3. https://doi.org/10.1038/s41562-017-0259-2
Figure 1
True and False Politically Neutral Headlines Rated as Accurate Across Conditions
Note. Error bars are 95% confidence intervals.
Figure 2
True and False Political Headlines Rated as Accurate Across Conditions and Political Concordance
Note. Error bars are 95% confidence intervals.
Initial Response False True 40 68 Final Response False True 32 70 One-Response Baseline False True 30 72
Headline Veracity
Percentage Rated as Accurate
Initial Response False True False True 36 69 32 57 Final Response False True False True 30 70 27 59 One-Response Baseline False True False True 30 71 25 61
Politically Concordant Politically Discordant
Percentage Rated as Accurate