Significance Achievement gaps increase income inequality and decrease workplace diversity by contributing to the attrition of underrepresented students from science, technology, engineering, and mathematics (STEM) majors. We collected data on exam scores and failure rates in a wide array of STEM courses that had been taught by the same instructor via both traditional lecturing and active learning, and analyzed how the change in teaching approach impacted underrepresented minority and low-income students. On average, active learning reduced achievement gaps in exam scores and passing rates. Active learning benefits all students but offers disproportionate benefits for individuals from underrepresented groups. Widespread implementation of high-quality active learning can help reduce or eliminate achievement gaps in STEM courses and promote equity in higher education.

Abstract We tested the hypothesis that underrepresented students in active-learning classrooms experience narrower achievement gaps than underrepresented students in traditional lecturing classrooms, averaged across all science, technology, engineering, and mathematics (STEM) fields and courses. We conducted a comprehensive search for both published and unpublished studies that compared the performance of underrepresented students to their overrepresented classmates in active-learning and traditional-lecturing treatments. This search resulted in data on student examination scores from 15 studies (9,238 total students) and data on student failure rates from 26 studies (44,606 total students). Bayesian regression analyses showed that on average, active learning reduced achievement gaps in examination scores by 33% and narrowed gaps in passing rates by 45%. The reported proportion of time that students spend on in-class activities was important, as only classes that implemented high-intensity active learning narrowed achievement gaps. Sensitivity analyses showed that the conclusions are robust to sampling bias and other issues. To explain the extensive variation in efficacy observed among studies, we propose the heads-and-hearts hypothesis, which holds that meaningful reductions in achievement gaps only occur when course designs combine deliberate practice with inclusive teaching. Our results support calls to replace traditional lecturing with evidence-based, active-learning course designs across the STEM disciplines and suggest that innovations in instructional strategies can increase equity in higher education.

In industrialized countries, income inequality is rising and economic mobility is slowing, resulting in strains on social cohesion (1). Although the reasons for these trends are complex, they are exacerbated by the underrepresentation of low-income and racial and ethnic minority students in careers that align with the highest-lifetime incomes among undergraduate majors: the science, technology, engineering, and mathematics (STEM) and health disciplines (2⇓–4). Underrepresentation in STEM is primarily due to attrition. Underrepresented minority (URM) students in the United States, for example, start college with the same level of interest in STEM majors as their overrepresented peers, but 6-y STEM completion rates drop from 52% for Asian Americans and 43% for Caucasians to 22% for African Americans, 29% for Latinx, and 25% for Native Americans (5). Disparities in STEM degree attainment are also pronounced for low-income versus higher-income students (6, 7).

Poor performance, especially in introductory courses, is a major reason why STEM-interested students from all backgrounds switch to non-STEM majors or drop out of college altogether (8⇓–10). Underrepresentation occurs because URM and low-income students experience achievement gaps—examination scores that are lower on average than their overrepresented peers in “gateway” STEM courses, along with failure rates that are higher (11, 12). In some cases, these disparities occur even when researchers control for prior academic performance—meaning that underrepresented students are underperforming relative to their ability and preparation (12). Achievement gaps between overrepresented and underrepresented students have been called “one of the most urgent and intractable problems in higher education” (ref. 13, p. 99).

Previously, most efforts to reduce achievement gaps and increase the retention of underrepresented students in STEM focused on interventions that occur outside of the courses themselves. For example, supplementary instruction programs are sometimes offered as optional companions to introductory STEM courses that have high failure rates. These supplemental sections are typically facilitated by a graduate student or advanced undergraduate, meet once a week, and consist of intensive group work on examination-like problems. Although most studies on supplemental instruction do not report data that are disaggregated by student subgroups, several studies have shown that low-income or URM students—hereafter termed students from minoritized groups in STEM, or MGS—gain a disproportionate benefit (SI Appendix, Table S1). Unfortunately, almost all studies of supplementary instruction fail to control for self-selection bias—the hypothesis that volunteer participants are more highly motivated than nonparticipants (refs. 14 and 15, but see ref. 16). A second widely implemented approach for reducing performance disparities provides multifaceted, comprehensive support over the course of a student’s undergraduate career. These programs may include summer bridge experiences that help students navigate the transition from high school to college, supplementary instruction for key introductory courses, financial aid, early involvement in undergraduate research, mentoring by peers and/or faculty, and social activities (SI Appendix, Table S2). Although these systemic programs have recorded large improvements in STEM achievement and retention for underrepresented students (17, 18), they are expensive to implement, depend on extramural funding, and are not considered sustainable at scale (19). A third approach that occurs outside of normal course instruction consists of psychological interventions that are designed to provide emotional support. Some of these exercises have also shown disproportionate benefits for underrepresented students (SI Appendix, Table S3).

Can interventions in courses themselves—meaning, changes in how science is taught—reduce achievement gaps and promote retention in STEM? A recent metaanalysis concluded that, on average, active learning in STEM leads to higher examination scores and lower failure rates for all students, compared to all students in the same courses taught via traditional lecturing (20). However, several reports from undergraduate biology courses also suggest that innovative course designs with active learning can reduce or even eliminate achievement gaps for MGS (12, 21⇓⇓–24). Is there evidence that active learning leads to disproportionate benefits for students from MGS across a wide array of STEM disciplines, courses, instructors, and intervention types? If so, that evidence would furnish an ethical and social justice imperative to calls for comprehensive reform in undergraduate STEM teaching (25).

Our answer to this question is based on a systematic review and individual-participant data (IPD) metaanalysis of published and unpublished studies on student performance. The studies quantified either scores on identical or formally equivalent examinations or the probability of passing the same undergraduate STEM course under active learning versus traditional lecturing (Materials and Methods). The contrast with traditional lecturing is appropriate, as recent research has shown that this approach still dominates undergraduate STEM courses in North America (26). In addition, passive and active approaches to learning reflect contrasting theories of how people learn. Although styles of lecturing vary, all are instructor-focused and grounded in a theory of learning that posits direct transmission of information from an expert to a novice. Active learning, in contrast, is grounded in constructivist theory, which holds that humans learn by actively using new information and experiences to modify their existing models of how the world works (27⇓⇓–30).

To be admitted to this study, datasets needed to disaggregate student information by race and ethnicity (or URM status) or by students’ socioeconomic status (e.g., by means of Pell Grant eligibility). These data allowed us to identify students from MGS. Although combining low-income and URM students devalued the classroom experiences of individual students or student groups, the combination is common in the literature (6, 31), represents the student groups of most concern to science policy experts (31), and increased the statistical power in the analysis by using student categories that may be reported differently by researchers, but often overlap (6, 10).

Our literature search, coding criteria, and data gathering resulted in datasets containing 1) 9,238 individual student records from 51 classrooms, compiled in 15 separate studies, with data from identical or formally equivalent examinations (32); and 2) 44,606 individual student records from 174 classrooms, compiled in 26 separate studies, with data on course passing rates, usually quantified as 1 minus the proportion of D or F final grades and withdrawals (33).

IPD metaanalyses, based on datasets like ours, are considered the most reliable approach to synthesizing evidence (34, 35). We analyzed the data using one-step hierarchical Bayesian regression models (Materials and Methods and SI Appendix, SI Materials and Methods). Below, we report the mean of the posterior distribution as well as the 95% credible intervals (CIs) for each estimate. The interpretation of the 95% CIs is “95% of the time, the estimate falls within these bounds.”

Results We found that, on average, the standardized achievement gap between MGS and non-MGS students on identical or formally equivalent examinations was −0.62 SDs in courses based on traditional lecturing (95% CI: −0.69 to −0.55). In courses that included active learning, this gap was −0.42 SDs (95% CI: −0.48 to −0.35) (Fig. 1A and SI Appendix, Table S4). Across many courses and sections, this represents a 33% reduction in achievement gaps on examinations in the STEM disciplines. Although students from MGS experience lower examination scores on average than students from non-MGS across both instructional types, the disparity is significantly reduced when instructors employ active learning. Fig. 1. Average achievement gaps are smaller in active-learning classes than traditional-lecturing classes. (A) Model-based estimates for the average achievement gaps in examination scores across STEM for students from MGS versus non-MGS under traditional lecturing (gold) and active learning (purple). The data are in units of SDs (SI Appendix, SI Materials and Methods). (B) Model-based estimates for the average achievement gaps in percentage of students passing a STEM course for students from MGS versus non-MGS. In both graphs, points show averages and the vertical bars show 95% Bayesian CIs; the dashed horizontal lines represent no gap in performance. Furthermore, we find that, on average, students from MGS pass at lower rates than students from non-MGS by 7.1% (95% CI: −8.4% to −6.6%) with traditional lecturing. The difference in passing is reduced to only −3.9% (95% CI: −5.2% to −2.5%) with active learning (Fig. 1B and SI Appendix, Table S5). When compared to traditional lecturing across an array of disciplines, courses, and sections, active learning reduced the gap in probability of passing between students from MGS versus students from non-MGS by 45%. A more granular analysis of changes in achievement gaps shows extensive variation among studies (Fig. 2). In 10 of the 15 studies with examination score data, students from MGS showed disproportionate gains under active learning relative to students from non-MGS (Fig. 2A). In 8 of these 10 cases, students from MGS still perform less well than students from non-MGS in both treatments, although achievement gaps shrink. The data from the remaining five studies show that active learning benefitted students from non-MGS more than students from MGS, in terms of performance on identical examinations. The analysis of passing rate data shows a similar pattern, with students from MGS showing a disproportionate reduction in failure rates in 15 of the 26 studies. In the remaining 11 studies, active learning benefitted students from non-MGS more than students from MGS in terms of lowering failure rates. Fig. 2. The magnitude of achievement gaps in active-learning (AL) versus passive-learning (PL) classes varies among studies. Each data point represents a single course; the majority of active-learning courses narrowed the achievement gap. In both panels, the red dashed 1:1 line indicates no difference in the gap between active and passive learning; the white area above the line indicates courses where the gap narrowed. (A) The Upper Left quadrant indicates studies where gaps in examination scores reversed, from students from non-MGS doing better under lecturing to students from MGS doing better under active learning. The Upper Right quadrant represents studies where gaps in examination scores favored students from MGS under both traditional lecturing and active learning. The Bottom Left quadrant signifies studies where students from non-MGS averaged higher examination scores than students from MGS under both passive and active instruction. The Bottom Right quadrant denotes studies where students from non-MGS outperformed students from MGS under active learning, but students from MGS outperformed students from non-MGS under traditional lecturing. Both axes are in units of SDs and indicate difference in performance between MGS and non-MGS students. (B) The Upper Left quadrant indicates studies where gaps in the probability of passing favored students from non-MGS under lecturing but MGS under active learning. The Upper Right quadrant represents studies where the probability of passing was higher for students from MGS versus non-MGS under both passive and active learning. The Lower Left quadrant signifies studies where students from MGS were less likely to pass than students from non-MGS under both modes of instruction. The Lower Right quadrant denotes studies where students from MGS were more likely than non-MGS to pass under traditional lecturing but less likely than non-MGS to pass under active learning. Both axes are percent passing and indicate the difference in performance between MGS and non-MGS students. Sensitivity analyses indicate that our results were not strongly influenced by unreasonably influential studies (SI Appendix, Fig. S2) or sampling bias caused by unpublished studies with low effect sizes—the file drawer effect. The symmetry observed in funnel plots for examination score and passing rate data, and the approximately Gaussian distributions of the changes in gaps in each study, each suggest that our sampling was not biased against studies with negative, no, or low effect sizes (Fig. 3). Fig. 3. Results appear robust to sampling bias. Funnel plots were constructed with the vertical axis indicating the 95% CI for the difference, under active learning versus lecturing, in (A) examination score gaps or (B) passing rate gaps, and the horizontal axis indicating the change in gaps. The dashed red vertical line shows no change; the solid black line shows the average change across studies. The histograms show data on (C) examination scores (in SDs) and (D) percent passing. The vertical line at 0 shows no change in the achievement gap. If the analyses reported in this study were heavily impacted by the file drawer effect, the distributions in A–D would be strongly asymmetrical, with low density on the lower left of each funnel plot and much less density to the left of the no-change line on the histograms. Some of the observed variation in active learning’s efficacy in lowering achievement gaps can be explained by intensity—the reported percentage of class time that students spend engaged in active-learning activities (SI Appendix, Table S6). For both examination scores and passing rates, the amount of active learning that students do is positively correlated with narrower achievement gaps: Only classes that implement high-intensity active learning narrow achievement gaps between students from MGS and non-MGS (Fig. 4). In terms of SDs in examination scores, students from MGS vs. non-MGS average a difference of −0.48 (95% CI: −0.60 to −0.37) with low-intensity active learning, but only −0.36 (95% CI: −0.45 to −0.27) with high-intensity active learning (Fig. 4A and SI Appendix, Table S7). These results represent a 22% and 42% reduction, respectively, in the achievement gap relative to traditional lecturing. Similarly, on average, differences in passing rates for students from MGS vs. non-MGS are −9.6% (95% CI: −11.0% to −8.2%) with low-intensity active learning, but only −2.0% (95% CI: −3.3 to 0.63%) with high-intensity active learning (Fig. 4B and SI Appendix, Table S8). These changes represent a 16% increase and a 76% reduction, respectively, in the achievement gap relative to passive learning. Fig. 4. Treatment intensity is positively correlated with narrower gaps. High-intensity active-learning courses have narrower achievement gaps between MGS and non-MGS students. In both graphs, points show averages and the vertical bars show 95% Bayesian CIs; the dashed horizontal lines represent no gap in performance. (A) Examination score gap. (B) Gap in percent passing. Intensity is defined as the reported proportion of time students spent actively engaged on in-class activities (SI Appendix, SI Materials and Methods). Other moderator analyses indicated that class size, course level, and discipline—for fields represented by more than one study in our dataset—did not explain a significant amount of variation in how achievement gaps changed (SI Appendix, Tables S9–S11). Although regression models indicated significant heterogeneity based on the type of active learning implemented, we urge caution in interpreting this result (SI Appendix, Tables S12 and S13). Active-learning types are author-defined and currently represent general characterizations. They are rarely backed by objective, quantitative data on the course design involved, making them difficult to interpret and reproduce (SI Appendix).

Discussion Earlier work has shown that all students benefit from active learning in undergraduate STEM courses compared to traditional lecturing (20). The analyses reported here show that across STEM disciplines and courses, active learning also has a disproportionately beneficial impact for URM students and for individuals from low-income backgrounds. As a result, active learning leads to important reductions in achievement gaps between students from MGS and students from non-MGS in terms of examination scores and failure rates in STEM. Reducing achievement gaps and increasing the retention of students from MGS are urgent priorities in the United States and other countries (36⇓–38). Our results suggest that, for students from MGS, active learning’s beneficial impact on the probability of passing a STEM course is greater than its beneficial impact on examination scores. Course grades in most STEM courses are largely driven by performance on examinations, even in active-learning courses that offer many nonexam points (39). As a result, achievement gaps on examinations often put underrepresented students in a “danger zone” for receiving a D or F grade or deciding to withdraw. On many campuses, median grades in introductory STEM courses range from 2.5 to 2.8 on a 4-point scale—equivalent to a C+/B− on a letter scale. In these classes, a final grade of 1.5 to 1.7 or higher—a C− or better—is required to continue in the major. If URM or low-income students have average examination scores that are 0.4 to 0.6 grade points below the scores of other students (12), then underrepresented students are averaging grades that are in or under the 2.0 to 2.4 or C range—putting many at high risk of not meeting the threshold to continue. As a result, even a small increase in examination scores can lift a disproportionately large number of URM and low-income students out of the danger zone where they are prevented from continuing. The boost could be disproportionately beneficial for students from MGS even if average grades are still low, because URM students in STEM are less grade-sensitive and more persistent, on average, than non-URMs (11, 40). This grittiness may be based on differences in motivation, as students from MGS are more likely than students from non-MGS to be driven by a commitment to family and community (41⇓–43). It is critical to realize, however, that active learning is not a silver bullet for mitigating achievement gaps. In some of the studies analyzed here, active learning increased achievement gaps instead of ameliorating them. Although the strong average benefit to students from MGS supports the call for widespread and immediate adoption of active-learning course designs and abandonment of traditional lecturing (23, 36), we caution that change will be most beneficial if faculty and administrators believe that underrepresented students are capable of being successful (44) and make a strong commitment to quality in teaching. Here, we define teaching quality as fidelity to evidence-based improvements in course design and implementation. Fidelity in implementation is critical, as research shows that it is often poor (45⇓–47). In addition, faculty who are new to active learning may need to start their efforts to redesign courses with low-intensity interventions that are less likely to improve student outcomes (Fig. 4). If so, the goal should be to persist, making incremental changes until all instructors are teaching in a high-intensity, evidence-based framework tailored to their courses and student populations (12, 39, 48). We propose that two key elements are required to design and implement STEM courses that reduce, eliminate, or reverse achievement gaps: deliberate practice and a culture of inclusion. Deliberate practice emphasizes 1) extensive and highly focused efforts geared toward improving performance—meaning that students work hard on relevant tasks, 2) scaffolded exercises designed to address specific deficits in understanding or skills, 3) immediate feedback, and 4) repetition (49). These are all facets of evidence-based best practice in active learning (38, 50, 51). Equally important, inclusive teaching emphasizes treating students with dignity and respect (52), communicating confidence in students’ ability to meet high standards (53), and demonstrating a genuine interest in students’ intellectual and personal growth and success (54, 55). We refer to this proposal as the heads-and-hearts hypothesis and suggest that the variation documented in Fig. 2 results from variation in the quality and intensity of deliberate practice and the extent to which a course’s culture supports inclusion. We posit that these head-and-heart elements are especially important for underrepresented students, who often struggle with underpreparation due to economic and educational disparities prior to college (55), as well as social and psychological barriers such as stereotype threat and microaggressions (56, 57). Our heads-and-hearts hypothesis claims that the effect of evidence-based teaching and instructor soft skills is synergistic for underrepresented students, leading to the disproportionate gains that are required to reduce achievement gaps (36, 57). Why might deliberate practice and inclusive teaching be particularly effective for MGS? Our answer relies on three observations or hypotheses. 1) If students from MGS have limited opportunities for quality instruction in STEM prior to college compared to students from overrepresented groups, they could receive a disproportionate benefit from the extensive and scaffolded time on task that occurs in a “high-intensity” active-learning classroom. Data on the impact of active-learning intensity reported here is consistent with this deliberate practice element of the heads-and-hearts hypothesis.

2) For students from MGS, the popular perception of STEM professionals as white or Asian males, the fact of underrepresentation in most STEM classrooms, stereotype threat, and microaggressions in the classroom can all raise the questions, “Do I belong here?” and “Am I good enough?” All students benefit from classroom cultures that promote self-efficacy, identity as a scientist, and sense of belonging in STEM (58), and students in active-learning course designs that reduced or eliminated achievement gaps have reported an increased sense of community and self-efficacy compared to their peers in the lecture-intensive version of the same course (22, 23). Similarly, recent research on 150 STEM faculty indicated that the size of achievement gaps is correlated with instructor theories of intelligence. Small gaps are associated with faculty who have a growth or challenge mindset, which emphasizes the expandability of intelligence and is inclusion-oriented, while larger gaps are correlated with faculty who have a fixed mindset, which interprets intelligence as innate and immutable and is therefore exclusion or selection-oriented (44). It is not yet clear, however, whether a change in classroom culture occurs in active-learning classrooms because of the emphasis on peer interaction, changes in student perception of the instructor, or both.

3) Synergy between deliberate practice and inclusive teaching could occur if a demonstrated commitment to inclusion in an active-learning classroom inspires disproportionately more intense effort from students from MGS. In support of this claim, a general chemistry companion course that used psychological interventions to address belonging, stereotype threat, and other issues, combined with evidence-based study skills training and intensive peer-led group problem-solving, narrowed achievement gaps while controlling for self-selection bias (16). Explicit and rigorous testing of the heads-and-hearts hypothesis has yet to be done, however, and should be a high priority in discipline-based education research. Our conclusions are tempered by the limitations of the study’s sample size. Although our search and screening criteria uncovered 297 studies of courses with codable data on the overall student population, we were only able to analyze a total of 41 studies with data on students from MGS. Based on this observation, we endorse recent appeals for researchers to disaggregate data and evaluate how course interventions impact specific subgroups of students (13, 22, 34, 59, 60). Because our analyses of moderator variables are strongly impacted by sample size, we urge caution in interpreting our results on class size, course level, and discipline. Our data on type of active learning are also poorly resolved, because publications still routinely fail to report quantitative data on the nature of course interventions, such as records from classroom observation tools (e.g., ref. 61 and SI Appendix, Table S12). We are also alert to possible sampling bias in terms of instructor and institution type (SI Appendix). If many or most of the researchers who contributed data to the study also acted as instructor of record in the experiments, they may not be representative of the faculty as a whole. Specifically, they may be more likely than most peers to be both well-versed in the literature on evidence-based teaching and highly motivated to support student success. Finally, the existing literature overrepresents courses at research-intensive institutions and underrepresents teaching-intensive institutions. Our recommendation to pursue active learning in all STEM courses is tempered by the dearth of evidence from the community college context, where over 40% of all undergraduates—and a disproportionate percentage of students from MGS—actually take their introductory courses (62, 63). Efforts to increase study quality and to intensify research on innovations that reduce achievement gaps should continue. Researchers have noted that “One is hard pressed to find a piece in academic or popular writings in the past century that does not use the word crisis to describe inequities in educational attainment” (ref. 37, p. 1; emphasis original). The data reported here offer hope in the form of a significant, if partial, solution to these inequities. Reforming STEM courses in an evidence-based framework reduces achievement gaps for underrepresented students and increases retention in STEM course sequences—outcomes that should help increase economic mobility and reduce income inequality.

Acknowledgments We thank Ian Breckheimer, Ailene Ettinger, and Roddy Theobald for statistical advice and Darrin Howell for help with hand-searching journals. We are deeply indebted to the community of researchers who supplied the raw data on student demographics and outcomes that made this study possible. Financial support was provided by the University of Washington College of Arts and Sciences.

Footnotes Author contributions: M.J.H., E.T., and S.F. designed research; E.J.T., M.J.H., E.T., S.A., E.N.A., S.B., N.C., D.L.C., J.D.C., G.D., J.A.G., K.H., J.H., N.I., L.J., H.J., M.K., M.E.L., C.E.L., A.L., S.N., V.O., S.O., B.R.P., S.B.P., D.L.S., I.W.C.-S., K.E.S., V.S., C.V., C.R.W., K.Z., and S.F. performed research; E.J.T. analyzed data; and E.J.T., M.J.H., E.T., S.A., E.N.A., S.B., N.C., D.L.C., J.D.C., G.D., J.A.G., K.H., J.H., N.I., L.J., H.J., M.K., M.E.L., C.E.L., A.L., S.N., V.O., S.O., B.R.P., S.B.P., D.L.S., I.W.C.-S., K.E.S., V.S., C.V., C.R.W., K.Z., and S.F. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

Data deposition: The data reported in this paper have been deposited in GitHub, https://github.com/ejtheobald/Gaps_Metaanalysis.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1916903117/-/DCSupplemental.