Teaching Academic Honesty in CS50

David J. Malan

Each spring, educators from around the world gather for ACM’s Technical Symposium on Computer Science Education, otherwise known as SIGCSE, which “addresses problems common among educators working to develop, implement and/or evaluate computing programs, curricula, and courses.” Among the problems perennially is academic dishonesty, instances of plagiarism whereby students submit work that is not, in some way, their own. Indeed, on the schedule for SIGCSE 2018 in Baltimore just last month was a session on “GitHub, Tutors, Relatives, and Friends: Combating the Wide Web of Plagiarism,” aptly sub-subtitled “the Discussion Continues.” Structured as a “BOF flock” (a gathering of b̲irds o̲f a f̲eather) led by Amardeep Kahlon of Austin Community College, the session brought together a roomful of educators (myself among them) to share experiences and ideas:

Plagiarism is of great concern to faculty in all fields, including computer science as it leads to one certain outcome — a compromise not just in student learning but also in the entire academic process. Faculty have tried to find ways to deal with this epidemic such as writing new course materials each semester, putting a larger or entire grade focus on exams, or even asking individual students to explain their assignments. However, plagiarism remains a source of frustration for both faculty and administrators. This BOF will bring interested faculty together to discuss the various and surprising ways in which students plagiarize, the methods of countering plagiarism, and the currently available tools for detecting plagiarism. Questions we will be discussing include: Do students understand plagiarism in the context of writing software? How can we develop an atmosphere that discourages plagiarism? Does such a thing as a “plagiarism-proof” assignment exist? If programmers go to online repositories, modify the code, and use it in professional programs then is it fair to expect the students to do just the opposite?

Cases

The stories shared during the BOF were all too familiar. Indeed, nearly every year, to my knowledge, CS50 unfortunately refers more students than does any other course to Harvard University’s Honor Council (and, prior to Fall 2015, Administrative Board) for reasons of academic dishonesty:

Figure 1. Students referred to Harvard’s Honor Council (and, prior to Fall 2015, Administrative Board) for reasons of academic honesty in CS50, from Fall 2007 through Fall 2017. In blue are then-current students, and in red are then-former students (with whom then-current students appeared to have collaborated).

As a percentage of enrollment, those numbers have been highly variable and, until this past fall, trending upward, but, on average, 3.6% of CS50’s student body are referred each year:

Figure 2. Students referred to Harvard’s Honor Council (formerly Administrative Board) for reasons of academic honesty in CS50, as a percentage of enrollment, from Fall 2007 through Fall 2017, with trend line subimposed.

The variability is perhaps, in part, explained by ebb and flow in students’ behavior. The peaks in 2007, 2011, and 2016, for instance, might have been at the forefront of some students’ minds in 2008, 2012, and 2017, as we typically share with students in year i the data from year i-1. (Indeed, we typically discuss in lecture not only that we compare but how we compare all submissions for similarities in hopes of prevention.) But it’s also surely, in part, a function of time spent by the staff, myself included, reviewing submissions. I suspect I spent less time in 2009 than in other years, for instance, the result of which were no referrals (0%).

The upward trending, meanwhile, appears to correlate with rising enrollment:

Figure 3. CS50’s enrollment from Fall 2007 through Fall 2017.

To be fair, when it comes to academic dishonesty, it’s not that CS50 students (or CS students more generally) are any less honest than their peers in other courses (or fields) but that we actually look for it and, as computer scientists, are particularly equipped with tools to detect it.

Funnel

The upward trending likely correlates as well with time spent by the staff reviewing submissions. Indeed, whereas in years past I alone reviewed students’ submissions, in recent years has a pipeline of multiple staff reviewed them as well. Using software, we first compare every student’s submission (i.e., problem set) against every other student’s submission. A computer scientist would describe that as O(n²) comparisons, which, in layman’s terms, is a lot! For instance, with 667 students in Fall 2016, each of whom submitted source code (in C, Python, and JavaScript) for 8 problem sets, that’s as many as 667 * 666 / 2 * 8 = 1,776,888 pairwise comparisons. For each problem set, the software ranks those pairs of submissions, with the most similar pairs up top. With human eyes do we then review those “matches” and decide if the similarities are indeed worrisome, unlikely to be the result of just chance. The process overall resembles a funnel: one member of the staff might whittle the topmost pairs down to, say, a few hundred, another might whittle the result to half that, and I, ultimately, might whittle that half to a quarter. Recently, too, have we begun to discuss results in small groups, voting internally on whether to refer.

The result of this funnel is higher confidence on our part that some line has been crossed, since multiple pairs of eyes have reviewed those submissions before any are referred. But we also spend more time overall, and the reality seems to be that the closer we look, the more undue similarities we find. I daresay we’ve gotten better at looking, thanks not only to software but also experience, though, to be sure, it’s not a skill we enjoy honing.

Regret Clause

It’s worth noting, too, that we’re also more comfortable referring cases these days since the introduction of CS50’s “regret clause,” a single sentence added to CS50’s syllabus in Fall 2014 that now reads as follows:

If you commit some act that is not reasonable but bring it to the attention of the course’s heads within 72 hours, the course may impose local sanctions that may include an unsatisfactory or failing grade for work submitted, but the course will not refer the matter for further disciplinary action except in cases of repeated acts.

Instances of academic dishonesty in CS50 are almost always the result of poor, late-night decisions with stress levels high and deadlines looming. Prior to Fall 2014, though, even if students woke up the next day (or, surely, within 72 hours) and, perhaps after some thought and perspective, felt regret, there was no well-defined process via which they could take ownership of the situation, short of waiting to see if their act would be noticed by term’s end. To be fair, there was also no process preventing students from turning themselves in. But with the potential penalty so high (e.g., required withdrawal from the college), it’s not surprising that few, if any, ever did on their own.

And so via that one sentence did we aspire to transform otherwise purely punitive scenarios into teachable moments. Students began to come forward, and we met them halfway, still applying some penalty when warranted (e.g., zeroing some scores without escalating further) but focusing conversations mostly on what had happened and why and, ultimately, how not to repeat. (Some students came forward unduly worried, having not actually crossed a line, whom we simply thanked and reassured.) On multiple occasions have these conversations brought to light, too, external pressures involving family or health for which we were able to enlist, with the students’ blessing, professional help.

Of all the interventions CS50 has tried over the years, this regret clause has been among our most pedagogically impactful. In its first year alone did 19 students come forward and, save for a downturn in 2016 (perhaps the result of less emphasis by us), nearly as many each year since:

Figure 4. Students who invoked CS50’s “regret clause” in Fall 2014 through Fall 2017.

But it’s worth noting that this regret clause, now in its fourth year, has not materially impacted CS50’s number of cases. Indeed, in each of Fall 2014, Fall 2015, and Fall 2016 did the number of students referred continue to rise (to 4%, 5%, and 10%, respectively, per Figure 2). Moreover, in each of those years did none of the students who invoked CS50’s regret clause even appear on our radar when we later cross-compared all submissions, those students’ among them. In fact, other students’ submissions ranked higher atop our lists of worrisome pairs. Of course, our conversations with those 19 students in Fall 2014, 26 students in Fall 2015, and 7 students in Fall 2016, per Figure 4, were still teachable moments. But those students perhaps represent a demographic for whom we had, in the years prior to 2014, missed opportunities for even more teachable moments.

From Reactive to Proactive

And yet CS50 referred far fewer students to the Honor Council in Fall 2017 than in Fall 2016, 29 (4%) versus 73 (10%), per Figure 1. Moreover, most of those cases were referred within days of the submissions in question rather than toward (or even after) term’s end, as had been historically a reality, given the tens of hours required to review submissions and document cases. Invocations of CS50’s regret clause, meanwhile, returned to its previous range, per Figure 4. And the course had preemptive conversations with 24 students whose submissions seemed worrisomely similar to other students’ though not necessarily over the lines drawn in the syllabus. On those occasions was the objective to understand the students’ methods of collaboration (some of which are very much allowed) and discuss how best to stay on the right side of those lines.

The leaps forward, we think, were not the result of better software (though that, too, does keep getting better) but of an improved human pipeline, inspired by conversations with colleagues at UW. We began the term with our first-ever CS50 Orientation, an hour-long presentation by the course’s heads for prospective students on how to succeed in CS50. Among the start-of-term topics was academic honesty itself, with Dean of Undergraduate Education, Jay Harris, addressing with us the topic head-on. Every week thereafter, thanks to support from Harvard College, SEAS, and DCE, did CS50 also collaborate (reasonably!) with an “academic integrity fellow,” Erin Carvalho (of CS50 AP fame, now in GSE’s Technology, Innovation, and Education program), who now works with CS courses across FAS, CS50 among them, on matters of academic honesty. For CS50 did Erin become the course’s primary point person for preemptive conversations and students’ confidant for invocations of the course’s regret clause, working full-time with (more) students and, as hoped, only part-time on (fewer) actual cases.

Near Occasion of Sin

Of course, Fall 2017 was not without cases referred, so opportunities for teachable moments remain. Of particular interest, in fact, is an idea offered by Christopher Moretti of Princeton in that same BOF at SIGCSE. Christopher shared that he offers students faced with panic and temptation an even more proactive solution, this one for students themselves:

And, if push comes to shove, and you reach your breaking point, up to the last moment before you cross that point of no return, you can email me that you see no other way out but invoke this clause (then close your computer and go to bed!). If you do that, I promise you that I will cast no aspersions, I will hold no prejudice towards this “near occasion of sin”, I will set up an individual meeting ASAP, and together we’ll get you back on track. For your honesty and your return from the brink, I will minimize or waive any lateness penalty.

So will CS50 have something similar in Fall 2018.