Participants

We offered a semester-long seminar as a part of Princeton University’s application-based Freshman Seminar Program. Between 2013 and 2017, 105 students participated in the seminar (60 female; M age = 18.3, SD = 0.81), in seven semester-long courses of fifteen students each. During the same period, 56 control students were recruited from among freshmen at Princeton University (26 female; M age = 18.1; SD = 0.67).

Due to institutional constraints, we could not randomize students into the seminar and control groups but had to use standard mechanisms for enrolling students; thus, our study was a quasi-experiment. Recruitment of control students focused on individuals who expressed interest in the seminar but were not enrolled due to limited space in the class. Thus, despite our use of a convenience sample, we were able to assemble a group of control students that did not differ significantly from the seminar group in their intended college majors at pretest, indicating that they took similar courses (other than the seminar). Moreover, pretest and self-reported SAT subject scores indicated that students in the two groups had comparable skills at pretest.

Control students did not receive explicit training in argument analysis using either visual or non-visual techniques. Control students either volunteered to participate without monetary compensation (N = 10) or were paid $20. Comparing paid and unpaid participants revealed no meaningful differences in test scores or outcome measures. We base our analysis on data from all control students and all students who enrolled in all iterations of the seminar. Self-reported SAT and ACT scores for our sample were consistent with admissions data,51 suggesting that our findings are relevant to students at selective colleges more generally. All participants provided informed consent and all study procedures were approved by the Princeton University IRB.

Seminar sessions

We trained students to practice close reading and argument analysis using web-based argument-visualization software. During class sessions, students worked in groups of two or three to analyze excerpts from philosophical texts and construct visualizations of the argument conveyed in each text. Unlike the simple example passages presented in Figs. 1 and 3, most texts used in the seminar were drawn from professional journals and books (e.g., Judith Jarvis Thomson’s “A Defense of Abortion,” Philippa Foot’s “Killing and Letting Die,” David Lewis’ “Are We Free to Break the Laws?,” and so forth.). To maintain an appropriate level of difficulty, these texts were sometimes adapted by the instructors. While students worked, three instructors circulated around the room, providing help or philosophical discussion when appropriate. A typical three-hour seminar was organized around three or four such argument analysis exercises and associated discussions.

Fig. 3 a Sample text. b Sample fill-in-the-blank exercise from an introductory problem set assigned early in the semester. Dashed borders mark claims which are implicit in the text (i.e., charitable assumptions required by the argument). Supporting reasons are represented by horizontal green brackets labeled “because”; objections are represented by horizontal red brackets labeled “however” Full size image

Problem sets

In weekly problem sets, students constructed argument visualizations based on excerpts from contemporary academic texts. Instructors encouraged students to collaborate on these assignments. At the beginning of the semester, problem sets consisted of simple fill-in-the-blank exercises (Fig. 3). After 4 weeks of training on pre-made exercises with progressively less scaffolding, students advanced to visualizing and analyzing arguments from scratch. Throughout the semester, additional support was provided in the form of weekly problem-set sessions hosted by the instructors, who provided general guidance on the current assignment, helped students to identify gaps in their understanding of the reading, and suggested ways for students to improve their work. Students then incorporated this feedback before submitting their work for assessment.

Students completed weekly surveys in which they reported how long they spent on the problem set, how difficult they found it, and to what degree it helped them to understand their readings. From week to week, feedback from students about the difficulty of the previous problem sets was used to calibrate the difficulty of the next problem set, with our target difficulty rating being 4/5. The course was designed to ensure that students practiced analyzing arguments for at least 10 h per week, including both classwork and homework.

In addition to coaching during the sessions, students received detailed and individualized written feedback on their problem sets every week, which indicated errors in students’ understanding of the texts as manifest in their argument visualizations. To convey a more accurate interpretation of the text, this feedback was often supplemented by a model solution. Common errors in representation include mistaking a premise for a support (and vice versa), representing co-premises as independent reasons (and vice versa), including unnecessary premises, and neglecting to represent important assumptions. During the semester, students were not informed of their grades in any form (e.g., alphabetical, numerical, checks/crosses) as we felt this would distract from the more valuable written feedback.

Quantifying analytical-reasoning skills

To assess whether this intensive training in argument visualization leads to generalized benefits for analytical reasoning, we administered equivalated LSAT Logical Reasoning forms (Law School Admission Council; Newtown, PA) at the beginning and end of the semester (i.e., 85 days later). These forms are highly reliable (KR20 = 0.81, 0.79), have well-known psychometric properties,28 are heavily focused on argumentation skills, and are appropriately difficult for our sample. Furthermore, these forms include texts and pose questions very different to those presented during the seminar, making them an effective test of students’ ability to transfer their skills to a new context. To control for possible differences between the forms, we randomly assigned 50% of students to form A as pretest and form B as posttest, reversing the order for the remaining students.

Assessing the quality of students’ essays

We stripped all identifying information from both seminar and control students’ essays. A grader blind to the hypothesis under study evaluated each essay using the following three-item scale:

1. How effectively structured is the essay? 2. How accurately presented are the relevant arguments? 3. How well does the student understand the relevant arguments?

Items were counterbalanced for order and rated on nine-point scales. Finally, essays were assigned letter grades according to the grader’s own standards for undergraduate essays.

Our three-item scale for rating the quality of students’ essays was highly consistent (α = 0.95), so the grader’s responses to the three items were summed to form overall essay scores.

Code availability

In collaboration with the developers of MindMup, we created a free, open-source platform for argument visualization which is available at http://argument.mindmup.com. Readers who wish to learn more about using argument visualization in their own teaching may find useful resources collected at http://www.philmaps.com.