After conducting automated analyses of what students do from moment to moment as they learn to write computer programs, Stanford University researchers were able to predict — with surprising accuracy — the students’ final grades. What makes the feat more remarkable is that the projections were based on learning patterns identified from the students’ work early in the course.

The work spotlights the potential for future classrooms to have instructors who can see how well their students are learning — as the material is being taught — and then adjust their lesson in real-time to meet better the needs of their students.

The findings, published last fall in the Journal of the Learning Sciences, are the result of a joint research effort between Stanford’s Computer Science Department and its Graduate School of Education. Four professors and two graduate students gathered more than 154,000 snapshots of interim coding efforts from 370 students enrolled in an introductory undergraduate course in programming methodology in the summer and fall of 2012. .

“The way we assess students in general has one big, big problem — it only looks at students before they learn and after they learn,” said the paper’s lead author, Paulo Blikstein, assistant professor of education. “Now, for the first time, we have technologies that would allow process-based assessments — assessing people continuously as they learn.”

In addition to Blikstein, the other co-authors are Marcelo Worsley, a former graduate student in education; Chris Piech, a current graduate student in computer science; Mehran Sahami, professor and associate chair for education in the computer science department; Stephen Cooper, an associate professor in the computer science department; and Daphne Koller, the Rajeev Motwani Professor in the computer science department.

The paper presents findings from two studies, both of which used a series of computer-based techniques generally referred to as “machine learning,” to analyze data gathered each time a student saved or compiled a program.

The first study looked for patterns of program updates over a series of assignments, attempting to correlate those patterns with final exam grades.

Some of the findings were counterintuitive: for example, the authors did not find any correlation between the amount of “tinkering” — which is often thought to signify lack of programming expertise — and course performance. However, the authors found that students who changed their programming style — going from being a “planner” to a “tinkerer,” or vice versa — were the ones who performed best, suggesting that behavior change (rather than learning one single behavior) was determinant for course performance.

The second study examined a single assignment in-depth, trying to investigate these multiple trajectories by building detailed “progression map” of the student’s work that could be correlated with course performance.

Using these results, researchers identified several programming trajectories among the 370 students. Of those, three patterns identified as “alpha,” “beta,” and “gamma” were most common. By grouping students in these trajectories, the researchers could determine their final exam grades in the course with greater accuracy than extrapolating from their mid-term scores.

Alpha students, explained co-author Piech, moved efficiently from one point to another as they wrote code, creating a streamlined program. Beta and gamma students, on the other hand, wrote themselves into so-called “sink states” in which their programs slammed into dead ends from which they had to back out before they could resume progress.

The discovery of these “sink states,” and how students got into them, offers opportunities beyond predicting grades: They open the door for developing systems to encourage students to go down more fruitful paths before they become lost in the programming weeds.

The paper includes an important caveat. Although a significant number of students fit neatly into these three categories, researchers were surprised by how many other effective approaches they saw to mastering programming. “We found that students have different backgrounds, styles and ways to learn. Even though not all groups performed equally, a variety of approaches got the job done,” Blikstein said. “The findings underscore how we should be more accepting of their differences instead of trying to standardize everyone into the one way.”

Sahami, a study co-author who teaches the introductory programming course at Stanford, called the research unusual for its collaboration among experts in both education and computer science. “I’ve been involved in computer science education for a long time and also in machine learning and data mining, and this was a way to bring both of those interests together,” said Sahami. “One goal of this work would be to identify students who need more help at a finer level of granularity, and a longer-term goal related to that would be to build systems that could automate that ‘help’ process.”

In addition, this kind of process-based assessment can be more effective in determining what students actually have learned, Blikstein said. “Some students know the material – sometimes better than the typical ‘A’ students — but they’re not good at taking tests,” he said. “Testing is normally a poor way of determining if students have learned something or not.”

Rather than so-called “teaching to the test,” automated data collection and analysis techniques can lead to new opportunities for project-based learning that focuses on the student.

“Our goal is not to use machine learning to further standardize instruction (e.g., building auto-graders for computer science), but to open it up by putting more project-based work into computer science classrooms,” the authors say in the paper. “Our two studies, in fact, show that success in computer programming is a tale of many possible pathways.”

Blikstein, founder of the Transformative Learning Technologies Lab at the Graduate School of Education, is seeking to develop other ways to gauge in real-time how students are learning, especially in K-12 engineering labs known as FabLabs or Makerspaces, where most subjects can’t be monitored as easily as in computer science courses. He and co-author Worsley, who received his PhD from Stanford in 2014 and is now a post-doctoral researcher at the University of Southern California, are using biosensors to capture stress levels during classes, eye-tracking technologies to determine where a student’s gaze is, and other metrics such as pupil dilation, automated movement tracking and audio analysis of a student’s speech patterns.

“Educational data mining should not be used to reinforce the existing ineffective forms of assessment, but to reimagine it completely,” Blikstein said. “Pre- and post-assessment is a black-box. We do not know much about what is happening in between.

“Ultimately, this work may help us better understand some forms of human cognition because you can see what people are doing and how they are thinking in real time.”

The research was supported by Stanford’s Lemann Center for Educational Entrepreneurship and Innovation in Brazil, Stanford’s DARE (Diversifying Academia, Recruiting Excellence) Doctoral Fellowship Program, and a Google Faculty Award.

The study protocol guaranteed that participants’ identities would be anonymous and that it would not be possible to associate data with students’ names.

Janet Rae-Dupree, a science and technology writer, wrote this story for Stanford Graduate School of Education.