2014 03 12 technology python mathematics teaching visualization

Summary: I describe stem plots, how to read them, and how to make them in Python, using 140 characters.

My friend @JarrodMillman, whose office is across the hall, is teaching a computational statistics course that involves a fair amount programming. He's been grading these homeworks semi-automatically - with python scripts that pull the students' latest changes from GitHub, run some tests, spit out the grade to a JSON file for the student, checks it in and updates a master JSON file that's only accessible to Jarrod. It's been fun periodically tagging along and watching his suite of little programs develop. He came in the other day and said "Do you know of any stem plot implementation in python? I found a few, and I'm using one that's ok, but it looks too complicated."

For those unfamiliar - a stem plot, or stem-and-leaf plot is a more detailed kind of histogram. On the left you have the stem, which is a prefix to all entries on the right. To the right of the stem, each entry takes up one space just like a bar chart, but still retains information about its actual value.

So a stem plot of the numbers 31, 41, 59, 26, 53, 58 looks like this:

2|6 3|1 4|1 5|389

That last line is hard to parse for the un-initiated. There are three entries to the right of the 50 stem, and these three entries 3 8 and 9 is how the numbers 53 , 58 , and 59 are concisely represented in a stem plot

As an instructor, you can quickly get a sense of the distribution of grades, without fearing the binning artifact caused by standard histograms. A stem-plot can reveal subtle patterns in the data that are easy to missed with usual grading histograms that have a binwidth of 10. Take this distribution, for example:

70 : XXXXXXX 80 : XXXXXXXXXXX 90 : XXXXXXX

Below are two stem plots which have the same profile as the above, but tell a different story:

7|7888999 8|01123477899 9|3467888

Above is a class that has a rather typical grade distribution that sort of clumps together. But a histogram of the same shape might come from data like this:

7|0000223 8|78888999999 9|0255589

This is a class with 7 students clearly struggling compared to the rest.

So here's the code for making a stem plot in Python using NumPy. stem() expects an array or list of integers, and prints all stems that span the range of the data provided.

from __future__ import print_function import numpy as np def stem ( d ): "A stem-and-leaf plot that fits in a tweet by @ivanov l , t = np . sort ( d ), 10 O = range ( l [ 0 ] - l [ 0 ] % t , l [ - 1 ] + 11 , t ) I = np . searchsorted ( l , O ) for e , a , f in zip ( I , I [ 1 :], O ): print ( ' %3d |' % ( f / t ), * ( l [ e : a ] - f ), sep = '' )

Yes, it isn't pretty, a fair amount of code golfing went into making this work. It is a good example for the kind of code you should not write, especially since I had a little bit of fun with the variable names using characters that look similar to others, especially in sans-serif typefaces ( lI10O ). Nevertheless, it's kind of fun to fit much functionality into 140 characters.

Here's my original tweet:

You can test it by running it on some generated data:

>>> data = np.random.poisson(355, 113) >>> data array([367, 334, 317, 351, 375, 372, 350, 352, 350, 344, 359, 355, 358, 389, 335, 361, 363, 343, 340, 337, 378, 336, 382, 344, 359, 366, 368, 327, 364, 365, 347, 328, 331, 358, 370, 346, 325, 332, 387, 355, 359, 342, 353, 367, 389, 390, 337, 364, 346, 346, 346, 365, 330, 363, 370, 388, 380, 332, 369, 347, 370, 366, 372, 310, 348, 355, 408, 349, 326, 334, 355, 329, 363, 337, 330, 355, 367, 333, 298, 387, 342, 337, 362, 337, 378, 326, 349, 357, 338, 349, 366, 339, 362, 371, 357, 358, 316, 336, 374, 336, 354, 374, 366, 352, 374, 339, 336, 354, 338, 348, 366, 370, 333]) >>> stem(data) 29|8 30| 31|067 32|566789 33|00122334456666777778899 34|02234466667788999 35|001223445555577888999 36|12233344556666677789 37|0000122444588 38|0277899 39|0 40|8

If you prefer to have spaces between entries, take out the sep='' from the last line.

>>> stem(data) 29| 8 30| 31| 0 6 7 32| 5 6 6 7 8 9 33| 0 0 1 2 2 3 3 4 4 5 6 6 6 6 7 7 7 7 7 8 8 9 9 34| 0 2 2 3 4 4 6 6 6 6 7 7 8 8 9 9 9 35| 0 0 1 2 2 3 4 4 5 5 5 5 5 7 7 8 8 8 9 9 9 36| 1 2 2 3 3 3 4 4 5 5 6 6 6 6 6 7 7 7 8 9 37| 0 0 0 0 1 2 2 4 4 4 5 8 8 38| 0 2 7 7 8 9 9 39| 0 40| 8

To skip over empty stems, add e!=a and in front of print . This will remove the 300 stem from the output (useful for data with lots of gaps).

>>> stem(data) 29| 8 31| 0 6 7 32| 5 6 6 7 8 9 33| 0 0 1 2 2 3 3 4 4 5 6 6 6 6 7 7 7 7 7 8 8 9 9 34| 0 2 2 3 4 4 6 6 6 6 7 7 8 8 9 9 9 35| 0 0 1 2 2 3 4 4 5 5 5 5 5 7 7 8 8 8 9 9 9 36| 1 2 2 3 3 3 4 4 5 5 6 6 6 6 6 7 7 7 8 9 37| 0 0 0 0 1 2 2 4 4 4 5 8 8 38| 0 2 7 7 8 9 9 39| 0 40| 8

Thanks for reading.