Introducing the dataset



If you read data science articles, you may have already stumbled upon FiveThirtyEight’s content. Naturally, you were impressed by their awesome visualizations . You wanted to make your own awesome visualizations and so asked Quora and Reddit how to do it. You received some answers, but they were rather vague. You still can’t get the graphs done yourself. In this post, we’ll help you. Using Python’s matplotlib and pandas , we’ll see that it’s rather easy to replicate the core parts of any FiveThirtyEight (FTE) visualization. We’ll start here:And, at the end of the tutorial, arrive here:To follow along, you’ll need at least some basic knowledge of Python. If you know what’s the difference between methods and attributes, then you’re good to go.

We’ll work with data describing the percentages of Bachelors conferred to women in the US from 1970 to 2011. We’ll use a dataset compiled by data scientist

Randal Olson, who collected the data from the National Center for Education Statistics. If you want to follow along by writing code yourself, you can download the data from Randal’s blog. To save yourself some time, you can skip downloading the file, and just pass in the direct link to pandas’ read_csv() function. In the following code cell, we:

Import the pandas module.

Assign the direct link toward the dataset as a string to a variable named direct_link .

to a variable named . Read in the data by using read_csv() , and assign the content to women_majors .

, and assign the content to . Print information about the dataset by using the info() method. We’re looking for the number of rows and columns, and checking for null values at the same time.

method. We’re looking for the number of rows and columns, and checking for null values at the same time. Show the first five rows to understand better the structure of the dataset by using the head() method.

import pandas as pd direct_link = 'http://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv' women_majors = pd.read_csv(direct_link) print(women_majors.info()) women_majors.head()

RangeIndex: 42 entries, 0 to 41 Data columns (total 18 columns): Year 42 non-null int64 Agriculture 42 non-null float64 Architecture 42 non-null float64 Art and Performance 42 non-null float64 Biology 42 non-null float64 Business 42 non-null float64 Communications and Journalism 42 non-null float64 Computer Science 42 non-null float64 Education 42 non-null float64 Engineering 42 non-null float64 English 42 non-null float64 Foreign Languages 42 non-null float64 Health Professions 42 non-null float64 Math and Statistics 42 non-null float64 Physical Sciences 42 non-null float64 Psychology 42 non-null float64 Public Administration 42 non-null float64 Social Sciences and History 42 non-null float64 dtypes: float64(17), int64(1) memory usage: 6.0 KB None

Year Agriculture Architecture Art and Performance Biology Business Communications and Journalism Computer Science Education Engineering English Foreign Languages Health Professions Math and Statistics Physical Sciences Psychology Public Administration Social Sciences and History 0 1970 4.229798 11.921005 59.7 29.088363 9.064439 35.3 13.6 74.535328 0.8 65.570923 73.8 77.1 38.0 13.8 44.4 68.4 36.8 1 1971 5.452797 12.003106 59.9 29.394403 9.503187 35.5 13.6 74.149204 1.0 64.556485 73.9 75.5 39.0 14.9 46.2 65.5 36.2 2 1972 7.420710 13.214594 60.4 29.810221 10.558962 36.6 14.9 73.554520 1.2 63.664263 74.6 76.9 40.2 14.8 47.6 62.6 36.1 3 1973 9.653602 14.791613 60.2 31.147915 12.804602 38.4 16.4 73.501814 1.6 62.941502 74.9 77.4 40.9 16.5 50.4 64.3 36.4 4 1974 14.074623 17.444688 61.9 32.996183 16.204850 40.5 18.9 73.336811 2.2 62.413412 75.3 77.9 41.8 18.2 52.6 66.1 37.3

Besides the

Year column, every other column name indicates the subject of a Bachelor degree. Every datapoint in the Bachelor columns represents the percentage of Bachelor degrees conferred to women. Thus, every row describes the percentage for various Bachelors conferred to women in a given year. As mentioned before, we have data from 1970 to 2011. To confirm the latter limit, let’s print the last five rows of the dataset by using the tail() method:

women_majors.tail()

Year Agriculture Architecture Art and Performance Biology Business Communications and Journalism Computer Science Education Engineering English Foreign Languages Health Professions Math and Statistics Physical Sciences Psychology Public Administration Social Sciences and History 37 2007 47.605026 43.100459 61.4 59.411993 49.000459 62.5 17.6 78.721413 16.8 67.874923 70.2 85.4 44.1 40.7 77.1 82.1 49.3 38 2008 47.570834 42.711730 60.7 59.305765 48.888027 62.4 17.8 79.196327 16.5 67.594028 70.2 85.2 43.3 40.7 77.2 81.7 49.4 39 2009 48.667224 43.348921 61.0 58.489583 48.840474 62.8 18.1 79.532909 16.8 67.969792 69.3 85.1 43.3 40.7 77.1 82.0 49.4 40 2010 48.730042 42.066721 61.3 59.010255 48.757988 62.5 17.6 79.618625 17.2 67.928106 69.0 85.0 43.1 40.2 77.0 81.7 49.3 41 2011 50.037182 42.773438 61.2 58.742397 48.180418 62.2 18.2 79.432812 17.5 68.426730 69.5 84.8 43.1 40.1 76.7 81.9 49.2

The context of our FiveThirtyEight graph



Almost every FTE graph is part of an article. The graphs complement the text by illustrating a little story, or an interesting idea. We’ll need to be mindful of this while replicating our FTE graph. To avoid digressing from our main task in this tutorial, let’s just pretend we’ve already written most of an article about the evolution of gender disparity in US education. We now need to create a graph to help readers visualize the evolution of gender disparity for Bachelors where the situation was really bad for women in 1970. We’ve already set a threshold of 20%, and now we want to graph the evolution for every Bachelor where the percentage of women graduates was less than 20% in 1970. Let’s first identify those specific Bachelors. In the following code cell, we will:

Use .loc , a label-based indexer, to: select the first row (the one that corresponds to 1970); select the items in the first row only where the values are less than 20; the Year field will be checked as well, but will obviously not be included because 1970 is much greater than 20.

, a label-based indexer, to: Assign the resulting content to under_20 .

under_20 = women_majors.loc[0, women_majors.loc[0] < 20] under_20

Agriculture 4.229798 Architecture 11.921005 Business 9.064439 Computer Science 13.600000 Engineering 0.800000 Physical Sciences 13.800000 Name: 0, dtype: float64

Using matplotlib’s default style



Let’s begin working on our graph. We’ll first take a peek at what we can build by default. In the following code block, we will:

Run the Jupyter magic %matplotlib to enable Jupyter and matplotlib work together effectively, and add inline to have our graphs displayed inside the notebook.

to enable Jupyter and matplotlib work together effectively, and add to have our graphs displayed inside the notebook. Plot the graph by using the plot() method on women_majors . We pass in to plot() the following parameters: x – specifies the column from women_majors to use for the x-axis; y – specifies the columns from women_majors to use for the y-axis; we’ll use the index labels of under_20 which are stored in the .index attribute of this object; figsize – sets the size of the figure as a tuple with the format (width, height) in inches.

method on . We pass in to the following parameters: Assign the plot object to a variable named under_20_graph , and print its type to show that pandas uses matplotlib objects under the hood.

under_20_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8)) print('Type:', type(under_20_graph))

Type: <class 'matplotlib.axes._subplots.AxesSubplot'>

Using matplotlib’s fivethirtyeight style



The graph above has certain characteristics, like the width and color of the spines, the font size of the y-axis label, the absence of a grid, etc. All of these characteristics make up matplotlib’s default style. As a short parenthesis, it’s worth mentioning that we’ll use a few technical terms about the parts of a graph throughout this post. If you feel lost at any point, you can refer to the legend below.

Source: Matplotlib.org

Besides the default style, matplotlib comes with several built-in styles that we can use readily. To see a list of the available styles, we will:

Import the matplotlib.style module under the name style .

module under the name . Explore the content of matplotlib.style.available (a predefined variable of this module), which contains a list of all the available in-built styles.

import matplotlib.style as style style.available

['seaborn-deep', 'seaborn-muted', 'bmh', 'seaborn-white', 'dark_background', 'seaborn-notebook', 'seaborn-darkgrid', 'grayscale', 'seaborn-paper', 'seaborn-talk', 'seaborn-bright', 'classic', 'seaborn-colorblind', 'seaborn-ticks', 'ggplot', 'seaborn', '_classic_test', 'fivethirtyeight', 'seaborn-dark-palette', 'seaborn-dark', 'seaborn-whitegrid', 'seaborn-pastel', 'seaborn-poster']

You might have already observed that there’s a built-in style called

fivethirtyeight . Let’s use this style, and see where that leads. For that, we’ll use the aptly named use() function from the same matplotlib.style module (which we imported under the name style ). Then we’ll generate our graph using the same code as earlier.

style.use('fivethirtyeight') women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))

fivethirtyeight

The limitations of matplotlib’s fivethirtyeight style



Wow, that’s a major change! With respect to our first graph, we can see that this one has a different background color, it has grid lines, there are no spines whatsoever, the weight and the font size of the major tick labels are different, etc. You can read a technical description of thestyle here – it should also give you a good idea about what code runs under the hood when we use this style. The author of the style sheet, Cameron David-Pilon , discusses some of the characteristics here

All in all, using the

fivethirtyeight style clearly brings us much closer to our goal. Nonetheless, there’s still a lot left to do. Let’s examine a simple FTE graph, and see what else we need to add to our graph.

Source: FiveThirtyEight

By comparing the above graph with what we’ve made so far, we can see that we still need to:

Add a title and a subtitle.

Remove the block-style legend, and add labels near the relevant plot lines. We’ll also have to make the grid lines transparent around these labels.

Add a signature bottom bar which mentions the author of the graph and the source of the data.

Add a couple of other small adjustments: increase the font size of the tick labels; add a “%” symbol to one of the major tick labels of the y-axis; remove the x-axis label; bold the horizontal grid line at y = 0; add an extra grid line next to the tick labels of the y-axis; increase the lateral margins of the figure.



Source: FiveThirtyEight

To minimize the time spent with generating the graph, it’s important to avoid beginning adding the title, the subtitle, or any other text snippet. In matplotlib, a text snippet is positioned by specifying the x and y coordinates, as we’ll see in some of the sections below. To replicate in detail the FTE graph above, notice that we’ll have to align vertically the tick labels of the y-axis with the title and the subtitle. We want to avoid a situation where we have the vertical alignment we want, lost it by increasing the font size of the tick labels, and then have to change the position of the title and subtitle again.

Source: FiveThirtyEight

For teaching purposes, we’re now going to proceed incrementally with adjusting our FTE graph. Consequently, our code will span over multiple code cells. In practice, however, no more than one code cell will be required.

Customizing the tick labels



We’ll start by increasing the font size of the tick labels. In the following code cell, we:

Plot the graph using the same code as earlier, and assign the resulting object to fte_graph . Assigning to a variable allows us to repeatedly and easily apply methods on the object, or access its attributes.

. Assigning to a variable allows us to repeatedly and easily apply methods on the object, or access its attributes. Increase the font size of all the major tick labels using the tick_params() method with the following parameters: axis – specifies the axis that the tick labels we want to modify belong to; here we want to modify the tick labels of both axes; which – indicates what tick labels to be affected (the major or the minor ones; see the legend shown earlier if you don’t know the difference); labelsize – sets the font size of the tick labels.

method with the following parameters:

fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8)) fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)

style.use('fivethirtyeight')

fivethirtyeight

style.use('default')

We add a “%” symbol to 50, the highest visible tick label of the y-axis.

We also add a few whitespace characters after the other visible labels to align them elegantly with the new “50%” label.

You may have noticed that we didn’t usethis time. That’s because the preference for any matplotlib style becomes global once it’s first declared in our code. We’ve set the style earlier as, and from there on all subsequent graphs inherit this style. If for some reason you want to return to the default state, just run. We’ll now build upon our previous changes by making a few adjustments to the tick labels of the y-axis:

To make these changes to the tick labels of the y-axis, we’ll use the

set_yticklabels() method along with the label parameter. As you can deduce from the code below, this parameter can take in a list of mixed data types, and doesn’t require any fixed number of labels to be passed in.

# The previous code fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8)) fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18) # Customizing the tick labels of the y-axis fte_graph.set_yticklabels(labels = [-10, '0 ', '10 ', '20 ', '30 ', '40 ', '50%']) print('The tick labels of the y-axis:', fte_graph.get_yticks()) # -10 and 60 are not visible on the graph

The tick labels of the y-axis: [-10. 0. 10. 20. 30. 40. 50. 60.]

Bolding the horizontal line at y = 0



We will now bold the horizontal line where the y-coordinate is 0. For that, we’ll use the

axhline() method to add a new horizontal grid line, and cover the existing one. The parameters we use for axhline() are:

y – specifies the y-coordinate of the horizontal line;

– specifies the y-coordinate of the horizontal line; color – indicates the color of the line;

– indicates the color of the line; linewidth – sets the width of the line;

– sets the width of the line; alpha – regulates the transparency of the line, but we use it here to regulate the intensity of the black color; the values for alpha range from 0 (completely transparent) to 1 (completely opaque).

# # The previous code fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8)) fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18) fte_graph.set_yticklabels(labels = [-10, '0 ', '10 ', '20 ', '30 ', '40 ', '50%']) # Generate a bolded horizontal line at y = 0 fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)

Add an extra vertical line



As we mentioned earlier, we have to add another vertical grid line in the immediate vicinity of the tick labels of the y-axis. For that, we simply tweak the range of the values of the x-axis. Increasing the range’s left limit will result in the extra vertical grid line we want. Below, we use the

set_xlim() method with the self-explanatory parameters left and right .

# The previous code fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8)) fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18) fte_graph.set_yticklabels(labels = [-10, '0 ', '10 ', '20 ', '30 ', '40 ', '50%']) fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7) # Add an extra vertical line by tweaking the range of the x-axis fte_graph.set_xlim(left = 1969, right = 2011)

Generating a signature bar



The signature bar of the example FTE graph presented above has a few obvious characteristics:

It’s positioned at the bottom of the graph.

The author’s name is located on the left side of the signature bar.

The source of the data is mentioned on the right side of the signature bar.

The text has a light grey color (the same as the background color of the graph), and a dark grey background.

The area in-between the author’s name and the source name has a dark grey background as well.

The image is posted again so you don’t have to scroll back. Source: FiveThirtyEight

It may seem difficult to add such a signature bar, but with a little ingenuity we can get it done quite easily. We’ll add a single snippet of text, give it a light grey color, and a background color of dark grey. We’ll write both the author’s name and the source in a single text snippet, but we’ll space out these two such that one ends up on the far left side, and the other on the far right. The nice thing is that the whitespace characters will get a dark grey background as well, which will create the effect of a signature bar. We’ll also use some white space characters to align the author’s name and the name of the source, as you’ll be able to see in the next code block. This is also a good moment to remove the label of the x-axis. This way, we can get a better visual sense of how the signature bar fits in the overall scheme of the graph. In the next code cell, we’ll build up on what we’ve done so far, and we will:

Remove the label of the x-axis by passing in a False value to the set_visible() method we apply to the object fte_graph.xaxis.label . Think of it this way: we access the xaxis attribute of fte_graph , and then we access the label attribute of fte_graph.xaxis . Then we finally apply set_visible() to fte_graph.xaxis.label .

value to the method we apply to the object . Think of it this way: we access the attribute of , and then we access the attribute of . Then we finally apply to . Add a snippet of text on the graph in the way discussed above. We’ll use the text() method with the following parameters: x – specifies the x-coordinate of the text; y – specifies the y-coordinate of the text; s – indicates the text to be added; fontsize – sets the size of the text; color – specifies the color of the text; the format of the value we use below is hexadecimal; we use this format to match exactly the background color of the entire graph (as specified in the code of the fivethirtyeight style); backgroundcolor – sets the background color of the text snippet.

method with the following parameters:

# The previous code fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8)) fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18) fte_graph.set_yticklabels(labels = [-10, '0 ', '10 ', '20 ', '30 ', '40 ', '50%']) fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7) fte_graph.set_xlim(left = 1969, right = 2011) # Remove the label of the x-axis fte_graph.xaxis.label.set_visible(False) # The signature bar fte_graph.text(x = 1965.8, y = -7, s = ' ©DATAQUEST Source: National Center for Education Statistics',fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey')

floats

x

y

A different kind of signature bar



The x and y coordinates of the text snippet added were found through a process of trial and error. You can pass into theandparameters, so you’ll be able to control the position of the text with a high level of precision. It’s also worth mentioning that we tweaked the positioning of the signature bar in such a way that we added some visually refreshing lateral margins (we discussed this adjustment earlier). To increase the left margin, we simply lowered the value of the x-coordinate. To increase the right one, we added more whitespace characters between the author’s name and the source’s name – this pushes the source’s name to the right, which results in adding the desired margin.

You’ll also meet a slightly different kind of signature bar:

axhline()

# The previous code fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8)) fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18) fte_graph.set_yticklabels(labels = [-10, '0 ', '10 ', '20 ', '30 ', '40 ', '50%']) fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7) fte_graph.xaxis.label.set_visible(False) fte_graph.set_xlim(left = 1969, right = 2011) # The other signature bar fte_graph.text(x = 1967.1, y = -6.5, s = '________________________________________________________________________________________________________________', color = 'grey', alpha = .7) fte_graph.text(x = 1966.1, y = -9, s = ' ©DATAQUEST Source: National Center for Education Statistics ', fontsize = 14, color = 'grey', alpha = .7)

Source: FiveThirtyEight This kind of signature bar can be replicated quite easily as well. We’ll just add some grey colored text, and a line right above it. We’ll create the visual effect of a line by adding a snippet of text of multiple underscore characters (“_”). You might wonder why we’re not usingto simply draw a horizontal line at the y-coordinate we want. We don’t do that because the new line will drag down the entire grid of the graph, and this won’t create the desired effect. We could also try adding an arrow, and then remove the pointer so we end up with a line. However, the “underscore” solution is much simpler. In the next code block, we implement what we’ve just discussed. The methods and parameters we use here should already be familiar from earlier sections.

Adding a title and subtitle



If you examine

a couple of FTE graphs, you may notice these patterns with regard to the title and the subtitle:

The title is almost invariably complemented by a subtitle.

The title gives a contextual angle to look from at a particular graph. The title is almost never technical, and it usually expresses a single, simple idea. It’s also almost never emotionally-neutral. In the Fandango graph above, we can see a simple, “emotionally-active” title (“Fandango LOVES Movies”), and not a bland “The distribution of various movie rating types”.

The subtitle offers technical information about the graph. This information is what makes axis labels redundant oftentimes. We should be careful to customize our subtitle accordingly since we’ve already dropped the x-axis label.

Visually, the title and the subtitle have different font weights, and they are left-aligned (unlike most titles, which are centered). Also, they are aligned vertically with the major tick labels of the y-axis, as we showed earlier.

Let’s now add a title and a subtitle to our graph while being mindful of the above observations. In the code block below, we’ll build upon what we’ve coded so far, and we will:

Add a title and a subtitle by using the same text() method we used to add text in the signature bar. If you already have some experience with matplotlib, you might wonder why we don’t use the title() and suptitle() methods. This is because these two methods have an awful functionality with regard to moving text with precision. The only new parameter for text() is weight . We use it to bold the title.

# The previous code fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8)) fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18) fte_graph.set_yticklabels(labels = [-10, '0 ', '10 ', '20 ', '30 ', '40 ', '50%']) fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7) fte_graph.xaxis.label.set_visible(False) fte_graph.set_xlim(left = 1969, right = 2011) fte_graph.text(x = 1965.8, y = -7, s = ' ©DATAQUEST Source: National Center for Education Statistics ', fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey') # Adding a title and a subtitle fte_graph.text(x = 1966.65, y = 62.7, s = "The gender gap is transitory - even for extreme cases", fontsize = 26, weight = 'bold', alpha = .75) fte_graph.text(x = 1966.65, y = 57, s = 'Percentage of Bachelors conferred to women from 1970 to 2011 in the US for

extreme cases where the percentage was less than 20% in 1970', fontsize = 19, alpha = .85)

Adding colorblind-friendly colors



In case you were wondering, the font used in the original FTE graphs is Decima Mono, a paywalled font. For this reason, we’ll stick with Matplotlib’s default font, which looks pretty similar anyway.

Right now, we have that clunky, rectangular legend. We’ll get rid of it, and add colored labels near each plot line. Each line will have a certain color, and a word of an identical color will name the Bachelor which that line corresponds to. First, however, we’ll modify the default colors of the plot lines, and add

colorblind-friendly colors:

color

plot()

# Colorblind-friendly colors colors = [[0,0,0], [230/255,159/255,0], [86/255,180/255,233/255], [0,158/255,115/255], [213/255,94/255,0], [0,114/255,178/255]] # The previous code we modify fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8), color = colors) # The previous code that remains the same fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18) fte_graph.set_yticklabels(labels = [-10, '0 ', '10 ', '20 ', '30 ', '40 ', '50%']) fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7) fte_graph.xaxis.label.set_visible(False) fte_graph.set_xlim(left = 1969, right = 2011) fte_graph.text(x = 1965.8, y = -7, s = ' ©DATAQUEST Source: National Center for Education Statistics ', fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey') fte_graph.text(x = 1966.65, y = 62.7, s = "The gender gap is transitory - even for extreme cases", fontsize = 26, weight = 'bold', alpha = .75) fte_graph.text(x = 1966.65, y = 57, s = 'Percentage of Bachelors conferred to women from 1970 to 2011 in the US for

extreme cases where the percentage was less than 20% in 1970', fontsize = 19, alpha = .85)

Source: Points of View: Color blindness by Bang Wong We’ll compile a list of RGB parameters for colorblind-friendly colors by using values from the above image. As a side note, we avoid using yellow because text snippets with that color are not easily readable on the graph’s dark grey background color. After compiling this list of RGB parameters, we’ll then pass it to theparameter of themethod we used in our previous code. Note that matplotlib will require the RGB parameters to be within a 0-1 range, so we’ll divide every value by 255, the maximum RGB value. We won’t bother dividing the zeros because 0/255 = 0.

Changing the legend style by adding colored labels



Finally, we add colored labels to each plot line by using the

text() method used earlier. The only new parameter is rotation , which we use to rotate each label so that it fits elegantly on the graph. We’ll also do a little trick here, and make the grid lines transparent around labels by simply modifying their background color to match that of the graph. In our previous code we only modify the plot() method by setting the legend parameter to False . This will get us rid of the default legend. We also skip redeclaring the colors list since it’s already stored in memory from the previous cell.

# The previous code we modify fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8), color = colors, legend = False) # The previous code that remains unchanged fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18) fte_graph.set_yticklabels(labels = [-10, '0 ', '10 ', '20 ', '30 ', '40 ', '50%']) fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7) fte_graph.xaxis.label.set_visible(False) fte_graph.set_xlim(left = 1969, right = 2011) fte_graph.text(x = 1965.8, y = -7, s = ' ©DATAQUEST Source: National Center for Education Statistics ', fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey') fte_graph.text(x = 1966.65, y = 62.7, s = "The gender gap is transitory - even for extreme cases", fontsize = 26, weight = 'bold', alpha = .75) fte_graph.text(x = 1966.65, y = 57, s = 'Percentage of Bachelors conferred to women from 1970 to 2011 in the US for

extreme cases where the percentage was less than 20% in 1970', fontsize = 19, alpha = .85) # Add colored labels fte_graph.text(x = 1994, y = 44, s = 'Agriculture', color = colors[0], weight = 'bold', rotation = 33, backgroundcolor = '#f0f0f0') fte_graph.text(x = 1985, y = 42.2, s = 'Architecture', color = colors[1], weight = 'bold', rotation = 18, backgroundcolor = '#f0f0f0') fte_graph.text(x = 2004, y = 51, s = 'Business', color = colors[2], weight = 'bold', rotation = -5, backgroundcolor = '#f0f0f0') fte_graph.text(x = 2001, y = 30, s = 'Computer Science', color = colors[3], weight = 'bold', rotation = -42.5, backgroundcolor = '#f0f0f0') fte_graph.text(x = 1987, y = 11.5, s = 'Engineering', color = colors[4], weight = 'bold', backgroundcolor = '#f0f0f0') fte_graph.text(x = 1976, y = 25, s = 'Physical Sciences', color = colors[5], weight = 'bold', rotation = 27, backgroundcolor = '#f0f0f0')

Next steps



That’s it, our graph is now ready for publication! To do a short recap, we’ve started with generating a graph with matplotlib’s default style. We then brought that graph to “FTE-level” through a series of steps:

We used matplotlib’s in-built fivethirtyeight style.

style. We added a title and a subtitle, and customized each.

We added a signature bar.

We removed the default legend, and added colored labels.

We made a series of other small adjustments: customizing the tick labels, bolding the horizontal line at y = 0, adding a vertical grid line near the tick labels, removing the label of the x-axis, and increasing the lateral margins of the y-axis.

To build upon what you’ve learned, here are a few next steps to consider:

Generate a similar graph for other Bachelors.

Generate different kinds of FTE graphs: a histogram, a scatter plot etc.

Explore matplotlib’s gallery to search for potential elements to enrich your FTE graphs (like inserting images, or adding arrows etc.). Adding images can take your FTE graphs to a whole new level:

Source: FiveThirtyEight