Updated on Jan 4, 2018 with an improved map of Virginia and details on where the map data came from.

Recently, there was a thread on the Norfolk subreddit discussing the policing (or lack thereof) of speeding in Hampton Roads. I realized that Virginia’s court data could provide a quantitative answer. While most replies speculated about how far over the speed limit it’s “safe” to drive, I’m going to leave that subject for another post. Instead, I’ll try to address the original question, which is more general. Here’s my interpretation — “Are you less likely to be pulled over for speeding in Hampton Roads than in other places?” Since I only have data on Virginia, I’ll have to limit the definition of “other places” to “other places in Virginia”.

And the answer is… Yes! Drivers are (somewhat) less likely to be pulled over in Hampton Roads than they are in other parts of Virginia.

Miles Driven per Ticket Issued (2015)

I filtered the speeding cases out of Virginia’s criminal district court data from 2015 and combined them with VDOT’s daily vehicle miles traveled data to get a measurement of miles driven per ticket issued for each county and independent city. Then I created a chart showing the rank of each locality based on their standard deviation. I also used the standard deviation to create the map above.

It’s no surprise to most Virginians to see that Emporia ranks first in speed limit enforcement. But let’s look more closely at Hampton Roads —

#50 Chesapeake

#55 Hampton

#64 Portsmouth

#75 Virginia Beach

#87 Suffolk

#94 Newport News

#103 Norfolk

Two of the cities rank slightly above average and the other five are below average — two of them are well below average. But don’t let these rankings lull you into a false sense of security. A look at the raw number of tickets issued in 2015 prove that the police are definitely hard at work in the region.

I hope you enjoyed this look at Virginia’s court data. Tweet me about what you think I should look into next. You could even take a look for yourself. In the rest this post, I’ll go over the data and the code I used in this analysis.

The Data

I’ll used court case information from Virginia’s criminal district courts. You can download the data from http://virginiacourtdata.org/.

Speeding charges in court data

Each case record has the criminal charge listed. Unfortunately, this is a manual field, so the officer could write anything, but after looking through the data, I noticed that the speeding charges have a general form — something like 73/55 SPEEDING, meaning that the defendant was charged with driving 73 MPH, in violation of the 55 MPH speed limit. Another limitation of the data is that it doesn’t specify where the offense occurred, only the locality in which it was filed, but that’s OK for this analysis.

So I generated a list of all speeding violations in Virginia and grouped them by the county (or independent city) in which they occurred. I now needed a way to compare the number of speeding violations between counties. Population didn’t seem like a fair metric because Virginia has a number of interstates that cut through rural counties. I needed more closely related to traffic. I posed the question to my wife, who happens to be a traffic engineer, and she told me to look at VDOT’s Traffic Count Data. That’s where I found exactly what I was looking for — 2015 Daily Vehicle Miles Traveled by Physical Jurisdiction, with Towns Combined into Counties.

I decided to limit this look to 2015. Specifically, the traffic counts are from 2015 and court cases are those that had their most recent hearing in 2015.

Finally, I combined the two data sets, but there were a few wrinkles that are worth mentioning. First, the traffic data identifies localities by name and the court data identifies them by FIPS code. Second, some localities have multiple courts (e.g. Newport News Criminal and Newport News Traffic are separate courts in the data) and these courts actually have modified FIPS codes, that I believe are simply made up (e.g. the official Newport News FIPS Code is 700, but the courts mentioned are coded 701 and 702). Third, some localities don’t have their own court, but instead use a neighboring court. To smooth these issues out, I took VDOTs spreadsheet, strip it down to the rows and columns I need, manually added the court FIPS codes to each row, and converted it to CSV. (I welcome you to check behind me, but I didn’t bother automating it.)

The Code

I wrote a Python script to combine the two data sets and render the charts.

Pulling in the traffic data wasn’t entirely straight forward because of the wrinkles mentioned above. I needed to group the traffic data by court and a few localities needed to be combined.

Pulling in the court data was even tougher, because I needed to figure out which cases were speeding violations and toss the rest.

On my first attempt, I used a regex that looked for 2 or 3 digits then a forward slash then 2 digits, hoping to find things like 73/55. If the string matched, I would then make sure it had one of the “speeding keywords”, like SPEEDING, SP, RD, RECK, etc. This seemed to be working OK until I got my chart rendered and saw an outlier.

It seemed unlikely that the police in York County and Poquoson were stopping people for speeding 10 times less often than average. I asked an open data friend from that area if that seemed right and he confirmed that it did not. After some digging, I discovered that my assumptions about the charge field were incorrect.

I include this anecdote primarily to note that I’m not 100% sure I’m picking out all the speeding cases in the data. I’d say I’m feeling pretty good about it, not great.

I modified the code to run a regex on the Charge field that looked for 2 or 3 digits then a space, dash, forward slash, or back slash, then 2 more digits. If that pattern was matched, I checked to see if the CodeSection field was in a list of code sections known to be used in speeding cases (I built the list manually by printing cases that matched the regex). Once I had a speeding violation, I added it to the data structure generated when I read the traffic data in. There’s a little more complexity to the final data structure, but the important bits looked like this, where all is the daily traffic count and chargeCount is the number of speeding violations.

I used numpy to compute the standard deviations and I used pyplot to create the bar charts. The script also writes some of the data out to json file that I use to create the map.

The map is created using D3.js. I created the base map from the Virginia Administrative Boundaries shapefile published by VGIN. I loaded that shapefile into a tool called mapshaper, which was recommended to me by Jonah Adkins, and simplified the map down to 1% so that it would load quickly in the browser. I opened the console and changed the projection by typing the command proj wgs84. Finally, I exported the map to GeoJSON format.

The geography overlaid on the basemap is of Virginia’s major roadways. I download the Virginia Road Centerlines shapefile also published by VGIN. This is a huge dataset, but mapshaper is up to the task. I only wanted to show the interstates on my map, so I opened the console and ran the command filter ‘VDOT_RTYP == “IS”’ which gets rid of all roads that aren’t classified as interstates. I figured out the column name and value through some trial and error using the info command. Then I got rid of a lot of data I wouldn’t need by running the command drop fields=’*’. Finally, I simplified the map and exported it to GeoJSON.

I would love to provide more details about how I was able to use these GeoJSON files to generate my visualizations in D3.js, but the truth is I wasn’t really sure about what I was doing. There was a lot of Googling, copying of other people’s code, and tweaking until things looked right. Take a look at my code if you’d like to try something similar.