If you’re in Brooklyn looking for your car after a ‘quick trip into the coffee shop’, it’s likely hanging off the back of a white Ford tow truck rollin’ coal. Let’s find out why…

As I continue to sharpen my skills in data analysis, particular SQL, I often hunt out new data sources to explore (not unlike most reading this article, I suspect). One frequent places I loiter is the open data portal of my home state, New York. There are a lot of things to dislike about the state, but their willingness to embrace data and keeping it open to all is be admired.

One of the largest datasets on the site is the list of all state vehicle registrations. Including boats and snowmobiles. That’s right...ALL. They are stripped of personal information, and only a limited set of fields are available, but they’re there. I’m purposely leaving the link to the dataset out, as an exercise for the reader.

To begin my exploration, I uploaded the CSV file to Google Cloud Storage in a newly created bucket. Over in Google BigQuery, I created a new data table. that pointed to the CSV. Obviously nothing unique about those tools for our current discussion, just my current weapons of choice.

Ignore the silly bucket name. I’m an odd bird

Fair warning for those following along at home. This dataset is big. 1.4GB. Not big in the ‘big data’ sense, but big in the ‘sitting at my laptop and catching a few innings of the Yankees game whilst waiting for the upload to GCS’ sense.

After Gio Urshela hits another one out of the ballpark, I find the schema shows some logical fields such as the city and county of registration, vehicle type and color, as well as the non-unique portions of the vehicle identification numbers (VIN).

That’s a lot of data.

BigQuery allows me to quick sample of the data and peruse the vegetables, as my mother would say. Scrolling through, I realized there is a great variety in Registration_Class. This in itself is not surprising, as the list includes snowmobiles, boats, and other toys.

All that data that’s fit to print

One Registration_Class that immediately caught my eye is ‘TOW’. The name is self-explanatory in my mind…these are all the tow trucks on the road. So, let’s have some fun (in a data nerd sense of the word, obvi).

SELECT

City,

Zip,

County,

Make,

Registration_Class,

Color,

Fuel_Type

FROM

`nys_regs`

WHERE

Registration_Class = ‘TOW’

LIMIT

10000

This yielded ~7400 tow trucks in the State. But that’s only a number. I’m a data nerd. I want pretty pictures. But fear not, GBQ has a handy link send the query result right to Data Studio. Woot woot.

Gotta love when they make it easy for you.

The automagically-created default table lists the count by City, which immediately reveals some frankly not-so-surprising information. The majority of tow trucks are registered in the NY metro area. (Rochester is a medium-sized city in state’s Finger Lakes region, so that was a bit intriguing, but we move on.)

I see you Rochester

So before I make those pretty pictures, let’s step back and propose a more specific inquiry: Exactly how many are in the NY metro area and what else can find out? We can hop back to GBQ use the County field to explore.

SELECT

Registration_Class,

City,

County,

Model_Year,

Make,

Fuel_Type,

Color

FROM

`nys_regs`

WHERE

Registration_Class = 'TOW'

AND (County LIKE '%KINGS%'

OR County LIKE '%BRONX%'

OR County LIKE '%RICHMOND%'

OR County LIKE '%QUEENS%'

OR County LIKE '%NEW YORK%')

LIMIT

10000

(Bonus points to the reader if they know the boroughs of NYC reside in these five counties!)

Now there has to be a more elegant method to chaining the LIKE statements together, but I was okay with it here. If you have a suggestion, please comment below.

The resulting table looks a lot like this, showing a nice little summary of information for NYC tow trucks. (The total number now is ~1350, in case you’re wondering)

Ford must love this dataset

Regenerating a summary table in Data Studio shows that Brooklyn proper outpaces the pack in sheer number of trucks. Bronx is #2, while smaller cities have a few dozen. One limitation of this approach is that several of the cities listed are individually rather than by borough. A more sophisticated process would group accordingly, but that’s a post for another day.

No sleep til…

Now, let’s make some pretty pictures. We saw that vehicle make and fuel type are available fields. Let’s make a stacked bar chart to determine if there is a clear preference for any of the available options.

Not sure who Peter is, but he’s gotta up his game.

Interesting! There’s a clear preference for Ford trucks, which makes sense given Ford sells the most trucks in the States regardless of their end use. And the majority of those Fords are diesel, which also makes sense given their overall popularity and reputation as being more durable.

But what is surprising is the flip for Chevrolet at the number 2 spot. There is an obvious preference for gas engines in these vehicles. This is also true for the GMC trucks further down the list, which are the corporate twins of Chevy. I’d love to know why that is, but that requires more data and domain expertise that I (as of now) lack. (Another post perhaps?)

Moving on from brand choice, let’s look at something a little more colorful, or in this case…maybe less?

Say hello to my monochromatic friend

In a shock to no one, white tow trucks dominate the NY metro area. Based on cost, availability, and being logo-friendly, this makes sense. Red in the 3-spot is a bit of a surprise, but that may be dominated by one or two companies that have red as corporate color. I’ll highlight that as a risk of drawing too much of a conclusion from a small dataset.

Lastly I ran a simple query to find the oldest operating tow trucks in the City.

SELECT

Registration_Class,

City,

County,

Model_Year,

Make,

Fuel_Type,

Color

FROM

`nys_regs`

WHERE

Registration_Class = ‘TOW’

AND Model_Year < 1980

AND (County LIKE ‘%KINGS%’

OR County LIKE ‘%BRONX%’

OR County LIKE ‘%RICHMOND%’

OR County LIKE ‘%QUEENS%’

OR County LIKE ‘%NEW YORK%’)

LIMIT

10000

Turns out there’s a couple of ca. 1979 Internationals running around Brooklyn. Let me know if you spot them!

Even the tow trucks in Brooklyn are ironic hipsters

In summary, this was a super straightforward, perhaps not terribly surprising, crawl through some open data. But a great learning experience for me and maybe a reader or two.