311 Complaints

Found among NYC’s OpenData is the 311 service requests from 2010 to present data set.

In New York City, 311 is used by city officials as one of several sources of measurement and information about the performance of city services. Important dates in the history of New York’s 311 service include December 20, 2005, when it received its record high of 240,000 calls, due to the first day of the 2005 New York City transit strike, and June 20, 2007, when it received its 50 millionth call. 3-1-1, Wikipedia

This data set contains a varied amount of complaint types ranging from Blocked Driveway to Noise - Residential. Using the SODA API we will collect 2,678 public complaints created between 2010-01-01T04:49:33 and 2016-05-29T18:22:54 .

Over Time

One way to analyze the data is to chart/plot the complaints as they occurred over time. To do this, we will be using the Haskell Chart library.

Bucketed by the Day

For the first chart we will bucket the created_date s by the day such that if we had 2010-01-01T04:49:33 and 2010-01-01T08:32:12 they would both fall into the 2010-01-01T00:00:00 bucket. With every complaint bucketed, we will count up the amount of complaints in each day bucket. These counts will then be charted over time.

Line Chart

We will visualize the day buckets using a line chart.

Immediately you can see a large spike around September in the late summer of 2015. With the exception of 2011, you can see similar late summer spikes for 2010, 12, 13, and 14. There are also large falloffs occurring at the end and beginning of each year during the winter months.

Bucketed by the Month

To add some clarity, we will now bucket the complaints by month. We will bucket them in such a way that if we had 2010-01-13T04:49:33 and 2010-01-18T08:32:12 they would both fall into the 2010-01-01T00:00:00 bucket. In other words, each event per year and month will fall on the first of its month.

Line Chart

You can see the line follows the same overall shape as the line chart bucketed by day. The counts are higher since all events that occurred in any particular month are now reported in aggregate for that month. By bucketing each complaint by the month it occurred in, we can more clearly see the spikes. Notice that the same late summer spikes are still present. 2011 breaks the pattern with it spiking earlier in the year with a smaller spike occurring later.

Here we see both the bucketed by day and month charted together.

Bar Chart

Alternatively, we can truncate the created_date time stamps to YYYY-MM and bucket each complaint by their truncated dates. For these buckets, we will use a bar chart. Since we will be sorting the buckets numerically by their year-month labels, we can view the bar chart as a histogram (the labels are quantitative vs categorical).

Again, we see the same spikes and falloffs.

By Borough

Another way to visualize the data is to look at them by borough. New York City is made up of five boroughs.

New York City is often referred to collectively as the five boroughs; the term is used to refer to New York City as a whole unambiguously, avoiding confusion with any particular borough or with the Greater New York metropolitan area. Borough (New York City), Wikipedia

Total Aggregate Count

We will go ahead and plot a bar chart where each category is a borough and its value is how many complaints were reported as belonging to that borough (over the ~6 year span).

We can see that Manhattan had the most with Brooklyn, Queens, the Bronx, and Staten Island coming in at second, third, and four respectively.

Borough Population Sizes

To make the borough counts more interesting we will also chart their population sizes. The U.S. Census Bureau estimated the 2015 population sizes as:

Bronx 1,455,444

Brooklyn 2,636,735

Manhattan 1,644,518

Queens 2,339,150

Staten Island 474,558

Looking at the bar chart we can see that the population size does not necessarily relate to the complaint count at least for Manhattan. Of course the population sizes are estimated for just 2015 while the complaint count is aggregated over a ~6 year period.

2015

With its interesting spike occurring late in the summer, we zero in on 2015.

We can see the large spike, that we saw before, occurring in September with just over 100 complaints recorded in a single month.

The 2015 borough counts have roughly the same proportion as the borough counts aggregated across the whole six year span.

Comparing the 2015 estimated population sizes against the 2015 complaint borough counts, we see that the relative population and complaint count proportions are not entirely related.

By Day of the Week

The last visualization is the complaint counts per the day of the week.

Box Plot

Haskell Chart does not have an out-of-the-box solution for box plots. However, we can re-purpose its candlestick chart interface. We will use the bucketed by day counts and for each day of the week, we will collect the counts–that fell on that day–in a list. Each day of the week will have its own list of counts–the counts found as we scan through the day buckets (Jan 1 2010, Jan 2 2010, …, May 28 2016, May 29, 2016, etc.) With the counts sorted, for each day of the week, we will calculate the min, lower quartile (25%), median (50%), upper quartile (75%), and the max.

Sunday starts at 0 and the rest of the days of the week follow (Monday at 1 , Tuesday at 2 , etc.). We see that Tuesday and Wednesday have the largest “middle 50” ranging from zero to two. Thursday has a max of eight–the same eight seen in the September 2015 spike. All have a min of zero.

Recap

Using Haskell, we queried, processed, and visualized 2,678 311 complaints recorded between 2010 and 2016. A definite cyclic pattern can be seen from year to year. Spikes occur in late summer and falloffs occur during fall and winter months. 2015 saw the largest spike in September with just over 100 recorded complaints. 2011 had a large spike in the early part of the year.

Appendix

Below you will find some supplementary material.

Full Source Code

The source is written in Haskell but heavily documents itself.