Analysis and Inferences:

We counted the occurrences, for each category of crime and plotted it. Since the distribution was skewed, we normalized it by taking the log. Below is the normalized crime category distribution.

2. There were 915 distinct crime descriptions, and the descriptions determine whether the crime was narcotics related or not. So, we counted occurrences for each crime description and filtered those that were below 97th percentile and kept the rest for creating the cluster maps.

3. Created a cluster map, to explore the distributions of different types (i.e. of crimes across each PdDistrict (i.e. Police District)). Again, since this distribution was skewed, it affected our model (shown below). One can observe that Grand Theft Auto is an outlier besides that we gain no information, thus normalization was required.

4. Thus normalization was performed using ​min-max normalization​, since taking the log does not retain the scale as to how large/small is one feature compared to another. Below is the normalized cluster map:

Here we can observe the following:

(i) Southern: extremely high occurrences of theft, including theft from auto

(ii) Bayview: significant occurrences of violences and threats

(iii) Tenderloin: seems to be an outlier, with exceedingly high occurrences of possession of narcotics paraphernalia. Tenderloin, seems like a potential candidate to install SIS, although it could be a false positive (i.e. these could be due marijuana). Thus one needs to delve deeper.

5. Next, we filtered narcotics related crime using some regular expressions and string pattern matching, and counted occurrences of each distinct narcotics related description. Again the distribution was skewed, and it affected the cluster-map shown below. One can observe Tenderloin is an outlier and we gain no other information.

6. Below is normalized cluster-map which shows distribution of narcotics related crimes across each PdDistrict.

Inference:

​From the cluster-map above we can clearly conclude that, Tenderloin, Southern, Mission and Northern are the optimal candidates for installing SIS

7. Thereafter time-series analysis was performed, to analyze opioid trends across time. First we compressed numerous narcotics related crime descriptions to create opioid groups/features (i.e. barbiturate features, coke features, marijuana features, meth features etc). Then we created a 30 days window for each group, and counted the no. of occurrences for each group across this 30 day window (i.e. each month) for each month from 01/01/2003 to 05/15/2018.

To remove cyclic features of the months — we indexed them from 0 to 187. Below is a stacked histogram, to represent these trends. As you can see meth and heroin related incidents significantly went up.

8. To make the trends clearer, below is the normalized distribution of opioid trends across the years from 2003 to 2018.

One can observe that crack related incidents went down over this period. Similarly, marijuana related incidents went down after it was legalized in 2016. But meth & heroin related crimes significantly shot up — this is substantial evidence to conclude it is an epidemic.