Appendix

Hawkes Process

Given a temporal point pattern \(t_1, \ldots , t_n \in {\mathcal {R}}\), observed on a time window [0, T], the Hawkes process is a so-called self-exciting point process. It has a parsimonious form, parameterized by a conditional intensity function with two parts, a background rate \(\mu _t(t)\) which may or may not vary with t and a self-excitatory rate which is based only on events which occurred before time t:

$$\begin{aligned} \lambda (t) = \mu _t(t) + \theta \cdot \sum _{i: t_i < t} \omega \cdot \text{ exp }(-\omega (t-t_i)) \end{aligned}$$ (2)

Here the triggering kernel is given by an exponential with inverse lengthscale \(\omega\) and weight \(\theta\), which characterizes the strength and duration of an event’s influence on future events. The likelihood of the Hawkes process is simply that of an inhomogeneous Poisson process with intensity \(\lambda (t)\):

$$\begin{aligned} {{\mathcal {L}}}(t_1, \ldots , t_n | \theta , \omega , \mu ) = \prod _i \lambda (t_i) \exp \left( -\int _{t=0}^T \lambda (t) dt\right) \end{aligned}$$ (3)

To include a spatial dimension, we consider a spatiotemporal point pattern with events at locations \((x_i,y_i,t_i)\) observed on a window \(S \times [0,T]\). We further pick a spatial kernel \(k(s,s')\), and then we have the following conditional intensity:

$$\begin{aligned} \lambda (s,t) = \mu (x,y,t) + \theta \sum _{i: t_i < t} \omega \cdot \text{ exp }(-\omega (t-t_i)) k((x,y),(x_i,y_i)) \end{aligned}$$ (4)

The likelihood is as follows:

$$\begin{aligned} {\mathcal {L}}(s_1, t_1, \ldots , s_n, t_n | \theta , \omega , \mu ) = \prod _i \lambda (s_i,t_i) \exp \left( -\int _S \int _{t=0}^T \lambda (s,t) ds dt\right) \end{aligned}$$ (5)

Then we can explicitly calculate the log-likelihood corresponding to Eq. (5):

$$\begin{aligned}&\log {\mathcal {L}} = \sum _j \mu (x_j,y_j,t_j) + \theta \sum _{i: t_i < t_j} \omega \cdot \text{ exp }(-\omega (t_j-t_i)) k((x_j,y_j),(x_i,y_i)) \end{aligned}$$ (6)

$$\begin{aligned}&- \int _S \int _{t=0}^T \left( \mu (x,y,t) + \theta \sum _{i: t_i < t} \omega \cdot \text{ exp }(-\omega (t-t_i)) k((x,y),(x_i,y_i))\right) ds dt \end{aligned}$$ (7)

Now for simplicity, we assume that \(\int _S k((x,y),(x_i,y_i)) ds = 1, \,\, \forall i\) which we can ensure by properly normalizing the spatial kernel, and also assuming that the lengthscale is much shorter than the size of the spatial domain. We use a Gaussian kernel for \(k((x,y),(x',y'))\) with lengthscale \(\sigma\). Since we are in two dimensions, we have the following normalized kernel:

$$\begin{aligned} k(s,s') = \frac{1}{2\pi \sigma ^2} \exp \left(-\frac{\Vert s-s'\Vert ^2}{2\sigma ^2}\right) \end{aligned}$$ (8)

We estimate the background intensity \(\mu (x,y,t)\) using kernel smoothing with an Epanechnikov kernel:

$$\begin{aligned} k(d) = \frac{3}{4}(1-d^2) I(|d| < 1) \end{aligned}$$ (9)

We separately fit \({\hat{\mu }}_s(x,y)\) and \({\hat{\mu }}_t(t)\) and include a weighting term \(m_0\) to enable the model to downweight this background intensity:

$$\begin{aligned} {\hat{\mu }}(x,y,t) = m_0 \cdot {\hat{\mu }}_s(x,y) {\hat{\mu }}_t(t) \end{aligned}$$ (10)

We performed a sensitivity analysis in Section to investigate the impact of the kernel bandwidth.

In order to learn the parameters of the spatiotemporal Hawkes process, we use Bayesian inference implemented with Markov Chain Monte Carlo methods to find the posterior distribution of the parameters, where we place weakly informative priors Gelman et al. (2008) on them:

\(m_0 \sim {\mathcal {N}}_+(0,1)\)

\(\theta \sim {\mathcal {N}}_+(0,10)\)

\(\sigma \sim {\mathcal {N}}_+(0,10)\) where distance is measured in kilometers

\(\omega \sim {\mathcal {N}}_+(0,10)\) where time is measured in 1/days

with \({\mathcal {N}}_+\) denoting the Normal distribution truncated to be positive.

We coded our model in the probabilistic programming language Stan Stan Development Team (2016), which implements Hamiltonian Monte Carlo sampling to efficiently explore the posterior distribution ove all of the parameters of our model. We ran four chains for 1000 iterations each. Convergence diagnostics are discussed in Sect. A.7.

Dealing with Holidays

As shown in Fig. 5 (left), many extra shootings are observed in our dataset on the days before and after New Year’s Eve and the Fourth of July. There were 5653 total shootings on these days, accounting for 36% of all shootings. We considered these shootings to be “celebratory” and thus removed July 1st–6th and December 29th-January 2nd from our dataset for each year. The time series without these extra shootings is shown in Fig. 5 (right). We performed a sensitivity analysis to see whether our main findings were robust to including the holidays. The posterior over the lengthscales was slightly longer—19 min and 210 m and the parameter \(\theta\) (which characterizes the number of additional shootings we expect to see after a single shooting) was higher: 0.43 (vs 0.13 in our main model) and correspondingly \(m_0\) the weight placed on the underlying intensity was 0.57 (vs 0.87 in our main model). All of these findings make sense, as the holiday shootings are by their nature spatiotemporally clustered and the model represents this by increasing the weight and spatiotemporal extent of the self-excitatory component.

Fig. 5 Smoothed time series of shots fired in Washington DC, 2011, comparing the original data (left) to the data after removing July 1st–6th and December 29th-January 2nd for each year (right) Full size image

Crime Offense Reports Data

Conventional reported crime data on firearm assaults, as seen in Fig. 6, manifests a similar spatial distribution and temporal distribution as AGLS-detected gunshots. However, the intensity levels are markedly different. As a result of these intensity differences between measures, heterogeneity in the intensity of incident clusters is much more visible in the AGLS spatial distribution than the conventional firearm assault distribution. In addition, as seen in the bottom right graph in Fig. 6, firearm assault data lags AGLS data by several hours. A lag of several minutes would be consistent with existing knowledge on reporting practices, but a lag of hours likely reflects a recording practice associated with duty shifts or a similar process. For this reason, 911 calls for service data on “shots fired” will provide a better source for estimating spatiotemporal models if AGLS data is not present and binning is observed in publicly reported crimes data.

Fig. 6 Spatial and temporal distributions of firearm assaults and shots fired in Washington DC, 2011. The spatial distribution of aggravated assaults a 911 “shots fired” calls b and acoustically detected gunshots c is shown subfigures a–c. Monthly and day-of-week temporal distributions are shown in subfigures d–e Full size image

To verify that our spatiotemporal models could be used on 911 calls for service data, we refit our final model using the 911 calls for service data for “sounds of gunshots” for an overlapping temporal period beginning in 2011 and ending in 2013. The spatial lengthscale was 221 [214, 228] m. The temporal lengthscale was 9 [8.4, 9.4] min. And the self-exciting paramters, corresponding to the fraction of events attributable to the conditional as opposed to background intensity, was 0.15 [0.14, 0.16]. These results closely matched those obtained using AGLS data. The degree of similarity is likely a product of the high fraction of gunshot events reported to police both by civilians and the AGLS system. In cities with more limited reporting or deployment, models fit on different data sources may not yield as consistent results.

Sensitivity Analysis for Smoothing Kernel Bandwidth

We investigate the choice of the bandwidth for the smoothing kernel for estimating the underlying intensity \({\hat{\mu }}(x,y,t)\). For spatial bandwidths in [0.5, 1, 1.6] km and temporal bandwidths in [0.5, 1, 7, 14] days we fit our model using only data from 2011–12. The learned spatial lengthscales were between 130 and 210 m, matching our main model’s posterior (126 m), while the learned temporal lengthscales were between 12 and 20 min, slightly longer than in our main model (10 min).

Sensitivity Analysis for Inclusion of Duplicate Events

In our main analysis, we merged shootings which occurred within 1 min and 100 m together, treating them as a single shooting (after also removing holidays). We thus removed 415 shootings which we considered to be duplicates (4% of our sample). As a sensitivity analysis, we fit our model to the original dataset, without holidays, but with each of the near-repeats included. All of our findings were consistent with our main findings: the lengthscale was 100 m in space (slightly shorter than the main model) and 11 min in time, and \(m_0\), the weight placed on the endogeneous component was 0.86, while \(\theta\) was 0.14. This underscores the very local and limited extent to which we are able to find evidence of diffusion in our dataset.

Samples from the Posterior for Lengthscale Parameters

Inference for our model for the AGLS data resulted in a posterior concentrated around its mode, as shown in Fig. 7.

Fig. 7 2000 Draws from the posterior of the main model are shown for the temporal and spatial lengthscale parameters. The posterior is concentrated around its mode Full size image

Traceplots and Convergence Diagnostics

We implemented the model in Sect. A.1 in the probabilistic programming language Stan Stan Development Team (2016), which uses Hamiltonian Monte Carlo to approximately sample from the posterior of the distribution over the parameters. We ran four chains for 1000 iterations each, discarding the first 500 draws as burn-in, for a total of 2000 draws. We show traceplots in Fig. 8. The Gelman-Rubin statistic \({\hat{R}} = 1\) for every parameter, with effective sample sizes above 1000. Thus we conclude that the chains have mixed and converged, and we are sampling from the true posterior of the model.