Evaluating vehicle-based measurements

In the first part of this study, we assess the degree to which windshield wiper activity serves as a proxy for both rainfall intensity and binary rainfall state. First, wiper measurements are compared against conventional rainfall measurement technologies to determine if there is a direct relationship between wiper intensity and rain intensity. Next, we assess the degree to which each data source reflects the ground truth rainfall state by comparing measurements from all three sources (gages, radar and wipers) with vehicle-based video footage. Video footage provides instantaneous visual confirmation of the rainfall state (raining or not raining), and is thus taken to represent the ground truth. We characterize the binary classification performance of each technology in terms of its true positive and true negative rates.

To ensure that our analysis is computationally tractable, we isolate the study to a subset of three storms in 2014. We assess the validity of our procedure for storms of different magnitudes by selecting a large storm (2014-08-11), a medium-sized storm (2014-06-28) and a small storm (2014-06-12). Storms are selected during the summertime months to avoid conflating rainfall measurements with snow measurements. The year 2014 is chosen because it is the year for which the greatest number of vehicles are available. Unless otherwise specified, data are co-located using a nearest neighbor search. For comparison of wiper and gage readings, we select only those gages within a 2 km range of any given vehicle.

Data sources

We consider four data sources: (i) stationary rain gages, (ii) weather surveillance radar, (iii) vehicle windshield wiper data, and (iv) vehicle dashboard camera footage. We provide a brief description of each data source here:

Gage data are obtained from personal weather stations maintained by the Weather Underground23. Within the city of Ann Arbor (Michigan), Weather Underground hosts 21 personal weather stations, each of which yield rainfall estimates at a time interval of approximately 5 minutes. Locations of gages are indicated by blue circles in Fig. 1. Although verified gage data from the National Weather Service (NWS) and the National Oceanic and Atmospheric Administration (NOAA) are available, Weather Underground gages are selected because (i) NOAA and NWS each maintain only a single gage in the city of Ann Arbor, meaning that intra-urban spatial variations in precipitation intensity cannot be captured, and (ii) the temporal resolution of NOAA and NWS gages are relatively coarse for real-time applications (with NOAA offering a maximum temporal resolution of 15 minutes and NWS offering a maximum temporal resolution of 1 hour).

Weather radar observations are obtained from NOAA’s NEXRAD Level 3 Radar product archive24. We use the “Instantaneous Precipitation Rate” data product (listed as variable code 176 in the NEXRAD Level 3 archive25). Radar precipitation estimates are obtained at a temporal resolution of 5 minutes, and a spatial resolution of 0.25 km by 0.5 degree (azimuth). Radar station KDTX in Detroit is used because it is the closest radar station to the City of Ann Arbor. Radial radar scans are interpolated to cartesian coordinates using a nearest neighbor approach.

Vehicle-based wiper intensities are obtained from the University of Michigan Transportation Research Institute (UMTRI) Safety Pilot Model Deployment database26. For each vehicle, this dataset includes time series of latitude, longitude, and windshield wiper intensity at a temporal resolution of 2 milliseconds. Windshield wiper intensity is given on an ordinal scale from 0 to 3, with 0 indicating that the wiper is turned off, 1 representing the lowest wiper intensity, and 3 representing the highest wiper intensity. A wiper reading of 4 indicates that the vehicle’s “mister” is activated, distinguishing between wiper use for rain removal and wiper use for windshield cleaning. For this study, wiper usage for cleaning (i.e. wiper mode 4) was filtered out before the analysis. Note that wiper intensity codes are based on electrical signals generated by the wiper itself, meaning that no manual wiper mode classification is needed. For the year 2014, 69 unique vehicles are available in the UMTRI dataset. However, typically less than ten vehicles are active at any given time during the observation period. Vehicles with no sensor output or invalid readings were removed from the dataset prior to the analysis (see the supplementary note for more details). Other sources of human error (such as accidentally turning wipers on), are captured by the true positive and true negative rates included in Tables 1 and S1.

Camera observations are also obtained from the UMTRI vehicle database26. Located on the inside of each vehicle, cameras provide streaming video footage of the windshield, side-facing windows, rear-facing windows, and the driver. For the purposes of validation, we use the front-facing windshield camera. Camera frames are manually inspected for rain drops striking the windshield. Time intervals where rain is observed are classified as “raining”; similarly time intervals where no new droplets are observed are classified as “not raining”. Manual inspection and labeling of the video data was performed independently by two reviewers to ensure robustness.

A Bayesian filtering framework

In the second part of this study, we develop a Bayesian filtering framework that combines binary wiper observations with radar-based rainfall intensity measurements to generate corrected rainfall maps. In simple terms, the Bayesian filter generates an updated rainfall field, in which binary (on/off) wiper measurements adaptively correct the underlying radar rainfall field. Windshield wiper status is taken to represent a measurement of the ground truth binary rainfall state, given that it is a better predictor of the binary rainfall state than radar- or gage-based measurements. Under this framework, four distinct cases are possible. If both the wiper and radar measure precipitation, the radar reading is taken to be correct, and the original rainfall field remains the same. Similarly, if neither the wiper nor the radar measure precipitation, the radar rainfall field remains zero. However, if the radar measures precipitation at a target location and the wiper does not, then the filter will update the rainfall field such that rainfall intensity is reduced within the proximity of the vehicle (with a decay pattern corresponding to the Gaussian kernel and an intensity of zero at the location of the wiper reading). Similarly, if the wiper measures precipitation, but the radar measures no precipitation, the rainfall intensity will be increased within the proximity of the vehicle (by combining the local distribution of the radar rainfall prior with a point estimate of rainfall intensity based on the wiper intensity). In our implementation, provided that no other information is available, this point estimate is generated using the empirical rainfall intensity distribution associated with the given wiper intensity. The empirical rainfall intensity distributions associated with each wiper intensity are shown in Figure S1 in the Supplementary Information.

Note that while wiper intensity by itself does not exhibit a strong correlation with rainfall intensity, the Bayesian filter uses both wiper and radar measurements to generate the posterior rainfall intensity estimate. In other words, the posterior rainfall intensity at the vehicle’s location is a probabilistic estimate that depends on both the wiper-based estimate and the local prior intensity within the neighborhood of the vehicle. Thus, a nonzero wiper measurement located far away from a radar rainfall front will result in a smaller posterior intensity than one located near a radar rainfall front (as discussed in the results section and shown in Fig. 3). The relative contribution of the wiper measurement and radar prior are controlled using a weighting parameter representing the user’s trust in each data source. This probabilistic assimilation of data sources helps to reduce the uncertainty associated with using the wiper intensity to estimate rainfall intensity. It should be noted that other methods for obtaining a point estimate of rainfall intensity are possible—such as choosing the closest nonzero intensity in the radar rainfall prior. For newer vehicles equipped with rain sensors, the rainfall intensity can also be measured directly using the sensor output. As mentioned in the discussion section, however, it is currently difficult to evaluate the relative accuracy of these approaches, given the lack of reliable ground truth rainfall intensity data at the appropriate spatial and temporal scales.

A more formal description of the filtering framework is given here in terms of a noisy sensor model (for additional details, see Park et al. (2018)27). Consider a noisy sensor model in which each sensor produces a binary measurement given a target state. The target state is represented as a random tuple z = (q, I) where q is a location state (e.g. the latitude and longitude at the target), and I is an information state (e.g. the precipitation intensity at the target) with all the random quantities indicated by bold italics. We denote by M t the event that sensors correctly measure the intensity, and by \({\bar{M}}_{t}\) the event that sensors fail to measure the intensity correctly. The joint measurement likelihood at any time t is given by:

$$p({M}_{t}|{\boldsymbol{z}},{x}_{t})$$ (1)

where x t represents the locations of the sensors at time t. Equation 1 yields the probability distribution of precipitation intensity measurement at q by sensors at x t . The expected value of Equation 1 with respect to I is equivalent to the rainfall intensity experienced at the location q. Because the effective range of the wipers is limited, we account for the probability of detection as a function of the distance between the sensor and the target. We denote by D t the event that sensors detect the target, and by \({\bar{D}}_{t}\) the event that sensors fail to detect the target at time t. The probability of detecting a target located at q by sensors located at x t , p(D t |q, x t ), is taken to decay with increasing distance to the sensor. Using the law of total probability, the conditional probability of a correct measurement is then given by:

$$p({M}_{t}|{\boldsymbol{z}},{x}_{t})=p({M}_{t}|{\boldsymbol{z}},{D}_{t},{x}_{t})p({D}_{t}|q,{x}_{t})+p({M}_{t}|{\boldsymbol{z}},{\bar{D}}_{t},{x}_{t})p({\bar{D}}_{t}|q,{x}_{t})$$ (2)

where D t is conditionally independent of I when conditioned on q. For example, consider x t = (0, 0), and q = (q 1 , q 2 ). If the decay function is taken to be a 2D Gaussian centered at x t with covariance matrix σI where I is a 2 by 2 identity matrix, then:

$$p({D}_{t}|q,{x}_{t})={\tilde{\eta }}_{t}\frac{1}{2\pi {\sigma }^{2}}\exp \,(-\frac{{q}_{1}^{2}+{q}_{2}^{2}}{2{\sigma }^{2}})$$ (3)

where \({\tilde{\eta }}_{t}\) is a normalization constraint. If the target is not detected (i.e., \({\bar{D}}_{t}\)), then the measurement is assumed to be unreliable, and the likelihood, \(p({M}_{t}|{\boldsymbol{z}},{\bar{D}}_{t},{x}_{t})\), is modeled using a prior distribution. If there is no prior information available, the function is modeled using a uniform distribution. Now let b t (z) represent the posterior probability of the precipitation intensity given a target location q at time t. Using Bayes’ Theorem, b t (z) can be formulated:

$${b}_{t}\,({\boldsymbol{z}})={\eta }_{t}\,p({M}_{t}|{\boldsymbol{z}},{x}_{t}){b}_{t-1}({\boldsymbol{z}}),\,t=1,2,\ldots $$ (4)

where η t is a normalization constant and b 0 is uniform if no information is available at t = 0. This filtering equation forms the basis of the rainfall field updating algorithm. To reduce computational complexity, the filtering operation is implemented using a Sequential Importance Resampling (SIR) Particle Filter28.

The results of the Bayesian sensor fusion procedure are evaluated by determining the proportion of instances where the combined data product is able to predict the binary rainfall state. We characterize the true and false positive rates for the largest storm event (2014-08-11) using an iterated “leave-one-out” cross-validation approach. First, a single vehicle is removed from the set of vehicles. The Bayesian update procedure is then executed using all vehicles except the excluded vehicle, and an updated rainfall map is generated. Next, the rainfall states predicted by the corrected rainfall field (radar and wiper) and the original rainfall field (radar only) are compared against the rainfall states predicted by the omitted vehicle. The performance of each data product is evaluated based on its ability to reproduce the binary rainfall state observed by the omitted vehicle. Performing this process iteratively yields the true and false positive rates for both the original (radar only) and updated (radar and wiper) rainfall fields. This procedure is repeated for each vehicle in the set of vehicles to generate Receiver-Operator Characteristic (ROC) curves, which characterize the true and false positive rates across an ensemble of simulations.