Our DTX sensor dataset is seemingly small with only 10 variables, however the possibilities for revealing knowledge are endless when new variables are created and analyzed.

Making Timestamps Useful

From last week’s post, it was interesting to assess each of the sensor variables across the span of the entire day. This allowed for identification of patterns and signatures not easily seen through summary data. However, the “timestamp” variable proved difficult to work with, particularly when graphing.

Using the lubridate package, it is possible to create a new variable “time” which lets R recognize this column as date and time information. With this new variable, plotting graphs against time is not only faster, but allows for easy labeling along the x-axis. It is now possible to identify when exactly certain patterns occurred. For example, it is possible to illustrate that a sharp increase in sound levels occurred around 1:30 in the afternoon and lasted for about an hour.

Figure 1. Sound levels recorded throughout the day in DTX.

Code:

‘create new variable with timestamp as date and time recognized by R

sensorA$time<-ymd_hms(sensorA$timestamp)

> class(sensorA$time)

[1] “POSIXct” “POSIXt”

‘plot sound levels against time> ggplot(MCPAll, aes(time, value)) + geom_line(aes(color=factor(id_wasp), group=id_wasp))+scale_color_discrete(name=”Sensor”) + scale_x_datetime(name = “Time”, breaks = waiver()) + scale_y_continuous(name = “Decibels (dBA)”, breaks = c(40, 50, 60, 70, 80, 90, 100))

Removing Duplicate Records

One strange thing identified in this dataset is the occasional duplication of records. It was noticed last week that there were over 100 more observations of humidity readings from sensor City 2 than sensor City 1. Although these readings likely wouldn’t have a huge effect, they still represent potentially false data points.

To adjust this, a new variable “unique” was created to identify duplicated readings. In this case, R scanned the dataframe and identified rows that had identical entries with the exception of the “id” column. All duplicate rows were given a value of “1” in the “unique” variable. The creation of the “unique” variable identified 1,640 rows that were identical duplicates of other observations. Unfortunately, the two sensors still do not have exactly the same number of observations, perhaps due to the way the sensor records data.

Code:

‘create new variable without duplications

> sensorA$unique<- as.numeric(duplicated(sensorA[,2:9]))

> sum(sensorA$unique)

[1] 1640

‘compare humidity data with and without duplicates removed

> HUM1<- sensorA[sensorA$sensor==”GP_HUM” & sensorA$id_wasp == “city1” & sensorA$unique == “0”,]

> HUM2<- sensorA[sensorA$sensor==”GP_HUM” & sensorA$id_wasp == “city2” & sensorA$unique == “0”,]

> summary(HUM1$value)

Min. 1st Qu. Median Mean 3rd Qu. Max.

47.35 54.13 68.63 69.74 86.61 88.82

> summary(HUM2$value)

Min. 1st Qu. Median Mean 3rd Qu. Max.

46.56 51.76 62.33 63.48 75.41 77.62

> HUM1d<- sensorA[sensorA$sensor==”GP_HUM” & sensorA$id_wasp == “city1″,]

> HUM2d<- sensorA[sensorA$sensor==”GP_HUM” & sensorA$id_wasp == “city2”,]

> summary(HUM1d$value)

Min. 1st Qu. Median Mean 3rd Qu. Max.

47.35 54.29 68.79 69.62 86.45 88.82

> summary(HUM2d$value)

Min. 1st Qu. Median Mean 3rd Qu. Max.

46.56 51.65 62.33 63.35 75.41 77.62

Dangerous Sound Levels

Creating new variables not only makes datasets easier to handle and analyze, they can also reveal new information.

The National Institute on Deafness and Other Communication Disorders (NIDCD) indicates that sustained exposure to noise levels at or above 85 decibels can cause hearing loss. In addition, the City of Boston maintains a noise ordinance limiting excessive noise in business districts to 65 decibels. So how does the noise in DTX compare?

To investigate this, a new variable was created called “dbaexceed” which identifies records of decibels above 65 and above 85 dBA. This gives a clear indication of when and for how long sound measurements were at dangerous levels during the day.

Figure 2. Sound levels exceeding decibel regulations

Figure 3. Proportion of sound measurements above decibel regulations.

Indeed, the majority of sound measurements were above the 65-decibel level. Are these sound levels normal for Downtown Crossing? Should Boston revisit its sound ordinances?

Code:

‘create new variable indicating sound readings with dBA>= 65, 85

> sensorA$dbaexceed<- ifelse(sensorA$sensor == “MCP” & sensorA$value >= 65 & sensorA$value < 85, 1, ifelse(sensorA$sensor==”MCP” & sensorA$value >= 85, 2, 0))

‘plot sound levels against regulations

> ggplot(MCPAll, aes(time, value, color=factor(dbaexceed))) + geom_point() + scale_color_discrete(name=”dBA Thresholds”, labels=c(“<65”, “65-85”, “85+”)) + scale_x_datetime(name = “Time”, breaks = waiver()) + scale_y_continuous(name= “Decibels (dBA)”, breaks = c(40, 50, 60, 70, 80, 90, 100))

‘plot sound levels facetted by regulation levels

> ggplot(MCPAll, aes(value))+ geom_bar()+scale_x_continuous(name=”Decibels (dBA)”, breaks=waiver()) + scale_y_continuous(name=”Count”, breaks=waiver()) + facet_wrap(~dbaexceed)

Business Hours

While Downtown Crossing has a growing number of residents, it is still a hub of commerce and shopping. Located adjacent to the financial district, the area sees a lot of action during business hours. So how to levels of air pollutants vary during peak business hours compared with early morning and evenings.

To allow for faceting of this information based on business hours, a new variable was created to identify all records that were measured between 9am and 5pm. In the “bushours” variable, all records measured before 9am and after 5pm were given a value of 0 and all recording between those times were given a value of 1.

Figure 4. CO Levels during business hours (1) and off-business hours (0).

Although business hours only represent 8 out of 24 hours, its evident that CO records are concentrated at slightly higher levels during that time of the day. Finding relationships between patterns like air pollutants and characteristics of the neighborhood may reveal not only when air conditions are the worst but also when the most people may be exposed.

Code:

‘create new variable “bushours” to identify business hours of the day

> sensorB$bushours<- ifelse(hour(ymd_hms(sensorB$timestamp)) >= 9 & hour(ymd_hms(sensorB$timestamp)) <= 17, 1, 0)

> COAll<- sensorB[sensorB$sensor == “CO”,]

> ggplot(COAll, aes(value))+ geom_bar()+scale_x_continuous(name=”Decibels (dBA)”, breaks=waiver()) + scale_y_continuous(name=”Count”, breaks=waiver()) + facet_wrap(~bushours)

> ggplot(COAll, aes(value))+ geom_bar()+scale_x_continuous(name=”CO Level (voltage)”, breaks=waiver()) + scale_y_continuous(name=”Count”, breaks=waiver()) + facet_wrap(~bushours)

Sources:

https://www.boston.gov/departments/environment/rules-noise-boston

https://www.nidcd.nih.gov/health/noise-induced-hearing-loss