Spies In The Skies

Data and R code for the analysis supporting this April 6, 2016 BuzzFeed News post on federal surveillance aircraft. Supporting files are in this GitHub repository.

Data preparation

BuzzFeed News obtained more than four months of aircraft transponder detections from the plane tracking website Flightradar24, covering August 17 to December 31, 2015 UTC, containing all data displayed on the site within a bounding box encompassing the continental United States, Alaska, Hawaii and Puerto Rico. Fightradar24 receives data from its network of ground-based receivers, supplemented by a feed from ground radars provided by the Federal Aviation Administration (FAA) with a five-minute delay.

After parsing from the raw files supplied by Flightradar24, the data included the following fields, for each transponder detection:

adshex Unique identifier for each aircraft, corresponding to its “Mode-S” code, in hexademical format.

Unique identifier for each aircraft, corresponding to its “Mode-S” code, in hexademical format. flight_id Unique identifier for each “flight segment,” in hexadecimal format. A flight segment is a continuous series of transponder detections for one aircraft. There may be more than one segment per flight, if a plane disappears from Flightradar24’s coverage for a period — for example when flying over rural areas with sparse receiver coverage. While being tracked by Fightradar24, surveillance planes were typically detected several times per minute.

Unique identifier for each “flight segment,” in hexadecimal format. A flight segment is a continuous series of transponder detections for one aircraft. There may be more than one segment per flight, if a plane disappears from Flightradar24’s coverage for a period — for example when flying over rural areas with sparse receiver coverage. While being tracked by Fightradar24, surveillance planes were typically detected several times per minute. latitude , longitude Geographic location in digital degrees.

, Geographic location in digital degrees. altitude Altitude in feet.

Altitude in feet. speed Ground speed in knots.

Ground speed in knots. track Compass bearing in degrees, with 0 corresponding to north.

Compass bearing in degrees, with 0 corresponding to north. squawk Four-digit code transmitted by the transponder.

Four-digit code transmitted by the transponder. type Aircraft model, if identified.

Aircraft model, if identified. timestamp Full UTC timestamp.

We saved the data for each individual plane in a separate comma-separated values (CSV) file, named for the aircraft’s adshex Mode-S code.

From the FAA registration database, we identified aircraft registered to the Department of Homeland Security and companies previously identified as front operations for the FBI. Using the Mode-S hexadecimal codes for these planes, we imported the corresponding files from the Flightradar24 data into R, and joined this to the FAA data to give the following additional fields:

name Name of aircraft registrant.

Name of aircraft registrant. other_names1 , other_names2 Other names for the registrant, if listed.

, Other names for the registrant, if listed. n_number Aircraft registration number, sometimes called a “tail number.” For U.S.-registered planes, these begin with the letter “N,” followed by up to five alphanumeric characters.

Aircraft registration number, sometimes called a “tail number.” For U.S.-registered planes, these begin with the letter “N,” followed by up to five alphanumeric characters. serial_number Identifying number assigned to the aircraft by its manufacturer.

Identifying number assigned to the aircraft by its manufacturer. mfr_mdl_code Code designating the manufacturer and model of the aircraft.

Code designating the manufacturer and model of the aircraft. mfr Manufacturer.

Manufacturer. model Aircraft model.

Aircraft model. year_mfr Year in which aircraft was manufactured.

Year in which aircraft was manufactured. type_aircraft 4 : fixed-wing single-engine, 5 : fixed-wing multi-engine, 6 : helicopter.

: fixed-wing single-engine, : fixed-wing multi-engine, : helicopter. agency Federal agency operating the aircraft, recorded by BuzzFeed News.

This data is spread across three files feds1.csv , feds2.csv , and feds3.csv , to keep file sizes small, which we then processed to create further calculated fields.

# load required packages library(readr) library(dplyr) # Set default timezone for session to UTC Sys.setenv(TZ = "UTC") # create data frame feds <- data_frame() # list files to load files <- list.files("data/feds") # load data for (file in files) { tmp <- read_csv(paste0("data/feds/",file), col_types = list( adshex = col_character(), flight_id = col_character(), latitude = col_double(), longitude = col_double(), altitude = col_integer(), speed = col_integer(), squawk = col_character(), type = col_character(), timestamp = col_datetime(), name = col_character(), other_names1 = col_character(), other_names2 = col_character(), n_number = col_character(), serial_number = col_character(), mfr_mdl_code = col_character(), mfr = col_character(), model = col_character(), year_mfr = col_integer(), type_aircraft = col_integer(), agency = col_character() )) feds <- bind_rows(feds,tmp) } rm(tmp,file)

First, we created a new field loc_timestamp , converting the UTC timestamps to local standard times.

# load required packages library(rgdal) library(rgeos) # load Natural Earth timezones shapefile timezones <- readOGR("data/ne_10m_time_zones", "ne_10m_time_zones")

# what is its Coordinate Reference System (CRS)? timezones@proj4string

## CRS arguments: ## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

# convert the feds data to a spatial points data frame, with the same CRS feds_map <- as.data.frame(feds) xy <- feds_map[,c(4,3)] feds_map <- SpatialPointsDataFrame(coords = xy, data = feds_map, proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) # spatial join of timezone data to feds_map (this will take some time to run) feds_map <- raster::intersect(feds_map, timezones) # function to use when adding/subtracting hours from datetime objects hrs <- function(x) { y <- x * 3600 return(y) } # create loc_timestamp field, and drop the fields added in the spatial join # zone is the field that gives the offset from UTC in hours feds <- as.data.frame(feds_map) %>% mutate(loc_timestamp = timestamp + hrs(zone)) %>% select(1:21,loc_timestamp)

We then created further fields from loc_timestamp , giving date, day of the week, hour on the 24-hour clock, and designating each day as a regular working day, or not (for weekends and federal holidays).

# create date, day_week, and hour fields feds <- feds %>% mutate(date = as.Date(loc_timestamp), day_week = weekdays(loc_timestamp), hour = as.numeric(format(loc_timestamp, "%H"))) # create work_day field feds_weekends <- feds %>% filter(day_week == "Saturday" | day_week == "Sunday") %>% mutate(work_day = "N") hols <- as.data.frame(as.Date(c("2015-09-07","2015-10-12","2015-11-11","2015-11-26","2015-12-25"))) names(hols) <- "date" feds_hols <- inner_join(feds, hols) %>% mutate(work_day = "N") feds_working <- feds %>% filter(day_week != "Saturday" & day_week != "Sunday") %>% anti_join(hols) %>% mutate(work_day = "Y") feds <- bind_rows(feds_weekends, feds_hols, feds_working)

The feds data frame now contained the following additional fields:

loc_timestamp Full timestamp in local standard time.

Full timestamp in local standard time. date Date, based on local timestamp.

Date, based on local timestamp. day_week Day of the week, based on local timestamp.

Day of the week, based on local timestamp. hour Hour on the 24-hour clock, based on local timestamp.

Hour on the 24-hour clock, based on local timestamp. work_day Y for a regular working day, N for weekends and federal holidays, based on local timestamp.

Registration documents obtained from the FAA revealed that the ownership of two helicopters was transferred from the DHS to an FBI front company in mid-December, so we corrected earlier records to reflect this switch.

# correcting ownership records for aircraft that switched registration from DHS to FBI transfers <- feds %>% filter(date < "2015-12-14" & (n_number == "6971A" | n_number == "6982A")) feds <- anti_join(feds, transfers) transfers <- transfers %>% mutate(name = "DEPARTMENT OF HOMELAND SECURITY", agency = "dhs") feds <- bind_rows(feds, transfers)

We performed a spatial join to the boundaries of U.S. states and territories, for analysis of the data by state, and to restrict calculations to planes observed over U.S. states and territories.

# load states/provinces shapefile states <- readOGR("data/ne_10m_admin_1_states_provinces", "ne_10m_admin_1_states_provinces") # filter for US states/territories only states <- states [ which(states@data$sov_a3 == "US1"), ]

# what is its CRS? states@proj4string

## CRS arguments: ## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

# convert the feds data to a spatial points data frame, with the same CRS feds_map <- as.data.frame(feds) xy <- feds_map[,c(4,3)] feds_map <- SpatialPointsDataFrame(coords = xy, data = feds_map, proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) # spatial join of states to feds_map (this will take some time to run) states_feds_map <- raster::intersect(feds_map, states) # convert to data frame, and drop fields from the spatial join apart from state/territory name and abbreviation states_feds <- as.data.frame(states_feds_map) %>% select(1:26, name.1, postal) %>% rename(state = name.1)

And we performed a spatial join to the boundaries of urban areas defined by the Census Bureau. This allowed subsquent analysis for transponder detections over any urban area.

# load urban areas shapefile urban <- readOGR("data/cb_2014_us_ua10_500k", "cb_2014_us_ua10_500k")

# what is its CRS? urban@proj4string

## CRS arguments: ## +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0

# convert the feds data to a spatial points data frame, with the same CRS feds_map <- as.data.frame(feds) xy <- feds_map[,c(4,3)] feds_map <- SpatialPointsDataFrame(coords = xy, data = feds_map, proj4string = CRS("+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0")) # spatial join of urban to feds_map (this will take some time to run) # note use of spatialEco package function here, rather than raster package function used above, which crashed on this spatial join urban_feds_map <- spatialEco::point.in.poly(feds_map, urban) # convert to data frame, and drop some fields from the spatial join urban_feds <- as.data.frame(urban_feds_map) %>% select(1:26,29,30)

The urban_feds data frame now contained the following additional fields:

GEOID10 Five-digit identifying code for each urban area.

Five-digit identifying code for each urban area. NAME10 Name of each urban area.

We then filtered to remove data for 2015-08-16 and 2015-12-31 , which will be incomplete after conversion to local times from UTC.