BuzzFeed News Trained A Computer To Search For Hidden Spy Planes. This Is What We Found.

Data and R code for the analysis supporting this August 7, 2017 BuzzFeed News post on identifying potential surveillance aircraft. Supporting files are in this GitHub repository.

Data BuzzFeed News obtained more than four months of aircraft transponder detections from the plane tracking website Flightradar24, covering August 17 to December 31, 2015 UTC, containing all data displayed on the site within a bounding box encompassing the continental United States, Alaska, Hawaii, and Puerto Rico. Flightradar24 receives data from its network of ground-based receivers, supplemented by a feed from ground radars provided by the Federal Aviation Administration (FAA) with a five-minute delay. After parsing from the raw files supplied by Flightradar24, the data included the following fields, for each transponder detection: adshex Unique identifier for each aircraft, corresponding to its “Mode-S” code, in hexademical format.

Unique identifier for each aircraft, corresponding to its “Mode-S” code, in hexademical format. flight_id Unique identifier for each “flight segment,” in hexadecimal format. A flight segment is a continuous series of transponder detections for one aircraft. There may be more than one segment per flight, if a plane disappears from Flightradar24’s coverage for a period — for example when flying over rural areas with sparse receiver coverage. While being tracked by Fightradar24, planes were typically detected several times per minute.

Unique identifier for each “flight segment,” in hexadecimal format. A flight segment is a continuous series of transponder detections for one aircraft. There may be more than one segment per flight, if a plane disappears from Flightradar24’s coverage for a period — for example when flying over rural areas with sparse receiver coverage. While being tracked by Fightradar24, planes were typically detected several times per minute. latitude , longitude Geographic location in digital degrees.

, Geographic location in digital degrees. altitude Altitude in feet.

Altitude in feet. speed Ground speed in knots.

Ground speed in knots. squawk Four-digit code transmitted by the transponder.

Four-digit code transmitted by the transponder. type Aircraft manufacter and model, if identified.

Aircraft manufacter and model, if identified. timestamp Full UTC timestamp.

Full UTC timestamp. track Compass bearing in degrees, with 0 corresponding to north. We also calculated: steer Change in compass bearing from the previous transponder detection for that aircraft; negative values indicate a turn to the left, positive values a turn to the right.

Feature engineering Using the same data, we had previously reported on flights of spy planes operated by the FBI and the Department of Homeland Security (DHS), and reasoned that it should be possible to train a machine learning algorthim to identify other aircraft performing similar surveillance, based on characteristics of the aircraft and their flight patterns. First we filtered the data to remove planes registered abroad, based on their adshex code, common commercial airliners, based on their type , and aircraft with fewer than 500 transponder detections. Then we took a random sample of 500 aircraft and calculated the following for each one: duration of each flight segment recorded by Flightradar24, in minutes.

of each flight segment recorded by Flightradar24, in minutes. boxes Area of a rectangular bounding box drawn around each flight segment, in square kilometers. Finally, we calculated the following variables for each of the aircraft in the larger filtered dataset: duration1 , duration2 , duration3 , duration4 , duration5 Proportion of flight segment durations for each plane falling into each of five quantiles calculated from duration for the sample of 500 planes. The proportions for each aircraft must add up to 1; if the durations of flight segments for a plane closely matched those for a typical plane from the sample, these numbers would all approximate to 0.2; a plane that mostly flew very long flights would have large decimal fraction for duration5 .

, , , , Proportion of flight segment durations for each plane falling into each of five quantiles calculated from for the sample of 500 planes. The proportions for each aircraft must add up to 1; if the durations of flight segments for a plane closely matched those for a typical plane from the sample, these numbers would all approximate to 0.2; a plane that mostly flew very long flights would have large decimal fraction for . boxes1 , boxes2 , boxes3 , boxes4 , boxes5 Proportion of bounding box areas for each plane falling into each of five quantiles calculated from boxes for the sample of 500 planes.

, , , , Proportion of bounding box areas for each plane falling into each of five quantiles calculated from for the sample of 500 planes. speed1 , speed2 , speed3 , speed4 , speed5 Proportion of speed values recorded for the aircraft falling into each of five quantiles recorded for speed for the sample of 500 planes.

, , , , Proportion of values recorded for the aircraft falling into each of five quantiles recorded for for the sample of 500 planes. altitude1 , altitude2 , altitude3 , altitude4 , altitude5 Proportion of altitude values recorded for the aircraft falling into each of five quantiles recorded for altitude for the sample of 500 planes.

, , , , Proportion of values recorded for the aircraft falling into each of five quantiles recorded for for the sample of 500 planes. steer1 , steer2 , steer3 , steer4 , steer5 , steer6 , steer7 , steer8 Proportion of steer values for each aircraft falling into bins set manually, after observing the distribution for the sample of 500 planes, using the breaks: -180, -25, -10, -1, 0, 1, 22, 45, 180.

, , , , , , , Proportion of values for each aircraft falling into bins set manually, after observing the distribution for the sample of 500 planes, using the breaks: -180, -25, -10, -1, 0, 1, 22, 45, 180. flights Total number of flight segments for each plane.

Total number of flight segments for each plane. squawk_1 Squawk code used most commonly by the aircraft.

Squawk code used most commonly by the aircraft. observations Total number of transponder detections for each plane.

Total number of transponder detections for each plane. type Aircraft manufacter and model, if identified, else unknown . The resulting data for 19,799 aircraft are in the file planes_features.csv .