Using PCA to Identify the Orientation of Polygons John Koo

Motivating example

When generating maps consisting of multiple regions with defined boundaries (e.g., polygons), it’s often times necessary to label the regions. A natural method of labeling might be to just identify the centroid of each region and place the labels (text) there. Depending on the specific shape of each region, this may not work as intended. It may be more appropriate to rotate the text to better fit the shape of the polygon.

In the following example, a map of New England is generated, and each state is labeled.

# packages used in this code: # * magrittr # * dplyr # * ggplot2 # * import # * rgeos # * broom # * foreach # load packages import::from(magrittr, `%>%`, `%$%`, extract) import::from(ggplot2, ggplot, geom_polygon, geom_text, geom_point, geom_segment, aes, coord_map, map_data) import::from(foreach, foreach, `%do%`) import::from(broom, tidy) dp <- loadNamespace('dplyr') rg <- loadNamespace('rgeos') #' @title Turn a data frame of vertices into a polygon/multipolygon #' @param input.df (data frame) Data frame of vertices, can contain multiple #; polygons #' @param x.col (character) #' @param y.col (character) #' @param polygon.col (character) #' @return (SpatialPolygons) df.to.sp <- function(input.df, x.col = 'long', y.col = 'lat', polygon.col) { # separate out each polygon polygon.ids <- unique(input.df[[polygon.col]]) sapply(polygon.ids, function(pid) { # subset input.df to each polygon polygon.df <- input.df %>% dplyr::filter_(paste0(polygon.col, ' =="', pid, '"')) # start compiling a WKT paste(polygon.df[[x.col]], polygon.df[[y.col]]) %>% paste(collapse = ', ') %>% paste0('((', ., '))') }) %>% paste(collapse = ', ') %>% paste0('MULTIPOLYGON(', ., ')') %>% rgeos::readWKT() } # state boundary data states.df <- map_data('state') %>% dp$mutate(region = gsub(' ', '

', region)) head(states.df)

long lat group order region subregion 1 -87.46201 30.38968 1 1 alabama <NA> 2 -87.48493 30.37249 1 2 alabama <NA> 3 -87.52503 30.37249 1 3 alabama <NA> 4 -87.53076 30.33239 1 4 alabama <NA> 5 -87.57087 30.32665 1 5 alabama <NA> 6 -87.58806 30.32665 1 6 alabama <NA>

# vector of new england states ne.states <- c('massachusetts', 'connecticut', 'rhode

island', 'vermont', 'new

hampshire', 'maine') # data frame of new england states ne.df <- dp$filter(states.df, region %in% ne.states) # data frame of centroids # 1. convert each state into a spatial object # 2. use built-in spatial functions to find the centroid of each state # 3. compile into a data frame ne.labels.df <- foreach(ne.state = ne.states, .combine = dp$bind_rows) %do% { temp.sp <- df.to.sp(dp$filter(ne.df, region == ne.state), polygon.col = 'group') temp.centroid <- rg$gCentroid(temp.sp) data.frame(state = ne.state, long = temp.centroid$x, lat = temp.centroid$y, stringsAsFactors = FALSE) } # plot ggplot() + coord_map() + geom_polygon(data = ne.df, colour = 'black', fill = 'white', aes(x = long, y = lat, group = group)) + geom_text(data = ne.labels.df, aes(x = long, y = lat, label = state))

Looking at this plot, we might think that maybe the labels for New Hampshire and Vermont should be rotated 90 degrees. We can do this manually:

ggplot() + coord_map() + geom_polygon(data = ne.df, colour = 'black', fill = 'white', aes(x = long, y = lat, group = group)) + geom_text(data = ne.labels.df, aes(x = long, y = lat, label = state, angle = c(0, 0, 0, 90, 90, 0)))

(We can also see that perhaps the centroid might not be the best choice for the position of the label.)

Ideally, this would be done automatically/algorithmically. In order to come up with a method, we might want to think about the polygons as “pointing” in a certain direction. Vermont and New Hampshire point up and down, and Massachusetts and Connecticut point left to right. Maine points 45 degrees (although we might still want to have the label horizontal), and Rhode Island is relatively square/spherical.