Update: In the end I’ve actually just gone with using QGIS to do the georeferencing. The rendering speed in R is much slower than QGIS, so it takes longer to zoom and click on each of the corners, meaning there’s lots of wasted time when using R. Still, it was a good learning experience, but ultimately maybe a case of trying to do too much with a tool you know (now I’m off to drill with a hammer).

This summer I’ve had the opportunity to work with a student who is part of the IBS-SRP program. Within PalEON we’ve gradually been piecing together a map of pre-settlement forest cover for the upper Midwest of the United States using Public Land Survey Data, but we’re missing a chunk of data from the lower half of Michigan (see Creamsicle Figure 1, on the right).

The data exists, but when the data were first digitized they were transfered directly to mylar sheets. From these sheets, polygons were traced and then digitized, so the point data was effectively lost. Unfortunately for us, we want the point data! Luckily the mylar sheets still existed, so we got our trusty REU to scan them, and she has been fantastic. I feel a little bad (well, I feel awful) about it, but we’ve been doing some other cool stuff in the meantime (she’s learning R like a pro!) that sort of makes up for the drudgery of scanning.

So, we have these scanned mylar sheets, they have the section and quarter section points and the corners of each quad marked, but there’s no spatial data explicitly in the scanned files. Jim Burt at the University of Wisconsin has done some great work georeferencing historical USGS topographic quadrangles with QUAD-G, but our problem is a bit more complicated, mostly because we’re dealing with hand drawn notes on clear overlays, with lots of marginal squiggles. So I set about creating a bespoke georeferencing tool using R (I interpreted Barry Rowlingson’s reply here to indicate there was no native support in R anyway, and couldn’t find anything myself). I should also point out that I wasn’t exactly georeferencing, I was also trying to perform rubbersheetingto remove any sort of warping that might have occurred during the scanning process.

The first thing was to fire up the raster library. The code runs in a few discrete steps:

Load all the coordinates and names of actual state quadrangles for Michigan and find the associated rasterized mylar sheets (conveniently scanned and saved using the name of the state quad). Look for a csv file with prior work saved. The table in the file is a list of all the quads, their related raster files, and then a vector of x and y coordinates for the map corners in both ‘mylar space’ (the un-georeferenced coordinates of the mylar scan) and Michigan space (the projected coordinate system for the state quads). Once the table is created/loaded find the next un-referenced quad/raster pair and plot it (using image saves about 20s relative to using the plot command here, a good reason to profile your code). Using the zoom and click commands we can get the coordinates for each corner in mylar space, then sort them so they are in a predefined order (clockwise from top left or whatever) ad get the coordinates of the bounding box for the related quad. Given this information it should be relatively straightforward to transform the raster, but it wasn’t, as far as I could tell. I tired lots of tricks in R, but we had a real problem in that the mylar sheets were scanned at a very high resolution (1GB per file @300dpi resolution) and that, because of this, the corners never really lined up perfectly. After a bunch of efforts to do matrix transformations with no real success I finally went the system route, with to calls directly to gdal_transform and gdalwarp using the point pairs as ground control points and then warping with a thin-plate spline.

I would have preferred to try to do things natively in R, but this does provide a reproducible workflow (as long as you’ve got R and gdal) and has the added benefit that we can acually re-visit the transformations as we get more ground control points. This is really great since, in trying to get the primary data off the mylar sheets we actually locate each section and quarter section point on the map, and these have fixed ground positions. Given that, our georeferencing and rubbersheeting should get progressively better and better as we assimilate the data.

If anyone knows of a more reasonable solution, I’m all ears. I do see some clear benefits to this method:

There is a record of all the control-point pairs for each raster It is reproducible It uses all open-source software It helps teach the REU student how to use R

Suggestions? I’m all ears.

What I’m listening to: Rheostatics – The Wreck of the Edmund Fitzgerald Happy Belated Canada Day!