In my previous post I’ve showed some new MySQL 5.6 features which can be very helpful when creating geo-enabled applications. In this post I will show how we can obtain open-source GIS data, convert it to MySQL and use it in our GEO-enabled applications. I will also present at the upcoming Percona Live conference on this topic.

Data sources (US)

For the U.S. we may look at 2 major data sources:

1. ZIP codes with latitude, longitude and zip code boundaries (polygon). This can be downloaded from the U.S. Census website: US Zipcodes direct link

2. Point of interests, roads, boundaries, etc. The Openstreatmap website provides an excellent source of the GIS data. North American data can be downloaded here (updates frequently)

Data formats and conversion

U.S. Census data is stored in Shapefile (.shp, .shx, .dbf) format. Openstreetmap uses its own XML format (OSM) and/or Protocolbuffer Binary Format. We can convert this to MySQL with GDAL server (on Linux) and ogr2ogr utility. To convert Shapefile any version of GDAL will work, however, for OSM/PBF we will need to use v. 1.10. The easiest way to get the GDAL 1.10 is to use Ubuntu + ubuntugis-unstable repo.

Here are the commands I’ve used to install:

apt-add-repository ppa:ubuntugis/ubuntugis-unstable apt-get update apt-get install gdal-bin 1 2 3 apt - add - repository ppa : ubuntugis / ubuntugis - unstable apt - get update apt - get install gdal - bin

This will install gdal server. Make sure it is latest version and support OSM format:

ogr2ogr --version GDAL 1.10.1, released 2013/08/26 ogrinfo --formats|grep OSM -> "OSM" (readonly) 1 2 3 4 ogr2ogr -- version GDAL 1.10.1 , released 2013 / 08 / 26 ogrinfo -- formats | grep OSM -> "OSM" ( readonly )

Now we can convert it to MySQL. First, make sure MySQL has the default storage engine = MyISAM (yes, GDAL will use MyISAM to be able to add a spatial index) and the max_allowed_packet is large enough:

mysql -e "set global max_allowed_packet = 16777216*10; set global default_storage_engine = MyISAM; " 1 mysql - e "set global max_allowed_packet = 16777216*10; set global default_storage_engine = MyISAM; "

ZIP codes and boundaries conversion

Now we can start conversion:

# ogr2ogr -overwrite -progress -f "MySQL" MYSQL:zcta,user=root tl_2013_us_zcta510.shp 0...10...20...30...40...50...60...70...80...90...100 - done. 1 2 # ogr2ogr -overwrite -progress -f "MySQL" MYSQL:zcta,user=root tl_2013_us_zcta510.shp 0...10...20...30...40...50...60...70...80...90...100 - done .

The only thing we need to specify is db name and user name (assuming it will write to the localhost, otherwise specify the MySQL host). ogr2org will create all needed tables.

mysql> use zcta Database changed mysql> show tables; +--------------------+ | Tables_in_zcta | +--------------------+ | geometry_columns | | spatial_ref_sys | | tl_2013_us_zcta510 | +--------------------+ 3 rows in set (0.00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 mysql > use zcta Database changed mysql > show tables ; +--------------------+ | Tables_in_zcta | +--------------------+ | geometry_columns | | spatial_ref_sys | | tl_2013_us_zcta510 | +--------------------+ 3 rows in set (0.00 sec)

The geometry_columns and spatial_ref_sys are the reference tables only. All zip codes and boundaries will be stored in tl_2013_us_zcta510 table:

mysql> show create table tl_2013_us_zcta510G *************************** 1. row *************************** Table: tl_2013_us_zcta510 Create Table: CREATE TABLE `tl_2013_us_zcta510` ( `OGR_FID` int(11) NOT NULL AUTO_INCREMENT, `SHAPE` geometry NOT NULL, `zcta5ce10` varchar(5) DEFAULT NULL, `geoid10` varchar(5) DEFAULT NULL, `classfp10` varchar(2) DEFAULT NULL, `mtfcc10` varchar(5) DEFAULT NULL, `funcstat10` varchar(1) DEFAULT NULL, `aland10` double DEFAULT NULL, `awater10` double DEFAULT NULL, `intptlat10` varchar(11) DEFAULT NULL, `intptlon10` varchar(12) DEFAULT NULL, UNIQUE KEY `OGR_FID` (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`) ) ENGINE=MyISAM AUTO_INCREMENT=33145 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 mysql > show create table tl_2013_us_zcta510G *************************** 1. row *************************** Table : tl_2013_us_zcta510 Create Table : CREATE TABLE `tl_2013_us_zcta510` ( `OGR_FID` int (11) NOT NULL AUTO_INCREMENT , `SHAPE` geometry NOT NULL , `zcta5ce10` varchar (5) DEFAULT NULL , `geoid10` varchar (5) DEFAULT NULL , `classfp10` varchar (2) DEFAULT NULL , `mtfcc10` varchar (5) DEFAULT NULL , `funcstat10` varchar (1) DEFAULT NULL , `aland10` double DEFAULT NULL , `awater10` double DEFAULT NULL , `intptlat10` varchar (11) DEFAULT NULL , `intptlon10` varchar (12) DEFAULT NULL , UNIQUE KEY `OGR_FID` (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`) ) ENGINE = MyISAM AUTO_INCREMENT = 33145 DEFAULT CHARSET = latin1 1 row in set (0.00 sec)

Example 1. Selecting zip code boundaries for a given zipcode (Durham, NC):

mysql> select astext(shape) from zcta.tl_2013_us_zcta510 where zcta5ce10='27701'G *************************** 1. row *************************** astext(shape): POLYGON((-78.902351 35.988107,-78.902436 35.988116,-78.902597 35.98814,-78.902725 35.988147,-78.902992 35.988143,-78.903117 35.988129,... -78.902351 35.988107)) 1 2 3 mysql > select astext (shape) from zcta.tl_2013_us_zcta510 where zcta5ce10= '27701' G *************************** 1. row *************************** astext (shape): POLYGON ((-78.902351 35.988107,-78.902436 35.988116,-78.902597 35.98814,-78.902725 35.988147,-78.902992 35.988143,-78.903117 35.988129,... -78.902351 35.988107))

Example 2. Find ZIP code for the given point (Lat, Lon): Percona HQ in Durham, NC

mysql> SELECT zcta5ce10 as ZIP FROM tl_2013_us_zcta510 WHERE st_contains(shape, POINT(-78.90423, 36.004122)); +-------+ | ZIP | +-------+ | 27701 | +-------+ 1 row in set (0.00 sec) 1 2 3 4 5 6 7 8 9 10 11 mysql > SELECT zcta5ce10 as ZIP FROM tl_2013_us_zcta510 WHERE st_contains(shape, POINT (-78.90423, 36.004122)); +-------+ | ZIP | +-------+ | 27701 | +-------+ 1 row in set (0.00 sec)

Converting OpenStreetMap (OSM) data

Converting OSM is the same:

ogr2ogr -overwrite -progress -f "MySQL" MYSQL:osm,user=root north-america-latest.osm.pbf 1 ogr2ogr - overwrite - progress - f "MySQL" MYSQL : osm , user = root north - america - latest .osm .pbf

Please note, that it will take a long time to convert (8-12+ hours, depends upon the hardware).

Tables:

mysql> use osm Database changed mysql> show tables; +------------------+ | Tables_in_osm | +------------------+ | geometry_columns | | lines | | multilinestrings | | multipolygons | | other_relations | | points | | spatial_ref_sys | +------------------+ 7 rows in set (0.00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 mysql > use osm Database changed mysql > show tables ; +------------------+ | Tables_in_osm | +------------------+ | geometry_columns | | lines | | multilinestrings | | multipolygons | | other_relations | | points | | spatial_ref_sys | +------------------+ 7 rows in set (0.00 sec)

Points of interest are stored in “points” table. “Lines” and “multilinestrings” tables contain streets, hiking trails, bike paths, etc:

mysql> show create table pointsG *************************** 1. row *************************** Table: points Create Table: CREATE TABLE `points` ( `OGR_FID` int(11) NOT NULL AUTO_INCREMENT, `SHAPE` geometry NOT NULL, `osm_id` text, `name` text, `barrier` text, `highway` text, `ref` text, `address` text, `is_in` text, `place` text, `man_made` text, `other_tags` text, UNIQUE KEY `OGR_FID` (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`) ) ENGINE=MyISAM AUTO_INCREMENT=13660668 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 mysql > show create table pointsG *************************** 1. row *************************** Table : points Create Table : CREATE TABLE `points` ( `OGR_FID` int (11) NOT NULL AUTO_INCREMENT , `SHAPE` geometry NOT NULL , `osm_id` text , `name` text , `barrier` text , `highway` text , `ref` text , `address` text , `is_in` text , `place` text , `man_made` text , `other_tags` text , UNIQUE KEY `OGR_FID` (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`) ) ENGINE = MyISAM AUTO_INCREMENT = 13660668 DEFAULT CHARSET = latin1 1 row in set (0.00 sec)

“Shape” is the point (in spatial format) and other_tags will contain some additional format (in JSON format), this is how ogr2ogr converts it by default. See the GDAL documentation on the OSM driver for more information.

OSM data may contain the zip code, but this is not guaranteed. Here is the example how we can find all coffee shops in ZIP code 27701:

mysql> select shape into @shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701'; Query OK, 1 row affected (0.00 sec) mysql> SELECT name, st_distance(shape, centroid(@shape) ) as dist FROM points WHERE st_within(shape, @shape) and other_tags like '%"amenity"=>"cafe"%' limit 10; +--------------------+----------------------+ | name | dist | +--------------------+----------------------+ | Blue Coffee Cafe | 0.00473103443182092 | | Amelia Cafe | 0.013825134250907745 | | Serrano's Delicafe | 0.013472792849827055 | | Blend | 0.009123578862847042 | +--------------------+----------------------+ 4 rows in set (0.09 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 mysql > select shape into @shape from zcta.tl_2013_us_zcta510 where zcta5ce10= '27701' ; Query OK, 1 row affected (0.00 sec) mysql > SELECT name, st_distance(shape, centroid (@shape) ) as dist FROM points WHERE st_within(shape, @shape) and other_tags like '%"amenity"=>"cafe"%' limit 10; +--------------------+----------------------+ | name | dist | +--------------------+----------------------+ | Blue Coffee Cafe | 0.00473103443182092 | | Amelia Cafe | 0.013825134250907745 | | Serrano's Delicafe | 0.013472792849827055 | | Blend | 0.009123578862847042 | +--------------------+----------------------+ 4 rows in set (0.09 sec)

First, I have selected the ZIP code boundaries into MySQL variable (I could have used subquery, in MySQL 5.6 the performance will be very similar; this is a little bit outside of the current blog post topic, so I will not compare the 2 methods here).

Second I’ve used this variable to find all point which will fit into our boundaries and filter by “amenity”=>”cafe”. I have to use like ‘%..%’ here, but I’m relying on the spatial index here. Explain plan:

mysql> explain SELECT name, st_distance(shape, centroid(@shape) ) as dist FROM osm.points WHERE st_within(shape, @shape) and other_tags like '%"amenity"=>"cafe"%' limit 10G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: points type: range possible_keys: SHAPE key: SHAPE key_len: 34 ref: NULL rows: 10 Extra: Using where 1 2 3 4 5 6 7 8 9 10 11 12 mysql > explain SELECT name, st_distance(shape, centroid (@shape) ) as dist FROM osm.points WHERE st_within(shape, @shape) and other_tags like '%"amenity"=>"cafe"%' limit 10G *************************** 1. row *************************** id: 1 select_type: SIMPLE table : points type : range possible_keys: SHAPE key : SHAPE key_len: 34 ref: NULL rows : 10 Extra: Using where

Conclusion

Using open source spatial data is a great way to enrich your application and add new features. You can store this data in MySQL so the application will be able to perform a join to the existing data. For example, if you store ZIP code for a user you can use OpenStreetMap data to show the appropriate content for this user. I will also provide more examples in my upcoming Talk @ Percona Live 2014 as well as share it in this blog in a future post.

I’ve also created a Public Amazon AMI: GIS-MySQL-Ubuntu – ami-ddfdf5b4. The AMI has the ZIP code and OSM data in MySQL 5.6 as well as the GDAL server installed (under /data, mounted on EBS). Please feel free to give it a try. As always I appreciate any comments/questions/thoughts/etc.