A tutorial on using geospatial analysis, shapefiles and datasets from the U.S. Geological Survey, Census, and Department of Education to visualize the impact of Oklahoma's earthquakes and explore possible investigative projects.

This CartoDB map shows the location of Oklahoma schools that are within areas in which USGS instruments measured an Modified Mercalli Intensity of at least 4.5 from the most powerful of Oklahoma's recent surge of earthquakes.

The school locations are color-coded relative to the percentage of their students eligible for free or reduced lunch (a proxy for estimating the poverty level among students). The U.S. Census tracts are color-coded relative to the median household income, as measured by the American Community Survey, Table B19013.

This visualization is the result of an exploratory exercise and so it comes with a lot of caveats that are explained below. However, it's meant to be an example of both the visualization and investigative possibilities when using geospatial queries. This tutorial assumes you're already well-versed in basic SQL (particularly joins), and have also perused a few of my past CartoDB tutorials, as well as CartoDB's excellent resources.

Table of contents

Investigating earthquakes and schools

In 2011, the Center for Investigative Reporting found "systemic failures" in how California officials safeguard schools from earthquakes. CIR's indepth project, which was a Pulitzer finalist, spurred statewide policy changes for uncovering how lax oversight allowed "children and teachers to occupy buildings with structural flaws and potential safety hazards reported during construction."

If such neglect can happen in a state like California, with its long and well-known seismic history, what can we expect from other states? Well, neglect is not the issue because those states – not having many earthquakes to protect from – generally don't have expansive regulations to be neglectful of.

Oklahoma's new normal

But what if a previously seismically-safe state were to suddenly find itself to not be safe, and thus without the decades of lessons and laws that California has developed?

This is the real-life scenario for Oklahoma. Before 2010, Oklahoma had just a handful of earthquakes in its recorded history. Now it experiences more 3.0+ magnitude earthquakes than the other 48 contiguous states – including California – combined:

(the above chart was created with R and ggplot, and I'll someday finish writing the walkthrough on that…)

So far, the Oklahoman earthquakes are blips compared to what California buildings are designed to withstand. But they are enough to cause significant damage to Oklahoma's buildings. This past July, a relatively small – for California, anyway – 4.5 magnitude earthquake caused enough damage to a high school to require officials to file an earthquake insurance claim, the first time ever for a public school in Oklahoma's history:

The 4.5 earthquake left a large crack that now runs along the gym wall and on the opposite side of the wall a large art room, the counseling office and the girls fitness center also have visible damage.

"We've sealed the building off evacuated the common wall classrooms and the kids are safe right now," Superintendent Hart said.

The Oklahoma earthquake surge appears to be the new normal, at least for the near future. But the vast majority of Oklahoma's present construction hasn't prepared for this. Much of the kind of deep reporting that CIR did for its "On Shaky Ground" project – including reviewing documentation of enforcement actions and construction funds – isn't relevant in a state without that regulatory framework.

But this also means that our initial investigative heuristics can be a little more blunt: if a school built to Oklahoma's past standards suffered damage from a M4.5 earthquake, then we might hypothesize that all other schools may be at risk from earthquakes of similar magnitude.

We don't have scientific expertise to predict where Oklahoma's next big earthquakes will happen. They don't appear to fall along Oklahoma's known historical fault lines, and their cause has been linked to wastewater disposal activity from drilling. The U.S. Geological Survey announced earlier this year it would release a new hazard map.

So let's start with the major earthquakes that have already occurred in Oklahoma, then find the schools situated in the zones of seismic activity that could have potentially resulted in damage, according to the USGS. We do this with the understanding that past seismic activity may not directly correlate with future activity when it comes to induced earthquakes. But we can update the map as new information comes in.

And to further refine our potential investigative leads, let's apply another broad, if simplistic filter: poorer schools may have poorer construction and maintenance, as has been observed elsewhere in America's public school system.

Technical walkthrough

This next section describes where to get the relevant data, how to filter and join it logically and spatially, and how to visualize the results using CartoDB. Again, refer to previous tutorials for more detailed explanations of the SQL and CartoDB functionality.

The boundaries of Oklahoma's Census tracts

Let's first add Oklahoma's Census tract boundary data so that we get a loose sense of population distribution and, for starters, where exactly Oklahoma's boundaries are, as CartoDB's map tiles don't highlight state boundaries:

Downloading the tract boundary shapefiles

The U.S. Census website has a section for Cartographic Boundary Shapefiles.

The subpage, Cartographic Boundary Shapefiles - Census Tracts, contains a listing of zip files of tract data for each individual state: Here's Oklahoma (mirror here).

Mapping the ok_tracts dataset

Uploading the shapefile (as a zip file) as a dataset named ok_tracts generates a map that looks like this:

The American Community Survey Median Household Income data

Tract boundaries by themselves are boring, so let's add a layer of census data. Since relative income/poverty levels are relevant to my initial investigative inquiry, I've chosen table B19013, which is the American Community Survey's estimate of Median Household Income In the Past 12 Months

Downloading ACS data from the American FactFinder

Go to the American FactFinder and choose the Advanced search option.

Then choose these filters:

Geographies Census Tract Oklahoma All Census Tracts within Oklahoma

Topics People Income & Earnings Income/Earnings (Households)



Then look for the row with the entry:

B19013 - MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) 2014 ACS 5-year estimates

Here's the detailed view of that data:

Here's a mirror of the file I downloaded: ACS_14_5YR_B19013.zip

Unzipping, and opening ACS_14_5YR_B19013_with_ann.csv:

I edited the ACS_14_5YR_B19013_with_ann.csv file and removed the second line, which is just an annotation, and uploaded to CartoDB as a table named ok_median_household_incomes. You can download the CSV here: ok_median_household_incomes.csv

Joining the median income data to the Census tract boundaries to make a choropleth

Remember that the median household incomes dataset contains no geospatial shape or coordinate data itself. But it contains a column – geo_id – that can be joined with ok_tracts.affgeoid :

SELECT ok_median_household_incomes . hd01_vd01 AS median_household_income , ok_median_household_incomes . geo_display_label , ok_tracts . cartodb_id , ok_tracts . the_geom_webmercator FROM ok_median_household_incomes INNER JOIN ok_tracts ON ok_tracts . affgeoid = ok_median_household_incomes . geo_id ;

Viewing the data as a choropleth map; the darker green regions represent higher-income areas:

At this point, we can hit the Visualize or (Publish) button to turn this into what CartoDB considers to be an actual map – which allows us to add additional data layers. This is a purely CartoDB convention so refer to their tutorials for more information.

The USGS Earthquake Archive Data

Now let's get the locations and magnitudes for Oklahoma's earthquakes. This data is exhaustively documented by the U.S. Geological Survey's Earthquake Hazards Program. Here's a map of M4.5+ earthquakes around Oklahoma

Getting the earthquake data from USGS

The USGS provides a handy web form for querying its archive of earthquake data and fetching records as CSV. Each query is limited to 20,000 results, so we have to make use of the various filters. It's not worth limiting by date, as Oklahoma's pre-2010 seismic history is basically non-existent. I've opted to limit the search to earthquakes of at least magnitude 2.0, and – using the handy USGS map-picker tool – in a rectangular region that encompasses Oklahoma and a bit of space around its borders:

You can use the exact same queries I used to get the CSV data from the USGS site. Or you could download my mirror here: ok_earthquakes.csv

Mapping all the earthquakes as a layer

My loose query of the USGS archive resulted in 6,400+ earthquakes, most of which are so light as to not disturb a sleeping cat, and a few which aren't even inside Oklahoma's borders (not that tremors obey political lines, but that's an epistemological issue that we'll ignore for now):

More to the point: there's just too many dots for this visualization to make any sense, so we need to do some filtering.

Filtering earthquakes by magnitude

There are several variables that determine what makes an earthquake "significant." While earthquakes of M3.0+ are sometimes given an online mention in the Los Angeles Times earthquakes coverage, they generally don't cause any damage. Also, the actual shaking intensity of an earthquake is not solely dictated by magnitude, but by the composition of geological composition, among other factors.

So let's keep things simple: what is the weakest earthquake that caused newsworthy damage in Oklahoma? Last week (Nov. 30, 2015), a M4.7 quake is suspected to have caused some bricks to fall off the tallest building in Enid. A few years after Oklahoma's biggest recorded quake – a M5.6 in November 2011 – a small college is still cleaning up the mess. A M4.7 that struck nearby days later caused ceilings to fall down in a Meeker school. And there's that previously discussed M4.5 in July 2015 that cracked Crescent High's gymnasium.

So let's look at only M4.5+ earthquakes:

SELECT * FROM ok_earthquakes WHERE mag >= 4 . 5 ;

So from 2008 through November 2015, our dataset includes 8 such earthquakes – all of them in 2011 or after, and 3 of them in 2015 alone. Here's what the result looks like on a map:

Filtering for Oklahoma-only quakes via a spatial join

Because of the sloppy selection job I did on the USGS Earthquake Archive form, one of these earthquakes actually took place in Kansas. Using the PostGIS ST_WITHIN as a constraint, we can use the geospatial boundaries in the ok_tracts dataset to filter out non-Oklahoman earthquakes. Note that we keep the WHERE clause to keep filtering for M4.5+ earthquakes:

SELECT ok_earthquakes . * FROM ok_earthquakes INNER JOIN ok_tracts ON ST_WITHIN ( ok_earthquakes . the_geom , ok_tracts . the_geom ) WHERE mag >= 4 . 5 ;

Begone Kansas quake:

Representing earthquakes with a 4-mile buffer

So how close does a building have to be from an earthquake's epicenter to risk getting damaged? The answer is: it's complicated – certainly more complicated than imagining earthquakes to be a simple point on a 2-dimensional plane like our map.

But let's keep things simple for now: Crescent High is a little less than 4 miles from the epicenter of the M4.5 earthquake that damaged it.

We can use the PostGIS ST_BUFFER function to extrapolate a circle around each earthquake, effectively turning our point data into polygon data, i.e. circles.

The CartoDB tutorials explain it better, but using the previous SQL snippet, we specify the selection of useful columns from ok_earthquakes (such as time, magnitude, and place).

And then we calculate a new value for the_geom_webmercator – which is the hidden field that, by convention, CartoDB uses to project geometric data onto its web maps (here's a video tutorial/explanation).

This derived calculation involves several functions and conversions, so here are the steps:

Convert ok_earthquakes.the_geom (which is basically just a latitude/longitude point) into a geography data type. Call the ST_BUFFER function on that the_geom -as- geography value and pass in 1609 * 4 , which is the conversion of 4 miles to meters (the distance unit used by PostGIS). This ST_BUFFER call is what effectively turns each point into a geometric circle. Convert the result of ST_BUFFER into a geometry datatype, because, well, see the next step: Use the ST_TRANSFORM function to project our newly-created circles to the map projection used by CartoDB, i.e. EPSG:3857, aka WGS84, aka "Web Mercator". Alias the result as the_geom_webmercator so CartoDB knows to project it onto our web map.

For more information, CartoDB has a tutorial on Drawing a circle from a point and radius. And a more thorough tutorial involving buffers. Check them out, and then see if the following query makes sense, because it works for me:

SELECT id , depth , mag , place , time , ok_earthquakes . cartodb_id , ST_Transform ( ST_BUFFER ( ok_earthquakes . the_geom :: geography , 1609 * 4 ):: geometry , 3857 ) AS the_geom_webmercator FROM ok_earthquakes INNER JOIN ok_tracts ON ST_WITHIN ( ok_earthquakes . the_geom , ok_tracts . the_geom ) WHERE mag >= 4 . 5 ;

And here are the earthquakes, rendered as 4-mile-radius circles:

At this point, I've taken advantage of CartoDB's option to create a new dataset from my now-convoluted query that filters and joins ok_earthquakes.

This new dataset – which again, is just the result of the previous SQL query – I have named ok_earthquakes_radii. Its schema includes a few fields from ok_quakes (e.g. time , place , id and mag ). When the "create a new dataset" from query feature is used, CartoDB will helpfully create and set the cartodb_id and the_geom columns as needed, i.e. the_geom contains the Polygon data that draws those 4-mile circles.

Locations of Oklahoma's public schools

What we have so far is not bad – we see where Oklahoma's past major earthquakes have occurred in relation to the state's population and wealth centers. Adding school location data will gives us specific points of interest

Download school data from the National Center for Education Statistics

The National Center for Education Statistics publishes the Public Elementary/Secondary School Universe Survey Data, which includes the names, addresses, and demographics of every school in the United States.

As you can imagine, this is a lot of rows and columns. The documentation of its fields is here. A plaintext description of the record layout is here.

The data file for 2013-2014 is 14.1MB zipped/107MB unzipped tab-delimited file.

Trimming the schools data

Ideally we could just upload the tab-delimited file straight into CartoDB; it might work for you; for me, it did not. So I just imported the file into my own SQL database, ran the following query, and saved the results as new CSV file, which I then uploaded to CartoDB:

SELECT SURVYEAR , NCESSCH , LEANM , SCHNAM , LCITY , LSTATE , LZIP , TYPE , STATUS , GSLO , GSHI , LEVEL , TITLEISTAT , TITLEI , CHARTR , VIRTUALSTAT , FRELCH , REDLCH , TOTFRL , MEMBER , LATCOD AS latitude , LONCOD AS longitude FROM schools_directory WHERE LSTATE = 'OK' ;

The filtered file can be found here: ok_schools.csv

Pro-tip: csvkit is awesome

As a quick aside, I used the awesome csvsql command-line tool (which is part of the csvkit toolkit). Making a database just to clean up and export data into a database can cause serious anger issues, so learning about csvkit (and command-line operations in general) is highly recommended.

For reference sake, here's the command I used to get the trimmed ok_schools.csv:

csvsql --no-inference --snifflimit 0 --no-constraints --tabs \ --query " SELECT SURVYEAR,NCESSCH,LEANM,SCHNAM,LCITY,LSTATE,LZIP,TYPE,STATUS, GSLO,GSHI,LEVEL,TITLEISTAT,TITLEI,CHARTR,VIRTUALSTAT, FRELCH,REDLCH,TOTFRL,MEMBER, LATCOD AS latitude, LONCOD AS longitude FROM sc131a_supp WHERE LSTATE = 'OK';" sc131a_supp.txt > ok_schools.csv

Mapping all of Oklahoma's schools

There are 1,800+ schools in our list, though some of them are on top of each other – for instance, Crescent High is in the same building as Crescent Middle and Crescent Elementary:

Visualizing Oklahoma's schools as a choropleth of free/reduced lunch ratio

Let's add a little visual variety to our school dots. In the introduction, I hypothesize that while all schools and children are important, schools that serve poorer populations might be more affected by earthquakes because, well, poorer schools tend to get the short-end in other areas of educational resources – though ideally, every public school has the same access to public funds for construction and facility maintenance. (So, we're being a bit cynical here).

A common proxy for estimating student poverty is to look at the number of students eligible for the federally-funded free/reduced lunch program, which is a number that the federal education department conveniently tracks.

Calculating the percentage of students at a given school who are eligible for free or reduced lunch is a simple formula involving two columns; MEMBER is the school's total population, and TOTFRL is the number of students eligible for the program:

ROUND(TOTFRL * 100 / MEMBER)

Creating a column for frlp_ratio

To keep things cleaner in subsequent queries, I'm going to add a new column named frlp_ratio to the ok_schools table:

ALTER TABLE ok_schools ADD COLUMN frlp_ratio INTEGER ;

To populate that column with values:

UPDATE ok_schools SET frlp_ratio = ROUND ( TOTFRL * 100 / MEMBER );

The resulting choropleth:

Filtering schools by proximity to earthquakes

OK, now we have to deal with the fact that our map devotes far too much ink to schools that are, for the time being, not near the locations of previous major earthquakes. So let's reduce the list of schools to just the ones that are located within the geometry of the 4-mile circles we constructed around each earthquake's epicenter, i.e. the ok_earthquakes_radii dataset.

This is a matter of doing a geospatial join using the ST_WITHIN function, which we used previously to constrain earthquakes to being within Oklahoman boundaries:

SELECT ok_schools . * FROM ok_schools INNER JOIN ok_earthquakes_radii ON ST_WITHIN ( ok_schools . the_geom , ok_earthquakes_radii . the_geom );

That cleaned things up:

And here's a closeup with Crescent High School highlighted:

Our list of impacted schools contains only 6 schools in 3 separate locations. It's worth noting that our 4-mile criteria leaves out Meeker High, which was damaged in the M4.7 in November 2011:

OK, so let's try a geospatial analysis that treats earthquakes as being a little more complicated than circles on a map.

The USGS Shakemaps

The USGS ShakeMap program is described as an online reference that provides:

near-real-time maps of ground motion and shaking intensity following significant earthquakes. These maps are used by federal, state, and local organizations, both public and private, for post-earthquake response and recovery, public and scientific information, as well as for preparedness exercises and disaster planning.

Each "significant" earthquake has its own shakemap. For instance, here is the Shakemap for the M5.6 that hit Oklahoma – note how the URL is derived from the earthquake's id column, in this case, b0006klz :

earthquake.usgs.gov/earthquakes/shakemap/global/shake/b0006klz/

The Modified Mercalli Intensity Scale

Each shakemap page contains a trove of data and calculations, including intensity maps:

That looks more complicated and interesting than a simple circle. Intensity is described by the USGS as this:

As an effort to simplify and maximize the flow of information to the public, we have developed a means of generating estimated Modified Mercalli Intensity maps based on instrumental ground motion recordings (Wald et al., 1999). These "Instrumental Intensities" are based on a combined regression of peak acceleration and velocity amplitudes vs. observed intensity for eight significant California earthquakes From the comparison with observed intensity maps, we find that a regression based on peak velocity for intensity > VII and on peak acceleration for intensity < VII is most suitable. This is consistent with the notion that low intensities are determined by felt accounts (sensitive to acceleration). Moderate damage, at intensity VI-VII, typically occurs in rigid structures (masonry walls, chimneys, etc.) which also are sensitive to high-frequency (acceleration) ground motions…

OK, that's more detail than I can confidently interpret or meaningfully use. But the main gist of the Modified Mercalli Intensity (MMI) Scale seems to be summarized in this table:

A more fun and descriptive table of MMI can be found here:

Basically, anything under a MMI value of 5.0 is barely worth tracking. At 5.0, the potential damage is described as "Light": such as pictures being moved and pendulum clocks failing to swing their pendulums.

Downloading the MMI shapefiles for each earthquake

Each ShakeMap page contains various data downloads. The ones we want are labeled "General-purpose GIS shape files", which you kind of have to dig around to find on each page. Here's the one for the Nov. 30, 2015 M4.7 quake:

Maybe there's a more efficient way of getting this data. Because there are so few quakes we care about right now, I ended up just visiting each shakemap and downloading the zip file directly. Actually, for the 3 earthquakes that are geolocated next to each other, I just got one of their data files. I'm sure it's easy to write a script to automate this but for your convenience, here's a mirror of those zips:

But we can't just upload all of those into CartoDB, because they each contain separate shapefiles. The only ones we need are the shapefiles that are prefixed with mi :

Suffice to say, doing this at scale is going to require some kind of automated script. For your convenience, a zip file that contains only the MMI data for each earthquake:

/files/data/cartodb/ok-schoolquakes/usgs-intensity-maps-ok.zip

If you upload that to CartoDB, you will get 5 separate tables (one for each earthquake). You can join them into a single table – which I refer to as ok_shakemaps – with this SQL:

SELECT 'usp000jadn' AS id , the_geom , value AS intensity FROM mi_usp000jadn UNION ALL SELECT 'usb000ldeh' AS id , the_geom , value AS intensity FROM mi_usb000ldeh UNION ALL SELECT 'us1000424d' AS id , the_geom , paramvalue AS intensity FROM mi_us1000424d UNION ALL SELECT 'us200030gd' AS id , the_geom , paramvalue AS intensity FROM mi_us200030gd UNION ALL SELECT 'us10003zgz' AS id , the_geom , paramvalue AS intensity FROM mi_us10003zgz ;

Visualizing the earthquake intensity measurements

When mapped, the intensity measurements are rendered as contour lines. The level and amount of detail is quite intricate, which means that visualizing all of it – which I've done below as a choropleth based on the intensity column – results in a complete mess:

So referring to the MMI scale, it seems we should use SQL to filter for intensity values 5.0+, which involve a level of shaking that can cause "light" damage. However, Crescent High School – the school that necessitated the first ever earthquake insurance claim for an Oklahoma public school – was situated in an intensity zone of 4.6.

So let's just use the value 4.5 – which, yes, is confusingly coincidentally the same as the cutoff value we used for filtering earthquakes by magnitude, but whatever – as a cutoff for intensity values:

SELECT * FROM ok_shakemaps WHERE intensity >= 4 . 5 ;

And now we have a pretty sweet-looking – and hopefully more scientifically-relevant map:

Customizing the CartoCSS

OK, a quick detour into design considerations. The fill color for the intensity contour lines obscures the median household income data that we mapped earlier before. Maybe the household income data is not particularly relevant? Or, if we feel like we must have that context, then we can compromise by rendering the intensity data as simply color-coded lines, and reduce (or eliminate) their corresponding fill color.

This kind of customization can be done with the CartoCSS panel. Standard CSS, nevermind CartoCSS, nevermind CartoDB's implementation of CartoCSS, is far beyond the scope of this tutorial. I direct you to CartoDB's writeup and my previous tutorial.

Here's the CartoCSS I used for the finished map:

/** choropleth visualization */ #ok_shakemaps { polygon-fill : #FFFFB2 ; line-color : #FFFFB2 ; polygon-opacity : 0.1 ; line-color : #FFF ; line-width : 0.5 ; line-opacity : 1 ; } #ok_shakemaps [ intensity <= 6 .5 ] { polygon-fill : #B10026 ; line-color : #B10026 ; } #ok_shakemaps [ intensity <= 6 .2 ] { polygon-fill : #E31A1C ; line-color : #E31A1C ; } #ok_shakemaps [ intensity <= 5 .8 ] { polygon-fill : #FC4E2A ; line-color : #FC4E2A ; } #ok_shakemaps [ intensity <= 5 .6 ] { polygon-fill : #FD8D3C ; line-color : #FD8D3C ; } #ok_shakemaps [ intensity <= 5 .2 ] { polygon-fill : #FEB24C ; line-color : #FEB24C ; } #ok_shakemaps [ intensity <= 5 ] { polygon-fill : #FED976 ; line-color : #FED976 ; } #ok_shakemaps [ intensity <= 4 .7 ] { polygon-fill : #FFFFB2 ; line-color : #FFFFB2 ; }

Here is the less-cluttered result:

Note that I've left the 4-mile-circular representations of each earthquake to show how different the visualization for our simplified model ("each earthquake is just a big circle!") versus what we can get from the USGS ShakeMap program. The ok_earthquakes_radii dataset is not particularly worth keeping in the final visualization so we can go ahead and take it out.

However, it is worth keeping the earthquake characteristics, such as time and mag , none of which are included in the ShakeMap intensity tables. So let's do a join:

SELECT ok_shakemaps . * , ok_earthquakes . time , ok_earthquakes . place , ok_earthquakes . mag FROM ok_shakemaps INNER JOIN ok_earthquakes ON ok_earthquakes . id = ok_shakemaps . id WHERE ok_shakemaps . intensity >= 4 . 5 ;

This doesn't inherently change the data visualization, but it allows us to configure the CartoDB interactive layer to provide a popup infowindow when a user clicks on one of the contours:

Recalculating the list of schools potentially at-risk for earthquake damage

The area encompassed by the intensity contour layer is significantly larger than the area encircled by our simplified, now-obsolete visualization of earthquake epicenters. This means that we need to write a new query to find all the schools within intensity zones of 4.5+.

This is just another call to the ST_WITHIN function. If you've built your map like I have, you'll want to execute this SQL in the layer for ok_schools:

SELECT ok_schools . * FROM ok_schools INNER JOIN ok_shakemaps ON ST_WITHIN ( ok_schools . the_geom , ok_shakemaps . the_geom ) AND ok_shakemaps . intensity >= 4 . 5 ORDER BY SCHNAM , LCITY ;

The resulting list contains more than 450 schools. To make the final map more user-friendly, I've written some custom HTML for the pop-up window when a user hovers over each school location:

<div class= "cartodb-tooltip-content-wrapper" > <div class= "cartodb-tooltip-content" > <h5><strong> {{schnam}} </strong></h5> <h4> {{lcity}}, OK {{lzip}} </h4> <p> Number of students: <strong> {{member}} </strong></p> <p> Free/Reduced Lunch Ratio: <strong> {{frlp_ratio}}% </strong></p> </div> </div>

Note: I don't quite get the limitations of CartoDB's inline-HTML. I wouldn't spend too much time obsessing over it:

Oklahoma's fault lines

The USGS and Oklahoma Geological Survey (OGS) both have more specific geological datasets. I'm not a geologist so I have even less of a clue of how to appropriately apply the data.

The OGS and USGS were previously and famously at odds about the cause of Oklahoma's earthquake surge. I'm not sure if the OGS fault map is different than the one produced by the USGS Mineral Resources On-Line Spatial Data page.

I've uploaded and mapped the OGS's file that is labeled Compressed directory containing GIS-compatible files (ZIP), and the result looks like this bloody streak of a mess:

I don't know what insights to draw from the fault lines, particularly what's to be made of the fact that none of the major earthquakes have occurred near the areas of most concentrated fault lines (if that's even the correct way to interpret the fault lines). So I've hidden the layer by default from my final visualization, but you can turn it on (by clicking the thingy in the top right of the interactive map). And I might update this section whenever USGS gets around to releasing its "new hazard map for earthquakes from induced seismicity", if they haven't already (who knows, I'm not a geo-seismic-ologist).

Conclusion

After all that work, here's the final map (just in case you missed it in the embed at the top of this page):

Whew, that was a lot of work. But even after all of those steps, it's important to realize how narrow and limited its conclusions are. In fact, I would argue that it hardly makes any strong conclusions, given that we visually imply but can't actually back these key assertions:

That the locations of past Oklahoma earthquakes correlates with future Oklahoma earthquakes. That concentration of poverty in a student population correlates with a school's level of maintenance and repair. That areas with similar Modified Mercalli Intensity measurements experience the same potential risk of damage (they don't – such a calculation has many other variables).

We might go as far as to argue that since the vast majority of schools in our geospatial analysis have not reported any damage, that would seem to make our geospatial analysis pretty useless.

And that would be a fair point. But it's worth noting that both schools we know to be damaged – Crescent and Meeker schools – are both included in our geospatial analysis. So that's not nothing. And of course, as is the inherent nature of self-reported incidents – we don't know if there are schools that suffered deeper, structural damage that may become visually evident over time, or when other earthquakes hit. In the news report of the Meeker school damage, the M5.6 earthquake may have caused damage that was only noticed in the later, but weaker M4.7 quake.

But we didn't have to set out to prove anything. In some investigative projects, it's helpful to at least narrow down a list of leads and work from there. And the goal doesn't have to be investigative – there's nothing wrong with doing an inquiry that helps to guide proactive, precautionary safety measures. And hell, maybe everything will just work out and we're actually at the tail end of Oklahoma's earthquake swarm. But if this geological trend continues – and the number of earthquakes is already too many to easily count – it's worth considering the important insights that geospatial analysis and public data can reveal.