Machine learning is kind of magic right? But is it the kind of magic that can make us rich? And I don’t mean lucrative consulting gig rich, I mean digging valuable metals out of the ground rich. The kind of rich you can horde in treasure chests.

Also I’d been meaning to try out some transfer learning and looking around for a good topic to try it on. Transfer learning is where you take a pre-trained convolution (or other) network and use it for your task. It’s great for convolution networks particularly because they take a long time to train. South Australia where I live has taken a lot of effort to share data sets, including some great geological resources, so I thought why not stick the two together and use convolution networks to find gold. The easiest way to make use of convolution networks is to turn them into feature extractors, which is what I’ll show you in this post.

The code that goes along with this blog post can be found on GitHub:

Minerals

So to have a computer find you treasure, you’ve got to give it something to work with. Like a treasure map. So my first assumption is that a map of geological features might be enough to spot likely places if you knew how to look at it.

Of course it’s not like geologists don’t do this already. They probably even know the names of the minerals and whatnot’s they are looking for unlike me. So I found some papers written on finding minerals (gold in particular) using images. Instead of my first idea of maps, which would have been a fun challenge for a convolution network, there is a multi-spectral satellite image source called ASTER which can provide clues to mineral resources on and in the ground.

Multi-spectral imaging is taking a picture with a camera that can see more than just red, green and blue. Some of these bands in ASTER are good at spotting particular minerals, or mineral families. This following paper goes into how these bands have been used previously and how they could be used.

While the paper comes up with their own scheme, they mention the following mineral indexes as being particularly good for finding gold: “OHI is the index for OH-bearing minerals, KLI is the kaolinite index, ALI is the alunite index, and CLI is the calcite index”.

ASTER has a bunch places where we can download it free of charge, but the downside of wonderful real data set like ASTER is that it’s full of all sorts of real world complications. Like clouds getting in the way, and like the fact that since 2008 some of the sensors on the satellite are basically cooked. Fortunately Geoscience Australia have done a lot of the hard work of stitching and manipulating the data and produced a bunch of big ‘.tif’ images to work with.

You can find the data on their ftp site with the links in the Github repo for this post. I grabbed a bunch of images for the location I was interested in, using the AlOH, MgOH and Kaolin band images.

That’s only half the story though, we have a lot of images but no clue where the gold is. The easiest way to use machine learning to help us find gold is to give it a bunch of images that have gold in them and a bunch that don’t and get it to figure out the difference. To get that training data we need to know where there’s gold. Another government department the rescue, this time a South Australian state department which has a great little tool they call SARIG.

It’s a map with a lot of different mineral related overlays, one of them particularly interesting to us — ‘All Mines and Mineral Deposits’ with a handy option to download the shape file geo database we have everything we need data wise.

Data Processing

Now we have lots of data, but it’s not simply a matter of feeding it into computer. You have to slice and dice it into the formats needed. To work geospatial data their are a bunch of great tools collectively known as GDAL: Geospatial Data Abstraction.

These tools will happily export info out of our ‘All Mines and Mineral Deposits’ shape files and slice up the massive image files that are also geospatially aware. As each mineral deposit entry in shape file is a point location, we can cut up the large input images into segments around each point of a size appropriate to feed into a convolution network. These slices can have multiple minerals and we don’t allow overlapping slices so that the eventual machine learning doesn’t just get to remember what different segments look like and learn from rote.

Then it’s a matter of merging the images from the different bands into a false colour image. A false colour image is when we use something other than real red, green and blue values in image — in this case we’re using the AlOH content, the MgOH content and the Kaolin index. One measure for each colour band. Here’s what a false colour image ready to feed into our convolution network looks like:

A false colour image using GEOScience Australia reprocessed ASTER data

Convolution Networks

Convolution networks have shown amazing ability to learn answers to visual problems. Recently there’s a been a bunch of open source tools released to develop convolution networks including: TensorFlow, Theano, Caffe, Torch. All great tools but we really need something with easy access to pretrained data. For that problem, and for abstracting convolution networks a step away from their implementation, Keras has been incredibly useful.

The author also provides this Github repo with models and pretrained data to work with. The ResNet50 network has produced good results, so that’s an obvious starting point for the network. We’ll be using this network to transform our training images into a vector features. Essentially we take the output of the ResNet50 at a point before it turns into a bunch of image class predictions and use that as the input for another classification problem. While training a convolution network is slow using it to generate features is quick and gives us a data source that has been proven to transfer to other problems well. It’s definitely an open question how well it transfers to something as esoteric as a processed false colour satellite image but trying it out is the easiest way to learn.

Classification with ResNet50 Features

After we turn our false colour images into features we can run a bunch of different machine learning algorithms. For this we make use of Spark, a cluster focused graph computation framework with a number of built in machine learning algorithms.

While the size of the problem doesn’t necessitate using a cluster to solve it, it does provide a nice interface to the algorithms. We can make use of Logistic classification, Support Vector Machine classification, Naive Bayes classification, Random Forests and Multilayer Perceptron classification with just a few lines of code. To make our data ready for Spark we turn it into LibSVM format and then further simplify it by filtering the examples into those containing gold and those that don’t. This leaves us with a lot more not gold examples, so we over-sample the gold ones to give us even numbers because it helps to train an algorithm with balanced data. We load up pyspark and use a few lines of code to run some different classifers. Here are the results of Spark’s classification algorithms:

Logistic Regression:

Test Accuracy = 0.767756482525 SVM:

Test Accuracy = 0.808342728298 Naive Bayes:

Test Accuracy = 0.616685456595 Random Forest (10 trees):

Test Accuracy = 0.727171 Multilayer Perceptron (layers [2048, 512, 2]):

Test Accuracy = 0.775255

This is just a measure of how often the algorithm was right about an image containing gold or not over the total number of testing examples. An accuracy of 80% doesn’t sound bad, and does match with other results shown in the literature but it’s difficult to interpret this data from a simple accuracy measure. We haven’t over sampled the gold examples in the testing data, so they contain a lot more not gold than gold. If an algorithm gets pessimistic about predicting gold it can still do quite well while not telling us very much about where to dig. What we really need is a measure of how likely we are to strike gold if we do go out with a shovel. This is called the precision of the prediction. The flip side of an algorithm being really precise is that it may choose to only predict gold when it’s very sure and miss a lot of examples. The measure of the amount of examples we correctly predict as gold out of the total gold samples is called the recall. We’re probably fine with a low recall, but we’d like precision to be as high as possible. Here’s the precision and recall for the data above:

Logistic Regression:

Gold Recall = 0.324840764331

Gold Precision = 0.337748344371 SVM:

Gold Recall = 0.235668789809

Gold Precision = 0.425287356322 Naive Bayes:

Gold Recall = 0.68152866242

Gold Precision = 0.269521410579 Random Forest (10 trees):

Gold Recall: 0.575163398693

Gold Precision: 0.328358208955 Multilayer Perceptron (layers [2048, 512, 2]):

Gold Recall: 0.317880794702

Gold Precision: 0.335664335664

So for the purposes of getting out there with a shovel the Support Vector Machine classification really wins. If it says there’s gold then there’s a 42% chance it’s telling the truth, which is out of 17% of samples in the testing data actually having gold. This is just for segments of South Australia that have known mineral deposits though, which is a factor to consider as it might correlate to certain features that aren’t seen in other areas.

‘X’ Marks the Spot

So given the data we have is where do we go digging? Obviously not where gold is already labelled because it’s already dug up. Which really just leaves where it isn’t labelled, which has the down side of already being mined for something — but not gold. We’ll take it though, a lot of the data in the mineral deposits is old and abandoned. The Support Vector Machine performed best, but we want to move beyond classification into probabilities. If we clear the threshold that turns the SVMModel in Spark into a classifier we can get raw scores straight from the estimates. Taking the max of the ones where there is no gold gives us this image:

Weird boomerang shape marks the spot

Let’s check it on the map:

Our great mining hope

Someone grab the shovels, we’re off! Turns out this is right next door to the White Dam Gold Mine. Which is I guess a good sign, but the mine is actually in our data already it’s just a bit to the left and down. There’s a good chance that the spot we found looks a lot like the mine next door. A quick look through the rest of the likely spots shows a lot of regions next to existing mines. Probably a good thing, but sort of hard to make use of. Running through the maps without mineral data section by section might yield something but it’s a bit speculative and might best be left to the reader. So we didn’t really find gold, but I’m still pleasantly surprised by the accuracy of convolution network features on such a different problem to that which they were trained.

Conclusion and Other Projects

After searching through the maps with the algorithms we’ve developed a great next step would certainly be to train a convolution network on raw ASTER data. Convolution networks are getting to human levels of success (and over) on certain vision tasks and I think it they could have a big advantage with multispectral data. Humans can only interpret visual data through at most (theoretical tetrachromats not withstanding) three channels but convolution networks are not similarly constrained. The current band ratio used for finding minerals would certainly map well to a network architecture.