Some people like to do crossword puzzles. I like to do machine learning puzzles.

Lucky for me, a new contest was just posted yesterday on Kaggle. So naturally, my lazy Saturday was spent getting elbow deep into the data.

The training set consists of a series of ‘skies’, each containing a bunch of galaxies. Normally, these galaxies would exhibit random ellipticity. That is, if it weren’t for all that dark matter out there! The dark matter, while itself invisible (it is dark after all), tends to aggregate and do some pretty funky stuff. These aggregations of dark matter produce massive halos which bend the heck out of spacetime itself! The result is that any galaxies behind these halos (from our perspective here on earth) appear contorted around the halo.

The tricky bit is to distinguish between the background noise in the ellipticity of galaxies, and the regular effect of the dark matter halos. How hard could it be?

Step one, as always, is to have a look at what you’re working with using some visualization.

If you want to try it yourself, I’ve posted the code here.

If you don’t feel like running it yourself, here are all 300 skies from the training set.

Now for the simple matter of the predictions. Looks like Sunday will be a fun day too! Stay tuned…