Sure. So the machine learning problem that we are considering here is basically we wanna classify whether a particular sequence of dimmings that we observe on a star was caused by a planet or not… Because as Andrew mentioned, when a planet passes in front of the star relative to the telescope, you’ll see the brightness of the star dim and then come back up again. But there are other possible events that can also cause the brightness of the star to dim (or apparently dim), as measured by the telescope.

One thing that can happen is you can have two stars orbiting each other, rather than a planet orbiting a star… And when one star passes in front of another star, you’ll also see a dimming in the measured brightness.

Another example is star spots. Some stars have dark spots on them, and the stars themselves can be rotating… So every time that star spot rotates in the line of sight of the Kepler telescope, you’ll see the measured brightness dimming because of that dark star spot.

So the machine learning problem that we’re focusing on here is, okay, we see this dimming of the star - was this caused by a planet or not? Obviously, one of the main ingredients into machine learning is having a training set of data that has already been labeled. Luckily, in the case of the Kepler mission, which ran – at least the main Kepler mission ran from 2009 to 2013… Astronomers have paid a lot of attention to the data already, and had gone in and actually classified by eye over 30,000 of these signals. So we already had a training set of these dimming signals, where some of them were known to be planets and some of them were known to be various other phenomena - false positives, like I mentioned… Sometimes even instrumental false positives that can cause the star to apparently dim.

So you asked about the feature selection… I guess there’s perhaps two approaches you could take here. One of them is you could kind of sit down and you could think about what features you think are important for classifying one of these detected signals as either a planet or not… And others have actually done this with the Kepler data, and it works kind of well. This is more of a traditional machine learning approach, I guess. You could sit down and you could say “Okay, what’s the brightness of the star? What’s the period of the dimming that we observe? What percentage of the star’s brightness appears to dim?” The signal-to-noise ratio is a statistic that we can measure… And you can feed all those into a machine learning model and use those as features to make a classification.

In this project we actually took a slightly different approach, and we didn’t sit down and think about any of those features ourselves. Instead, we kind of treated these light curves that Andrew mentioned as kind of like a one-dimensional image.

[ ] If you imagine that, for example, a photograph is actually a two-dimensional image, right? It’s like a two-dimensional grid of pixels. Well, what we have is we have a sequence of brightness measurements over time, and so we treat that one-dimensional sequence of brightness measurements as kind of like a 1D photo or a 1D image. So we trained a type of model called a convolutional neural network, which is exactly the kind of model that we typically use to classify photos, that’s actually been very successful in recent years.

So we kind of applied a very similar model to one that is used to detect, say, cats and dogs in the photos you take on your phone, and we applied that to this problem. So we kind of give the input as actually the light curve itself, and that’s the only input that our model gets.