Where does your own interest in the intersection of astronomy and computer science come from?

I’m one of those people who could never decide whether I wanted to do computer science or astronomy. Computers are these wonderful things because it’s creation without boundaries. When you write a piece of code, it’s like building something, building a new world in that computer. It’s almost artistic. That was very appealing, but on the other hand I wanted to understand the world and I wanted to figure out how things work in the world. So I sort of flipped the coin and did physics. When I got to Princeton, where I did my Ph.D., the Sloan Digital Sky Survey was just starting. I thought, “Wow, there’s a tremendous amount of data, and people were struggling with understanding those data.” I realized at that point that my dreams came true: I don’t have to make a decision anymore whether to do something computer-related or astronomy-related because this type of environment requires both.

Does all of your astrophysics work relate to algorithms and computer programming?

I would say it’s a means to an end. I spend a lot more time focused on the algorithms themselves, but I mostly like to use those things to actually find interesting results. I’m driven by answering problems in astronomy, but I want to make sure I do it in such a way that the next person can build on what I’ve done.

You mentioned the Sloan Digital Sky Survey. How does LSST build upon that?

Sloan generated I think a total of about 10 to 20 terabytes over its history, in terms of imaging. LSST does that in a night. In terms of the number of objects, Sloan was an order of 500 million stars, [observed] once. With LSST, it is going to be about 20 billion stars, and every one of them is likely to be seen 825 times. We’re going to be looking at time domain. It’s a huge volume. The other problem is that — whenever I say problem, just think of it as opportunity — LSST is going to be measuring dozens and potentially hundreds of things about every object.

There was a realization in the 2000s, that instead of building a separate telescope to do this part of astronomy, a separate telescope for this part, what we’re going to do is we’re going to build a telescope, the telescope, to essentially download this sky. You still need to process those data into a form that will enable the solar system scientists to focus on solar system objects, and dark energy folks do their weak lensing maps. Data processing became a huge thing for LSST. It’s one of the rare projects in astronomy where the data system — for which I was responsible — is as expensive and as big as the telescope itself and as the camera itself.

Something we haven’t yet touched on, but is absolutely crucial in astronomy measurements, is statistics.

When you’ve collected all the data there is to collect, the only thing that’s left is to analyze it better. We take measurements, and then how do you know what the measurement is telling you about whatever hypothesis you have? There are statistical methods that allow you to do tests, to fit the models. Statistics is, among other things, about extracting knowledge from data, quantifying your knowledge given the data you have. We use [statistics] very prescriptively, as in, here’s a statistical cookbook. You have to look at the ingredients and pick the right rule, the right recipe. If you have a data set that needs to fall in certain criteria, and if you apply that rule, good things will happen. We’re getting to a point where we’re measuring almost everything we can. The only way forward is to now do your data analysis correctly, because, we cannot do approximations anymore. People think statistics is boring, but once you understand what it is, it’s a fundamental element of science, of discovering knowledge in data.

This big-data evolution, as you call it, it’s not just in astronomy, correct?

Particle physicists have been dealing with it for a little bit longer, they’re maybe five to 10 years ahead of us. Oceanography is now entering into the same region. Ecology is entering the same region. The basic tools that you need to know just to do your science are changing.