The LHC today began running 7TeV collisions for the first time. In the instant that its detectors register the events associated with a collision, the challenges move from the hardware realm into software, as the LHC will literally produce more data than we can possibly handle. We have to figure out what to hang on to in real time, and send it around the globe via dedicated connections that aggregate multiple 10Gbp/s links; those on the receiving end need to safely store it and pursue the sorts of analyses that will hopefully reveal some new physics. In our final installment, we'll take a brief look at the computational issues created by the LHC.

Finding what we're interested in

The LHC isn't just exceptional in terms of the energy it can reach; it's got very high luminosity, which means that it produces collisions at a staggering rate. Howard Gordon indicated that interactions will take place at rates of 600 million events a second—for each detector. Even if we had the capacity to save records of all of them (which we don't), many of them will represent familiar physics. Srini Rajagopalan, a Brookhaven employee working at CERN, said that out of the millions of collisions that happen every second, we'll be saving roughly 400 of them.

Obviously, this a pretty significant culling process, one complicated by the fact that we're hoping to see particles that have been predicted by various theories (ideally, we'd also detect things that the theorists aren't expecting). How does that work? A hint of that was provided when Stephanie Majewski was asked about one model for the existence of dimensions beyond our well-known four. "Extra dimensions will give us lots of muon jets," Majewski said, "you couldn't miss it." In short, most of the things we're expect or hoping to find are the product of some fairly specific predictions, and will produce equally predictable patterns of particles in the detectors (Chris Lee covered this in a bit more detail).

Rajagopalan described how the ATLAS detector's software included what he called "event filters." Basically, the software can determine the extent to which the particles and energy that comes out of a collision matches a pattern that we'd expect to be produced by a given particle. These expectations can be based either on what we've already seen for known particles like the top quark, or that predicted by theory.

Right now, the software already has 300 event filters, but it can apparently handle up to 8,000, and prioritize each of them—so, for example, we're likely to try to capture more potential Higgs events than top quarks.

These filters can have various degrees of stringency, meaning they can be set loose enough to capture events that are similar to, but don't quite match the predictions. It's also possible to detect partial overlap between events. So, for example, an unknown particle might produce a set of familiar ones as part of its decay pathway—even if there's not a filter specific to that particle, the event might be captured because it looks a bit like something that also decays via a similar set of particles.

That last bit is important in case the theorists start coming up with ideas long after the LHC has started data gathering. As Howard Gordon put it, it's possible to take new ideas and compare them to the existing models to identify potential places of overlap, and to go from there to the primary data and test things against the predictions in a bit more detail.

The embarrassment of computational physics

As the primary ATLAS interface for the US, Brookhaven's main role will simply be storing any data that makes it through the event filters as it arrives from CERN, and distributing it to various Tier 2 and 3 locations across the country (Brookhaven also houses a 10,000-core grid computing facility that will perform some analysis). As Ofer Rind described it, since each event is essentially independent, they can all be analyzed separately; it's an "embarrassingly parallel problem," in computer science terms.

As a result, the high-energy physics community has a great deal of experience with grid computing. "We've been doing this for a while, and with a lot less money than the cloud folks," Rind said.

Part of that computing power simply goes to converting the raw data to particle identities and tracks, and another part to modeling what a theoretical particle might look like. But, as more data becomes available, a lot of the computation will simply involve scanning events to determine how well any of them match theoretical predictions. Users of the grid will be able to specify an analysis program (including one that they submit with the task), identify the data that it should be run with, and simply set the job in motion. Based on the priority of the work, the grid will find it spare processor time, and then bring the software and data together on the same machines, allowing the analysis to take place.

In the near future, these sorts of programs should start building up a catalog of collisions that have the right properties—correct number of muons, photons of the right energy, etc.—to contain an indication of something that's new to physics. And, once enough of these are identified to pull a signal out of the statistical noise, we might just be ready to start updating the standard model (and, possibly, all of cosmology).

Listing image by DOE