The trickiest part of hunting for new elementary particles is sifting through the massive amounts of data to find telltale patterns, or "signatures," for those particles—or, ideally, weird patterns that don't fit any known particle, an indication of new physics beyond the so-called Standard Model. MIT physicists have developed an analytical method to essentially automate these kinds of searches. The method is based on how similar pairs of collision events are to one another and how hundreds of thousands of such events are related to each other.

The result is an intricate geometric map, dubbed a "collision network," that is akin to mapping complex social networks. The MIT team described its novel approach in a new paper in Physical Review Letters: "Maps of social networks are based on the degree of connectivity between people, and for example, how many neighbors you need before you get from one friend to another," co-author Jesse Thaler said. "It's the same idea here."

The Large Hadron Collider (LHC) produces billions of proton [corrected] collisions per minute. Physicists identify exactly which particles are produced in high-energy collisions by the electronic signatures the particles leave behind, known as nuclear decay patterns. Quarks, for instance, only exist for fractions of a second before they decay into other secondary particles. Since each quark has many different ways of decaying, there are several possible signatures, and each must be carefully examined to determine which particles were present at the time of the collision.

Detectors like the Compact Muon Solenoid (CMS) collaboration filter out signals using so-called "triggers"—set off when an event indicates a specific particle of interest, or a potentially new particle, out of the tens of thousands of signals created every millionth of a second inside the accelerator.

“Maps of social networks are based on the degree of connectivity between people. It’s the same idea here.”

Here's an example: if a proton-antiproton collision produces a top quark and an antitop particle, these will instantly decay into two weak force (W) bosons and two bottom quarks. One of the "offspring" bosons turns into a muon and a neutrino, while the other decays into up and down quarks. The two bottom quarks decay into two jets of particles, as do the up and down quarks. So the signature of the collision is a muon, a neutrino, and four jets.

"Jets" appear because quarks can't exist in isolation; they must be bound inside hadrons. Whenever a quark is produced in a collision, it goes flying out of its host hadron, surrounded by a spray of hadrons, all traveling pretty much in the same direction. Studying the jet spray enables physicists to determine what kind of quark produced it.

Back in 2017, Thaler and his colleagues applied some of their novel analytical methods to a huge dataset from the CMS detector. The dataset consisted of some 29 terabytes of data involving about 300 million proton collisions within the LHC and had been released onto the CERN Open Data Portal. The idea was to demonstrate the usefulness of such methods to make sense out of that mountain of information.

This latest work builds on that. It is especially well-suited for hunting for new physics that falls outside existing theories—in other words, cases where physicists wouldn't know ahead of time what signatures they're looking for.

The basic idea is to compare many different events to each other, rather than analyzing each one individually. The spray of particles produced in a collision is modeled as a point cloud, like those used in computer vision for representing objects. This lets physicists clearly identify typical behaviors and more easily pick out outliers lurking at the fringes of the collision network.

"What we're trying to do is to be agnostic about what we think is new physics or not," said co-author Eric Metodiev. "We want to let the data speak for itself."

Key to this novel analytical method is an algorithm that calculates how much energy (or "work" in physics parlance) is required for one cloud in a pair to transform into another. This concept is dubbed the "earth mover's distance," or EMD. A pair of point clouds would be deemed farther apart if it takes a lot of energy to rearrange one into the other.

"You can imagine deposits of energy as being dirt, and you're the earth mover who has to move that dirt from one place to another," said Prof. Thaler. "The amount of sweat that you expend getting from one configuration to another is the notion of distance that we're calculating."

Using public data from the LHC, the MIT team constructed a social network of 100,000 pairs of collision events, assigning a number to each pair based on the "distance," or similarity, between them. Thaler would like to further test the team's technique on known historical data, such as rediscovering the top quark (first observed in 1995).

"If we could rediscover the top quark in this archival data, with this technique that doesn't need to know what new physics it is looking for, it would be very exciting and could give us confidence in applying this to current datasets, to find more exotic objects," said Thaler.

"It will be interesting to see where the ideas and techniques presented in this short and thought-provoking paper will bring us," wrote Michael Schmitt at APS Physics (Prof. Schmitt was not involved in the new paper). "The new EMD-based metric may well lead to better event classification techniques that enable experimenters to discover new physics beyond the Standard Model."

DOI: Physical Review Letters, 2019. 10.1103/PhysRevLett.123.041801 (About DOIs).