There’s an enormous volume of statistics and metrics available to quantify golf performance, including specifics of the player’s game. Perhaps we need to start looking at these individual attributes to identify the best two guys for the job. The PGA Tour track everything from driving distance, to strokes gained from approach, to bouncebackability — there’s almost too much to choose from to make a decisive choice. To address this, we construct something known as a feature vector for each player consisting of the following key performance measures (with the intention of covering as many pillars of ‘good play’ as possible):

Strokes gained: off tee, approach, around the green, putting, tee-to-green, total

Driving: distance, accuracy

Percentages: greens in regulation, sand saves

Averages: eagles (holes per), birdie, scoring

This 13-component vector acts as a performance signature for every player. Using this, we can then begin to directly compare players, measure similarities and cluster similar playing styles together.

Here’s an example of Dustin Johnson’s features prior to processing:

Pre-scaling feature vector for Dustin Johnson

Each of these are scaled relative to the rest of the active players on tour (not just the Americans, as was the case with the previous analyses) in order to make them directly comparable in a global feature space. All this consists of is range normalising the values by subtracting the mean and dividing by the standard deviation from each column.

This creates a less manually interpretable but more statistically robust signature for each player, which allows us to measure distances between each of the features for two given players e.g. who is most alike DJ in driving and putting? There’s a few ways of computing distances in feature space, but we use a technique called cosine similarity to measure how alike two players are. This provides a value in the range [1, -1], with 1 being the maximum similarity and -1 being completely unalike.

If we assume the auto-qualified team members represent the ideal qualities of a US Ryder Cup player, we can build a combination of them by taking the average of their post-processed feature vectors. This acts as a sort of ‘centre of gravity’ for the eight players in the team already, and is an anchor point in the feature vector space around which the team exists. We can have a look at who’s feature vector is most similar to this.

Overview of ‘most like US Ryder Cup team’ method

So which of the three candidates is most similar to this ‘ideal US Ryder Cup player’ or ‘US team anchor’? Calculating the cosine similarity across all the non-qualified US players:

DeChambeau, Cantlay and Finau are all still on top! This is probably unsurprising as good tournament results are built on good play, but note that our feature vector representation doesn’t consider tournament position or strokes to the winner in the same manner as the first two metrics (though strokes gained is definitely related).

From this result, we can conclude that Bryson is statistically the most alike of the non-qualifiers to the auto-qualifiers, and that from the data he’d be a superb addition to the team (no surprises there).

As for Finau vs. Cantlay, that’s a really tough choice. The momentum is probably with Finau with more recent results and in higher profile tour events, but as we’ve seen here Cantlay does have the figures to back it up (and actually out-performs Tony in some aspects).

Best of luck to Jim Furyk with his selections, and to the US. Let’s go Europe!

Acknowledgements