Keys to the match: In tennis, is it an advantage to serve first?

Tennis is, like many other sports, rife with commonly accepted knowledge that seems to be repeated and promulgated by experts and novices alike. Some of these “universal truths” have never been tested against the data that is now available. The data used in this article includes data from the four tournaments of Australian Open, Roland‑Garros, Wimbledon and US Open. Collectively known as the Grand Slam tournaments, these are the pinnacle of modern tennis and are by far the tennis tournaments that draw the largest crowds to the stadiums, and in front of TVs around the world.

It is believed by many that there is an advantage to be serving in the first game of a set and, in particular, in the deciding set of a match. It supposedly gives the player a chance to easily get ahead and puts the pressure on the opponent, who is left to play catch-up throughout the set. It sounds compelling, but can it be supported by the rich data that is now available from the professional tennis tournaments?

Who gets to serve in the first game?

The first thing to investigate is the question of how a player gets to serve in the first game of a set: The serve in the first set is decided by a coin toss. The chair umpire tosses a coin at the net, the players call heads or tails, and the winner of the coin toss gets to decide whether he/she wants to serve, receive, choose a side of the court, or defer the decision to the opponent. If the player who won the coin toss chooses to either serve or receive, then the opponent gets to choose a side of court. If the winner of the coin toss selects a side of the court, then the opponent gets the choice of whether to serve or receive. A more detailed explanation can be found on USTA’s Improve Your Game website. Since there is a belief that serving first represents an advantage, it is thought that most players who win the coin toss elect to serve first. For most purposes, we can consider the decision as decided by the coin toss, which is inherently random.

From the first game onwards, the serve alternates between the two players, which mean that the player who serves in the first game of the second set is the player who did not serve in the last and deciding game of the first set (not considering sets that end with a tiebreaker). A player will then serve the first game of the second set if:

The player won the set by breaking the opponent’s serve in the last game of the first set

The player lost the set as the opponent held the serve to win the last game of first set

Either way, the set was decided by a game where the opponent was serving and the serve is no longer random, but dependent on the outcome of the previous set and any advantage can no longer be separated from that outcome. The exception to this is when the first set is decided by a tiebreaker; if that is the case, then the player who was receiving in the first game of the first set will be serving in the first game of the second set. When the first set is decided by a tiebreaker, then the serve in the second set is independent of the result of the first set, but in later sets, the serve will then again be dependent on the result of the last game of the previous set—unless, of course, that set is also decided by a tiebreaker.

Table 1 – The result of the previous set, for the player serving the first game of a set other than the first set

Since tiebreakers are relatively rare, chances are that if you are serving the first game in a set other than the first, then there is a 65.9 percent probability that you lost the previous set.

This essentially reduces the question to whether it is an advantage to be serving in the first game of the first set.

Serving in the first game of the first set

In the first set of a match, the player serving in the first game is as close to random as we can get and there is no dependency on the player’s seeding in the tournament or on the results of the previous set.

Table 2 – Outcome of the first set for the player who served the first game of the set

It appears that there is a statistically significant advantage to serving the first game in the first set, and that the advantage is the same for both Gentlemen’s and Ladies’ singles, as the difference between the two is not statistically significant. Though the advantage gained is very small, it could be the advantage that might decide an otherwise very close set.

There is a very similar relationship between serving in the first game of the first set and winning the match, but the data also shows that the player who wins the first set goes on to win the match in around 80 percent of the matches.

Table 3 – Outcome of the match for the player, who won the first set

Serving in the first game of the deciding set

For the last and deciding set of a match it is thought to be especially advantageous to be the player serving in the first game of the set and, while the deciding set is still subject to the same lack of randomness as described above, it is worth examining. After all, getting to a deciding third or fifth set does require that the two players have won the same number of sets leading up to this point and, therefore, they should have an equal chance of winning the set and the match.

Table 4 – Outcome of the match for the player, who served the first game of the deciding set

The numbers show that there is neither an advantage nor any disadvantage to serving in the first game of the deciding set in one of the four Grand Slam tournaments. They also show that this is true in both the Gentlemen’s and the Ladies’ Singles.

The Grand Slam tournaments does not deploy a tiebreaker in the deciding set, but play out the set until one player has a two‑game margin, with the exception of the US Open. This does set the US Open apart from the other three tournaments. Since there is a major difference in the rules for the deciding set between the tournaments, it is worth investigating the difference between the tournaments that do not use a tiebreaker in the deciding set and the tournament that does.

Figure 1 – The need for a tiebreaker in the deciding set at the US Open

The number of deciding sets from the US Open, however, is relatively small compared to that of the other three tournaments combined and the difference between the US Open and the other three tournaments is therefore not significant due to a higher level of uncertainty attached to the numbers from the US Open. From the small number of matches that required a deciding set at the US Open, even fewer required a tiebreaker to determine the outcome of the deciding set and the data set is reduced to fewer than 50 sets for each of the two events and less than 15 percent of the US Open matches that required a deciding set.

Matches between similarly ranked players

In the first few rounds of any Grand Slam tournaments, you will often see matches where the two players are not evenly matched; this is by design through the seeding of players and the way that the draw is organized. When a player, who is seeded in the top four for the tournament, such as Novak Djokovic, meets an unseeded player who is playing his first US Open, such as Ricardas Berankis, you would not expect that the outcome of the match depends on whether Berankis serves in the first game of any set. It would be very surprising if Berankis would be able to push the match into a deciding fifth set. This begs the question: what if we only consider matches between players who are evenly matched? If either player has a realistic chance of winning, is there an advantage to serving in the first game of the first set?

It is not straightforward to determine if two players are evenly matched, but the available data does contain the seeding of each player in every tournament, so we can use that as the basis and trust that the clubs did a good job when assigning a seeding to each player.

For the first four rounds (Round one through Round 16), we will consider the players evenly matched if:

The two players are within six seedings of each other, or

Both players are unseeded and they gained entry to the tournament in the same way: either by having a high ATP or WTA ranking, making it through the qualifying tournament, being granted a “wild card” or by being a “lucky loser”

A seeded and an unseeded player are never considered to be evenly matched in this context. This leaves us exclusively with matches between two unseeded players in the first two rounds, and this is also the case for the majority of the matches in round three. Once a player makes it through the round 16 and enters the Quarterfinals of a tournament, we will consider any two players to be evenly matched.

There are some obvious issues with this formula as there might be a huge difference between two unseeded players. One example is the first round match at the 2013 US Open, where Gael Monfils met Adrian Ungur. Both players were unseeded and were admitted into the tournament through their ATP singles ranking, but G.Monfils had a ranking of 39 before the tournament and was ranked as high as seven in July 2011, while A.Ungur was ranked at 105 going in to the tournament.

At the other end of the spectrum, there is the question of whether anyone currently has a realistic chance of winning when playing against Serena Williams regardless of how her opponent is seeded.

We will first re‑examine the outcome of the first set based on whether a player served the first game of the match against a similarly seeded opponent.

Table 5 – Outcome of the first set for the player, who served the first game of the match against an evenly matched opponent

Only looking at the matches between players who are evenly matched reduces the data set dramatically, and it now appears that there is no advantage from serving in the first game. As it turns out, the slight advantage that we found earlier is entirely down to matches that are not evenly matched.

When it comes to the deciding set, we would generally expect to see more matches go to a third or a fifth set, when the two players are evenly matched, such as some of the more recent and epic battles between Rafael Nadal and Novak Djokovic in the 2013 Roland-Garros semifinal and the 2012 Australian Open final.

Table 6 – Outcome of the match for the player, who served the first game of the deciding set against an evenly matched opponent

For the Gentlemen, there is no significant advantage from serving in the first game of the fifth set, but for the Ladies, there might be a small disadvantage from serving in the first game of the deciding set, which is contrary to popular belief. There are, however, a few additional things to consider: an evenly matched Ladies’ Singles will go to a third and deciding set 35.6 percent of the time compared to just 20.4 percent of the time for the Gentlemen, and since the Ladies play only a maximum of three sets, the momentum going into the third and deciding set might be more heavily influenced by the outcome of the second set. The effect that we are seeing might therefore be a ghost of the effect that was alluded to in Table 1, where the player serving the first game of a set is likely to have just lost the previous set.

Figure 2 – The number of matches that required a deciding third or fifth set

Conclusion

In conclusion, there does appear to be a small advantage to be gained from serving in the first game of the first set, but the advantage evaporates when the two players are evenly matched.

In the deciding set, the result is independent of whether a player serves in the first game of the set, and this holds true for matches between players who are evenly matched as well.

Although not shown in this post, these conclusions also hold true for each of the four tournaments (Australian Open, Roland‑Garros, Wimbledon and US Open), but there might still be specific situations or individual players who do gain a psychological advantage from serving in the first game of a set. It would be interesting to utilize the detailed and rich data set to examine this further.

For more from Keys to the Match, watch Keys to the Match: IBM Big Data and Analytics Powering Predictions and visit the corresponding infographic.

Follow @IBMBigData

Follow @IBMAnalytics

Data

The data used in this paper includes a total of 7,262 matches from the four major tennis tournaments from 2005 through 2013 (nine years). The data only includes matches that were completed normally and matches that ended in retirements were excluded. Only matches played on courts designated as “show courts,” where detailed data is collected by IBM, are included.

Table 7 - Number of matches included in the data set by year and tournament

Since each match consists of at least two and as many as five sets, the total number of sets included in the data is 8,440 from the Ladies’ Singles and 13,099 from the Gentlemen’s Singles.

Significance testing:

For any of the significance testing done in this paper, the null hypothesis have been rejected or not rejected at a 5 percent significance level unless otherwise specified.

Figures & tables:

Table 1 : The result of the previous set, for the player serving the first game of a set other than the first set (page 2) The table includes each set from the matches listed above except for the first set of each match and any set, where the previous set was decided by a tiebreaker.

: The result of the previous set, for the player serving the first game of a set other than the first set (page 2)

Table 2 : Outcome of the first set for the player who served the first game of the set (page 2) Since each match must have a first set, you would expect the total number of sets included to be equal to the total number of matches included in the data, but there is one match in the data set where we are missing the information about who served the first game in the first set, so this is excluded from the data for this table.

: Outcome of the first set for the player who served the first game of the set (page 2)

The missing set is from a Gentlemen’s Singles match in the 2nd round of the 2010 Wimbledon tournament.

H0: Independence between serving the first game of the first set and the outcome of the first set

Gentlemen’s Singles: p-value = 0 Null hypothesis rejected

Ladies’ Singles: p-value = 0 Null hypothesis rejected

H0: Independence between the event and the outcome of the first set, when serving in the first game of the first set

p-value = 0.399 Null hypothesis not rejected

Table 3 : Outcome of the match for the player, who won the first set (page 2) The sets included in this table are the same as those included in Table 2 above.

: Outcome of the match for the player, who won the first set (page 2)

H0: Independence between the outcome of the first set and the outcome of the match

Gentlemen’s Singles: p-value = 0 Null hypothesis rejected

Ladies’ Singles: p-value = 0 Null hypothesis rejected

H0: Independence between the event and the outcome of the match, when having won the first set

p-value = 0 Null hypothesis rejected

Table 4 : Outcome of the match for the player, who served the first game of the deciding set (page 2) The deciding set is defined as the third set for the Ladies’ Singles and the fifth set for the Gentlemen’s Singles. Only about 29.5 percent of the Ladies’ Singles and 18.4 percent of the Gentlemen’s Singles matches in the data set reached a deciding third or fifth set.

: Outcome of the match for the player, who served the first game of the deciding set (page 2)

H0: Independence between the serving the first game of the deciding set and the outcome of the match

Gentlemen’s Singles: p-value = 0.659 Null hypothesis not rejected

Ladies’ Singles: p-value = 0.184 Null hypothesis not rejected

H0: Independence between the event and the outcome of the match, when having won the first set

p-value = 0.740 Null hypothesis not rejected

Figure 1 : The need for a tiebreaker in the deciding set at the US Open (page 2) The data set is reduced to only the deciding sets (3rd set for the Ladies’ Singles and 5th set for the Gentlemen’s Singles) played at the US Open since this is the only tournament of the four that uses a tiebreaker in the deciding set.

: The need for a tiebreaker in the deciding set at the US Open (page 2)

H0: Independence between the event and the need for a tiebreaker to determine the outcome of the deciding set

p-value = 0.096 Null hypothesis rejected at a 10% significance level

Table 5 : Outcome of the first set for the player, who served the first game of the match against an evenly matched opponent (page 2) The definition of evenly matched players dramatically reduces the data set to just about 25% of the original data, but there are notable differences from round to round that might be worth pointing out.

: Outcome of the first set for the player, who served the first game of the match against an evenly matched opponent (page 2)

Minor changes to the definition such as allowing for a difference of 10 in the seeding instead of 6 includes more matches from Round 3 and from the Round of 16, but it would not cause any changes to any of the other rounds nor to the conclusions drawn from the data.

H0: Independence between serving the first game of the first set and the outcome of the first set

Gentlemen’s Singles: p-value = 0.144 Null hypothesis not rejected

Ladies’ Singles: p-value = 0.707 Null hypothesis not rejected

H0: Independence between the event and the outcome of the first set, when serving in the first game of the first set

p-value = 0.353 Null hypothesis not rejected

Table 6 : Outcome of the match for the player, who served the first game of the deciding set against an evenly matched opponent (page 2) When the two players are evenly matched, there are a higher proportion of matches that require a deciding third or fifth set to determine the outcome of the match. When the players are evenly matched we find that 35.6 percent of the Ladies’ Singles matches and 20.4 percent of the Gentlemen’s Singles matches require a deciding set.

: Outcome of the match for the player, who served the first game of the deciding set against an evenly matched opponent (page 2)

H0: Independence between serving the first game of the deciding set and the outcome of the match

Gentlemen’s Singles: p-value = 0.388 Null hypothesis not rejected

Ladies’ Singles: p-value = 0.059 Null hypothesis rejected at a 10% significance level

H0: Independence between the event and the outcome of the match, when serving in the first game of the deciding set

p-value = 0.200 Null hypothesis not rejected

Figure 2: The number of matches that required a deciding 3rd or 5th set (page 2) The figure only includes matches between players, who were evenly matched using the same definition as described for Table 5 above.



H0: Independence between the event and requiring a deciding set

p-value = 0 Null hypothesis rejected

References:

Magnus, Jan R. and Klaassen, Frand J.G.M. (1999), “On the advantage of serving first in tennis: four years at Wimbledon”, The Statistician

http://www1.fee.uva.nl/pp/klaassen/index_files/service_statistician_web.pdf

Robson, Douglas (2012), “Is serving first big advantage? Maybe not, stats say”, USA Today

http://usatoday30.usatoday.com/sports/tennis/story/2012-06-28/wimbledon-serving-first/55901966/1

Sackmann, Jeff (2010), “First Server Advantage? – Now With Data!”, The Summer of Jeff

http://summerofjeff.wordpress.com/2010/12/07/first-server-advantage-now-with-data/

USTA, “Improve Your Game | Winning the Toss”

http://www.usta.com/Improve-Your-Game/Rules/Serving-and-Receiving/Winning_the_toss/

Read more on my background and interests, and follow me on Twitter:

Follow @KennethAJensen

Follow @IBMBigData

Follow @IBMAnalytics