KingFut dives into shot statistics for the Egyptian Premier League to look at who has been the best team through 8 games in Egypt using advanced stats.

Which four teams do advanced stats have as the best in Egypt? Which teams have underperformed/over-performed? Who will win the league?!

Full disclosure: I’ve watched a total amount of zero domestic games this season. I live abroad and I haven’t been able to find a live stream of the Egyptian League. As a result, all I’ve had to go on are our excellent match reports here at KingFut. After 8 weeks of getting restless however, I decided to dig into some data to try and make sense of the Egyptian League table. Were Smouha really the best side in the country? What about Petrojet? Are both Ahly and Zamalek on the decline & perhaps going to miss the playoffs? All numbers are of Friday, February 21, 2014

What statistic shows the best teams?

The problem I was facing was that there was no statistic that could accurately sum up performance. How could I tell which teams were performing better.

There are two parts to finding a useful statistic:

1) Is it useful? (how linked it is to winning)

2) Is it repeatable? (can teams consistently do it or is it luck)

There are a lot of statistics that are useful, but not repeatable. For example, if I told you that Team X was awarded 3 penalties this match, the probability of them winning that match would probably be close to a 100%. That is a useful statistic, as it directly resulted in a win. However, what are the chances of Team X getting 3 penalties in another match? Is winning penalties a skill that can be repeated every match? Obviously, winning a penalty is very luck-based and not skill-based, therefore this statistic is not repeatable. The fact that Team X won 3 penalties tells us nothing about their skill level and predictions for their next matches! Since we want to predict the future, repeatability is key.

Repeatability

My first thought was to look at goals. Goals are definitely useful, as they lead to winning, and they seem pretty skill-based. The better goal difference a team has the better it is, right? Well, not really. It turns out that although Goal Difference is the most useful statistic in football (is the best predictor of points), it is actually not the most repeatable as we expected. It is actually tied for 2nd most repeatable! Goals don’t happen enough in football matches, and therefore they are controlled a little bit by luck. A silly own-goal, a defending error, all will lead to wins but are not the most repeatable skills! How do I know all this? Below is a chart by the excellent James Grayson (whose blog can be found here) showing how repeatable each statistic is using data from the EPL over more than 1 season.

Metric % Skill % Luck Total shots ratio 86 14 Total shots differential 86 14 Goal ratio 83 17 Goal difference 83 17 Total shots against 82 18 Total shots for 80 20 Goals for 75 25 Goals against 66 34 % of total shots that are on target (%TSOT) for 53 47 %TSOT for + %TSOT against 52 48 PDO (penalties excluded) (1) 46 54 % of total shots that are on target (%TSOT) against 44 56 PDO 44 56 sh% 43 57 sv% 38 62 sh% on shots from inside the box (2) 37 63 sh% (penalties excluded) (1) 36 64 sv% (penalties excluded) (1) 32 68 sv% on shots from inside the box (2) 24 76 sv% on shots from outside the box (2) 23 77 Penalties awarded differential (penalties awarded for minus penalties awarded against) (1) 9 91 Having penalties awarded against (1) 9 91 Penalty differential (penalty goals for minus penalty goals against) (1) 8 92 sh% on shots from outside the box (2) 8 92 Being awarded penalties (1) 4 96 Penalty goals conceded (1) 3 97 Penalty goals scored (1) <1 >99

So Goal Difference & Goal Ratio are 83 % skill, or 83% repeatable! The only statistic that improves upon that is TSR (Total Shots Ratio). Penalties, despite being useful, are 1% skill and 99% luck!

So we have two statistics that are heavily repeatable to look at. This will give us a statistic that is good at predicting future performances.

Usefulness

So are these statistics useful?

Any football fan can tell you that goal difference is obviously related to points. A quick look by Grayson for Goal Difference vs Points in the last ten years leads to this chart:

Those of you familiar with statistics, the R-squared value is what shows how related to factors are to each other. The R-squared here is .9281 which means that GD explains 92.81% of points, a very strong number!

What about TSR?

Although it is more repeatable, TSR is actually less useful than points, according to Grayson’s research.

So TSR is actually 66% correlated with points, which sounds like a low number but is actually quite significant. Both statistics have been proven to be useful & repeatable.

TSR vs. Goal Difference; which will we use to predict?

TSR is more repeatable, goals are more useful. So which should we use to predict? Although it seems like goals are more useful due to their high usefulness and repeatability, there is 1 factor that sways the argument towards TSR. At the time we collected data, most clubs in the Egyptian Premier League had only played 8 games. So we need to figure out, which statistic is more accurate after 8 games?

Again, the brilliant Mr.Grayson comes to the rescue with his heavy lifting:

After around 8 games, TSR is a lot more accurate than Goals Ratio (Goal Difference). TSR is at 80% accuracy after 10 games, which is really remarkable. As a result, it is more useful to use as a predictive tool than GD, especially so early in the season!

What is TSR?

I realize it’s taken me a while to get here, so thanks for the patience! I’m trying my best to keep the math to a minimum and the football to a maximum! So let’s get down to business. What is TSR?

TSR stands for Total Shots Ratio. I gave a brief introduction in my piece on the new advanced statistics in football.

Total Shots Ratio = (Total Shots for)/(Total Shots For/Total Shots Against).

For example, if a team shot 9 times in match and its opponent had 1 shot, then their TSR would be 90% or 0.9.

We’ve already covered the good things about TSR. It’s useful, and related to points. It’s very repeatable, with only 14% being luck. It also stabilizes fast, going above 80% after only 10 games!

So what are some of the bad things?

Well, it’s obviously not bullet-proof! Using TSR to predict, Tottenham would’ve won the league this year, which really doesn’t look like happening. The biggest drawback on TSR is basically shot quality. A team that takes 20 shots from the halfway line will have a higher TSR than a team that shoots 1 shot from inside the six-yard box! However, since players are rational ( and don’t want to get dropped!) the differences are rarely so drastic. However, a closer look at Tottenham this year shows that a ton of their shots come from outside the box, which inflates their TSR without giving them goals! TSR does also not take into account the skill of the opposition goalkeeper.

So if a team has higher point total than TSR would suggest then there are two options:

1) The team has been lucky and will start dropping more points soon!

2) The team is taking higher quality shots than other teams.

A look at the Egyptian Premier League

Using data from Koora.com, I’ve aggregated all the teams TSRs over the first 8 games. Gouna were omitted due to their being a lack of data for their matches!

Here are the results:

Teams Total Shots Ratio TSR RANK Points Per Game PPG Rank ElMinya 0.35 21 0.5 21 GhazlMahala 0.41 19 0.63 20 ElGeish 0.45 14 0.63 19 Entag 0.40 20 0.78 18 Telephonat 0.43 18 0.86 17 Raja 0.47 12 0.89 16 Dakhleyah 0.48 11 1 15 Qanah 0.44 17 1.11 14 ENPI 0.47 13 1.11 13 Haras-ElHdood 0.44 16 1.13 12 Elshorta 0.55 7 1.25 11 Makasa 0.52 8 1.34 10 AlMasry 0.45 15 1.38 9 Itihad 0.61 3 1.67 8 Wadi Degla 0.49 10 1.75 7 Ahly 0.62 2 1.78 6 Zamalek 0.68 1 1.86 5 Ismaily 0.60 4 1.88 4 Mokawleen 0.56 6 1.89 3 Petrojet 0.58 5 2.22 2 Smouha 0.49 9 2.38 1

This table takes a closer look at the results, team by team:

Points Per Game (PPG) are used due to some teams playing only 7 teams. Again, Gouna are not on the table due to a lack of shots data!

Observations:

The Top Six:

The most obvious observation from the table is that there are a clear top 6 in Egypt: Al Zamalek, Al Ahly, Ittihad of Alexandria, Ismaily, Petrojet and Arab Contractors.

Smouha:

The most surprising data point would be Smouha, who lead the league in PPG but are not part of these dominant six in TSR. This could mean 1 of 2 things:

1) Smouha have been lucky and will probably not keep up this league-leading form

2) Smouha take better quality shots than everyone else

Even if Smouha were to take better quality shots, the disparity seems too large and Smouha seems to be an ideal candidate for drop-off according to TSR!

The Mido Effect:

Another important observation is the dominance of Al Zamalek, the only club with a TSR of above .6500. It seems as though Zamalek have been performing well and have been a little unlucky, or perhaps they need to improve their shot quality!

Teams under the line have been relatively unlucky: Zamalek, Tala’a El-Gaish

Teams above the line have been relatively lucky: Smouha, Wadi Degla

Predicting the Final Table:

Using the formula Grayson has for TSR correlation with points ( instead of our formula due to the fact that he has substantially more data), I built a model of what the final EPL table would look like taking into accounts how many games each team has left:

Team Points Now Expected Points Total Points Zamalek 13 28.68 41.7 Petrojet 20 18.61 38.6 Ahly 16 21.09 37.1 Ismaily 15 21.69 36.7 Itihad 15 20.52 35.5 Mokawleen 17 17.87 34.9 Smouha 19 15.43 34.4 Wadi Degla 14 15.38 29.4 ElShorta 10 19.03 29.0 Makasa 12 15.85 27.9 AlMasry 11 13.12 24.1 ENPI 10 13.04 23 Dakhleyah 9 13.45 22.5 Haras El Hedood 9 12.70 21.7 Qanah 10 11.48 21.5 Raja 8 13.35 21.4 Telephonat 6 13.11 19.1 ElGeish 5 13.14 18.1 Entag 7 9.43 16.4 GhazlMahala 5 10.64 15.6 ElMinya 4 7.35 11.4

As you can see, this table does not take into account the 2 groups which are split. Furthermore, the margins are extremely small for error and success. Obviously the model does not take into strength of schedule, as well as the fact that the league is actually split into two groups.

The model has Zamalek and Petrojet qualifying from Group 2, and Itihad rebounding and qualifying with Al Ahly in Group 1. The model has Smouha collecting around 15 points from its last 14 games, and missing the playoffs due to their statistical weakness! Zamalek & Ismaily capitalize on their game-in-hand and TSR dominance and collect 28.7 & 21.7 points respectively in their last 13 matches!

The playoffs would be:

Zamalek vs Itihad

AlAhly vs. Petrojet

TSR is actually terrible at predicting single games, and so we won’t sue it to predict who the final winner is!

Ismaily get the short end of the stick here, despite finishing with more points than Itihad, due to them being in an extremely competitive Group 2, they miss out by around 2 points. As for Group 1, Arab Contractors have a real shot, as the model has them missing out by only a point!

As you can tell, the margins are tiny so I shouldn’t be held accountable for any results not falling the way I predicted them to! TSR is still a limited tool, but we’ll check back at the end of the season to see how accurate our model was!

Leave your predictions for the end of season table in the comments & join the discussion! I apologize if there was too much math for your taste!