Data Discovery - Football Statistics

After a day at work extracting and rationalising numerous datasets such as SAP, Oracle and Social Media a nice way to unwind is of course by looking at another dataset; football statistics (my wife does not agree!). 

Over the last 5 years the amount of data collected and analysed has vastly increased in the football world with organisations such as Opta being heavily used by the media, fans and the clubs themselves to drive performance. In this blog I will briefly look at the performance of the 'top 6' sides in the Premier League so far in terms of shots and goals (after 20 games of the season).


Shots vs Goals

A good starting point is to look at the number of shots against the number of goals that a team has scored. 

It's not a huge surprise to see that the 'top 6' are well ahead in terms of shots vs goals and it appears that Man Utd have under-performed so far this season when it comes to goals scored. 


Shots vs Expected Goals

A relatively new statistic is Expected Goals (xg) and it looks at the number of estimated expected goals based on factors such as shot location, shot type and assist type. The above scatter chart shows the total number of shots, however it does not distinguish between a shot that is taken inside the 6 yard box and a shot that is a 40 yard attempt. The latter has obviously less chance of becoming a goal than the former and this is where expected goals come into play. Thanks to @MC_of_A for the data behind the expected goals.

As you can see the above chart has the 'top 6' much closer in terms of shots vs expected goals, with a significant gap to the rest of the league.


Variance in Goals and Expected Goals


Looking at the variance in goals and expected goals it is clear than Liverpool, Arsenal and Chelsea are significantly ahead of their expected goals total, whereas Man Utd are actually behind. This suggests that Man Utd are not converting their chances as efficiently as the other teams. In the league overall there are only 4 teams that are currently less than their expected goals (Man Utd, West Ham, Everton and Southampton).


Shooting Accuracy

Looking deeper into Man Utd's efficiency problem it is interesting to look at the shooting accuracy (shots on goals / total shots) for the first 10 games compared to the second 10 games.

The overall trend for this period is negative with some impressive performances mixed in with some below average performances. This inconsistency ties in with their results over the period (W, W, W, L, L, W, D, D, L, D).

The second half of the season paints a more positive story with the team consistently being more accurate with their shooting and this has led to improved results. This turnaround is perfectly highlighted by the shots per goal ratio that Zlatan Ibrahimovic had in the two periods. In the first 10 games it was over 10 shots per goal, whereas in the second half it is an impressive 2.4 shots per goals.

Data Discovery

I could write about football statistics all day long with a mountain of data now available, however I hope the above has highlighted that it is not just about looking at the high level figures, by delving into the detail you will discover information that you never expected or that can help justify a decision that you will need to make. 

Someone once told me "Data is data", which it is of course, however it gives people the ability to tell stories and understand where efficiencies can be made for the greater good, whether it is Zlatan's shots per goal ratio or the average cost of apples.


Some other random football stats

  • Curtis Davies, Michael Dawson, Ben Mee and Michael Keane have accounted for over 8% of all blocked shots (1,380 in all league) #blockers
  • West Brom have the second worst shooting accuracy in the league (36.4%) but have the best shots on goal per goal (2) in the league; so they essentially score every other shot that is on target #efficient
  • Joshua King commits on average 34 fouls per card, whereas Danny Simpson commits 2.5 fouls per card #disproportionate