Breaking down the Ballon d’Or shortlist: Messi, Ronaldo and Neuer

ESPN FC’s Paul Mariner and Alexis Nunes weigh in on the Ballon d’Or and FIFPro Top Forwards shortlists.

The three finalists for the FIFA Ballon d’Or were announced on Monday: Lionel Messi, Manuel Neuer and Cristiano Ronaldo. Here’s a look at the stats that made them great in 2014 and the possible accomplishments that lie before them…

* After one goal in eight previous World Cup games, Lionel Messi scored four times at the 2014 World Cup and won the Golden Ball as the tournament’s best player. He’s had 52 goals in 62 games for club and country this year, also adding the La Liga and Champions League… Read more ›

Massey Ratings For Football Part One

Introduction We all know the league table can lie and one of the common causes of this is strength of schedule. Take Southampton, at the time of writing they are currently second in the Premier League twelve matches in yet still haven’t played Chelsea, Manchester City, Manchester United or Arsenal. Without wishing to be dismissive […] Read more ›


Massey Ratings For Football Part One


We all know the league table can lie and one of the common causes of this is strength of schedule. Take Southampton, at the time of writing they are currently second in the Premier League twelve matches in yet still haven’t played Chelsea, Manchester City, Manchester United or Arsenal. Without wishing to be dismissive of Southampton, who undoubtedly are a very talented team, there’s a pretty decent chance that they’d currently be lower down the league table had these fixtures come up earlier in the season instead of Leicester, Hull or Aston Villa.

Massey Ratings

So if we can’t rely on the league table to tell us which teams are performing best what do we do? One alternative is to use Massey Ratings. This is a method devised by Ken Massey back in 1997 for his honours thesis that rates teams based on what opposition they’ve played. The system was originally designed for American Football but it can be adapted to football fairly trivially.

The idea behind Massey Ratings is that they rate teams such that the difference between any two teams is equal to the expected margin of victory between them, as shown in Equation One below:


where y is the margin of victory for fixture, ra is the rating of team a, rb is the rating of team b


In an ideal world we’d have enough data that we could calculate true ratings for each team but with players moving from one team to another and with football seasons typically lasting just 38 matches we never have sufficient data for that so we have to settle for approximating ratings based on previous match results. This means we need to modify equation one to add in an error term to allow us to account for any unexplained variation in the outcome of games (Equation Two below).


where y is the margin of victory, ra is the rating of team a, rb is the rating of team b and e is the remaining error in the model.

So far so good, but how do we know what ra and rb should equal? Well, to start with we want that error term we added into Equation Two to be as small as possible so we use a technique called Least Squares to find the optimal set of ratings for each team in order to minimise e based on the past data we have.

The Matrix

Things get slightly trickier here but let’s say our past data comprises m matches involving n teams. We know what the margin of victory was for each match and who won but not the ratings for each team so we have m equations we need to solve to find the n unknown rating values, which we can write as Equation 3 below:


Where y is the the margin of victory, r is the rating we are trying to find, e is the remaining error and X is an m x m sized matrix of coefficients where each row represents a matchup containing a 1 for the winning team and -1 for the losing team. Unfortunately though, this gives us a very sparse matrix that is likely to be highly over-determined making it difficult to find a unique solution to the system.

The Massey Matrix

Thankfully Massey discovered that you can modify the matrix such that the diagonal elements equal the number of games each teams has played and the off-diagonal elements equal the negation of the number of matchups teams have played against each other giving Equation Four below:


where M is the modified Massey Matrix, p is a vector of the score differentials and r is the vector of unknown scores.

We are getting closer now but the matrix still doesn’t necessarily have a unique set of Ratings so Massey modifies it further to set the bottom row to zero and the corresponding element of p to zero too. This constraint creates a full rank matrix for us and forces the ratings to sum to zero.

Massey Ratings For The English Premier League

Finally, using some linear algebra we can solve the system and get the ratings for each team, shown below in Figure One.

PelicanFigure One: EPL Massey Ratings

It’s no surprise that Chelsea are ranked far ahead of anybody else in first place but Southampton do actually get ranked in second place, showing that even accounting for their easier schedule to date they deserve to be second in the league at the moment.

Interestingly, Swansea get ranked fourth rather than their current position of seventh in the league. However, Swansea have already played five of the six teams above them so their Massey Rating shows they are performing better than their raw points tally would suggest.

At the bottom of the table it’s not looking good for Aston Villa. I showed in my last article how their Pythagorean meant they were over performing being even as high as they are and this is now backed up by their Massey Rating ranking them in one of the relegation spots.

Next Steps

In my next article I’ll show how we can take Massey Ratings a step further and decompose teams’ overall ratings into separate ratings for both attack and defence. I’ll also add some example code too so you can have a go calculating them yourself.

In the meantime, if you are interested in finding out more about the maths behind Massey Ratings then take a look at Ken Massey’s honours thesis which goes into the theory in much more depth than my brief overview here.

Read more ›


Why Uttoxeter Probably Isn’t A Hotbed of Swimming Talent.

Occasionally the newspapers publish stats based articles that do not relate to sport, but do serve to highlight some of the dubious assumptions that can be made from such studies.
In the run up to Christmas, a raft of newspapers, including the Daily Telegraph reported that the drink driving capital of Britain was Llandrindod Wells, a small rural town in mid Wales.

LW had over the last 12 months 1.98 convictions per 1,000 drivers, second to Blackpool with 1.85 such convictions. After establishing the drink driving hotspot, a couple of reasons were then devised to explain the results, lack of public transport and a belief that an offender will not be caught in a rural setting, for example.

However, studies comprising very different sample sizes inevitably lead to conclusions that may fail to represent the true picture. Most famously a study decided that small schools are inherently better than large ones because they appeared in disproportionate numbers at the top of a performance table and is quoted in Daniel Kahneman’s book “Thinking, fast and slow”.

In short, sometimes samples are too small to come to a reliable conclusion.

LW has a population of just over 5,000. If the town follows national trends around 80% of the population will be able to legally hold a driving licence. So, 1.98 convictions per 1,000 drivers implies that 8 cases of drink driving were successfully caught and prosecuted in LW over the previous 12 months.

If we imagine that one such case went undetected. Now LW has a conviction rate of 1.75 per 1,000 and they fall to 4th in the table. Blackpool is now top and it may seem that seaside towns lead to drink driving.

If convictions drop to 6 LW fall to the middle of the roll of shame with entirely unexceptional conviction rates per 1,000 drivers. However, two extra cases added to the actual total catapults the town to 2.5 cases per 1,000, well above the next worst, Blackpool.

So it is possibly the size of LW population that has contributed to making them a headline in the national press. Blackpool, in contrast has around 118,000 drivers and the conviction rate is much less susceptible to large changes occurring in that headline rate because of small numerical changes in convicted or non-convicted cases. Blackpool has probably prosecuted around 280 drink drivers.

Percentages derived from small sample sizes can bounce around if the raw number of cases alters by just one or two.  Just as small schools can be shown to be the best, as in the study quoted in Kahneman’s book, they can also quickly become the worst if just a handful of students produce poor results rather than excellent ones.

To keep the blog sports orientated, let’s use this dubious method to “prove” that Uttoxeter, population 12,000, a small town on the correct side of the Staffordshire/Derbyshire border is a hot bed of swimming world records.

Around 12% of the population are in the age group that would typically hold a world swimming record. So Uttoxeter has around 1,400 potential champions. They currently have one actual world record holder, Adam Peaty (100/1 to be Sports personality of the Year, but don’t let that put you off voting for Adam).

Therefore, Uttoxeter has 0.7 world record swimmers per 1,000 likely candidates. This of course would double if we made the conditions gender specific, but it is still good enough to give it the best headline rate in the country.

So Uttoxeter can be shown to be the place for swimming excellence, but only by using percentages applied to small sample sizes which obscure, rather than illuminate the less startling reality of the situation.

Sadly, it is  flawed conclusion, based on the exploits of a single outstanding swimmer, especially as the town doesn’t currently have a swimming pool!

Read more ›


Five Aside: Messi sets second scoring record in a week with hat trick

Barcelona’s Lionel Messi becomes the Champions League all-time goals leader with 72.

Lionel Messi scored his 72nd, 73rd and 74th Champions League goals Tuesday against APOEL Nicosia in Cyprus, breaking Raul’s all-time record of 71 goals in the tournament.

Here are five stats on the Argentine’s scoring record.

* The hat trick scored against APOEL was Messi’s fifth career Champions League hat trick, two more than any other player in tournament history.

* In the 47 Champions League games where Messi has scored at least one goal, Barcelona is 38-2-7, losing to Celtic in 2012… Read more ›

Is the Sloan Sports Analytics Conference still worth attending?

I have been attending the MIT Sloan Sports Analytics Conference since 2010, right at the time that the Conference was transitioning from an intimate meeting of professional sports insiders and academics to a slick and polished cultural event.  As happens every year at this time, I receive notifications from the organizers of the conference that encourage […] Read more ›

Who plays and gets paid more? Player participation in MLS 2014

I’ve been working on front-office efficiency figures for Major League Soccer in 2014, and while I’ve been putting those results together I wanted to see what kind of information could be shown with the data that I have.  One idea that came to mind was to take minutes played data of MLS players in the […] Read more ›