One of the most familiar of the crossover stats from hockey to football is total shot ratio. It is often used as a proxy for “dominance” of possession and supposes that a side that is constantly out-shot or out shoots their opponent will eventually rea… Read more ›
Tom Tango has recently presented an alternative to Corsi in hockey that weights shots differently depending on whether they resulted in goals, saves, misses or blocks.
One of the logical tests of the new metric is see how well it correlates to useful team information, such as future goal difference, compared to projecting from previously used metrics, such as unweighted shot differential or ratios.
The expectation voiced in many hockey circles was that because the “Tango” correlated almost perfectly to the traditional Corsi metric, the added information hoped for by weighting different types of shots would be negligible, at best.
In a typical concise and insightful post, here, Tango addresses the issue of the virtually perfect correlation between both metrics. Pointing out that using basic shot data from identical samples to test the correlation to out of sample data, such as future goal difference, gave different coefficients of correlation depending on whether the Corsi or Tango was used.
In short, weighted shots showed higher r values, despite the strong correlation between the two metrics.
r Values for Weighted & Unweighted Shot Differential and Ratios when Correlating to Future Premiership Goal Difference.
|After X Games||r for TSR||r for Shot Differential.||r for Weighted Shot Differential|
Tango’s defence of his new metric can be summed up in this extract from the linked post.
“But more amazing is that even though the correlation of Corsi to Tango (both based on the same samples) was close to r=1, when we correlate each to out-of-sample data (in this case, goal differential from OTHER games), Tango correlated at r=.50, while Corsi was r=.44. Or if you prefer r-squared, it’s .25 to .19, respectively.”
I have therefore repeated the exercise for the Premiership, using three flavours of shot based metrics in one part of the season and testing the correlation between these at an individual team level and goal difference for teams in the remainder of the season.
And the weighting of shots also appears to make a difference in soccer as well as in hockey. Correlation peaks around mid-season, but at every stage, weighting proved a superior correlation to goal difference in the remainder of the season compared to unweighting.
It also makes intuitive sense to reflect the extra information present in a goal compared to just a shot.
As a follow up to the previous post, here is the changing relationship between a side’s weighted shot differential compared to their opponents, for goals, shots that went wide and shots that were saved after a certain number of matches and the goal dif… Read more ›
Interesting news from OptaPro — Opta Sports’ professional services arm — that player tracking data from English Premier League matches will be made available to successful proposals to their Analytics Forum next year. The data comes from Opta’s partner Tracab, which uses their image-tracking technology to generate 3D sports data for media customers (they are […] Read more ›
The Scoreboard Journalism challenge for points and place predictions from prominent media, stats modellers, fans and online publishers for both the Eredivisie and the Premier League has attracted considerable interest and @JamesWGrayson has been publishing regular assessments of how the … Continue reading → Read more ›
If you’ve been following Soccermetrics on Twitter this week, you would have seen infographics on the Front-Office Efficiencies of all nineteen Major League Soccer teams. You could have worked out the ranking yourself, but if not, below is the full infographic for posterity. A few observations: The six teams with the most efficient front offices […] Read more ›
Shot counts verses goal counts as a predictor of future performance is a debate that that is being fought out not only in football, but also in hockey. Sample size is at the heart of the issue. Goals are obviously more important in terms of who wins th… Read more ›
In part one I introduced Massey Ratings and how they can be used to rank football teams in a way that accounts for their strength of schedule. Next, we’ll take a look at how Massey Ratings can be extended further to look at team’s attack and defence strength separately.
The idea behind Massey Ratings is that they rate teams such that the difference between any two teams is equal to the expected margin of victory between them. For example, if a team rated -1.0 played a team rated +1.0 then we’d expect the average goal difference between them to be two goals.
Since Massey Ratings look at goal difference rather than goals scored or conceded they account for a team’s overall strength and combine both their attack and defence strengths together into a single value. This means with a bit of mathematics we should be able to decompose a Massey Rating to split out these two constituent parts.
Attack And Defence
In part One we originally defined the Massey Rating as shown below in Equation One:
where y is the margin of victory for fixture, ra is the rating of team a and rb is the rating of team b. Let’s take this a step further and define the total goals a team should score in a match as Equation Two below:
where ya is the number of goals team a is expected to score, oa is team a’s attack strength and db is team b’s defence strength.
Extending this further we can say the total goals a given team should score over the course of a season is therefore equal to its attack strength multiplied by the number of matches played minus the sum of the defence strength of all its opponents. Since we know what the team’s overall rating are, how many matches they’ve played, how many goals were scored and who their opponents were we’re getting pretty close to getting what we need.
Decompose The Massey Matrix
Next we need to decompose the Massey Matrix we created in Part One into it’s diagonal and off-diagonal elements to give us two new matrices, G and P, which we use in Equation Three below:
where G is total games played, P is the number of pairwise matchups each team has played, r are the team’s Massey Ratings and p is a vector of the team’s goal differentials.
From here, Ken Massey uses some clever algebra to derive the equivalent of Equation Four below:
where G is total games played, P is the number of pairwise matchups each team has played, d is the defensive rating and f is the number of goals scored.
If you are interested in finding out more about the mathematics behind this then I heartily recommend taking a look through Ken Massey’s thesis where he explains it in much more detail than I’ve gone in to here.
Calculating The Ratings
Finally, we can now solve this linear system to get the attack and defence ratings for each team.
Figure One: Defensive Massey Ratings
Figure Two: Offensive Massey Ratings
It’s no surprise that Manchester City and Chelsea rate high for offensive strength but Everton are somewhat surprisingly rated third best offensive team even though they only rank mid-table in the league. Everton may only have a goal difference of +2 at the moment though but they are actually joint third highest goal scorers in the Premier League. They are performing well offensively, it’s their defence that is letting them down and is actually ranked worse than relegation-threatened Burnley’s.
QPR also rate pretty high in terms of attacking strength for a team in the relegation zone. Looking at their results for this season though they managed to score two against Manchester City, scored against Chelsea and are one of the few teams to actually get a goal against Southampton so they are performing well offensively against the league’s stronger teams. Like Everton though, their defence is performing poorly and dragging down their overall performance.
What’s that at the bottom of the offensive chart in red? Why it’s Aston Villa whose attack is so poor it actually gets a negative rating! I’ve mentioned in my last two articles about how Aston Villa’s Pythagorean and Massey Ratings show them to be seriously over-placed in the league and once again here’s another metric showing how poor they are. Bizarrely, Villa are somehow in twelfth place having managed a pitiful eight goals from fourteen matches. Although they are mid-table in the league and their defensive rating is pretty good, from an offensive point of view Aston Villa’s numbers suggest they are perhaps rather fortuitous to be so far away from the relegation zone…
So far the Massey Ratings have considered each match a team plays equally but Ken Massey suggests they can be improved further by weighting matches based on their importance. For example, playing a cup match against a team from a lower division is probably less relevant to calculating the ratings than say a league match against a close rival. By weighting matches appropriately we can reduce the influence less relevant matches have on a team’s ratings and potentially improve their accuracy.
If you are interested in having a go with Massey Ratings then I’ve put some example R code on GitHub. You’ll need to add your own data though as I’ve stripped out the section where it connects to my database for security reasons.
Peter – December 4, 2014
Great read as always.
Currently in the process of teaching myself R. Just wondering if you could give me a pointer as I’m really interested in giving this a go! What headers should the data be ordered in? Is this all taken from a league table, or from the results csv on the football-data website?
Martin Eastwood – December 5, 2014
It was all taken from my PostgreSQL database so you’ll need to make sure your data matches the naming conventions used in the code or change the code to match your data.
Kevin – December 5, 2014
Have you thought about improving the ratings by using expected goals, rather than goals, in your matrices?
Martin Eastwood – December 5, 2014
Not tried, but it’s an interesting idea!
Peter – December 8, 2014
I have given it a go (through Excel, not R) and while I have taken a different approach, things seem to look fairly consistent regarding the overall ratings. I’ll cautiously refer to it as an Adjusted Massey… I’m thinking decomposing these attack/defense ratings may prove a challenge however. I’m using it in conjunction with Pythagorean Expectation to gauge overall performance, and will have a blog post up fairly soon (with due reference to pena.lt/y/ for lighting the way of course)!
Martin Eastwood – December 9, 2014
Cool, look forward to reading it Peter!