The Math Behind Premier League Score Prediction
I recently developed an Excel spread sheet with macros that simulates the Premier League table. This is quite an interesting process. I takes me some time to figure out the math behind Premier League score prediction. The overall predicting steps are well documented by other sources. However, most of the details, especially detailed algorithms, are not widely discussed or documented. I have to come up with my own methods or algorithms. I will document these details here. Hopefully, it may be useful for me or others in the future.
Before you proceed, you may want to download the spread sheet here. Most of the terms and concepts discussed below are used in the spread sheet. The spread sheet is written with Excel VBA macros. It runs in Office 2010 and newer.
Introduction to SPI (Soccer Power Index)
The first step toward predicting the Premier League is to set up the Soccer Power Index (SPI) for each team in the league. Luckily, the folks at fivethirtyeight.com have already done it. They post updated SPI values for each team in the Premier League on their website.
SPI is a pair of numbers that measures a team’s offense and defense strength. The attacking index is the expected number of goals a team scores over an average team in the league. The defending index is the expected number of goals a team concedes to an average team in the league. Detailed explanation of SPI and high level score estimation procedure is documented here.
Adjust League Average Based SPI to SPI Against Opposing Team of a Match
The next step is to convert SPI to the number of goals a team is expected to score or concede over its opponent which may be stronger or weaker than an average team in the league. I used the following formulas:
offhome_adj = Offhome + (Defaway - Def_ave)
defhome_adj = Defhome + (Offaway - Off_ave)
For example, the first game in the 2018/19 season is Manchester United vs Leicester City. The SPI of Man U is 2.2(attacking), 0.5(defending). The SPI of Leiceter City is 1.9 (attacking), 0.8 (defending). The average SPI of all teams in the Premier League is 1.97 (attacking), 0.75 (defending). To calculate the attacking index of Man U (the home team), we take the difference between the defending index of Leicester City (the away team) and League average, which is the extra goals that Leicester City may give up against the average team in the league, and add it to the Attacking index of Man U. This assumes that Man U can capitalize on the goals that Leicester’s defense is likely to give up. In this case, we get
We calculate the Defending Index for Man U. in the similar way.
This strengthens Man U’s defending index, making them conceding fewer goals, since Leicester’s attacking index is slightly below League average. Adjusted SPI for Leicester City
(offway, defaway)=(1.65,1.03) is calculated in the same way.
Obtain Projected Score of a Match
Once we have the adjusted SPI values, we can go on to find out the projected score of a particular match. For this I simply average the adjusted attacking index of the home team and the adjusted defending index of the away team to get the projected number of goals scored by the home team (
spiGF). The number goals the away team is expected to score (
spiGA) is calculated in the same way.
spiGF = (offhome_adj + defaway_adj) / 2
spiGA = (defhome_adj + offaway_adj) / 2
Again, take Man U vs Leicester City as an example, the projected score is
spiGF = (2.25 + 1.03) / 2 = 1.64
spiGA = (0.43 + 1.65) / 2= 1.04
Scale to The Number of Average League Goals
In order to generate more realistic scores, after we calculate projected GF and projected GA for all teams, we scale the number of projected goals to the number of average league goals. For this simulation, we assumed that number of league average goal is 2.85 per game.
Consider Other Factors: Home Field Advantage. Days of Rest before a Game. SPI Adjustment Factor.
For home field advantage, I assume the home team can score an extra 10% more goals.
I also assume that a team will give up an extra goal if they have zero rest days before a game and a team will not give up any extra goals, if they receive full 7 day rest.
In the case of Man U. vs Leicester City, projected after adjusting for home field advantage is
The SPI adjustment factor (SAF) allows the user to override the given SPI values. It’s a whole number between -15 to +15.
SAF=0 will not alther the SPI values or the projected game score. If you set SAF to a number larger than 10, you literally can boost a team from bottom of the table to top of the table. If you set SAF to a number less than -10, you can sink a top team to the bottom of the table. A moderate value between -5 to +5 is recommended.
The projected game score is the most likely outcome of the game. But in reality, anything can happen. We must calculate the probability of every possible score. It turns out Poisson Distribution closely describes the goals are scored in a soccer match.
P = (Lamda ^ k * Exp(-Lamda)) / Factorial(k)
Lamdais the number of projected goals of the match.
P is the probability of having
k number of goals in that match.
To qualify for Poisson Distribution, a soccer game must have the following characteristics:
- k is the number of goals that can be scored. k =0, 1, 2, …
- The occurrence of a goal does not affect the probability that a second goal is scored. That is, goals occur independently.
- The rate at which goals are scored is constant. The rate cannot be higher in some intervals and lower in other intervals.
- Two goals cannot be scored at exactly the same instant.
- The actual probability distribution is given by a binomial distribution and the number of trials on goal is sufficiently bigger than the number of goals scored.
I would not say that the above is a perfect description of a soccer match, but I guess it is close enough.
Below is table of goal probability of the Man U. vs Leicester City match.
As an example, the very top row in red shows that Man U has 26.8% of chance of scoring two goals. The very left column in blue shows that Leicester has 36.8% of chance of scoring one goal. Combined, there’s a 9.8% of chance of having a 2:1 score. This is the most likely score, since it has the highest probability of all possible scores. Incidentally, 2:1 is the actual score of the match.
Also from the table, all the green cells add up to the probability (55.0%) of Man U. winning the game. All the orange cells add up to the probability (11.7%) of Leicester City winning the game. All the yellow cells add up to the probability (23.1%) of a draw.
Time to Throw the Dice
So what’s the final guess of the score? It’s time to throw the dice. At this point, the program generates two random numbers, one for Man U. and one for Leicester. Depending on the value of the random numbers, we guess the final score. Although 2:1 is the most likely score, there’s only 9.9% of the time we can get that score.
Auto Fill Feature
Let’s go back to the Excel program. The “Auto Fill” function executes the above sequences for all fixtures. It runs through the entire season once and generates the “guessed scores” of every game. The League table in the “Table” tab is updated using the “guessed scores”. Keep in mind, this is the result from the throw of a dice, the “guessed score” may not always be the “projected score”.
A few things about Auto Fill:
- It will only update the empty cells in the score columns in “fixture” tab.
- It will show the projected scores, the win, draw, loss probabilities of the home team and the dice values in the neighboring columns.
- It will not erase the scores that are manually entered.
Adjust SPI Based on Game Results
To make the simulations more realistic, as Auto Fill runs through the season, the SPI values of each team is updated based on the guessed results. For instance, if a weak team somehow beat a strong team, the SPI of the weak team is raised. The SPI of the strong team may be lowered.
In the program, we only updates the SPI, if there is a difference of 1.7 goals between the “projected” goal and the “actual” or “guessed” goal. For every 1.7 goal difference, we boost the SPI value by 0.1. The offensive SPI caps at 3.2 and bottoms at 0.5. The defensive SPI bottoms at 0.1 and caps at 2.
To predict the team standings in the table, we need to run “Monte Carlo simulation”. We basically run the “Auto Fill” process many times, say 10,000 times. We then calculate the average points and average goal difference of the teams from the 10,000 tables. The teams are then ranked based on the average points and average GD.
Below is my Monte Carlo simulation results. It was run on December 24, 2018.
|Rank||Club||SPI Off||SPI Def||GD||Pts||Win League||Top 4||Rel.|
|Rank||Club||SPI Off||SPI Def||GD||Pts||Win League||Top 4||Rel.|
As the season progresses, I expect each team will change and grow. The table will get more interesting. From time to time, I will post updated table predictions here.
May your favorite team win!