EPL Wages Revisited: Fun with Statistics, Part II

By Susan

Soccer Orb’s resident statistician, Steve, shared an interesting bit of research with us a couple of days ago (See Premiership Ratings: Biggest Bang for the Buck from June 6). I’m going to swim into some dangerous waters and interpret his findings with my quasi-layperson’s eyes.

His model was simple, but very revealing. Using EPL wage data from the 2006-7 season, he built a model that, in words, looks something like this:

For each of the twenty clubs, the total number of points earned in Premier League matches is a function of the total (player) wages paid by the club for that season.

In other words, how much of any given club’s success can be explained by the quality of its roster? “Quality” cannot be perfectly measured by a number, but a player’s salary is the proxy variable that Steve used, mostly because wages were a major factor in the recent Deloitte report of the financial condition of European football.

Steve’s results were quite strong, given that he was trying to learn how much one factor–wages–explained team success. Also, he was forced to use a small sample–twenty observations, confined to just one league. It would be great to have this data over a period of five or so years, and to have the numbers not just for the EPL but also for La Liga, Serie A, and the Bundesliga. Steve, the cheapskate, wasn’t willing to cough up £600 for this year’s full report.

According to his model, about two-thirds of the variation in a team’s points could be explained by the variation in the wages that it paid to its players. It is important to understand that, although wages are a powerful predictor of team success, they’re not the only predictor. One-third of the variation in team points is due to other factors besides wages, factors that were not included in this model. Why weren’t they included? Because Steve wanted to isolate the effect of wages, since they were given a lot of attention by the Deloitte analysts.

If you look at his graph (again, it’s in the post before this one), you’ll see the strong positive relationship between salaries and Premiership points. To be more precise, each £1 million pound increase in salaries changes the season’s point total by .46.

Steve noted that Manchester United’s point total for the season was actually seventeen points higher than the simple wage model predicted. I immediately jumped on this as evidence of the “Sir Alex Ferguson” effect. His long experience and eye for youthful, underpriced talent contributed to his ability to get more points out of his players, regardless of their quality as proxied by salaries.

What about Chelsea? If you look at Steve’s graph, you’ll see that its data point lies below the regression line. (Insert gloating smirk-face here). That is, given the considerable amount by which the Chelsea roster lightened Abramovich’s checkbook, the team finished the season with fewer points than the model predicted. Why? Well, you tell me. What variables would help to improve the explanatory power of the model? No matter what you come up with, it’s clear that Jose Mourinho & Co. didn’t manage their costly resources very well.

Models like this are a good starting point in helping to isolate the factors associated with a club’s success. But you have to remember that the analyst is always constrained by the data that are available. It would have been great to have had several years’ worth of data so that a time-series analysis could be performed. Then you “build a more dynamic model,” according to Steve. For example, you could measure the extent to which last years’ success influences the current year’s salaries, (because a good year means more money from shirt sales, or whatever), which as we’ve seen has a big impact on the current year’s success.

Many thanks to Steve for putting his statistical expertise to work on salary-performance relationship. (I won’t thank him too much because I know that he’d gladly spend all his waking hours doing sports stats analysis instead of that options volatility stuff).

A final word: fortunately, no model can completely explain success on the pitch. As someone who followed baseball and (American) football first, to my eyes soccer is inherently difficult to quantify. I hope it stays that way. Soccer’s beauty would be diminished if all of its mysteries were revealed through regression analysis.

And now it’s time for Germany-Poland. Then, a few hours on my knees praying that Argentina beats us by no more than three or four goals…

Tags: , , , , , ,

20 Responses to “EPL Wages Revisited: Fun with Statistics, Part II”

  1. Steve Says:

    Thanks, Susan, for giving us an insightful interpretation of the results. You were right that my goal was not to come up with the best possible model to explain a team’s points. That would have taken data that I couldn’t possibly have found. My real purpose was more modest: What can you say about the variation in points among clubs using player wages alone; then how did the clubs perform given this wage-determined norm.

    You’re also right that the statistics of soccer tell little of the story. They say almost nothing about the flow and beauty of the play, the quality of the chances, and the relative contributions of those on the field. Baseball is so different in that regard. Sure, there’s often only an inch of difference between a hit and an out, but there are so many repetitions that the positive noise in the numbers cancels with the negative noise in the averages. A box score tells most of the story of a game, in my view. With soccer, there’s so much more than just the statistics. Still, since statistics are kind of my thing, I’m drawn to them even when I know they’re limited in what they say.

    Didn’t we have a discussion once about the system used in the Ukraine where they determined scientifically the optimal positions of the players? I think Franklin Foer said it even kind of worked for a while. We both decided, though, that this was not good for the game. I only mention this in case anyone thinks we’d ever attempt to reduce the game to less than it is through any of our own analyses.

  2. Susan Says:

    How generous of you not to poke holes in my interpretation of your results. Since I’ve forgotten most of what I once knew about regression analysis (which wasn’t very much), it helps me to simplify it as much as possible. Of course, it’s far more interesting to apply those techniques to something I actually care about, like sports. Macroeconomic analysis seems like such a terrible waste of perfectly good techniques.

    Yes, I should re-read that chapter from Foer’s book about the Ukraine–that system seemed as pointless and unappealing as one of those old Leontief input-output models that tried to explain the macroeconomy. Soccer, like life, is better when it’s unpredictable.

  3. Steve Says:

    I like your Leontief analogy, Susan. You not only kept to the economics theme, you also mentioned a model that is clunky, outdated, and over-engineered. Did you also know that Leontief’s mother was from Odessa? You even got the Ukrainian tie-in. (We can find linkages just about anywhere we look now with Wikipedia just a few clicks away.)

    Also, as you already know, I’m like Gomez Addams when Morticia speaks French whenever you talk about R-squareds and regressions.

  4. Susan Says:

    Too funny–I had no idea that Leontief was connected with Ukraine!

    Fortunately, that’s the only way in which you resemble Gomez Addams.

  5. turkish Says:

    “it’s clear that Jose Mourinho & Co. didn’t manage their costly resources very well”

    Unfortunately financial clout cannot deal with a number of injuries to key players at key times, outside the scope of the relevant transfer windows, Mr Abramovich may have several billions liquid and ready to invest in players, but as soon as the transfer window closes he’s as good as the man outside on King’s road selling pies.

    The African Cup of Nations would also have an effect on player availability, which would have an adverse effect on the occasional year.

    Therefore a Statistical analysis of the effect of wages should be metered per club on a game by game basis dealing with the total wages proportioned to the minutes played by each player. A Case in point would be Mr Ballack who spend around 10 weeks out injured in early stages of the season, His £120-130k per week is, in terms of points gained during this time, a complete waste. Equally John Terry was sidelined for a 1/3 of the season, again one of the clubs top earners.

    Other factors should be considered when dealing with the effect of wages too. North/South divide etc. would hint at the London paying a percentage on top of other non-london clubs

  6. Susan Says:

    You are quite right that a wage/payoff analysis should consider more factors than the simple relationship between the two variables in this model. This was a simple regression (just one independent variable), that was performed on a small sample. The model was constructed in response to a Wall Street Journal article that I had referenced in an earlier post. (http://online.wsj.com/article/SB121210088060330885.html) The article noted that Chelsea had the highest wages in the EPL, which led to the questions regarding what they got for their money.

    It would be best to check this model’s performance over time because it’s more meaningful to identify a consistent pattern of over- or under-performance. And that is a good point you made about the North-South difference in cost of living. I’m assuming that London is substantially higher than Manchester…I know it’s much higher than a place like Portsmouth.

    Wouldn’t a player’s potential participation in the African Cup of Nations or any other tournament be factored into his negotiated salary? If not, it should. And a similar market adjustment must be made for players who are more injury-prone (or likely to be injured, as in the case of a workhorse like John Terry). Actually, since any player on any club can be injured and thus unproductive, then all wages must reflect that possibility.

    Thanks for your comment! Somehow this type of analysis is more fun when it’s all about football.

  7. turkish Says:

    thank you for your reply..

    “Wouldn’t a player’s potential participation in the African Cup of Nations or any other tournament be factored into his negotiated salary? If not, it should.”

    I agree it should, although i see it as extremely unlikely conditions and bonuses and in fact any detailed financial information relating to wages would be given by the EPL’s clubs so it would be extremely difficult to factor this in in terms of finance, but the difference on the field of play was quite noticeable, in my opinion atleast.

    I work in statistics daily, but football satsistics and analysis is the most fun. I would love to set up a football statistics agency of somekind and put my knowledge and passion for football together… interested?

  8. Susan Says:

    Why yes indeed, I am interested. I think the main issue, to which you allude, is gaining access to detailed financial information. Many (most?) of the EPL clubs are publicly traded–here in the US that means that their financial statements are available. I don’t know about UK financial reporting laws, though.

    But the person who is far more qualified is Steve, who is an econometrician by training and temperment (haha). We’ve often wondered if statistical analysis could be applied to football in any meaningful way. It’s gained quite a following in our (US) baseball over the years, due to the nature of that sport.

    Got to run, but I’ll give this some more thought!

  9. Steve Says:

    With his astute comments about statistical biases and possible remedies, Turkish is clearly a member of the footymetric club. Good to see. Susan, you do well to remind us of the gray areas that remain, though. Our R-squareds are never 100%, it seems.

    Anyway, I’d love to pursue anything to do with that scant overlap between football and statistics. As evidence of my interest, I dug through the archives of your old Soccer Orb site, Susan, and pulled out a forecast model I built to predict the winner of the last World Cup:

    http://soccerorb.spaces.live.com/blog/cns!FFBDF659300A9C15!325.entry

    The primary determinants were home field advantange, goals scored and goals allowed in prior matches, and the amount of rest since the last match. It did predict the winner (not that a sample of 1 is anything to shout about), but then also did well over the sample of games used to estimate the strength of each effect.

    Turkish, if you’re at all serious about your interest here, maybe we could find some joint projects. Data is often a problem, of course, but there must be a few interesting topics to study given what’s currently at hand. Why don’t you shoot off an e-mail to Susan’s address, clio562003@yahoo.com , if you’d like to follow up. There’s at least a chance we could get past your apparent Chelsea sympathies. ;-)

  10. turkish Says:

    sorry for the delay in getting back to you… i’ll drop an email to you soon, annual holiday and hectic life style has get me away for longer than i would care for.

    did you enjoy Spain vs Germany?

  11. Volodya Says:

    I believe it make sence to count manager’s wages and transfer expenses.

    I am also doing similar project as a part of my MBA business-analyitics class.

    Really difficult to find any information – will aprecciate if you could point me the way

  12. Susan Says:

    Hi Turkish,
    No problem…blogging has actually become low-priority for me as well these days.

    Hmm…Spain v. Germany…I was supporting Germany (since Holland were knocked out)…but the Spaniards were a pleasure to watch. They outplayed the Germans on every level and I think their victory was well-deserved. El Nino’s goal was in part due to some poor goalkeeping by Lehmann. That said, Lehmann and the Germans were lucky, as the score quite easily could have been 3-0 or 4-0 in favor of Spain. Had Torres not scored when he did, he would surely have had a goal later in the match, as he was on fire.

    Spain were the class of the tournament from the first day out and are very deserving champions.

    Volodya,

    Good point about the manager’s wages and transfer expenses. However, I am not sure if those data were included with the data from the Deloitte report that Steve used to run his regression. (I am sure that you saw the original blog post that presented his regression results). I will ask Steve if he remembers the website where he found the player salary data. I will warn you that what was freely available on the web was very limited. The full cost of the Deloitte report for those interested in buying it was something like 600 pounds! Have you already tried the published financial reports of publicly traded clubs?

    If anything, including managers’ salaries and transfer fees would only show that Chelsea underperformed even more, given how active that club has been in the transfer market in recent years and their high-profile, presumably expensive, manager.

  13. Volodya Says:

    I am from Ukraine and was a Russia fan on EURO, especially after Holland’ demolition in quarterfilals. Anyway, Spain is, definitely, a deserved champion this time.

    For 2006/7 season I have found and playing with:
    - team turnover
    - total team wages
    - wages/turnover ration (you never know :) and it looks like there is a negative dependance)
    - Coach wages
    - Number of foreigners in the team
    - Home stadium capacity
    - Average attendance (rubbish factor)

    Do you think about any other factor to consider?

    Yes, I am the Chelsea fan – should we also include the wealth of the owner factor? :)

  14. Steve Says:

    It looks like Volodya has got a very interesting data set to analyze. I can imagine several of those variables being significant explanators of league success.

    One thing you might find, Volodya, is difficulty in sorting out the individual influences of these variables when considered in combination with one another. For instance, if turnover and wages and the ratio between them are all included in the same regression, it will be difficult for the least squares algorithm to parse the effects. Whenever explanatory variables are highly correlated, this can be a challenge, as you may know.

    I can only hope that if you are successful in your modelling, that you don’t share your results with Chelsea to improve their chances. They’ve been too strong a rival for my side, Man U, even without additional help. ;-)

  15. Susan Says:

    Volodya,
    I’m excited to hear from someone from Ukraine…I’m not sure that anyone from so far away has dropped by before.

    I think Steve is talking about multicollinearity? It’s been a while since I’ve done anything with regression myself, but Steve is a pro. What exactly is your dependent variable? Number of wins per season, or winning percentage? And you’re using only data from the EPL? So was the # of foreigners variable statistically significant? I assume the sign on the coefficient was positive?

    As for a suggestion, what about a lagged variable? Something from the previous season…maybe the prior year’s turnover or a dummy variable that was one if the team participated in Champions League/UEFA Cup or made it to the quarters of the FA Cup and zero otherwise? Because of the effect of that prior year’s revenue (that’s turnover, I think) on the ability to buy talent for the current year?

  16. Volodya Says:

    Yes, multicollinearity is an issue. Had to throw out “team turnover” and “average attendance”. Not sure yet on “wages/turnover ratio”

    For now have this set pf parameters:
    - total team wages
    - Coach wages
    - Number of foreigners in the team
    - Home stadium capacity

    Here is an idea for other parameters to analyze:
    - Manager experience (in years)
    - Max individual salary in the team
    - average team age
    - number of individual wins per club (trofies won by squad members)

    Obviously, have diffilulty to find data. Managers wages were tough to get.

    Susan, yes, I am plannign to use previous year data. So far it’s not coming along. Previus success influences the current year budget similarly as owner invetments, etc. and I can not find clear dependancy.

    Steve, with all due respect, MU won too much last year. Our turn now :)
    Shevchenko (from my home city) should wake up with Scolari…

  17. Steve Says:

    This probably won’t make you happy, Volodya, but Susan has been saying for a few years now that Shevchenko should come play here in Chicago where we live. Every once in a while the US league is able to entice a star to play the latter stages of his career for one of our teams. We figured Chicago would be a good place for him because we have a fairly sizable Ukrainian population living here. That, plus we’re always glad to see a star of his caliber playing close to home.

  18. Volodya Says:

    Steve, this world is small. I’ve been living in Chicago for almost 2 years, coming back to Ukraine a year ago. There is a good Chicago Fire soccer team and they have a very nice stadium. I saw Chelsea playing US Clubs team there in 2006.
    Shevchenko’s wife is american, so there is a big chance he will come to play his last year or to to US. Lucky you :)

    I’ve completed my analysis and was able to exclude multicollinearity and build regression. For EPL 2006/7 points will depend on team wage and manager experience with good determination of 77%.

    If you are interested in more details let me know your email.

  19. Susan Says:

    Since you’ve lived in Chicago, Volodya, you know that there’s some great shopping on the Magnificent Mile–that should tempt Shevchenko’s wife, don’t you think? We Chicagoans don’t think it’s fair if all the big European stars to go to LA and NY.

    We were at that game in 2006 as well! I was very impressed that Chelsea’s top stars all played in that match–John Terry, Lampard, and I think Shevchenko (he’d just joined the club). I was delighted when the MLS stars defeated them.

  20. Steve Says:

    Hi Volodya. Small world, indeed. Like Susan said, we did see that same game. My recollection was that Shevchenko played for one half.

    If it’s not an inconvenience for you, I would be curious to see your regression results. My email is steve at matcap dot com. Thanks!

Leave a Reply