The Correlation Analysis of Scored Goals and Red Cards

Red card football

Data analysis is becoming more and more important in the contemporary sports industry. The reason for its exponential growth is the ability to process any kind of data and extract valuable insight that will transform the perception of any sport. The main topic of this research is the importance of Red Card in goal productivity over the years.

The idea of using penalty cards to communicate a referee’s intentions was invented by British referee Ken Aston in 1966 (FIFA, 2002). In the quarter finals of the 1966 World Cup, England met Argentina at the Wembley Stadium. During the match, referee Rudolf sent off the Argentinian player Antonio Rattin. However, the referee could not communicate his decision to penalize Rattin due o language barriers. This unfortunate incident pushed FIFA to adopt the language-neutral penalty cards for every competition which are in use up to this day (FIFA, 2002).

This research includes three major parts:

  • The first part compares the use of red cards concerning the last ten seasons. Is also uses rigorous statistical methodology to predict possible red card penalties for the two upcoming seasons.
  • The second part of the research examines if there is any statistical significance between scoring goals and red card penalties to the four major soccer leagues that are included in the sample.
  • In the last part, the researcher registers an even deeper analysis. The sample is split not only by year and league but also in home and away goal productivity. The main goal is to prove whether the fact that a team stands either on home or away side has a statistical impact on goal productivity in the matches with and without red cards.

The datasets that have been gathered, cleaned and used include the last 10 seasons (from 2007/08 up to 2016/17) of the 4 major championships in Europe – i.e. the English Premier League, the Italian Serie A, the Spanish La Liga and the German Bundesliga. The total number of matches that have been included in the examining sample are 15,200.

Some factors that have to be considered before proceeding into the analysis are the following:

  • The data analysis techniques have been applied regardless of the number of the total red cards in a match. If a red card is given in a match, that match is included in the sample.
  • The data that has been used in the process contains the matches that have at least one red card penalty regardless of the minute that this incident took place in.

The graph below illustrates the average red card penalty of the teams that have participated in these four major championships during the last ten seasons. Spanish La Liga has had an average of 0.268 red cards per game while the Italian Serie A comes in second with 0.25 red cards per game, in the third place is the German Bundesliga with 0.152 red cards per game and in fourth comes the English Premier League with 0.141 red cards per match.

The graph illustrates that in these ten seasons from 2007-08 up to 2016-17 Spanish La Liga and Italian Serie A are significantly higher on average than the German Bundesliga and the English Premier League.  The biggest difference occurred during the season 2012/13, where Spanish referees showed 74% more red cards than their colleagues in Germany.

Using the historical data that was illustrated in the graph above, the data analysis could provide the approximate number of red cards for the next two seasons concerning these four European soccer leagues.

The technique that will be used is called forecasting and it is a very important data analysis tool that is used for statistical prediction (Duke University, 2012). The confidence level that has been used is 95%.

Premier League’s graph shows a predicted decrease as well as a slight increase in red cards for the upcoming seasons. The forecasted value for the next season (2017/18) is: 0.124 red cards per game with a Lower Bound of 0.084 and an Upper Bound of 0.164. For the 2018/19 season the forecasted average red cards are 0.122 per game with a Lower Bound of 0.082 and an Upper Bound of 0.162.

For the Serie A, a slight increase in red cards is also expected for the next two seasons according to the historical data. More specifically, for the season 2017-18, 0.213 red cards penalties are going to occur per match with a Lower Bound of 0.186 and an Upper Bound of 0.240. For the 2018/19 season the approximate red card showing are going to increase further to 0.245 per match with a Lower Bound of 0.209 and an Upper Bound of 0.282.

For the Spanish La Liga the expectations include a slight increase and also a slight drop. For the season 2017-18 there are going to be used 0.213 red cards per match with a Lower Bound of 0.157 and an Upper Bound of 0.269. On the contrary, for the 2018/19 the predicted red cards are going to drop to 0.202 per match with a Lower Bound of 0.146 and an Upper Bound of 0.258.

The most significant changes in red card forecasting for the next two seasons apply on German League – Bundesliga. For the season 2017-18 there is going to be a drop to 0.119 red cards per match with a Lower Bound of 0.078 and an Upper Bound of 0.160. On the other hand, in the 2018/19 season the approximate number of red cards is going to increase to 0.157 per match with a Lower Bound of 0.115 and an Upper Bound of 0.200.

The second part of the research aims to prove the existence of specific significant statistical changes in goal scoring , for the matches that had at least one red card and the matches that had none.

The graph above illustrates the average scored goals per match that had at least one red card shown. The average scored goals for the Premier League the last 10 seasons is 1.914 and it is by far the lowest average between the leagues that are included in the sample. For the Italian Serie A the average goals under the same condition is 2.169, while for the German Bundesliga the number is 2.181 goals and for the Spanish La Liga the number is 2.297 goals.

The graph above shows the goal productivity for the leagues that have been included in this research sample for the matches that did not have any red card shown. The results in this situation differ a lot compared to the previous graph. German Bundesliga is ranks first with almost 3 goals (2.937) per match while Spanish La Liga follows with 2.847 goals per match, Premier League stands at 2.727 goals and finally Serie A comes last with 2.7 goals per match under the same criteria.

The method that is used to test if the difference in scoring goals in games with red cards shown as opposed to scoring in games without any red cards shown is statistically significant, is called the t-test. The t-test (also called Student’s T Test) compares two averages (means) and concludes if they differ from each other. In other words, it is an indicator to prove if those differences could have happened by chance (Statistics How to, 2012).

The variances of the samples are unequal so the proper t-test is the heteroscedastic two tailed t-test. The confidence level is 95%. For every test in this part the Null Hypothesis (H0) is the following: mean1=mean2. The table below shows the p-value for every League in the sample.

Premier LeagueSerie ALa LigaBundesliga
5.80E-056.43E-071.12E-064.46E-08

The results for all the sampled leagues are undeniable. It is concluded that the difference of goals scored in matches that had at least one red card shown and matches that had no red cards shown, statistically differs in confidence level of 95%.

The last section of the research divides the two categories that have been analyzed above into two new parts: home goal productivity and away goal productivity for matches with at least one red card shown and matches without red cards for the home and the away side respectively.

The graph above shows the average scored goals for the home team in the matches that at least one player from the home team has been sent off. The average goals per match for the for the Spanish La Liga were 1.345 goals, for the Italian Serie A 1.216 goals, for the German Bundesliga 1.189 goals and finally for the English Premier League 1.093 goals.

On the opposite side, for the last ten seasons the average scored goals for the Home teams that had no player sent off are the following: Spanish La Liga had 1.664 goals per match, the German Bundesliga had 1.662 goals, the English Premier League 1.545 goals and finally the Italian Serie A 1.545 goals.

The T-test will be used again to prove if the two situations above differ scientifically. The confidence level is set on 95%. For these tests the Null Hypothesis (H0) is the following: mean1=mean2. The table below shows the p-value for every League in the sample.

Premier LeagueSerie ALa LigaBundesliga
1.49E-044.39E-059.20E-059.85E-10

The results once again are undeniable. There are significant statistical differences between the goals that the home team scored in matches that had at least one player from the home team was sent off and the matches that had no player from the home team sent off in confidence level of 95%.

The second counterpart of this part includes the examination of the goal productivity for the away teams in matches that had at least one player from the away team sent off and also the matches that had no player sent off from the away side.

The graph above illustrates the average goals per match for the away teams that had at least one player who was sent off. The numbers are normally lower that the home side; The German Bundesliga had 0.992 goals, the Italian Serie A 0.954 goals, the Spanish La Liga 0.952 goals and finally the English Premier League had 0.822 average away goals the last ten seasons under the same criteria.

The values in this graph above are slightly increased in comparison to the previous one. In this situation, there is no player from the away side that had been sent off. The German Bundesliga had 1.275 goals, the Spanish La Liga 1.183 goals, the Italian Serie A 1.155 goals, and finally the English Premier League had 1.147 away goals on average the last ten seasons under the same criteria.

The method that will be used to test if the means differ statistically is once again the t-test. The confidence level is set on 95%. The Null Hypothesis (H0) remains the same: mean1=mean2. The table below shows the p-value for every League in the sample.

Premier LeagueSerie ALa LigaBundesliga
2.38E-045.46E-041.46E-068.37E-04

Once again, there is a strong statistical difference between the scored goals for the away teams when they had at least one player sent off and when they had all their players on the pitch till the end of the match according to the t-test method in 95% confidence level.

This scientific research extracted some very interesting results. At first, the comparison of the historical data of red cards showed that the four leagues are divided into two groups with a great difference between them. In the next part, the forecasting method illustrated the predicted number of red cards for every league the next two seasons. In the next part, the t-test method validated the statistical difference of goal productivity for matches with and without a red card. The last part, showed that red cards were a scoring drawback for the teams either at home or away, in matches that they played in.

BIBLIOGRAPHY

Duke University, 2012. Steps in choosing a forecasting model. [Online]
Available at: https://goo.gl/XqWzgS

FIFA, 2002. Ken Aston – the inventor of yellow and red cards. [Online]
Available at: https://goo.gl/5GMHWj

Statistics How to, 2012. T Test (Student’s T-Test): Definition and Examples. [Online]
Available at: https://goo.gl/n6e6pM

Statistics: Howto, 2016. Moving Average: What it is and How to Calculate it. [Online]
Available at: https://goo.gl/hKVkCG

Latest News

Read the latest launches, collaborations and partnerships