Skip to the content.

Analyzing the Relationship Between Player’s Character Pool Size and Winning Rate

This is the Final Project for DSC80 at UCSD

website

Author(s):Junhao Yang,Zifan Luo

Introduction

League of Legends (LoL) is one of the most popular multiplayer online battle arena (MOBA) games globally, with millions of players and a thriving professional esports community. The dataset used in this project, sourced from Oracle’s Elixir, contains detailed match data from professional LoL esports matches throughout 2022. The main goal of our project is to explore the relationship between a player’s character pool size and their winning rate. By exploring this relationship, we aim to question the common assumption that mastering a diverse range of characters potentially leads to higher match success. Our analysis will provide valuable insights into whether a player’s character pool size plays a more significant role in determining match outcomes.

The dataset contains 150180 rows and 161 columns, offering comprehensive insights into gameplay metrics, player performance, and team dynamics in professional matches.The first column is the ‘gameid’, which uniquely identifies each match. The remaining 160 columns contain various game metrics including individual player statistics, match outcomes, and in-game events. Among all the rows in the dataframe, each ‘gameid’ corresponds to up to 12 rows – one for each of the 5 players on both teams and 2 containing summary data for the two teams.

Here’re the relavent columns:

Data Cleaning and EDA

Data Cleaning

First of all, we only get the columns needed for the hypothesis testing(position,playername,playeris,champion,result) and rows that only contain players’ information and store them in df1. Next, we also get the columns needed for the prediciton model(gameid,league,teamname,teamid,gamelength,dpm,eanred gpm,vspm,pick1-5)and only include rows about teams and store everything in df2.

For df1, we remove all the rows that contain NaN and add a column played showing the number of games played and group the dataset by playerid and playername. Besides, we also add a column num_characters that represents the number of unique characters each player used. Another column weighted_num_characters(calculated by shannon_diversity)is added to represent the weighted number of characters of each player. Players that played for more than 20 games are kept in the dataframe. Lastly, an additional columnwinning_rateis calculated to display the winning rate of each player and the dataframe is sorted from the highest weighted_num_characters. Below is the final cleaned df1_temp(first 5 rows):

position played result num_characters weighted_num_characters winning rate
sup 77 58 22 4.11723 0.753247
jng 59 32 15 3.27323 0.542373
jng 46 29 14 3.30532 0.630435
top 77 59 16 3.66457 0.766234
jng 35 24 8 2.77841 0.685714

For df2, we just remove all the rows that contain NaN. Here’s the final cleaned dataframe:

gameid league teamname teamid gamelength dpm earned gpm vspm pick1 pick2 pick3 pick4 pick5
8401-8401_game_1 LPL Oh My God oe:team:f4c4528c6981e104a11ea7548630c23 1365 1762.02 1326.02 7.1209 Jinx Jarvan IV Nautilus Syndra Gwen
8401-8401_game_1 LPL ThunderTalk Gaming oe:team:df80f468a3f9a722df056fe9104f052 1365 1337.01 1021.41 6.8132 Xin Zhao Thresh Aphelios Vex Jax
8401-8401_game_2 LPL Oh My God oe:team:f4c4528c6981e104a11ea7548630c23 1444 2482.52 1586.26 7.4792 Jinx Xin Zhao Rakan Rumble Corki
8401-8401_game_2 LPL ThunderTalk Gaming oe:team:df80f468a3f9a722df056fe9104f052 1444 1459.65 1040.78 6.6066 Lee Sin Leona Ziggs Gangplank Twisted Fate
8402-8402_game_1 LPL FunPlus Phoenix oe:team:33d17f3717f58e12a3da80b377221fb 1893 1719.94 1373.19 8.6846 Jinx Viego Thresh Corki Graves

Univariate Analysis

We performed Univariate Analysis on weighted_num_characters column of df1_temp to examine its distribution.

The distribution appears to be approximately normal, with most values clustering around 3.0–3.5. There are fewer players with extremely low or high values of weighted_num_characters, indicating that most players maintain a moderate range of champion diversity.

We also look at distribution of gamelength of df2

The distribution is slightly skewed to the right, with most games lasting around 1800–2000 units. There are fewer games that are very short or very long,most matches have standard length.

Bivariate Analysis

We performed Bivariate Analysis on the weighted_num_charactersand winning_rate of df1_temp.

The points are widely spread, and there doesn’t appear to be a strong positive or negative linear correlation between the two variables. This suggests that having a higher or lower weighted number of characters may do not strongly influence winning rate.

We also performed the Analysis on the played and num_characters to see the relationship between the number of games played and the variety of characters utilized by the players.

There is a weak positive trend where players who have played more games tend to use a slightly larger variety of characters. However, this relationship is not strong, as there is considerable variance in the number of characters used for any given number of games played.

Interesting Aggregates

We aggregated the df1_temp dataset by position. We summed the number column and averaged the remaining columns. Here’s the outcome:

position played winning rate num_characters weighted_num_characters number
bot 51.4418 0.494947 13.4299 3.26211 421
jng 49.8433 0.496166 13.0853 3.24513 434
mid 50.7281 0.493927 15.3783 3.5254 423
sup 48.9165 0.49916 13.7007 3.29368 431
top 50.9108 0.500978 14.9859 3.44844 426

We can tell from this dataframe that the midlane players(mid) stand out for their large character pool,highlighting the need for flexibility in this role. Also, toplane players have the highest winning rate, suggesting that success in this position may be influenced by factors such as individual performance.

Assessment of Missingness

NMAR Analysis

In our dataset, we believe that the missingness of patch column can be seen as Not Missing At Random(NMAR)** because its absence may be inherently tied to the nature of the games being hosted. Specifically, the missingness could arise due to the fact that some games might be played or recorded on patches that have not been officially released or documented at the time of data collection.

Missingness Dependency

In this part, we’re going to test if the missingness of splitis depend on league. The significance level we chose would be 0.05 and the test statistic would be the Total Variance Distance (TVD).

league split_missing = False split_missing = True
ASCI 0 0.0221094
CBLOL 0.0265052 0
CBLOLA 0.0235602 0
CDF 0 0.0227069
CT 0 0.00776815

After we performed a permutation test, we found the observed test statistic(TVD) for the test is 0.814 and the p-value is 0. Here’s the visualization that shows the empirical distribution of the TVD for the test:

As the p-value is less than the significance level(0.05), so we can reject the null hypothesis. Thus, the missingness of split column actully depends on the league column.

The second permutation test we performed is to determine whether the the missingness of split depends on position

position split_missing = False split_missing = True
bot 0.166667 0.166667
jng 0.166667 0.166667
mid 0.166667 0.166667
sup 0.166667 0.166667
team 0.166667 0.166667

The observed tes statstic(TVD) was found to be 0 and p-value equals to 1. We can visualize the empirical distribution:

As the p-value is bigger than the significance level, so we fail to reject the null hypothesis that the missingness of split does not depend on position.

Hypothesis Testing

In the hypothesis testing, we aim to assess whether there is a correlation between the weighted_num_characters and winning rate.

Since the p-value is smaller than the significance level 0.05, so we can reject the null hypothesis.This result indicates that the observed correlation is statistically significant, and there is likely a true relationship between the two variables.

Framing a Prediction Problem

In the competitive world of League of Legends (LoL), each league, such as LCK (Korea), LCS (North America), and LPL (China), represents not only a geographical region but also a unique identity, defined by its playstyle, champion preferences, and strategic approach. In this last section, we want to build a model which could predit the league a team belongs to based on gameplay statistics and performance metrics(dpm,earned gpm,gamelength,vspm,pick1-5). This is a classification problem, specifically a multiclass classification task, as teams can belong to multiple leagues such as LCK, LCS, LPL or LEC. The response variable would be league as understanding a team’s league based on gameplay metrics can provide insights into regional gameplay styles, strategies, and performance characteristics. This prediction can also help in analyzing trends and differences across leagues. The evaluation metric that we chose for our model is Macro F1 score as it balances well the two important aspects of classification(precision&recall) and also it handles the class imbalance well,which is critical for our dataset, as the distribution of classes is uneven. It is also well-suited for our multiclassifer probelm as it is calculated by taking the average of the F1 scores for each class in a multi-class problem. By focusing on the Macro F1 score, we ensure that our model evaluates performance fairly across all league.

Baseline Model

We used Random Forest Classifier as our baseline model following 4 features including: dpm,earned gpm,gamelength,vspm. Among these four features, all of them are quantitative. Since they are all quantitative variables, we did not perform any encoding at this stage. After fitting the model,we evaluated our model using the Macro F1 score, which provides a better measure of performance.The Macro F1 score for our model is 0.309. This relatively low Macro F1 score tells us that the model’s challenges with correctly classifying less-represented leagues, particularly due to a lower recall for these classes, which indicates that many true positives were missed. We would consider this model is mot yet ‘good’ enough for practical use(but at least it’s slightly better than random wild guess). Improvements in feature selection&engineering and model optimization are necessary to create a more reliable classifier.

Final Model

For our final model, we added another 5 features: pick1,pick2,pick3,pick4,pick5(These features represent the champion picks for a team in a match). Champion picks are a key component of a team’s strategy and are often region-specific. Certain champions may be prioritized or avoided in specific leagues due to meta trends, player preferences, or regional playstyles. Including these features allows the model to capture strategic differences across leagues, which is critical for predicting a team’s league affiliation. We still used a Random Forest Classifier. Ad the added features are categorical variables, so we first OneHotEncoded them. And then we performed 5-fold cross-validation using Macro F1-score as the evaluation metric and tested the following hyperparameter max_depth values: [2, 3, 5, 7, 10, 20, 50, 100, 200, 500, None]. We found out that the best-performing max_depth was 50, achieving a mean cross-validation Macro F1 score of 0.33. It was still a pretty low number but definitely a slight imporvement from our baseline model with an imporved Macro F1 score. Let’s take a look at the confusion matrix:

As we can tell from the graph, the model struggles to predict LCS and LEC accurately. We need to improve our model so it could perform better on minority classes (e.g., LCS and LEC). Addressing class imbalance through techniques like oversampling or class weighting could improve predictions for these leagues. Additional features specific to LCS and LEC gameplay strategies may help reduce misclassifications.

Fairness Analysis

We want to see if this model is fair among different groups, especially for teams in LCK, LPL and teams in LEC, LCS. GroupX is LCK and LPL (Major Asian leagues) and GroupY is LEC and LCS(Western leagues). Evaluation Metric is accuracy, specifically the difference in accuracy between Group X (LCK and LPL) and Group Y (LEC and LCS).

The followings are our hypothesis:

After conducting the test, we found out that the Observed accuracy difference is about 0.6686 and p-value is 0. The p-value is smaller than the significance level. Thus, we can reject the null hypothesis and conclude that our model is Fair!