INTRODUCTION
One of the key challenges faced by basketball coaching, scouting, and other staff is identifying players with similar characteristics. Building a successful team often involves targeting and signing players with specific traits that complement one another, creating a versatile offensive lineup. Similarly, it is crucial to form lineups that exploit the opponent’s defensive weaknesses. In this study, we analyze player data from the 2023-2024 EuroLeague season to cluster players based on the similarities in their performance and identify the common characteristics shared within these groups.
METHODOLOGY
K-MEANS CLUSTERING
In this research, we employ an unsupervised machine learning technique known as K-means clustering. This algorithm groups unlabelled observations in a dataset into clusters by adhering to two key principles: a) All data points within the same cluster should be highly similar and b) Different clusters should be clearly distinct from one another.
K-means achieves this by identifying the centroid of each cluster and assigning data points based on their proximity to these centroids. A crucial step in this process is determining “K,” the number of clusters into which the dataset will be divided. Several methods exist to identify the optimal value of K. For this study, we use the popular Elbow Method. This approach calculates the Within-Cluster Sum of Squares (WSS) for various values of K (ranging from 1 to 10 in this study). By plotting WSS against K, the optimal number of clusters is identified as the point where the decrease in WSS between consecutive values of K becomes minimal.
DATA
This research focuses on analyzing players’ offensive performance during the 2023-24 Euroleague Basketball season. Various indices across different categories will be considered, including scoring efficiency inside and beyond the arc, foul-drawing ability, ball handling, and offensive rebounding. Our objective is to create multiple clusters of players, allowing us to identify and present common characteristics within each cluster. Additionally, we aim to uncover insights such as whether players in similar positions tend to belong to the same cluster or how each cluster can be characterized. Notably, demographic attributes like age, height, nationality, or position are excluded, as the focus is purely on on-court performance.
DATA CLEANSING AND PREPARATION
Excluding outliers and duplicates
Our initial dataset consists of 306 observations, that correspond to all players that were part of the game squad at least once during the regular season. From this list, 47 players had total playing time less than 25 minutes so they were excluded as outliers. Out of the 267 remaining observations a few of them correspond to players that made an appearance with more than one team (e.g. Shabazz Napier, Ante Zizic, Martin Hermannson). For those we will keep only the rows containing the stats for the team that they finished the season with.
DATA NORMALIZATION
Next, all data will be scaled per 40 minutes since their total playing time differs. Then each feature will be normalized to have a mean value of 0 and standard deviation of 1. This is an essential step in order to ensure that features such as EFG% or Offensive Rating will have the same impact as features such as Assists or 3-pointers attempted, because K-means Clustering is sensitive to features that have incomparable units.
APPLICATION OF K-MEANS CLUSTERING METHOD
After completing the prerequisites to apply the k-means algorithm, we will proceed to the analysis that consists of two steps : a) Determining the optimal number of clusters, known as “k” and b) Splitting the players into k clusters
DETERMINING THE OPTIMAL “K”
The Elbow method was employed to determine the optimal value of K, with the results illustrated in Figure 1. As shown, the curve becomes noticeably flatter beyond K = 6, indicating a minimal decrease in the Within-Cluster Sum of Squares (WSS) between consecutive points. Consequently, dividing the data into six clusters achieves a balance between maximizing within-cluster similarity and maintaining distinct separations between clusters.

RESULTS
The result is 6 clusters with a different size each one, that ranges from 20 to 68. The clusters are visualized in Figure 2, showcasing the distribution of players along with a few named examples. Players closer to the centroid of a cluster are considered representative of that cluster’s defining characteristics. Some clusters appear closely aligned or partially overlapping (e.g., the fourth cluster shares similarities with three others), while the second cluster stands out as the most distinct. This overlap highlights cases of players who could be classified into multiple clusters, exemplifying versatility as “All-Around” contributors excelling in various aspects such as scoring, rebounding, and passing.
Red Cluster
The first cluster (Red) is the smallest and there we can find players with very limited involvement in the offensive end (only 15% of them have a USG% above average), that rarely made their presence notable. Moreover, the performance in 2FG% or 3FG% or contribution by assists for the majority of those players is low or very low. Probably the most famous names are Pangos and Baron, two players that used to be key players for their teams some years ago.
Yellow Cluster
The second cluster (Yellow) mostly consists of Centers that are good finishers inside the paint (usually >60% in 2FG%), perform above average in Offensive Rebounding, get often to the free throw line but in contrast do not attempt 3-pointers (or have low % there) and have minimal contribution with assists. Notable examples include Milutinov, Hines, Vesely and Tavares, but also a handful of Power Forwards such as Moneke or Sikma.
Green Cluster
The third cluster (Green) includes Forwards and Centers that can pose a threat to the opponent defense both behind and beyond the arc (Peters, Pleiss, Nunnally, Yabusele) and can also occasionally register a few rebounds or assists.
Dark Blue Cluster
The majority of the players that the fourth cluster (Dark Blue) consists of are role players that are usually the 3rd or 4th option in offense for their teams. This is also the largest cluster and contains a variety of roles (such as 3&D players or Stretch 4’s). They are relatively efficient scorers and do not commit many turnovers but also do not grab many offensive rebounds or serve assists to their teammates. Some names of this cluster are J. Hernangomez, Papanikolaou, Melli, Abrines and Kalinic.
Light Blue Cluster
The fifth cluster (Light Blue) could be described as Guards who are decent in most of the categories but without excelling in any of them. They are good in producing points by assists and can also make some 3-point shots, but almost two thirds of the players of the list perform below average when it comes to Offensive Rating, indicating they are not so efficient scorers. In this group we can find names such as Jokubaitis, Williams-Goss and Miller-McIntyre, but also players that are specialists in defense (Calathes, J. Grant and Walkup).
Pink Cluster
Finally, in the sixth cluster (Pink) we find players who run the offense of their team and are the go-to guys when it comes to scoring. That means players with high Usage Rate, Points, Floor Percentage and Scoring Possessions amongst other indices. As expected, stars like Nunn, Sloukas, Mirotic, Larkin and James belong to this cluster.

CONCLUSION
In this research we saw that players in EuroLeague Basketball can be categorized in 6 (or 5 if we exclude the smallest) clusters based on their characteristics in offense. A few interesting conclusions that were drawn are the uniqueness of some players (e.g. S. Rodriguez, M. Howard and C. Moneke lie in the edges of their clusters), the similarity between others that fall into the same cluster (such as F. Campazzo, K. Nunn, K. Sloukas and S. Larkin who not only are stars for their teams but also seem their performance is quite comparable) as well as the fact that we barely find Power Forwards or Centers with high Usage Rate.
Image source : http://eurohoops.net


