The purpose of our research is to ascertain the key drivers of professional soccer player valuation in the transfer market.
Drawing on sports economics, finance and management literature, we connect data-driven approaches to player valuation in the context of organizational decision-making. We evaluate the performance of four predictive models using over 800 real-world transfer fee records and extensive features in the “Big Five” leagues from seasons 2017–2018 to 2019–2020. Subsequently, we leverage Shapley Additive Explanations (SHAP) values, an interpretable machine learning (ML) technique, to identify important features and quantify their contributions to transfer fees.
A few fundamental human capital factors (e.g. age) and labor market variables (e.g. contract remaining) emerge as the key value drivers, outweighing technical capabilities (e.g. goal-scoring). Sport-general features (e.g. composure and reaction) hold greater predictive power than soccer-specific skills (e.g. dribbling).
Our research enhances the explainability and transparency of a reasonably accurate player valuation model in two ways. First, we utilize a rich set of interpretable, fine-grained player features. Second and more importantly, SHAP values allow us to deconstruct player valuation and provide economic interpretations of feature importance at both individual and aggregate levels. We also outline the practical implications of adopting interpretable ML in sports organization decision-making.
Introduction
Transfer is a distinctive economic behavior in the professional soccer market, in contrast to the U.S. franchise systems (e.g. National Football League, National Basketball Association) characterized by player-to-player trading. Soccer player transfers play a pivotal role in boosting team performance (Carmichael and Thomas, 1993) and are a pillar of club business models (Neri et al., 2023). A transfer refers to the movement of a player between two clubs, specifically the acquisition of their “performance rights” (Majewski and Majewska, 2017), often incurring a transfer fee. Table 1 summarizes transfer mechanisms introducing heterogeneity into acquisition costs. Free transfers and player loans do not require transfer fees. The market has a relatively low turnover rate such that a moderate number of players are transferred for fees during each transfer window. Total costs sometimes include bonuses and agent commissions.
Comparable to yet conceptually different from transfer fee, market value is a theoretical construct referring to an estimated price a club would be willing to pay for a player rather than a realized price, independent of an actual transfer (Herm et al., 2014; Balliauw et al., 2022). The underlying assumption is that every player has an “objective” market value determined by various characteristics such as age, reputation, past performance, and future potential. However, such a value seldom exists in an imperfect market with information asymmetries manifested by behind-the-scenes negotiations and hidden player personality issues (Follert and Gleißner, 2024). A widely adopted benchmark in transfer negotiations is crowd-sourced market values on Transfermarkt.com. Some studies have raised concerns about the validity and reproducibility of the “wisdom of the crowd” approach (for a systematic critique of market values on Transfermarkt.com, cf. (Coates and Parshakov, 2022)).
The challenges of predicting a soccer player’s transfer fee or market value are inherent in the valuation of multi-attribute strategic assets under uncertainty, be it human capital, venture capital, or real estate. Decision-makers need to mentally process a vast amount of information. Some turn to the “availability heuristic” (Tversky and Kahneman, 1973) or “fast-and-frugal heuristics” (e.g. emphasis on recent performance or name recognition) (Gigerenzer, 2023; Raab and Gigerenzer, 2015). These can be effective cognitive shortcuts. While clubs also solicit expert opinions, the precise valuation criteria remain opaque. Such valuations may be subject to an anchoring bias (e.g. overreliance on record fees or media speculation) and have room for improvement.
Against this backdrop, our overarching research question is:
What are the key drivers of player valuation in the soccer transfer market?
Our motivation is twofold. First, the bulk of prior sports analytics research has focused on performance evaluation, tactical analysis, fan engagement, or match scheduling (Katz et al., 2020; McHale et al., 2012; Pappalardo et al., 2019; Stambulova et al., 2007). Nevertheless, player performance metrics and transfer fees have not been fully integrated, with many player features being unexplored (Gerrard, 2016; Watanabe et al., 2021). This underutilization of available information may cause inefficiencies in the transfer market (Gerrard, 2014). Second, achieving both predictive accuracy and transparent explanations is a non-trivial empirical undertaking. Traditional multiple regression may oversimplify the relationships between transfer fee drivers (Herm et al., 2014; Shmueli and Koppius, 2011), whereas black-box predictive models lack the transparency necessary for building trust with stakeholders (Bauer et al., 2023; Lundberg et al., 2018; Wanless and Naraine, 2023).
By addressing these gaps, we make two major contributions. First, we curate a rich dataset comprising fine-grained player skills (e.g. reaction, composure) and advanced performance metrics. Then, we match these features with real-world transfer fees rather than estimated market values, ensuring data validity and practical relevance. Our research differs from a related study (McHale and Holmes, 2023). We focus on the “Big Five” leagues (English Premier League, Italian Serie A, Spanish La Liga, German Bundesliga, and French Ligue 1) that account for approximately two-thirds of global transfer spending (Poli et al., 2022). In addition to several coarse performance metrics, we utilize a wide set of granular and interpretable features such as composure and ball control, adding specificity to our results. McHale and Holmes (2023) identify generic overall player ratings as a strong predictor of transfer fees, but the specific player features that contribute to this rating remain unclear.
Second, we leverage Shapley Additive Explanations (SHAP) values, an interpretable Machine Learning (ML) technique, to identify key drivers of player valuation at the individual and market levels. SHAP values explain player valuation by visualizing the most important features and quantifying their marginal contributions, which constitutes a novel addition to sports management (Garnica-Caparrós and Memmert, 2021). Moreover, feature importance measured by SHAP values can directly translate to monetary values and therefore have an economic interpretation. By contrast, McHale and Holmes (2023) use a variance-based metric gain score to quantify feature importance, which is hard to interpret practically and cannot point out the directionality of importance (i.e. a positive or negative influence).
Figure 1 is our framework connecting data-driven approaches to strategic decision-making in sports organizations. The rest of our article revolves around this framework. From the top down, the Related Research section primarily focuses on the two theoretical lenses (human capital and pricing), with a brief discussion of analytics-based management paradigms. From the bottom up, the Data and Methodology section describes our observed features and predictive models coupled with an interpretable ML technique. The Results section presents the empirical results. Finally, we lay out managerial implications, limitations, and future research directions.
Related Research
Players are the most valuable assets of a soccer club through the lens of human capital (Breuer et al., 2021). Their knowledge, skills, and abilities are intellectual capital and value drivers (Ployhart et al., 2014; Rubio Martin et al., 2022; Wright et al., 1995). Players’ human capital is of entrepreneurial value to clubs (Radaelli et al., 2018). The identification and acquisition of human capital represent the means for sport organizations to succeed athletically and financially.
A strand of literature has identified a handful of human capital factors as salient value drivers: age, nationality, popularity, injury history, match appearances (employee seniority), and playing position (Bryson et al., 2013; Carreras-Simó and García, 2022; Franck and Nüesch, 2008, 2012; Frick, 2007; Herm et al., 2014; Montanari et al., 2008; Pedace, 2008; Shapiro et al., 2017; Stambulova et al., 2007; Kuper and Szymanski, 2009). It is worth noting that popularity originates from intrinsic on-field performance or extrinsic off-field social media celebrity status (Franck and Nüesch, 2008; Franck and Nüesch, 2012; Garcia-del-Barrio and Pujol, 2007; Herm et al., 2014; Shapiro et al., 2017; Rai et al., 2021). The latter often leads to star players earning disproportionately more than their peers–a phenomenon known as the “superstar effect” (Garcia-del-Barrio and Pujol, 2021; Hofmann et al., 2021).
Human capital extends to physiological (e.g. speed, stamina) and psychological (e.g. game intelligence) features (Ali, 2011; Rein and Memmert, 2016; Williams, 2000; Reilly et al., 2000). Ambidexterity (i.e. two-footedness), a special skill, is associated with higher transfer premiums and wages (Bryson et al., 2013; Frick, 2007). These features interact synergistically to affect player value (Ployhart et al., 2014).
At the market level, several contextual factors influence transfer fees. Bargaining power is arguably the most critical theoretical construct. Contract duration is a simple measurement of this construct. More years remaining in a player contract afford the seller club stronger bargaining power (Campa, 2022), because the Bosman ruling eliminates transfer fees upon contract expiration. Bargaining power could also be assessed by clubs’ league rankings or financial resources (Franks et al., 2016; Frick, 2007; Serna Rodríguez et al., 2019; Tunaru et al., 2005). Clubs in wealthy leagues (e.g. the English Premier League) often acquire talent from development-oriented “farm” leagues like the German Bundesliga (Matesanz et al., 2018). Following a recent systematic review (Franceschi et al., 2024), we summarize player value drivers into individual-level human capital factors and market-level contextual factors in Table 2.
Classic pricing theories assume monetary values of human capital, including value-based hedonic pricing, risk-aware option pricing, and auction theory. Rosen (1974, p. 34) defines hedonic prices as “the implicit prices of attributes and revealed to economic agents from observed prices of differentiated products and the specific number of characteristics associated with them”. A soccer player possesses a repertoire of features (e.g. human capital factors) that differentiate them from peers. Their transfer fee would therefore be the aggregate of implicit, hedonic values of these utility-generating features. Option pricing conceptualizes a player as a risky asset (Coluccia et al., 2018; Kedar-Levy and Bar-Eli, 2008; Majewski and Majewska, 2017). From this perspective, age is analogous to the asset lifecycle. As a player matures and gains experience, their value appreciates. When they reach the middle or late career, the value has peaked and then begins to decline. Injuries are uncertain risks that could rapidly erode a player’s value. According to auction theory, transfer negotiations resemble asymmetric bidding processes (Rottenberg, 2000). Transfer fees would be the outcome of a bargaining process in which an imbalance of bargaining power exists among buyers (Rottenberg, 2000). Different buyers can submit multiple bids and the seller can accept the preferred one. If a buyer activates the release clause, the seller will be obliged to approve the transfer and no auction will occur.
To identify key performance indicators (KPIs), many sports organizations adopt data-driven, analytics-based paradigms in strategic decision-making, especially consequential human resource management processes such as hiring (Alamar and Methrotra, 2011; Davenport, 2014; Gavião et al., 2023; Gerrard, 2007; Fry and Ohlmann, 2012). ML-assisted recruitment in organizational contexts entails an interplay of technical aspects and social aspects (Sturm et al., 2023). Some sports organizations are reluctant to shed light on proprietary models and certain KPIs for fear of losing a competitive edge (Coleman, 2012; Memmert and Raabe, 2023; Watanabe et al., 2021). Proponents of analytics have a cultural conflict with traditionalists who have a deep appreciation of the game (Alamar and Methrotra, 2011). Transparency of ML predictions, enhanced by interpretable ML techniques, helps analytics departments communicate the rationale behind predictions to stakeholders (e.g. general managers, sporting directors, executives), facilitating ML adoption and fostering trust in algorithms (Coussement et al., 2024; Lolli et al., 2025; Zhou et al., 2025).
Data and Methodology
Data sources
We combine three widely cited open-source soccer datasets, Transfermarkt.com, FBRef.com, and Sofifa, to curate a comprehensive dataset for player valuation (Herm et al., 2014; Müller et al., 2017; Payyappalli and Zhuang, 2019). Real-world transfer fees provided by Transfermarkt.com are our target variable. FBRef.com offers real-world performance data such as playing time, goals, assists, and advanced performance metrics expected goals (xG) and expected assists (xA) (Rathke, 2017). Sofifa supplies physiological, psychological, and technical skill features evaluated by the FIFA video game series. Known for its fidelity, Sofifa data is commonly repurposed for research tasks such as ML-based valuation (Al-Asadi and Tasdemır, 2022), fairness assessment (Awasthi et al., 2021), and player potential prediction (Vroonen et al., 2017; Carpita et al., 2021; McHale and Holmes, 2023; Wakelam et al., 2022).
We match players from different data sources by name and date of birth. Then, we apply a probabilistic record linkage algorithm to non-exact matches because of inconsistent or irregular player names, accepting matches with a similarity score above 0.5 (Stanojevic and Gyarmati, 2016). This results in a final dataset of 831 matched player transfers across the “Big Five” leagues from seasons 2017–2018 to 2019/20. Appendix 1 provides the full list of features used in our research. Our feature selection is a combination of theoretical justifications, domain expertise, and data availability. Guided by McHale and Holmes (2023), we exclude free transfers from the dataset.
Predictive modeling and experimental setup
Our objective is to approximate the underlying player valuation function , where is a vector of the target variable (i.e. transfer fees) and represents a vector of features (i.e. value drivers). We apply four regression-based predictive models: Decision Tree (DT), Random Forest (RF), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost). We evaluate model performance by Root Mean Square Error (RMSE) and . We split the dataset into 80% training with five-fold cross-validation and 20% testing.
Model interpretation
We employ SHAP values to interpret feature contributions (Bauer and Anzer, 2021). SHAP values treat a model prediction as a cooperative game payout distributed among features based on their marginal contributions (Antwarg et al., 2021; Lundberg et al., 2018; Rudin, 2019; Garnica-Caparrós and Memmert, 2021). In general, the explanation model is defined as the following linear function:
where , . denotes the number of all features. is a binary indicating whether feature is present () or absent (). is derived from Equation (2) where is a feature value of an instance (i.e. a player) being explained. is a subset of feature values excluding . denotes the expected value of the function conditioned on . SHAP values combine these conditional expectations to attribute to each feature.
The sum of SHAP values for an instance equals the difference between the prediction and the baseline in Equation (3). This additivity property ensures that SHAP explanations are consistent and complete. SHAP values provide both aggregate-level and individual player-level interpretability to enhance transparency in the complex, high-stakes decision-making of player valuation.
Results
Table 3 presents the performance metrics for all the four predictive models. XGBoost achieves the lowest RMSE and the highest . Therefore, we base subsequent SHAP value analysis on this model. Appendix 2 documents the tuning parameters of XGBoost.
Figure 2 decomposes the SHAP values of the predicted log-transformed transfer fee of 18.162 (£77,206,947) for Matthijs de Ligt. The base log value 15.422 (£4,985,279) is the average predicted transfer fee in the testing set. Red arrows indicate features with positive SHAP values increasing the predicted transfer fee from the base value, while blue arrows represent features with negative SHAP values doing the opposite. In de Ligt’s case, no top feature shows a negative impact. The length of each arrow is proportional to the SHAP value magnitude. Taken together, these SHAP values explain the difference between the base and predicted values. Specifically, Team Rating and Age increase the predicted transfer fee by the widest margins. In the summer of 2019, de Ligt transferred to Juventus, a highly competitive and financially well-endowed club in the Italian Serie A League. His young age also elevates the predicted fee. Other features with positive SHAP values include Movement Reactions, Games, Contract Remaining, Mentality Composure, Attacking Heading Accuracy, and International Reputation.
Next, we present SHAP values in the entire testing set to provide aggregate-level explanations. Figure 3 is a bar chart ranking features by their mean absolute SHAP values in descending order, where features with higher average contributions are more influential in predicting transfer fees. Figure 4 attributes predictions to each feature on the vertical axis by mapping feature values to the corresponding SHAP values on the horizontal axis. In this plot, each dot is a player. Its color reflects the magnitude of a feature value. Dots with similar SHAP values are clustered together. Overall, Contract Remaining emerges as the most important feature, followed by Team Rating and Age. More years remaining in the contracts are associated with marked rises in the predicted transfer fees, as evidenced by the visible gap in the SHAP value distribution. Almost on a par with Contract Remaining, the effect of Team Rating is more continuous. A younger Age contributes to a noticeable increase in the predicted fee and vice versa. Mentality Composure has a narrow SHAP value distribution. Therefore, its positive relationship with the predicted fees remains proportional. Skill Ball Control displays a mixed result: most have modestly negative SHAP values, with only a few high values associated with disproportionately large increases in the predicted fees. More Minutes and Games contribute to the predicted fees. Premier League (a dummy variable) has a clear positive effect. Akin to Skill Ball Control, some high Movement Reactions values boost the predicted fees. Lower Movement Sprint Speed values typically have negative SHAP values. xG makes a meager contribution to predicted transfer fees, and so does Skill Dribbling.
In addition, we use SHAP dependence plots (Figures 5-12) to further examine these relationships by visualizing the marginal effects of select features on the predicted transfer fees, ceteris paribus. These plots highlight the nuanced and nonlinear nature of feature contributions. Players with less than three years remaining on their contracts tend to have negative SHAP values. In contrast, a substantial increase in predicted transfer fees takes place when Contract Remaining exceeds three years. The nonlinear effect of Age turns negative beyond 26 and declines steadily thereafter. Team Rating between 70 and 80 increases the SHAP values continuously, albeit with a slower rate after 80. At the upper end, especially between 70 and 80, Mentality Composure, Skill Ball Control, and Movement Reactions all have high SHAP values. Minutes ranging from 1,500 to 2,000 records a jump in the SHAP values. The values then fluctuate within a small range. A similar pattern holds for Games, where the SHAP values turn positive only after 20 matches.
Discussion
We draw several key findings from our analysis. First, the prominence of fundamental human capital factors and labor market variables in predicting transfer fees contradicts the prima facie expectation that technical capabilities would have a larger influence (Dubois and Walzak, 2025). Chiefly, Age is among the three most influential value drivers. As key value drivers, Games and Minutes measure on-field involvement and signal player experience. This finding underscores the importance of employee seniority (Franceschi et al., 2024; Frick, 2007; McHale and Holmes, 2023). Contract Remaining demonstrates the most substantial influence on predicted transfer fees. When a contract approaches its final stage, particularly the last two years, the incumbent club’s bargaining power weakens significantly, leading to downward pressure on transfer fees. Team Rating measures the sporting strength of the acquiring club and might be a proxy for its financial capacity. Premier League is also a key driver, presumably due to its strong purchasing power and club visibility (Matesanz et al., 2018).
Second, sports-general physiological and psychological features are more impactful than soccer-specific skills. Mentality Composure ranks as the fourth most important value driver. High composure (e.g. above 80), however, yields diminished returns. Physiological features Movement Reaction and Power Strength contribute more to predicted transfer fees than domain-specific technique Skill Dribbling. This implies that dribbling, notwithstanding its hedonic value, may not generate commensurate utility from a club’s perspective. When it comes to soccer-specific skills, Skill Ball Control is the most significant contributing factor. While low to average values show a minor negative effect, exceptionally high Skill Ball Control has a disproportionately positive influence on transfer fees. This suggests that this skill, despite being less visible than dribbling or goal scoring, may be an undervalued core competency. The rank of xG highlights the usefulness of a more nuanced measure of attacking contribution in player valuation. Raw goal counts are rare events and inflated in some leagues (Memmert and Raabe, 2023). In line with Franceschi et al. (2024), yellow or red cards and footedness do not significantly affect predicted transfer fees. In conclusion, we do not intend to downplay the importance of soccer-specific skills. Rather, these features individually have a minuscule influence on player valuation but collectively demonstrate considerable predictive power.
Practical implications
The adoption of SHAP values can be a starting point for human-AI collaboration in sports organization financial strategic decision-making (Dubois and Walzak, 2025). Transparent explanations based on SHAP values for multi-attribute asset valuations may not only enable stakeholders to engage “System 2” thinking (slower, more deliberate, and analytical reasoning) but also reshape their information processing (Bauer et al., 2023). Analytics departments illustrate the contributions of various features to a valuation by SHAP value visualizations and communicate with non-technical stakeholders. General managers and sporting directors determine if data-driven assessments substantiate their intuitions or prompt adjustments to the weight of specific information for due diligence. This human-in-the-loop approach could help recalibrate judgment and augment bounded rationality (i.e. domain-specific, idiosyncratic prior knowledge from experiences).
On the demand side, a buyer club may estimate the degree to which a target player’s features that align with its tactical or strategic needs would drive the transfer fee. Furthermore, a buyer can generate contrastive explanations on the valuation of similar players and implement “blind scouting” for personnel selection (Dubois and Walzak, 2025), comparing SHAP values of anonymized candidates to reduce popularity or nationality biases. Accordingly, a buyer can allocate the budget to star players whose desirable features warrant a premium or cost-effective players whose key features are above average (Beiderbeck et al., 2021; Toma and Campobasso, 2023). On the supply side, a seller club may better understand the diminished returns or outlier effects of certain player features (e.g. Mentality Composure or Skill Ball Control). Clubs dependent on selling youth academy or key players as a business model could tailor talent development schemes to prioritize the features that would improve player brand image and help command higher fees (Hofmann et al., 2021).
Limitations and future research
Due to data availability, our feature selection omits contextual factors such as club-specific business strategies and market asymmetries. Inflation and the timing of transfers are outside the purview of our present research (Yang et al., 2024). The most relevant use case of our methodology is modeling transfer fees. A notable exception is the activation of release clauses, which provides a more direct explanation for transfer fees than player features. Controlling such cases would add rigor and amount to a new research direction. Future research could generalize our methodology to free transfers, loans, or player swaps, exploring key drivers of wages instead of transfer fees. A more holistic approach should model not only transfer fees but also factor in wages, performance bonuses, and even agent commissions. This could help the soccer regulatory body investigate transparency and fairness in club expenditure from a financial fair play standpoint (Neri et al., 2023).
Another future research direction is to operationalize different dimensions of bargaining power through a transfer market network analysis (Liu et al., 2016; Matesanz et al., 2018). In such a network, nodes are clubs, and edges show the flow of players or capital. Edge weights indicate the number of players moving from one club to another or associated fees. Network centrality measures (e.g. in-degree, out-degree, closeness) could capture bargaining power beyond contract remaining. Lastly, future research could ask if organizational fit (e.g. squad chemistry), the alignment between player and team style, would be a key driver in valuation (Al-Madi et al., 2016; Taylor and Giannantonio, 1993).
We acknowledge that theory-driven approaches, such as structural modeling and causal inference, are well-established research traditions and are integral to hypothesis testing. Our research design choice of predictive modeling is rooted in the growing trend in analytics-based sports management (Bogaert et al., 2017; McHale and Holmes, 2023; Watanabe et al., 2021; Yang et al., 2024), given the multidimensional nature of data (e.g. a large number of features). Our methodology does not substitute for or preclude structural modeling or causal inference. Rather, it shows complementarity and advances methodological pluralism. Structural models offer retrospective, theory-grounded insights, whereas predictive models provide proactive forecasts of transfer fees based on current player characteristics. By using SHAP values to explain feature influence on historical transfer fees, we bridge the gap between black-box predictions and economic interpretations.













