GMs and coaches of NBA teams need to make highstakes decisions when assembling their rosters. With the average NBA player salary topping $6.62 million for the 2022-23 season ([1]), the need for data insights is higher than ever. Our goal is to describe how NBA player salaries relate to performance metrics during the 2022-23 regular season. We focus on Points Per Game (PTS) and Minutes Played (MP) as primary predictors, along with other stats like FG, TRB, 3P%, TOV, WS, AST, and GP. This exploratory analysis will derive actionable insights and enable key stakeholders to make data-driven decisions.
While prior studies use machine learning for salary prediction ([2, 3]), our Ordinary Least Squares (OLS) regression model prioritizes interpretability to identify which performance indicators are most descriptively associated (not causal) with compensation. Because this is a team sport, we acknowledge the presence of dependence among observations and violations of the IID (independent and identically distributed) assumption. However, we proceed with the analysis given our focus on descriptive insights. We will revisit these hypotheses in our conclusion to confirm or refine their validity.
For data wrangling, we began by obtaining the main dataset from Kaggle and iteratively scraping roster data for each team from Basketball Reference. We then added All-Star selections and merged the dataset with market size information. Duplicate entries, primarily from players traded mid-season, were identified and removed. The cleaned data was split into two parts: 30/70 for exploratory data analysis (EDA) and the remainder for confirmatory analysis. The current analysis focuses on the EDA portion.
The raw NBA dataset contains 140 players and 62 variables for the 2022-23 season. After cleaning, the dataset contains 140 active players with valid salary data. We selected 24 key variables including performance metrics, player characteristics, and team information. We understand that Salary is heavily right-skewed, so we created a log-transformed variable (LogSalary) to normalize the distribution. We also created a Box-Cox transformed variable (BoxCoxSalary) with lambda of 0.1455141 to further improve normality.
3P%
(3-point shooting percentage), TOV (turnovers), WS (win shares), AST (assists per game), GP (games played)The salary distribution is heavily right-skewed (Figure 5.1). Applying a log transformation makes the distribution more symmetric and closer to a normal shape (Figure 5.1). Additionally, rookies are clearly at a disadvantage in terms of salary compared to more experienced players (Figure 5.1).
Figure 5.1: Salary distributions and boxplots by player category
PTS and FG are both right-skewed and highly correlated, while MP shows a fairly uniform distribution (Figure 5.2). True Shooting Percentage (TS%), calculated as TS% = PTS / (2 * (FGA + 0.44 * FTA))—which integrates scoring from field goals and free throws—tends to cluster around 0.6 regardless of salary (Figure 5.2). Notably, there’s an outlier with a TS% of 0, and interestingly, an individual with a TS% of 0.8 isn’t earning much.
Figure 5.2: Performance metric distributions
Strong correlations among metrics—such as PTS, MP, WS, and FG—suggest these variables tend to move together (see Appendix Figure 8.1). Additionally, histograms reveal significant right skewness, with TVMS displaying an almost bimodal distribution. Therefore, it’s crucial to test for variance inflation factors (VIF) before constructing the models to ensure multicollinearity isn’t an issue.
There is a noticeable positive correlation between points (PTS) and other performance metrics such as minutes played (MP), field goals (FG), turnovers (TOV), and win shares (WS) (see Appendix Figure 8.2). This suggests that market-driven factors captured by TVMS are distinct from on-court performance.
The analysis indicates that performance metrics such as points per game, minutes played, and win shares are strongly correlated with salary, although there are exceptions—such as a rookie with high minutes but a relatively low salary (Appendix Figure 8.6). Additionally, salaries tend to rise with years of experience before leveling off, highlighting the impact of tenure and potentially age. There are also clear positional differences; for example, centers and shooting guards tend to have lower median salaries compared to power forwards, who generally command higher median earnings. While there is an observation that players on larger market teams may earn more on average, this relationship appears to be complex and could benefit from further evaluation.
To explore linear relationships between NBA player salaries and various performance metrics, we formulated nine models across three salary transformations. These models assess how different variables influence player compensation during the 2022-23 season.
We test three model specifications with three different dependent variables:
Models 1-3: Raw Salary
Models 4-6: Log-transformed Salary (LogSalary)
Models 7-9: Box-Cox transformed Salary (BoxCoxSalary)
For each transformation, we test three specifications:
Basic Model: Only Points Per Game (PTS)
Performance Model: PTS, Total Rebounds (TRB), and Assists (AST)
Comprehensive Model: Performance metrics plus Experience (Exp_num) and TV Market Size (TVMS)
No collinearity concern with the VIF results (<5), see Appendix Table 8.3.
From the fitted value v.s. resudul plot, whereas the first three models exhibit heteroscedasticity, the next six (using log and Box-Cox transformations) are homoscedastic. These observations are further supported by the Breusch-Pagan test results. (See Appendix Figure 8.4 and Appendix Table 8.5.)
Based on the exploratory models (Appendix Tables 8.7, 8.8, and 8.9), Model 3 (Raw Salary ~ PTS + TRB + AST + Exp_num + TVMS) with additional selected predictive variables provides improvements in R^2 compared to Model 1 and Model 2. While log and Box-Cox transformations (Models 4 - 9) successfully decreased heteroscedasticity, they also decreased R^2 and make interpreting coefficients less direct (Appendix Figure 8.4). Model 3 includes key performance metrics, experience, and market size, all showing significance (especially with robust standard errors). Since we can use robust standard errors to address heteroscedasticity, we can still use the raw salary model (Model 3) for interpretability and significance testing. (Appendix Table 8.10). Therefore, we proceed with Model 3 as final model and apply confirmatory dataset to validate our findings.
This regression explains about 69% of the variation in NBA salaries, with scoring (PTS), experience (Expnum), and TV market size (TVMS) emerging as the most influential factors. Specifically, each additional point per game is associated with roughly a $942 K boost in salary, each year in the league adds about $1.01 M, and playing in a larger TV market contributes roughly $642 K, all else held equal. Rebounds and assists do not carry significant weight once scoring is in the model, and visual diagnostics show that predictions align well with actual salaries for most players, though superstar contracts at the high end exhibit greater dispersion.
As potential next steps, our model could be refined by incorporating longitudinal data to capture temporal trends in player performance. Additionally, addressing omitted variable bias by including contextual factors—such as changes in league policy, evolving salary structures, the influence of agents, and injury history—may further enhance explanatory power. Stratified modeling, such as developing team-specific or region-specific models, could also improve fit and interpretability. Lastly, exploring nonlinear relationships through the inclusion of quadratic or interaction terms may better capture the complexity inherent in the data.
Dependent variable: | |
Salary | |
Final Model | |
PTS | 942,429.500*** |
(122,593.400) | |
TRB | 225,295.100 |
(255,167.000) | |
AST | 19,140.650 |
(404,417.300) | |
Expnum | 1,014,302.000*** |
(118,423.100) | |
TVMS | 641,778.900*** |
(222,584.200) | |
Constant | -7,211,120.000*** |
(848,045.000) | |
Robust p-val: PTS | 0 |
Robust p-val: TRB | 0.378 |
Robust p-val: AST | 0.962 |
Robust p-val: Exp | 0 |
Robust p-val: TVMS | 0.004 |
Observations | 326 |
R2 | 0.689 |
Adjusted R2 | 0.684 |
Residual Std. Error | 6,083,581.000 (df = 320) |
F Statistic | 141.784*** (df = 5; 320) |
Note: | p<0.1; p<0.05; p<0.01 |
Variable | Model 2 | Model 3 | Model 5 | Model 6 | Model 8 | Model 9 |
---|---|---|---|---|---|---|
PTS | 3.30 | 3.41 | 3.30 | 3.41 | 3.30 | 3.41 |
TRB | 1.63 | 1.73 | 1.63 | 1.73 | 1.63 | 1.73 |
AST | 2.54 | 2.83 | 2.54 | 2.83 | 2.54 | 2.83 |
Exp_num | NA | 1.29 | NA | 1.29 | NA | 1.29 |
TVMS | NA | 1.04 | NA | 1.04 | NA | 1.04 |
Figure 8.1: Residual plots for preliminary models
Model | BP Statistic | P-Value | Conclusion |
---|---|---|---|
Model 1 | 31.826 | 0.0000 | Heteroscedastic |
Model 2 | 53.128 | 0.0000 | Heteroscedastic |
Model 3 | 47.785 | 0.0000 | Heteroscedastic |
Model 4 | 0.274 | 0.6006 | Homoscedastic |
Model 5 | 7.171 | 0.0667 | Homoscedastic |
Model 6 | 6.457 | 0.2642 | Homoscedastic |
Model 7 | 0.284 | 0.5938 | Homoscedastic |
Model 8 | 7.392 | 0.0604 | Homoscedastic |
Model 9 | 6.957 | 0.2239 | Homoscedastic |
Figure 8.2: Salary vs key predictors
Dependent variable: | |||
Salary | |||
Model 1 | Model 2 | Model 3 | |
PTS | 1,065,213.000*** | 371,271.300** | 536,860.500*** |
(89,908.840) | (147,836.500) | (127,087.400) | |
TRB | 816,232.800** | 313,391.500 | |
(325,873.700) | (283,988.600) | ||
AST | 2,401,759.000*** | 1,444,860.000*** | |
(454,548.400) | (405,766.900) | ||
Expnum | 902,796.800*** | ||
(126,761.900) | |||
TVMS | 242,708.300 | ||
(261,655.200) | |||
Constant | -1,226,388.000 | -2,545,865.000** | -4,754,681.000*** |
(1,024,934.000) | (1,064,685.000) | (1,085,961.000) | |
Observations | 140 | 140 | 140 |
R2 | 0.504 | 0.600 | 0.718 |
Adjusted R2 | 0.501 | 0.591 | 0.707 |
Residual Std. Error | 7,409,796.000 (df = 138) | 6,706,668.000 (df = 136) | 5,673,890.000 (df = 134) |
F Statistic | 140.368*** (df = 1; 138) | 67.932*** (df = 3; 136) | 68.151*** (df = 5; 134) |
Note: | p<0.1; p<0.05; p<0.01 |
Dependent variable: | |||
LogSalary | |||
Model 4 | Model 5 | Model 6 | |
PTS | 0.133*** | 0.047* | 0.069*** |
(0.014) | (0.024) | (0.022) | |
TRB | 0.219*** | 0.150*** | |
(0.053) | (0.049) | ||
AST | 0.192** | 0.061 | |
(0.074) | (0.070) | ||
Expnum | 0.124*** | ||
(0.022) | |||
TVMS | 0.039 | ||
(0.045) | |||
Constant | 13.911*** | 13.559*** | 13.244*** |
(0.162) | (0.173) | (0.187) | |
Observations | 140 | 140 | 140 |
R2 | 0.387 | 0.474 | 0.585 |
Adjusted R2 | 0.383 | 0.462 | 0.569 |
Residual Std. Error | 1.170 (df = 138) | 1.092 (df = 136) | 0.977 (df = 134) |
F Statistic | 87.166*** (df = 1; 138) | 40.775*** (df = 3; 136) | 37.716*** (df = 5; 134) |
Note: | p<0.1; p<0.05; p<0.01 |
Dependent variable: | |||
BoxCoxSalary | |||
Model 7 | Model 8 | Model 9 | |
PTS | 1.245*** | 0.461** | 0.669*** |
(0.118) | (0.198) | (0.174) | |
TRB | 1.828*** | 1.197*** | |
(0.436) | (0.389) | ||
AST | 1.913*** | 0.712 | |
(0.608) | (0.556) | ||
Expnum | 1.131*** | ||
(0.174) | |||
TVMS | 0.323 | ||
(0.358) | |||
Constant | 45.203*** | 42.265*** | 39.453*** |
(1.343) | (1.423) | (1.487) | |
Observations | 140 | 140 | 140 |
R2 | 0.448 | 0.536 | 0.656 |
Adjusted R2 | 0.444 | 0.525 | 0.644 |
Residual Std. Error | 9.706 (df = 138) | 8.965 (df = 136) | 7.768 (df = 134) |
F Statistic | 111.830*** (df = 1; 138) | 52.273*** (df = 3; 136) | 51.199*** (df = 5; 134) |
Note: | p<0.1; p<0.05; p<0.01 |
Dependent variable: | |||
Salary | LogSalary | BoxCoxSalary | |
Model 3 | Model 6 | Model 9 | |
PTS | 536,860.500*** | 0.069*** | 0.669*** |
(165,364.900) | (0.023) | (0.200) | |
TRB | 313,391.500 | 0.150*** | 1.197*** |
(320,133.600) | (0.039) | (0.334) | |
AST | 1,444,860.000** | 0.061 | 0.712 |
(597,919.800) | (0.063) | (0.558) | |
Expnum | 902,796.800*** | 0.124*** | 1.131*** |
(183,344.700) | (0.021) | (0.188) | |
TVMS | 242,708.300 | 0.039 | 0.323 |
(297,389.300) | (0.037) | (0.324) | |
Constant | -4,754,681.000*** | 13.244*** | 39.453*** |
(997,226.400) | (0.219) | (1.626) | |
Robust p-val: PTS | 0.001 | 0.003 | 0.001 |
Robust p-val: TRB | 0.329 | 0 | 0 |
Robust p-val: AST | 0.017 | 0.339 | 0.204 |
Robust p-val: Expnum | 0 | 0 | 0 |
Robust p-val: TVMS | 0.416 | 0.29 | 0.321 |
Observations | 140 | 140 | 140 |
R2 | 0.718 | 0.585 | 0.656 |
Adjusted R2 | 0.707 | 0.569 | 0.644 |
Residual Std. Error (df = 134) | 5,673,890.000 | 0.977 | 7.768 |
F Statistic (df = 5; 134) | 68.151*** | 37.716*** | 51.199*** |
Note: | p<0.1; p<0.05; p<0.01 |
Variable | Description |
---|---|
Salary | Player salary in USD |
LogSalary | Natural logarithm of salary |
PTS | Points per game |
MP | Minutes played per game |
FG | Field goals made per game |
TRB | Total rebounds per game |
3P% |
3-point shooting percentage |
TOV | Turnovers per game |
WS | Win shares |
GP | Games played |
AST | Assists per game |
TVMS | TV Market Size |
TS% | True Shooting Percentage |
star_player | All-Star status (1=Yes, 0=No) |
Position | Player position |
Age | Player age |
Exp_num | Years of NBA experience |
Southwest Journal. (n.d.). NBA Players’ Salaries: How Much Do NBA Players Make? Retrieved from https://www.southwestjournal.com/sport/nba/nba-players-salaries/?utm_source=chatgpt.com
Papadaki, I., & Tsagris, M. (2020). Are NBA Players’ Salaries in Accordance with Their Performance on Court? In Advances in Econometrics, Operational Research, Data Science and Actuarial Studies (pp. 405–428). https://doi.org/10.1007/978-3-030-85254-2_25
Sigler, K.J., & Sackley, W.H. (2000). NBA Players: Are They Paid for Performance? Managerial Finance, 26(7), 46–51. https://doi.org/10.1108/03074350010766783