NBA Player Rating Systems: Survey, Comparison, and Combination

2024–25 season testbed

Author

Justin Pietsch

Published

July 4, 2026

Abstract

Data: NBA.com via nba_api; Basketball-Reference; FiveThirtyEight (RAPTOR)

Draft

Among the ratings that never use who won, the one that best describes a finished NBA season is one of the worst at predicting the next. In a typical season (the middle of the 30 tested), PER, the oldest and simplest box score, rebuilds about 68% of the gaps between teams in a season just played, more than any other rating that never looks at team results, yet it forecasts the next at only about 10%. That split is one thread of a larger problem: every “top players” list uses a different method, and they disagree in ways that matter for players on losing teams, defensive specialists, and high-usage scorers.

This report surveys how NBA players get rated and puts three questions to the systems:

Do the systems agree?
What does each system uniquely capture?
How should they be combined into one rating?

A few things to settle up front:

Is there a single best system? No: the one that best sums up a finished season (PER) is among the weakest at forecasting the next, so “best” depends on the question you are asking.
Do the systems agree? At the very top, closely; below the top tier they diverge enough to change who you would pick.
Is a higher rank a proportionally bigger gap? No: value is top-heavy, so a 14-spot jump near the top of the list (in VORP, rank 1 to rank 15) is worth roughly 15.6x as much as the same-sized jump in the 50s.

The box-score systems move together but only loosely track the from-scratch RAPM, the one lineup-based impact metric computed here, agreeing with the box scores at about 0.40 on a 0-to-1 scale. Each system tilts toward a player type, and the all-in-one summaries (PER, Game Score, BPM, VORP) turn out largely redundant, nearly rebuilt from the others, while Win Shares per minute and the from-scratch lineup metric carry the most of their own. And two combined ratings, a plain consensus and a wins-predictive blend, agree almost exactly (0.98 on a 0-to-1 scale) on the best players while parting on role players whose production did not turn into team wins. One more thread runs through it: whether a player’s rating holds up when the games matter most, the Jalen Brunson question the report started from (Section 8).

The testbed is the 2025-26 season, built on the eight box-score systems that can be recomputed directly from public data: Game Score, PER, Win Shares, WS/48, BPM, Offensive BPM, Defensive BPM, and VORP. It also includes a lineup-based impact metric computed from scratch for this report, RAPM, in two forms: a bare single-season version, the noisier of the two, and a prior-informed, multi-year version (RAPM+prior) built the way the published metrics are. Two findings reach beyond the single season, across 30 seasons back to 1996-97 (29 season-to-season pairs for the forecast test): the describe-versus-forecast test above and each rating’s year-to-year stability. Because the cross-system comparison rests on one season while those panels span decades, treat the single-season orderings as a snapshot and the panel findings as the firmer, repeated pattern. The other impact metrics and the human rankings are surveyed as part of the landscape (Section 1) but are not recomputed or included in the comparison here.

1. The landscape of player rating

Box-score systems

The oldest and most available systems work entirely from the standard box score: points, rebounds, assists, steals, blocks, turnovers, and shot attempts. They can be computed for any season back to the 1980s.

Player Efficiency Rating (PER) was John Hollinger’s attempt to fold everything into one per-minute number normalized so the league average is always 15. A 20 PER is well above average; 30 or above is an all-time great season.

Win Shares (WS) takes a different angle. Rather than measuring efficiency per minute, it allocates the team’s actual wins back to individual players based on their offensive production and a defensive credit. The cumulative version (total Win Shares) grows with playing time; WS/48 normalizes it back to a per-minute rate.

Box Plus/Minus (BPM) tries to estimate what a player’s presence adds per 100 possessions compared to an average player, derived from box-score rates and adjusted for team quality. It splits into Offensive BPM and Defensive BPM. VORP extends it into a cumulative value by multiplying by playing time and comparing to a replacement-level player rather than an average one. BPM exists in two versions; this report uses the current one, BPM 2.0 (see Appendix B).

Game Score (also Hollinger) is a simpler per-game summary that weights each box-score category by its approximate value. It is not normalized.

Impact metrics

Box-score systems share a blind spot: they do not directly measure whether a player made their team better or worse. A prolific scorer who forces bad shots and plays poor defense can rate well on PER and WS. Lineup-based “impact” metrics try to fix this by measuring how the team’s scoring margin changes when a player is on the floor versus off it, with various forms of smoothing to handle the noise.

All of the major impact metrics share the same technical backbone: Regularized Adjusted Plus/Minus (RAPM). The basic idea is to ignore individual stats entirely and look only at the scoreboard. Every time a lineup is on the floor, track whether the team outscores or gets outscored. Do that for every lineup combination across an entire season, then use a statistical model to work backward and assign each player their share of credit or blame, adjusting for the quality of teammates and opponents. The result is a per-possession estimate of how much the scoring margin changes when a player is on the floor versus an average player.

RAPM is considered the best available signal for true player impact for three reasons. First, it measures the right thing: winning is about outscoring the opponent, and RAPM captures everything that contributes to that (off-ball movement, defensive positioning, communication), not just what shows up in the box score. A player who creates open shots for teammates without recording the assist, or whose defensive presence deters drives two passes away from the ball, shows up in RAPM but not in PER or Win Shares. Second, it passes predictive tests better than box-score metrics: in studies comparing how well first-half stats predict second-half game outcomes, RAPM-based metrics consistently outperform box-score-only approaches. Third, it does not assume what matters. Box-score metrics assign fixed weights to assists, steals, rebounds, and so on based on historical averages. RAPM does not; it lets actual game outcomes determine each player’s value.

The catch is noise. One season of lineup data is not enough to reliably separate a player’s true impact from the randomness of which lineups they happened to share court time with. Raw RAPM estimates for bench players or players in unusual lineup situations can swing wildly. Every serious system handles this by adding a “prior”: a box-score estimate of what a player should be worth, used to pull noisy RAPM estimates toward something stable. The systems mostly differ in how that prior is built and what data feeds it.

RAPTOR (FiveThirtyEight, 2013-14 through 2022-23) combined on/off lineup data with player-tracking data (speed, distance, shot quality). FiveThirtyEight shut down its sports coverage in April 2023, so RAPTOR has no 2025-26 values. Historical data is downloadable from their GitHub.

EPM (dunksandthrees) uses a RAPM calculation with a prior built from a highly optimized Statistical Plus/Minus model that incorporates player-tracking data. EPM is the only public metric that directly optimizes the weighting of each underlying stat by how quickly it stabilizes, which is one reason it tends to perform well in retrodiction tests (checking each rating against games it has not yet seen).

LEBRON (BBall-Index) also uses a luck-adjusted RAPM with a box-score prior. “Luck adjustment” means the on/off data strips out the swing from whether shots happened to fall, rather than crediting the player for it. The prior’s starting values come from PIPM (Player Impact Plus/Minus), an earlier metric by Jacob Goldstein that is no longer updated.

DARKO DPM is best thought of as a projection system rather than a season average. Like baseball projection systems (PECOTA, Steamer), DARKO weights recent games more heavily and updates daily, so it answers “what is this player’s current true talent?” more than “how did this player perform this season?” That distinction matters when comparing across systems.

DRIP (Daily Updated Rating of Individual Performance, Opta Analyst) is similar to DARKO in structure: it now-casts current player talent by weighting recent performance more heavily, rather than averaging over the full season.

ESPN RPM used RAPM with a box-score prior from Jeremias Engelmann, who also created the foundational RAPM dataset that most other systems calibrate against. ESPN stopped publishing RPM publicly around 2023.

The core limitation across all these systems is sample size: a player’s on/off data in one season is much noisier than their box-score totals. The prior’s smoothing pulls noisy estimates toward the box-score baseline, which means the baseline’s assumptions matter a great deal, especially for players with limited minutes.

The prior-plus-RAPM structure is a way of weighing a noisy measurement against a steadier starting estimate. The box-score prior is a starting belief about a player’s value before seeing the lineup data. The on/off data is the evidence that updates that belief. The final estimate blends the two, with the weight on the data side growing as the sample of lineup possessions grows. A player with 3,000 possessions of lineup data gets pulled strongly toward what the data shows; a bench player with 400 possessions barely moves from the prior. This is why the best-performing metrics in retrodiction tests outperform simpler ones: the improvement comes not from more variables in the box-score formula, but from more carefully balancing how much to trust the prior versus the data.

The one thing this framework does not yet deliver is uncertainty. Every metric publishes a single number (“EPM: +8.2 per 100”), with no range around it. Carrying the uncertainty through would produce something like “+8.2, likely between +6.8 and +9.6, with the range reflecting how much one season of lineup data can vary by chance.” That range would make visible what is currently hidden: the apparent disagreements between systems about who ranks 8th versus 12th are often smaller than the uncertainty band of any single metric. The rankings are noisier than they look.

Which metrics do practitioners trust? A HoopsHype survey of 29 NBA front-office executives (2021) found DARKO DPM was the most preferred catch-all metric (8 respondents), followed by EPM and LEBRON. A retrodiction study by Dunks & Threes, comparing how well each metric predicts future game outcomes, put EPM first, followed by RPM, RAPTOR, and BPM 2.0 in that order. EPM and RPM were the only two metrics using RAPM directly with a box-score prior at the time of that study, which appears to be the structural feature behind their edge over box-score-only approaches. (Both the survey and the retrodiction study are external published work, not recomputed here; full citations are in the resources bibliography.)

Human rankings

MVP vote share, All-NBA selections, and media top-100 lists measure something neither box nor impact models capture directly: reputation and narrative. A player with a great story can receive more MVP votes than their on-court numbers warrant; a player on a losing team can be underrated.

Human rankings sit outside this report’s current testbed: the pipeline was narrowed to systems that can be recomputed directly from box-score data. They are described here because they are part of how players actually get rated, and adding a reputation dataset to compare model-based and reputation-based consensus is a natural extension.

2. Do the systems agree?

Figure 1: Agreement between each pair of systems: darker squares mean the two systems rank players more alike.

The box-score systems mostly move in the same direction, but how tightly varies a lot by pair. PER and Game Score track closely (0.86 on a 0-to-1 scale, among qualified 2025-26 players). PER and Win Shares agree fairly well too (0.76), as do Win Shares and WS/48 (0.78). Even across families some pairs stay close: PER and BPM track at 0.90, as tight as, or tighter than, the box-score pairs above. The loosest pairing shown here is Game Score and BPM, at 0.74. The tightest pair is BPM and VORP (0.96), which is no surprise, since VORP is built directly from BPM.

Even the pairs that agree overall still hand out value differently player by player. Win Shares, which divides a team’s actual wins among its players, favors efficient bigs on good teams: Amen Thompson and Donovan Clingan both rate well above their consensus rank in it. PER, a per-minute efficiency score, leans toward high-usage scorers, with Giannis Antetokounmpo and Lauri Markkanen among its biggest risers. Two systems can land near each other in the overall order while still disagreeing about which individual players to credit.

RAPM, the one impact metric computed here, sits apart from all of them. It is built only from which lineups outscored their opponents, with no box-score inputs at all, so it shares little with the rest: its rank order agrees with the box-score systems at about 0.40 on the same 0-to-1 scale, well below how tightly those systems track each other. That low agreement is now genuine independent signal, the thing an impact metric exists to catch, not noise. It used to be noise, for a dull reason: a bug in the step that turns play-by-play into lineups was quietly throwing out most of the season’s games, one mistracked substitution enough to void an entire game, and the sliver of RAPM that survived was close to random. With that reconstruction fixed and most of the games restored, RAPM steadied. Fit it on two random halves of the possessions and the halves now agree at about 0.49 on a 0-to-1 scale (pooling three seasons of lineups), up from about 0.10 while the bug was live, and a single season of bare RAPM now carries from one year to the next at about 0.38, where before it barely held at all.

The bare version is still the noisier of the two. Its 2025-26 leader is Kawhi Leonard, a genuine star rather than a low-minute reserve, but bare RAPM puts him first while the box-score consensus has him down at number 6. That is why the prior-informed RAPM+prior, not the bare version, feeds the consensus, the same move every published impact metric (EPM, LEBRON, DARKO) makes when it anchors its RAPM to a box-score prior.

RAPM+prior pools three seasons of lineup data and anchors each player to a box-score prior (their BPM), the same recipe the published metrics use. It now tops out at the MVP tier: Nikola Jokić, Shai Gilgeous-Alexander, and Giannis Antetokounmpo lead it. Its order tracks the consensus at about 0.93, up from 0.49 for the bare version (agreement with the consensus rating, a separate measure from the box-score agreement above), and it holds from one season to the next about as firmly as BPM (0.85 against 0.79 on the 0-to-1 scale). A rating that repeats as reliably as BPM but is built from lineup results rather than the box score is adding real impact signal on top of it, not just restating it. How it is built is covered in Section 12 (Limitations); whether it forecasts team results any better is tested in Section 3.

The exact figures are in docs/player_rating_overview_results.md.

3. What the field has learned about evaluating these metrics

Which of these systems should you trust? The field has two main tests for that, and this report ran both: the answer splits, because describing a finished season and forecasting the next turn out to be different jobs.

Retrodiction uses the first half of a season’s games to predict game outcomes in the second half. A metric that genuinely captures player impact should let you predict which team wins when you know each team’s lineup. This is the test used in the Dunks & Threes comparison study, which found EPM and RPM at the top, followed by RAPTOR and BPM 2.0. The top metrics shared a structural feature the others lacked: both used RAPM with a box-score prior, while lower-ranked metrics either skipped the prior or used only box-score data without the RAPM step.

Team wins prediction aggregates player ratings to the team level and asks how well the total predicts actual wins. This is the logic behind the wins-predictive rating built in this report.

Both tests reward the same thing: a RAPM backbone stabilized by a well-calibrated box-score prior. That combination handles the two main failure modes: pure box-score metrics miss what a player does off the ball, and pure RAPM is too noisy with one season of data.

One important result from the academic literature runs against the intuition that more sophisticated is better: peer-reviewed research has found that complex metrics do not reliably outperform simpler ones when used to predict salaries or wins. That is not a flaw in the metrics themselves. They are built to estimate player impact per possession, not to serve as inputs in every downstream analysis. The lesson is that a metric’s predictive accuracy in a retrodiction test is not the same as its usefulness for a specific application, and the right tool depends on the question being asked.

A direct test on the systems here

A related test grades the systems one at a time rather than combining them. Add up a single system’s player ratings across a team’s roster, weighted by minutes played, and check how closely that total matches which teams actually outscored their opponents over 2025-26. Each system is graded on teams held out of the calculation, so it cannot score well just by fitting this exact season.

Read it the honest way, by separating the metrics built from team results from those that never saw them. BPM is anchored to each team’s point differential by design, so adding it back up to the team rebuilds that differential almost perfectly (BPM reaches essentially 100%), and RAPM+prior, RAPM, and VORP are close behind it (96%, 94%, and 94% respectively), since all four are built from, or anchored to, the same lineup and team-margin data. That is mechanical, not a mark of quality, so the real question is which rating that never used who won comes closest. There a useful surprise holds up: PER, the oldest and simplest per-minute box score, rebuilds about 73% of the differences between teams, ahead of every other outcome-blind rating. One caution before reading that as a verdict: rebuilding the season just played is only half the test.

Figure 2: How well each system, summed to the team, rebuilds 2025-26 team point differential. Blue systems never use who won; grey systems are built from team or lineup results, where a high score is partly mechanical.

The other half is forecasting. Take each player’s rating from the prior season (2024-25), spread it across this season’s rosters, and see how well that predicts which teams outscored their opponents in 2025-26. This is a stricter test: a metric’s team adjustment is tuned to its own season, so it earns nothing for predicting a season it has not seen. The order flips. PER, the best description of the season, becomes one of the weakest forecasts of the next, falling from about 73% to roughly 22%. In this one pair the lineup metric holds up best, with RAPM+prior on top at about 58%, ahead of every box score. (About 88% of this season’s minutes came from players who also rated the year before; rookies have no prior mark.)

The lesson is the one every projection system runs into: describing what happened and forecasting what comes next are different jobs. PER is a faithful scoreboard of a finished season but a poor crystal ball. Which rating is “better” depends on the question.

Figure 3: Each system’s same-season fit (grey) against how well its prior-season version forecasts this season (blue). PER describes best but forecasts among the worst; the plus/minus metrics hold more of their predictive signal.

One pair of seasons could be a fluke, so we ran the same two tests on every season back to 1996-97: 30 seasons for the describe test and 29 year-to-year handoffs for the forecast test. The shape repeats. Among the ratings that never use who won, PER describes best every era, rebuilding about 68% of the gaps between teams in a typical season, and forecasts near the bottom: about 10% in a typical handoff, and in a couple of handoffs its forecast collapses entirely. The team-anchored box metrics, BPM and VORP, sit above it on the describe test, but mechanically: they are built to reproduce team point differential, so rebuilding it is no test of them. The best forecaster among these box scores across all those handoffs is VORP (which, unlike PER and Game Score, is anchored to team results), at about 45%, a box-built rating that carries more of its signal from one season to the next than PER does. The single 2025-26 handoff agrees: VORP is the best-forecasting box score there too, and across the full panel BPM still beats PER as a forecaster in 28 of the 29 handoffs. What holds across every era is the gap itself: PER, the best plain description of a finished season, is consistently among the worst bets on the next one.

Figure 4: Average same-season fit (grey) against next-season forecast (blue) for each box-score system, pooled over 30 seasons and 29 season-pairs. Whiskers span the season-to-season range. BPM and VORP describe near-perfectly because they are anchored to team results; among the ratings that never use who won, PER’s describe-forecast gap is the widest and never closes.

The impact panel spans the 29 seasons from 1997-98 where both RAPM versions can be computed (play-by-play reaches 1996-97, and RAPM+prior needs the season before it), so a fair test scores RAPM against the box scores on those seasons rather than against their full 30-year history. On that even footing the bare single-season RAPM is a middling forecaster of next-season team results (ranked 6 of the 10 systems tested, rebuilding about 32% of the gaps between teams): no longer the near-random metric the reconstruction bug produced, but still the weaker of the two RAPM versions. Its same-season describe score is much higher, about 92%, which fits how it is built: RAPM comes from the very lineup margins the describe test rebuilds.

RAPM+prior is the standout on this panel: it forecasts next-season team point differential better than any box score (ranked 1 of the 10, at about 47%). Pooling three seasons and leaning on a box-score prior does more than sharpen which players rate highly (Section 12); on the seasons where the lineup data exists, it edges past the box-built BPM and VORP that top the wider panel. That is the payoff of the fix: with the games restored and the prior in place, the lineup signal now forecasts team results better than the box score does, not worse.

Figure 5: The same describe-versus-forecast test on the 29 seasons where RAPM can be computed, box scores and both RAPM versions scored together. Sorted by forecast strength: the prior-informed RAPM+prior forecasts next-season team results better than any box score, while the bare single-season RAPM sits mid-pack.

Which ratings hold steady year to year

There is a third way to judge a rating: how much a player’s own number carries from one season to the next. Match the players who logged real minutes in back-to-back seasons, and ask how many of the top 20 by each system one year are still in the top 20 the next. The simplest box scores are the stickiest. Game Score keeps 68% of its top 20 from one year to the next, and PER keeps 64%, against a chance level near 5%. On a 0-to-1 stickiness scale, where 1 would mean a player’s rating repeats exactly, Game Score is the steadiest at 0.85 and DBPM the jumpiest at 0.67, pooled over 29 season-to-season handoffs.

Stickiness is not the same as quality. A rating can repeat year to year because it captures a real, lasting skill or because it is slow to notice a player changing. What stands out is how this lens cuts against the forecasting one. PER is among the stickiest individual ratings yet one of the weakest forecasters of team results; the plus/minus metrics are jumpier yet forecast team point differential better than PER does. The plain reading is that the box-score rate metrics measure a stable individual trait, the scoring and production a player carries with them, while the plus/minus metrics move around more because they fold in team and role, the very context that helps them forecast a team. Only Game Score sits near the top of both lists: steadiest year to year (0.85) and still one of the better forecasters, at about 33% of next season’s team gaps rebuilt.

Figure 6: Left: how much a player’s rating carries from one season to the next, on a 0-to-1 scale. Right: the share of each system’s top 20 who are still top 20 the next season, with the chance level marked. Pooled over 29 season-pairs among players who qualified in both.

4. What each system uniquely sees

Not all systems add independent information, and they split widely on how much. The all-in-one summaries are the most redundant: about 95% of PER, 89% of Game Score, and roughly 94% of BPM and 89% of VORP can be rebuilt from the other systems, so each mostly repeats what the field already says. The metrics that carry the most of their own are Win Shares per minute (only about 66% of it can be rebuilt from the rest) and the lineup-impact metrics: bare RAPM sits near that independent end too, about 68% rebuildable, and its offensive half alone, O-RAPM, is the single most independent metric in the whole comparison, at about 63% (it is a bare-RAPM split, not part of the consensus). That the raw impact metric now lands among the most independent is the payoff of the reconstruction fix (Section 2): with RAPM carrying real lineup signal instead of noise, it catches what the box scores miss, which is the whole point of an impact metric. Defensive BPM, which used to look the most independent here, has slid toward the middle (about 82% rebuildable), consistent with the fixed RAPM now accounting for much of the defensive impact it was previously alone in catching. BPM and VORP are validated against Basketball-Reference (a value agreement of 0.93 on a 0-to-1 scale), yet they sit with the redundant systems, not the independent ones: a validated rating can still be redundant if it says what the others already do. The measure holds each metric’s own components out of the comparison, so BPM is never counted as “explained” by its own offensive and defensive halves (see results for the full breakdown).

The “system outliers” chart shows the players each system rates most above and below the consensus. This is where methodological differences become visible: a system that captures defensive value heavily will love rim protectors; a system that penalizes inefficient volume scoring will discount high-usage players with middling true shooting.

Figure 7: Players each system rates most above or below the consensus.

Five players make the divergence concrete. The fastest way to feel what each system catches is to watch where a handful of players land.

Start with the one everyone agrees on. Nikola Jokić ranks first in nearly every system at once: his worst finish across the five box scores is 1, and even the impact metrics put him within a hair of the top. That is the rare player box scores, impact metrics, and human voters all point at together, and he is the ceiling against which every disagreement below gets measured.

Now the defense-first star. Victor Wembanyama grades out at an all-around BPM of +8.9, and most of it is defense: +3.9 of that number comes from his defensive rating alone, the kind of value a pure scoring metric never records. The lineup metric agrees and then some: RAPM ranks his net on-court impact number 2, above his box-score offense at number 9, catching two-way value the box only partly sees.

Now the high-volume scorer the systems split on. Counting stats love Pascal Siakam, with Player Efficiency Rating placing him 45th. The impact-aware metrics are cooler: BPM has him down at 126th, because heavy scoring counts for less once efficiency and team results enter the picture. The lineup data breaks the tie toward the counting stats, though: RAPM puts him back up at 80th, so what his lineups do on the floor backs his volume more than his efficiency alone would.

Then Jalen Brunson, the player the playoffs make interesting (Section 8). His offense rates with the best, an offensive BPM of +4.3 that ranks 12th. His defense pulls the other way, a defensive BPM of -0.9, so his all-around BPM of +3.4 sits at 29th. The same player is a top-fifteen offensive player and a top-thirty contributor overall, and the gap between those two is exactly what his defense costs him. The impact metrics, now that they are fixed and trustworthy, tell the same story rather than a kinder one: the corrected RAPM puts his net on-court impact at merely average (169th), and RAPM+prior at 57th, so two independent methods, box and lineup, agree that his scoring outruns what his team actually does with him on the floor. That is a sturdier verdict than either method alone: the earlier, broken RAPM had buried him near the bottom, which was noise, and the corrected read lands him where the box score already suggested, good but not a hidden star.

And one player who answers a different question, OG Anunoby: he looks ordinary here but is the biggest riser once the playoffs start, which is exactly where Section 8 picks up.

5. The two uber ratings

Combine the systems two ways, a plain average or a weighting by how well each predicts team wins, and the two rankings land almost on top of each other at the top (0.98 on a 0-to-1 scale).

Each system uses a different scale (PER is centered around 15, BPM around 0, Win Shares in the single digits), so you cannot average them directly. Putting every system on one common scale fixes that, by asking of each player: how many typical-player gaps above or below average is this? A 0 means exactly average; +2 is well into star territory; −1 is a step below average. Once every system is on this common scale, they can be combined.

Consensus rating: the average normalized score across all systems. This measures what the crowd of methodologies agrees on, not what is best-supported by any one theory. The 2025-26 consensus top five: Nikola Jokić, Shai Gilgeous-Alexander, Victor Wembanyama, Giannis Antetokounmpo, and Luka Dončić.

Wins-predictive rating: a combination of those scores weighted by how well each system (aggregated to team level) predicts actual team wins. The two reach the same conclusions about the very best players; where they part is further down the list. The wins-predictive rating pushes the stars on winning teams higher still. Giannis Antetokounmpo rises the most, with Kawhi Leonard close behind, and Shai Gilgeous-Alexander, Victor Wembanyama, and Luka Dončić also move up. Players on losing teams slide the other way: Jericho Sims and other deep-rotation players on poor teams rate lower on the wins-predictive scale than on the consensus, because their on-court production did not translate into team wins.

Aggregating across systems is not unique to this report. HoopsHype publishes periodic “Analytics MVP” posts that combine EPM, LEBRON, DARKO, RAPTOR, and BPM into a single ranking, typically using a simple equal-weighted average. ESPN’s #NBArank is a different kind of aggregate: journalists vote rather than models, making it closer to the human-reputation category than to a model combination. Some metrics do the aggregation internally: LEBRON, for example, blends a box-score prior with luck-adjusted on/off data as part of its own formula rather than publishing both separately and combining them downstream. What distinguishes the wins-predictive rating here is that how much each system counts is estimated from data (how well each system actually predicted team wins) rather than assigned by hand. The practical difference is small (the two ratings agree at 0.98 on a 0-to-1 scale), but it tells you which systems carried the most predictive signal for 2025-26.

The 2025-26 season shows how this plays out. Nikola Jokić tops both ratings and pulls further ahead under the wins-predictive weighting: the consensus puts him at 3.99, the wins-predictive rating at 4.52, because Denver was one of the league’s stronger teams and that rating rewards strong production that lines up with team success. The single biggest gainer, though, is Giannis Antetokounmpo, who moves up +0.94 versus his consensus rating. The deep-rotation players on losing teams move the opposite way: solid rate metrics, but no team wins behind them.

6. Ranking throws away how much better someone is

Figure 9: A rank-1-to-15 jump in VORP costs far more value than a same-sized jump at rank 50-65, even though a ranked list shows both as 14 spots.

A ranked list moves in even steps: #1, #2, #3, each one spot from the next. The value behind it does not. In VORP, the gap between Nikola Jokić at rank 1 and Jalen Brunson at rank 15 is 4.18 points of value. The same 14-spot jump between Ausar Thompson at rank 50 and Ayo Dosunmu at rank 65 costs only 0.27: the top jump is worth roughly 15.6x as much as the bottom one. A rank list can’t see that difference; it treats both jumps as identical.

And the smaller the gap, the less of it is even real. Down in the crowded middle the steps between players are so small they sit inside the normal season-to-season bounce of any single system, so the exact order of two mid-tier players is more noise than signal. The information lives at the top, where the gaps are wide enough to mean something; the middle is a pile-up a ranked list only pretends to sort.

Figure 10: In BPM, built from the box score, the top few players stand clearly apart, but through the crowded middle 153 players sit within one season’s normal swing of each other, so their order reshuffles from year to year. One season of RAPM, read off lineup results, is noisier still: even near the top 36 players fall within a single swing, where BPM has just 3.

This is not a quirk of VORP. Every system checked shows the same shape, just to different degrees: the smallest ratio, in WS/48, is still 3.6x; the largest, in D-RAPM, reaches 19.9x. What changes system to system is how top-heavy the underlying value is, and that is the question the rest of this section answers.

Figure 11: Rating systems normalized to the same scale: value falls steeply in the top tier for every methodology, but at very different rates.

PER spreads relatively evenly among qualified 2025-26 players: the top 5% account for only 8.6% of total value. The gap between a middle-of-the-pack player and one in the top 5% in PER is smaller than it looks on a ranked list, which is why its rank-1-to-15-vs-50-to-65 ratio (11.6x) sits below the cumulative metrics.

Win Shares and VORP tell a different story. The top 5% of players hold 13.5% of total Win Shares, and VORP concentrates far more steeply still: its top 5% hold 24.5% of all positive value. Both lean toward the top because they multiply a rate by minutes played, and the best players lead in both, which is exactly why the rank gap at the top of a cumulative metric dwarfs the gap further down.

Statisticians call a curve like this heavy-tailed: a few values at the top run far higher than the rest, instead of tapering off evenly. The steepest, cleanest version of a heavy tail is a power law, a curve where value falls by a roughly constant percentage with each step down the ranks, so proportionally the drop from rank 1 to rank 2 matches the drop from rank 10 to rank 20. The test is simple: stretch both axes onto a log scale, and a power law turns into a straight line.

By that test several systems, including Win Shares and VORP, hold close to a straight line (Game Score, PER, Win Shares, BPM, OBPM, DBPM, VORP, O-RAPM, D-RAPM, RAPM+prior, O-RAPM+prior, D-RAPM+prior); a few bend instead (WS/48, RAPM). The cutoff is a convention, not a hard boundary, and systems sitting right at it (PER clears the line, Win Shares per-48 just misses) are really the same shape. The steadier read is the exponent itself: PER’s is shallowest at 0.14, VORP’s is steeper at 0.37, and the two combined ratings run steeper still (Consensus 0.38, Wins-Predictive 0.43). A bigger exponent means a heavier tail, which means a bigger gap at the top of that system’s rank list.

What bends are the per-possession rate metrics. They score how far above average a player is per possession against the lineup he shares the floor with, a quantity that is roughly even on both sides of the middle and has a natural size to it, so their best player is not a runaway.

The impact metrics are the subtler case: several of them clear the straight-line test above, but that test reads only the top 50 players, and their full distributions say otherwise. RAPM, the impact metric built for this report, is the clearest example, and its whole distribution settles it: it is not heavy-tailed at all. Pooled across all 29 seasons with play-by-play (9820 player-seasons, enough to read the shape cleanly), it is a symmetric hump centered on zero, about as many players below average as above (47% sit below zero), and it tracks a plain bell curve closely. A heavy tail needs a long one-sided run of standout values; RAPM has none, because a per-possession impact is scored against the average player and runs about as far into the minus as the plus. That is exactly why RAPM’s rank-1-15-vs-50-65 ratio (7.3x) sits well below Win Shares, VORP, and BPM: with no runaway top, a same-sized rank jump costs closer to the same wherever it falls on the list. Only WS/48, a per-minute rate bounded to a narrow range, runs flatter still, for a different reason: not RAPM’s symmetry, but simply not much room at the top to begin with. VORP, set beside RAPM, leans to the right: a handful of stars trail a long tail above the pack, the shape a heavy tail needs.

Figure 12: RAPM’s full distribution against VORP’s. RAPM is a symmetric bell with no heavy tail; VORP leans right, with the one-sided tail of stars a heavy-tailed metric needs.

So the shape tells you what the metric measures, and that shape is what the rank list hides. A metric that piles up accumulated value tilts toward a few players at the top, so its rank list understates the top and overstates the middle; a metric that scores distance from average has a built-in size and does not run away, so its rank list is closer to the truth. The small panels below make the difference visible: the blue curves hold a straight line, the grey ones bow.

Figure 13: One small panel per system: blue curves are power laws (a straight line on a log scale), grey curves bend. Ordered by how steeply value falls, steepest first.

One caution: this is a description of 50 players in one season, not a formal test. It shows which curves are straight and which bend, not a proven law, so the reliable read is the grouping and the order of the exponent across systems, not the label on any single borderline system.

Figure 14: Every system on one log-log chart, with each fitted power law drawn through it. Useful for comparing the slopes directly.

An older, more familiar way to put a single number on concentration is the Gini coefficient (0 means everyone is rated the same, 1 means one player holds all the value). It is kept here only as a cross-check, because it has a real limit: it works for metrics that pile up a quantity that cannot drop below zero, like Win Shares or VORP, but not for the 0-centered metrics. On those (the BPM family and the two combined ratings) it counts every below-average player as a zero and inflates the score, which is why Consensus shows a Gini of 0.753, above Win Shares at 0.353, an ordering that is not real. The steepness read above is the one to trust.

Figure 15: Gini coefficient by system, kept as a cross-check. It ranks the 0-centered metrics (the BPM family and the two combined ratings, outlined) near the top, but that is an artifact of how Gini handles below-average players, not real top-heaviness.

Figure 16: Each line shows a system’s value as a percentage of its rank-1 player. Win Shares and VORP fall steeply; PER and BPM stay much flatter.

Put together, this is the case for elite talent being worth more than rank implies: having the best player on the roster matters more to winning than being one step ahead of the second-best, and a ranked list treats every step between players as equal when the value behind it never is.

7. What each system rewards, and who tops it

Every rating tilts toward a player type. The fastest way to see the tilt is to read who each system rates furthest above the field: one row per system, the kind of player it rewards, and the two players it lifts highest above the consensus.

System	What it rewards	Rates highest above the field
Game Score	raw scoring volume	Lauri Markkanen, Keyonte George
PER	high-usage scoring efficiency	Giannis Antetokounmpo, Joel Embiid
WS/48	per-minute efficiency, low-usage bigs	Steven Adams, Mitchell Robinson
Win Shares	efficient bigs on winning teams	Amen Thompson, Donovan Clingan
BPM	all-around per-100 impact	Giannis Antetokounmpo, Paul Reed
Off BPM	shot creation and shooting	Giannis Antetokounmpo, Stephen Curry
Def BPM	perimeter defense	Alex Caruso, Cason Wallace
VORP	cumulative value, heavy-minute stars	Shai Gilgeous-Alexander, Nikola Jokić

That tilt is why the same player can top one list and sit mid-pack in another. The table below lets you see it player by player: every player who cracks any system’s top 20, with their score in each rating. Sort it by a column and that system’s own leaders rise to the top.

That last move is a smell test, not just a sort. When a rating’s top names don’t pass your gut check, something is often wrong under the hood: the from-scratch RAPM here is the case study, since an early version filled its top with low-minute players who plainly did not belong, which is how its reconstruction bug first surfaced (Section 2). Sort by RAPM now and the names read like a list of stars, the sign the fix took.

Player	Game Score	PER	WS/48	Win Shares	BPM	VORP	RAPM	RAPM+prior	Consensus	Wins-Pred
Nikola Jokić	28.7	32.2	0.332	15.7	11.4	7.7	6.9	10.9	3.99	4.52
Shai Gilgeous-Alexander	26.4	31.0	0.226	10.6	10.4	7.1	8.6	10.5	3.34	4.24
Victor Wembanyama	22.4	30.1	0.202	7.8	8.9	5.1	9.8	9.5	2.82	3.67
Giannis Antetokounmpo	23.8	31.1	0.278	6.0	10.0	3.2	4.9	10.3	2.70	3.64
Luka Dončić	26.4	28.7	0.172	8.2	8.0	5.8	3.9	8.1	2.58	3.33
Kawhi Leonard	22.5	27.9	0.157	6.8	7.7	5.1	11.0	8.9	2.36	3.28
Jalen Duren	18.3	26.6	0.278	11.5	5.9	3.9	5.2	4.9	2.22	2.39
Cade Cunningham	20.0	22.6	0.206	9.3	5.3	4.0	5.7	3.8	1.84	2.11
Paul Reed	8.1	23.3	0.288	5.4	7.0	2.0	2.5	6.4	1.78	2.12
Alperen Sengun	17.7	22.2	0.232	11.6	3.9	3.6	-1.1	3.0	1.72	1.57
Donovan Mitchell	20.9	24.2	0.140	6.8	5.2	4.3	5.5	6.0	1.72	2.31
Karl-Anthony Towns	17.2	23.3	0.216	10.5	3.9	3.5	0.5	3.2	1.64	1.59
Donovan Clingan	13.7	20.8	0.296	12.9	2.9	2.6	0.9	2.9	1.59	1.23
Tyrese Maxey	22.1	22.5	0.134	7.4	4.4	4.3	1.9	3.9	1.58	1.99
LaMelo Ball	15.0	20.8	0.185	7.8	5.3	3.7	2.1	4.4	1.57	1.79
Neemias Queta	11.5	20.7	0.257	10.3	3.3	2.6	6.9	5.0	1.52	1.51
Mitchell Robinson	9.0	20.5	0.351	8.6	3.6	1.7	2.3	3.6	1.48	1.20
Jamal Murray	20.6	22.5	0.146	8.1	4.3	4.2	1.9	3.1	1.48	1.80
Chet Holmgren	15.5	21.8	0.170	7.1	4.1	3.1	9.7	5.6	1.48	1.96
Jalen Johnson	19.9	21.4	0.179	9.4	3.4	3.5	2.1	2.9	1.48	1.59
Amen Thompson	17.3	19.2	0.214	13.1	2.2	3.1	6.4	3.2	1.46	1.30
Jimmy Butler III	19.1	22.4	0.245	6.0	4.0	1.8	7.0	5.5	1.44	1.81
Scottie Barnes	16.7	20.4	0.184	10.3	2.8	3.3	1.9	2.4	1.39	1.33
Kevin Durant	19.7	21.7	0.134	7.9	3.5	4.0	1.1	3.8	1.39	1.69
Jaylen Brown	19.7	23.4	0.130	6.6	4.0	3.7	-3.0	1.7	1.34	1.56
Robert Williams III	8.9	20.7	0.277	5.8	4.0	1.5	4.4	4.5	1.29	1.40
Mark Williams	12.1	21.6	0.257	7.6	3.6	2.0	1.5	1.8	1.25	1.26
Jalen Brunson	18.9	21.6	0.144	7.8	3.4	3.6	0.9	2.5	1.23	1.49
Isaiah Hartenstein	11.4	19.3	0.287	6.8	3.2	1.5	2.0	3.6	1.23	1.13
Stephen Curry	19.3	22.5	0.124	3.4	5.0	2.4	2.7	5.3	1.22	1.84
Joel Embiid	20.6	24.0	0.168	4.2	4.0	1.8	5.7	4.2	1.21	1.76
Day’Ron Sharpe	9.2	20.6	0.296	7.2	3.0	1.5	1.2	4.2	1.21	1.13
Jarrett Allen	15.0	21.9	0.202	6.4	3.2	2.0	4.0	4.2	1.20	1.44
Anthony Edwards	20.7	22.5	0.098	4.3	4.0	3.3	-1.7	3.2	1.19	1.60
James Harden	19.2	20.6	0.162	8.2	2.6	2.9	-1.9	1.6	1.12	1.12
Zion Williamson	17.3	23.0	0.175	6.7	2.7	2.2	2.6	3.3	1.08	1.37
Moussa Diabaté	10.3	17.3	0.273	10.8	1.1	1.5	6.9	3.3	1.05	0.71
Goga Bitadze	7.2	19.7	0.285	5.8	3.3	1.3	0.4	3.1	1.04	1.01
Luka Garza	7.4	20.1	0.275	6.4	3.6	1.6	0.6	2.7	1.02	0.99
Stephon Castle	14.1	18.6	0.197	8.4	2.2	2.2	2.9	2.3	1.01	1.00
Ausar Thompson	10.4	16.9	0.187	7.4	2.2	2.0	7.8	3.4	1.00	1.01
Luke Kornet	8.9	17.6	0.270	8.0	2.3	1.6	0.9	3.5	1.00	0.85
T.J. McConnell	9.4	19.5	0.214	4.3	4.6	1.6	0.1	3.1	0.97	1.26
Dyson Daniels	13.3	16.1	0.200	10.5	1.2	2.0	6.3	2.2	0.96	0.73
Rudy Gobert	13.1	17.8	0.224	11.1	0.2	1.3	6.4	3.0	0.92	0.74
Julius Randle	16.4	19.7	0.153	8.3	1.7	2.5	2.1	1.4	0.91	0.92
Payton Pritchard	13.8	17.5	0.164	8.7	1.9	2.6	2.3	2.0	0.87	0.88
Derrick White	13.8	16.3	0.150	8.2	1.2	2.2	6.2	3.2	0.82	0.90
OG Anunoby	13.2	16.6	0.112	5.2	1.6	2.1	6.9	3.4	0.66	0.94
Jrue Holiday	13.2	16.4	0.176	5.7	1.6	1.4	6.1	2.3	0.62	0.86
Clint Capela	4.9	16.8	0.274	5.3	1.2	0.8	4.5	1.5	0.59	0.42
Lauri Markkanen	20.2	21.2	0.110	3.3	1.3	1.2	6.2	3.5	0.59	1.19
Nic Claxton	12.0	18.2	0.218	8.7	0.3	1.1	-2.6	-0.0	0.58	0.25
Andre Drummond	7.5	16.2	0.248	6.4	-0.2	0.6	1.8	-0.6	0.33	-0.00
RJ Barrett	14.0	18.2	0.108	3.9	0.8	1.2	6.0	1.1	0.32	0.57
Steven Adams	8.0	14.5	0.304	4.6	-1.3	0.1	3.4	0.5	0.22	-0.19
Dylan Cardwell	7.3	16.4	0.263	5.0	-0.9	0.3	1.3	-0.2	0.21	-0.12
Hugo González	3.4	10.8	0.116	2.6	-1.5	0.1	6.5	0.2	-0.40	-0.42
Josh Green	3.7	9.5	0.116	2.2	-1.8	0.1	7.3	-0.8	-0.57	-0.59
Marcus Smart	7.3	10.8	0.073	2.7	-3.0	-0.5	8.2	-0.4	-0.67	-0.66

8. Who rose and fell in the playoffs

Recompute the box-score rate metrics on playoff games only, and a direct before-and-after read falls out: who climbed once the postseason started, and who sank. The biggest risers were OG Anunoby, Cason Wallace, and Jayson Tatum. The biggest fallers were Jalen Duren, Nikola Jokić, and Nickeil Alexander-Walker. The drop reached even the regular-season consensus number one, Nikola Jokić, whose box-score rates slipped as Denver lost in the first round.

This read works only for the box-score systems, which we recompute ourselves and can therefore run on playoff games alone. The impact metrics need far more games than a playoff run provides (a first-round loss is about six games), so they cannot be split this way, and neither can the once-a-season award votes. (The inventory lays out which systems can be split, and why, in full.) Among the 103 players who logged at least 150 playoff minutes in 2025-26, each player’s change is measured against the rest of that group. That strips out the leaguewide dip that comes from tougher defense and facing the same opponent night after night, so what is left is who rose or fell relative to the other rotation players who also advanced. A short run is still noisy, so two guardrails keep the list honest: each shift is trimmed toward zero the fewer playoff minutes a player logged, so a lucky handful of games can’t top the list, and each shift also carries a range built by re-running the read on that player’s games, re-drawn at random, over and over. A shift only counts as real when that whole range stays on one side of zero. A bar whose whisker clears zero on the chart is a shift unlikely to be just the bounce of a few games. By that test only 16 of the 103 players moved more than the games alone could explain.

The list rewards players whose game travels into a grind-it-out playoff series and marks down those who leaned on production that dried up against a focused defense. It describes one postseason, not proof that any of these players is reliably better or worse when the stakes rise: playoff samples are small, and only half the league is in them.

Figure 17: Who rose and who fell in the 2025-26 playoffs, by box-score rate metrics, for players with at least 150 playoff minutes. Green rose, red fell; the whisker is the range from re-drawing each player’s games at random.

Putting the two halves together: who delivered the most value once the playoffs are weighted in. The riser-and-faller read says who changed, not who held the most value across the whole season once the playoffs are weighted in. For that, blend each playoff player’s regular-season and playoff BPM into one number, counting each playoff minute 2x as heavily as a regular-season one. Call it Playoff-Weighted Value. Because it is built on BPM, which is validated against Basketball-Reference, it stays on a real scale: points per 100 possessions above an average player.

The top of that list is the expected company: Nikola Jokić, Victor Wembanyama, and Shai Gilgeous-Alexander. Brunson is the interesting case, and the one this report started from: did he really jump from the regular season to the playoffs, and does any metric catch it? Yes, and the size holds up. His BPM rose from +3.4 in the regular season to +4.6 in the playoffs, a real step up. Weighted only by minutes, that leaves him 15th of the 103 playoff players; lean on the postseason and he climbs to 7th; lean on it harder still and he holds at 7th, never quite breaking the top five. The straightforward reading is a real top-ten playoff performer who got better when it counted, not a hidden top-five star the regular-season numbers were missing: the players ahead of him were either better all-around or rose more.

Figure 18: Brunson raised his game in the playoffs, but not into the top five. Regular-season versus playoff BPM for the top 12 by playoff-weighted value.

9. What changed from 2024-25 to 2025-26

The whole cross-system comparison rests on one season, so it is fair to ask how much of it survives when the season turns over. The top barely moves, the middle churns, and the systems disagree by about the same amount as before.

Across the 290 players who qualified in both seasons, the consensus order agrees from 2024-25 to 2025-26 at about 0.75 on a 0-to-1 scale: a steady spine with a moving body. Nikola Jokić and Shai Gilgeous-Alexander hold the top two spots in both seasons, Giannis Antetokounmpo holds his as well, and Luka Dončić keeps a place too, so 4 of the top five carry over; Victor Wembanyama is the one new arrival.

The biggest one-year swings are exactly what a single season cannot tell apart from real change. Stephon Castle climbed the most on the common scale (+1.38) and Ivica Zubac fell the most (-1.59); injuries, role changes, and roster moves drive most of these, and the rating systems only record them. This is why the report treats the single-season orderings as a snapshot and rests its findings on the 30-season panels instead.

Figure 19: Who climbed and who slid in the consensus rating from 2024-25 to 2025-26, among players qualified in both seasons. Green rose, red fell.

What did not change is how much the systems agree with one another: the mean rank agreement among the box-score systems was 0.71 in 2024-25 and 0.75 in 2025-26. The disagreement this report describes is built into how the metrics are made, not an accident of one season.

10. Summary

Do the systems agree? On the best handful of players, closely; below the top tier, less than their reputations suggest. The box-score systems move together but only loosely track the from-scratch RAPM, the one lineup-based impact metric measured here (agreeing with the box scores at about 0.40 on a 0-to-1 scale), which is the point of an impact metric; after a reconstruction bug was fixed that low agreement is now real independent signal, and the prior-informed RAPM+prior, not the raw version, is the one that feeds the consensus. No system is best at everything: in a typical season PER describes a finished season better than any other (about 68% of the gaps between teams) yet forecasts the next among the worst (about 10%), a split that holds across all 30 seasons tested.

What does each system uniquely capture? It depends on the system. Each tilts toward a player type, Win Shares toward efficient bigs on winning teams, PER toward high-usage scorers, the impact metrics toward off-ball and defensive value the box score misses. But how much each adds beyond the rest splits widely: the all-in-one box scores, PER and BPM among them, are nearly reconstructable from the other systems, while Win Shares per minute and the bare RAPM family hold the most of their own, and with the RAPM reconstruction bug fixed that lineup independence now reads as genuine signal rather than an artifact of noise.

How should they be combined? Into two ratings that answer different questions. A plain consensus averages every system; a wins-predictive blend weights each by how well it tracked team wins. They agree almost exactly (0.98 on a 0-to-1 scale) on the best players and part mainly on role players whose production did not turn into team wins.

One pattern runs under all three answers: in the cumulative metrics, value is top-heavy. The gap between the best player and the tenth dwarfs the gap from the tenth to the fiftieth, so a ranked list understates how much a single elite player is worth.

The cross-system comparison rests on the 2025-26 season alone, so its exact orderings are a snapshot; year to year the order holds at the top (Nikola Jokić and Shai Gilgeous-Alexander lead both seasons, rank agreement about 0.75) but churns below it, while the describe-versus-forecast and stability findings span 30 seasons and are the firmer results. The recomputed BPM and VORP are validated against Basketball-Reference (Section 11) and best read for rank; their very top is slightly compressed.

11. A note on the recomputed formulas

Four of the box-score systems here are recomputed from raw totals (PER, Win Shares, BPM, VORP) rather than copied from a published source, and getting that recompute right mattered: an earlier version of this report had a reserve grading as the league’s best player and VORP running thirty times too large. PER and Win Shares land in the expected range (PER averages 15, with Nikola Jokić in the low 30s) and are used as is. BPM and VORP needed real repair, and the fix is worth walking through.

The bug. The first recompute built BPM’s per-100 rates on each player’s own possessions instead of the team’s, which inflated the rates for low-minute and high-steal players. The symptoms: a reserve, John Konchar, graded as the league’s best player by BPM; VORP ran roughly thirty times too large, with a single-season leader near 250 where the real scale tops out under 10; and usage rate read close to 100% for a starter instead of the correct 30%. The numbers were not merely approximate: they were wrong in scale and in order.

The fix. BPM is now built in two steps. First, the standard advanced rates (usage, assist, turnover, rebound, steal, and block percentages, each computed against team and league context the way Basketball-Reference does) feed a model tuned to reproduce Basketball-Reference’s published offensive and defensive BPM. Second, each team’s ratings are anchored so its players, weighted by minutes, sum to the team’s actual point differential, which is the defining property of BPM and what sets the scale. VORP was rebuilt on the corrected BPM with the right minutes base.

The check. Against Basketball-Reference’s published 2025-26 figures, across 361 qualified players, the recomputed values now line up: a value agreement of 0.93 for BPM, 0.95 for offensive BPM, and 0.96 for VORP, on a 0-to-1 scale. Defensive BPM is weaker at 0.88, which tracks expectations: box-score defense is hard, and Basketball-Reference’s own defensive BPM is limited for the same reason. Two caveats remain. The recompute slightly compresses the very top, grading the best player a notch below his Basketball-Reference mark. And the RAPM family had its own repair: a reconstruction bug was quietly discarding most of the season’s games, and the bare RAPM that survived was close to noise. With the bug fixed and the games restored, its reliability came back (Section 2), and RAPM+prior is now a usable impact metric; the bare single-season version remains the noisier of the two, and its absolute per-100 values are still approximate, so read the RAPM family for rank more than for exact scale.

12. Limitations

The cross-system comparison (rank agreement, the two combined ratings, the playoff risers and fallers) is built on the 2025-26 season alone. Two tests reach further, both across 30 seasons back to 1996-97: the describe-versus-forecast panel and the year-over-year stability of each rating. Extending the cross-system comparison itself across all those seasons, rather than the single testbed year, is the natural next step.

The crosswalk matching rate for each third-party source is reported in docs/player_rating_overview_results.md. Players who could not be matched are listed there; they are excluded from the cross-system comparison but retained in the unified table.

RAPM (regularized adjusted plus/minus) is the technical backbone of every serious modern impact metric (EPM, LEBRON, DARKO, RAPTOR, and RPM all build on it). This report now computes it directly from play-by-play, for 2025-26 and back through 1997-98. Within a season, every possession is reconstructed into the five-on-five lineup that was on the floor, and a single statistical model estimates all the players’ contributions to the scoring margin at once (582 players in 2025-26). One thing still makes the bare version weaker than the published metrics: it uses a single season of lineup data with no box-score prior, so it carries more noise than the pooled, prior-anchored version. Bare RAPM rates its leader (Section 2) well above where the box scores place him, and across the seasons it forecasts next-season team results less well than RAPM+prior does (Section 3). The bare version’s offensive and defensive halves are also kept out of the consensus, so one single-season split cannot swing the order at the top.

The report also computes the stabilized version, RAPM+prior. It pools three seasons of possessions, weighting recent ones more heavily, and shrinks each player toward a box-score prior: the offensive half toward Offensive BPM, the defensive half toward Defensive BPM. A player with few possessions stays near his box score, while a heavy-minute player moves toward what the lineup data shows. This is the prior-plus-RAPM recipe the published metrics use, and it helps where expected: its order agrees with the consensus at 0.93, up from 0.49 for the bare version, nearly double the agreement. The gain shows at the top as well as in the overall order: RAPM+prior now leads with the MVP tier, Nikola Jokić at its head, rather than the low-minute names a raw one-season estimate can float up. Its absolute per-100 values are still approximate (an in-house recompute), so read it for rank more than exact scale. Only the combined RAPM+prior feeds the consensus; its offensive and defensive halves stay out. It also forecasts team results well: across the seasons where it can be computed, RAPM+prior predicts next-season team point differential better than the box scores do (Section 3). The caveat that remains is at the player level, where heavy-minute role players who spend their court time in strong lineups, the hardest case for any plus/minus method, can still rate too high. Matching a published metric like EPM or LEBRON would need the player-tracking data they use, which we do not have. A public RAPM snapshot can be dropped into the cache schema to validate the computed values when one is available.

The tracking-based and team-internal systems (Second Spectrum, franchise models, Synergy) are documented in the inventory but not accessible: a blind spot this report can name but not fill.

Comparing playoff performance to regular season performance is done for the box-score systems in Section 8, where the rate metrics are recomputed from postseason games and compared directly. The limit is the impact metrics: RAPM-based systems cannot do this reliably. A first-round exit provides roughly 6 games of lineup data; even a Finals run provides only 20 to 24 games. That sample is too small for the lineup-based estimate to stabilize, regardless of how good the prior is. The small-sample problem is not a solvable data-engineering issue; it is a fundamental limit of how RAPM works.

13. What weighing the evidence more carefully would add

The impact metrics already weigh evidence this way in one place: Section 1 described how the box-score prior and the lineup on/off data are blended, with the weight shifting toward the data as a player’s sample of possessions grows. That structure is the skeleton of EPM, LEBRON, DARKO, and every other serious impact metric.

Three things a more complete version would add:

Uncertainty ranges. Section 1 already noted what the single number hides: an “EPM: +8.2” carries no range, though the uncertainty band around a one-season RAPM-based estimate is about ±3.0 points per 100 possessions for a rotation player, measured here from how far the two split-half fits diverge. This does not mean the metrics are unreliable. It means the precision they imply is a display choice, not a statistical one.

Playoff versus regular season. Section 8 already compares regular season and playoffs for the box-score rate metrics directly, which is the practical version. The addition is for the RAPM-based metrics, where the right approach is different: treat the full-season estimate as the starting belief and update it with playoff lineup data. The updated estimate barely shifts for a player who exits in the first round (roughly 6 games of new evidence), and moves more for a player who reaches the Finals (20-24 games). The shift (in which direction and by how much) is the honest answer to “did this player hold up in the playoffs?” without pretending there is more data than there is. A player whose estimate barely moves played to his regular-season picture; one whose estimate shifts meaningfully showed something the season had not.

Better consensus weighting. The wins-predictive rating in this report is a step toward what is sometimes called model averaging: instead of treating all systems as equally reliable, weight them by how well they predicted team wins. A more complete version would estimate those weights from retrodiction performance across multiple seasons and update them as new data arrives. The practical difference at the top of the rankings is small, because the systems that predict team wins best already carry the most weight. But the principled foundation matters when choosing which metrics to trust for a specific question, particularly for players at the margins of the top tier.

Appendix A: Companion Documents

Appendix B: The two versions of Box Plus/Minus

Box Plus/Minus comes in two versions, both built by Daniel Myers for Basketball-Reference. The original, BPM 1.0, was published in 2014 and was the first widely available attempt to estimate a player’s plus/minus impact from the box score alone. Basketball-Reference later replaced it with a revised version, BPM 2.0, and recomputed every season in its database with the new formula. The “BPM” throughout this report is BPM 2.0, the version in current use.

The two differ in method, not just in tuning. BPM 1.0 first guessed each player’s position (point guard through center) and offensive role from their stats, then applied weights that shifted with that position and role. BPM 2.0 reworked how it infers role and recalibrated its weights against a larger set of lineup plus/minus data, a revision meant to improve accuracy, particularly for players whose value comes from defense or from a role the box score describes poorly. Because 2.0 is the version Basketball-Reference now publishes, and the one its BPM and VORP figures reflect, it is the one this report recomputes.

We looked at adding BPM 1.0 alongside it, to show how much a single system’s verdicts move when only the formula changes, but did not. Basketball-Reference retired the original and no longer publishes its values, so there is nothing to import. And the position-and-role part of the formula is not cleanly documented in public sources: the reconstructions that circulate drop it, which collapses the metric into something close to a minutes-played ranking rather than a measure of skill. A faithful recompute would need the original full specification. If that becomes available, BPM 1.0 versus 2.0 is a natural addition.

Data: NBA.com via nba_api, Basketball-Reference, FiveThirtyEight (RAPTOR). See Appendix A for companion documents.