Player Rating Overview: Findings Outline
Draft
Internal outline. Cross-referenced to player_rating_overview_results.md. Technical shorthand fine here. Reflects the current pipeline: 14 recomputed box-score systems plus a from-scratch RAPM (bare single-season + prior-informed multi-year RAPM+prior); other impact metrics and human rankings surveyed but not recomputed.
Structure (matches the 13 sections in the findings)
Intro (question-driven; before §1)
- Hook: the rating that best describes a finished season forecasts the next worst. PER ~68% describe vs ~10% forecast in the median season, across 30 tested.
- Three questions: (1) do the systems agree? (2) what does each uniquely capture? (3) how should they be combined?
- Settled up front: no single best system; agree at the top, diverge below; value is top-heavy (gap #1→#10 >> #10→#50).
- Testbed 2025-26; 14 box-score systems + RAPM/RAPM+prior recomputed; other impact + human rankings surveyed only. Two panels span 30 seasons back to 1996-97.
§ 1. The landscape of player rating
- Box-score (recomputed, present): Game Score, PER, Win Shares, WS/48, BPM, OBPM, DBPM, VORP.
- Impact metrics (surveyed): RAPTOR, EPM, LEBRON, DARKO, DRIP, ESPN RPM. RAPM backbone + box-score prior; noise/regularization.
- Human rankings (surveyed, not included): MVP votes, All-NBA, media top-100.
- Practitioner trust (external, cited in resources bibliography): HoopsHype survey of 29 execs, DARKO top at 8 votes; Dunks & Threes retrodiction order EPM > RPM > RAPTOR > BPM 2.0.
§ 2. Do the systems agree?
- Spearman matrix. Tight: PER-Game Score 0.862, BPM-VORP 0.964. Moderate: PER-WS 0.764, WS-WS/48 0.781. Loose: PER-BPM 0.901, Game Score-BPM 0.739.
- Player-level divergence even where overall correlation is high: WS favors efficient bigs (Amen Thompson, Donovan Clingan); PER favors high-usage scorers (Giannis Antetokounmpo, Lauri Markkanen).
- RAPM sits apart: agrees with box scores at 0.40; bare 2025-26 puts reserve Kawhi Leonard at #6 in consensus. Fixed RAPM+prior agrees with consensus 0.93 (up from 0.49, ~double) but very top still catches low-minute noise (Nikola Jokić); RAPM+prior (combined only) feeds the consensus.
- Chart: rank-agreement heatmap.
§ 3. What the field has learned about evaluating these metrics
- Two community tests: retrodiction (1st-half → 2nd-half outcomes) and team-wins prediction; both reward a RAPM backbone + calibrated box-score prior. Academic caveat: complex metrics don’t reliably beat simple ones as inputs to downstream salary/wins models.
- Direct describe test (held-out teams, 2025-26): PER best, ~73% of team point-diff, above the plus/minus metrics.
- Forecast test (prior-season ratings predict this season): order flips, PER ~22%, best forecaster RAPM+prior ~58%. Coverage ~88%.
- Multi-season panel (30 seasons / 29 handoffs back to 1996-97): PER describes ~68%, forecasts ~10% (median handoff); best forecaster VORP ~45%; BPM beats PER forecasting 28/29 handoffs.
- Impact-era panel (29 seasons from 1997-98, box scores vs RAPM): bare RAPM forecasts 6/10 (~32%, describe ~92% but mechanical), RAPM+prior tops the forecast (1/10, ~47%) above every box score, while bare RAPM sits mid-pack; forecast leader RAPM+prior.
- Stability (year-over-year): Game Score retains 68% of its top 20, PER 64% (chance ~5%); steadiest Game Score 0.85, jumpiest DBPM 0.67; 29 pairs. Stickiness cuts against forecasting (PER sticky but weak forecaster; plus/minus metrics jumpier but better forecasters).
- Charts: retrodiction, next-season retrodiction, panel describe-vs-forecast, impact panel, rating stability.
§ 4. What each system uniquely sees
- Overlap R² (own kin held out; high = redundant): most redundant PER 0.945, BPM 0.943; least redundant WS/48 0.661, bare RAPM 0.677, WS 0.709; DBPM slid to mid 0.823 after the RAPM fix. Caveat: BPM/VORP validated vs BBR; RAPM now carries real signal, so its low overlap reads as genuine independence.
- System-outliers chart: who each system rates above/below consensus.
§ 5. The two uber ratings
- Consensus (average normalized score). Top 5: Nikola Jokić 3.99, Shai Gilgeous-Alexander 3.34, Victor Wembanyama 2.82, Giannis Antetokounmpo 2.70, Luka Dončić 2.58.
- Wins-predictive (weighted by team-wins prediction). Top: Nikola Jokić 4.52, Shai Gilgeous-Alexander 4.24.
- Consensus vs. wins-predictive Spearman 0.981. Top riser Giannis Antetokounmpo +0.94; top faller Jericho Sims -0.67. Wins-predictive lifts stars on winning teams, discounts production on losing teams.
§ 6. Ranking throws away how much better someone is (distribution section)
- Headline: rank-1-to-15 vs rank-50-to-65 gap, same 14-spot jump, very different value. VORP: Nikola Jokić to Jalen Brunson = 4.18, vs. Ausar Thompson to Ayo Dosunmu = 0.27 (ratio 15.6x). Universal: every system’s ratio > 1, smallest WS/48 3.6x, largest D-RAPM 19.9x. Guard:
every_system_top_gap_exceeds_mid_gap. - Why the ratio varies by system: rate metrics spread evenly (PER top-5% share 8.6%, PER gap ratio 11.6x); cumulative metrics top-heavy (WS top-5% 13.5%, VORP 24.5%).
- Power-law steepness (alpha) as the continuous concentration read, not a binary label: PER 0.14 (shallowest); VORP 0.37, Consensus 0.38, Wins-Pred 0.43. Straight-line power laws (incl. WS, BPM) vs. benders (per-possession metrics) is a convention-driven grouping, kept secondary to alpha.
- RAPM full distribution (29 seasons, 9820 player-seasons, 47% below 0) is a symmetric bell, not heavy-tailed (lacks the one-sided tail) — rank-gap ratio (7.3x) well below the cumulative metrics (VORP, WS, BPM), guard
rapm_smaller_ratio_than_cumulative. WS/48 (a bounded per-minute rate, not symmetry) posts the true smallest ratio of any system. VORP leans right (the precondition a heavy tail needs), though its own top-50 curve bends just short of the power-law line. - Gini kept only as a cross-check; it misleads on 0-centered metrics (Consensus Gini 0.753 > WS 0.353, an artifact, not real top-heaviness).
- Charts: ordinal-vs-value gap (opens the section), all-systems distributions, distribution shape, power-law small multiples, power-law fits, gini, rank-value curves.
§ 7. Who lands in the top 20
- Top-20-by-system table across the 14 systems; rank-1-to-rank-20 gap as a concentration read.
§ 8. Who rose and fell in the playoffs
- Box-score rate metrics recomputed on playoff games only (impact metrics can’t: a playoff run is too few games; only the recompute family splits by season type, see inventory). 103 players ≥ 150 playoff minutes; change measured against that peer group (strips the leaguewide playoff dip).
- Two reliability guards: shift shrunk toward zero by playoff minutes (half weight at 200) so a short sample can’t top the list; game-level bootstrap range (re-draw games 1000x), a shift is “clear” when its range excludes zero; only 16 of 103 clear it.
- Risers: OG Anunoby, Cason Wallace, Jayson Tatum. Fallers: Jalen Duren, Nikola Jokić, Nickeil Alexander-Walker, incl. regular-season consensus #1 Nikola Jokić. One postseason, not proof.
§ 9. What changed from 2024-25 to 2025-26
- Snapshot read of the most recent full-season pair (290 players qualified in both). Consensus order agrees 0.75 year-over-year: top stable (Nikola Jokić, Shai Gilgeous-Alexander lead both; only 4 of top 5 carry over), middle churns.
- Biggest mover up Stephon Castle (+1.38), down Ivica Zubac (-1.59); one-year swings, not signal. What did NOT change: box-score agreement 0.71 → 0.75, disagreement is structural. Chart: season comparison (consensus movers).
§ 10. Summary
- Re-answers the three questions: agree at the top / each captures less uniquely than it looks (heavy overlap) / two combined ratings agreeing at 0.981.
- Through-line: value is top-heavy. Caveat: single-season cross-system comparison is a snapshot; the describe-vs-forecast and stability panels (30 seasons) are the firmer results.
§ 11. A note on the recomputed formulas
- PER cleanest (mean ~15, Jokić low 30s). WS/BPM/VORP are approximations; absolute values differ from BBR.
- VORP inflation for high-steal defenders: VORP rates Shai Gilgeous-Alexander further above consensus than any player (per-100 denominator uses player-possessions not team-possessions). Also Ausar Thompson, Dunn.
§ 12. Limitations
- Single-season cross-system testbed; only the describe-vs-forecast and stability panels reach across 30 seasons.
- RAPM now computed from play-by-play (2025-26 back to 1997-98, 582 players in 2025-26): bare = noisy, no prior; RAPM+prior = 3 pooled seasons + BPM prior (consensus corr 0.93); over the 29-season panel RAPM+prior tops the forecast (above every box score), bare RAPM does not. No tracking data, so can’t match EPM/LEBRON.
- Tracking/team-internal blind spot (Second Spectrum, Synergy, franchise models).
- Playoff small-sample limit for RAPM-based metrics.
§ 13. What a Bayesian lens would add
- Uncertainty ranges: single-number metrics hide a ~2-3 pts/100 band; apparent 8th-vs-12th disputes often within it.
- Playoff vs. regular season: update the full-season prior with playoff lineup data; posterior barely shifts on 6 games, more on a Finals run.
- Better consensus weighting: estimate system weights from multi-season retrodiction.
Key numbers (from results.md)
- Players: 582 total, 378 qualified (≥ 500 min). Systems present: 14.
- Correlations: PER-Game Score 0.862, PER-WS 0.764, WS-WS/48 0.781, PER-BPM 0.901, Game Score-BPM 0.739, BPM-VORP 0.964.
- RAPM: box-score corr 0.40 (bare), consensus corr 0.49 (bare) → 0.93 (RAPM+prior).
- Describe vs forecast (panel, 30 seasons): PER 68% / 10% (median handoff); best forecaster VORP 45%.
- Consensus top 5: Nikola Jokić 3.99, Shai Gilgeous-Alexander 3.34, Victor Wembanyama 2.82, Giannis Antetokounmpo 2.70, Luka Dončić 2.58.
- Consensus vs. wins-predictive Spearman: 0.981. Top riser Giannis Antetokounmpo +0.94; top faller Jericho Sims -0.67.
- Concentration: PER top-5% 8.6%; Gini cross-check Consensus 0.753 vs WS 0.353 (artifact).
- Overlap R² (own kin held out; high = redundant): PER 0.945, BPM 0.943 most redundant; WS/48 0.661, bare RAPM 0.677, WS 0.709 least redundant; DBPM 0.823 mid after the RAPM fix.
- Playoffs: 103 players ≥ 150 min; top riser OG Anunoby, top faller Jalen Duren.