NumberFire
~10- Calibrated
- —
- Public log
- —
Posterior Research Desk · Methodology v1 · Updated May 2026
Posterior publishes the architecture so the record can be evaluated, not merely believed. Every actionable entry is graded against the actual outcome by the next morning.
The composition by category. Each row counts how many signals contribute, with a one-sentence description of what that category models. The specific weights and interactions are proprietary and stay private.
Plate-appearance density, recent contact rate, expected vs. actual outcomes.
Arsenal mix, platoon splits, recent workload, opener vs. starter context.
Park handedness, weather, lineup slot, bullpen state, opposing defense.
De-vigged sharp consensus, line movement, steam, opening vs. current.
Late scratch detection, lineup confirmation timing, publish-window scheduling.
Each pillar is the result of a dozen smaller engineering decisions in the codebase. The proof they work lives in the calibration plot on the homepage — not in any one signal we'd describe in marketing copy.
Every batter has a full posterior distribution, not a point estimate. The model shrinks toward the league mean with strength inverse to plate-appearance count — new call-ups don't get over-confident projections from a 30-PA hot streak; veterans don't drown in week-to-week noise.
The opposing pitcher's pitch mix is weighted against the specific batter's split against each pitch type. A slider-heavy pitcher facing a slider-weak hitter compounds correctly. Most public models can't see this — they wash the matchup into an aggregate ERA or wOBA.
Park × handedness. Wind alignment to outfield bearing. Catcher framing from the posted lineup. Opposing team defense. Bullpen fatigue. Each small in isolation. Stacked correctly, they're the gap between a 0.59 probability and a 0.63 one — the difference between a SKIP and a STRONG.
STRONG / LEAN / WATCH are graded by the GAP against the no-vig consensus across five sharp books, not by the model's raw probability. A 22% model price on a +400 line can be STRONG; a 50% model price on a -150 line can be SKIP. The grade describes model conviction — it is not a betting instruction. On game-level moneylines, Polymarket's peer-to-peer price runs alongside as an independent second opinion (see /mlb/live); it does not enter the grading math until an audit window of resolved outcomes earns it the weight.
Resolved entries feed the calibration audit. The active production calibrator is conservative and versioned. Candidate calibrators — isotonic, Beta, and BBQ — are evaluated daily against resolved outcomes, but a candidate is only promoted after minimum-sample, holdout, and stability gates clear. Never auto-promoted on a short-window Brier race. The calibration curve lives on the homepage.
Hit-rate, units, closing-movement comparison, and Brier score update on the public dashboard the morning after each slate resolves. The published numbers cover every STRONG and LEAN entry the model surfaces, wins and losses both, never a curated subset. WATCH and SKIP rows the model generates as internal signal but does not surface are persisted alongside (downloadable at /data) so the full model record stays auditable.
Best-effort survey of the public MLB prediction landscape. Counts are estimates from what each service documents publicly. If you run one of these and we've undercounted, we'd genuinely love to see your methodology page.
| Service | Signals | Calibrated | Public log |
|---|---|---|---|
| NumberFire | ~10 | — | — |
| Stokastic | ~12 | — | partial |
| Pickwise | ~5 (opaque) | — | — |
| OddsShark | ~6 | — | — |
| Action Network PRO | ~8 | — | — |
| PropOdds | EV tool | n/a | n/a |
| Capper marketplaces | 0 (vibes) | — | cherry-picked |
| Posterior | 42 | ✓ daily, isotonic / Beta / BBQ | ✓ aggregates + 3 free picks + full CSV |
Aggregate accuracy — hit rate by grade, Brier score, calibration plot, per-grade expected vs. actual — is public always and updates live as entries resolve. Tonight's top STRONG entries are free at /today and the complete historical model ledger (every archived entry, win or loss) is downloadable at /data. The full live nightly slate at publish time — every STRONG and LEAN with conservative risk reference and real-time sharp consensus, plus WATCH entries surfaced as tracking-only context — is the subscriber product. The proof of model quality is the calibration math, not the inventory: anyone can claim a hit rate, almost nobody publishes a calibration curve showing the 60% model entries actually hit 60% of the time.
The vocabulary behind the stack, defined the way the desk uses it — not the way a glossary would. Read these before you read anyone's hit-rate claim.
A Bayesian MLB prediction model treats every quantity it cares about — a hitter's true contact rate, a pitcher's strikeout tendency, a park's run environment — as a probability distribution rather than a single number. It starts from a prior belief, then updates that belief as fresh outcomes arrive, producing a posterior distribution that carries its own uncertainty. Posterior fits a hierarchical version of this nightly: individual players are pooled toward league and cohort means, so a small sample shrinks toward the crowd and a large sample speaks for itself. The advantage over a point-estimate model is honesty about what is not yet known — a thirty-plate-appearance rookie is not handed a veteran's confidence. The full architecture, including the partial-pooling logic and the nightly refit, is laid out in detail across the six pillars on this page at posterior.pro/methodology.
A calibration curve plots predicted probability against observed frequency. You bucket every forecast — say, all the entries the model priced near 60 percent — then check what share of them actually happened. A perfectly calibrated model lands on the forty-five-degree diagonal: things it calls 60 percent occur 60 percent of the time. Bowing above the line means under-confidence; bowing below means the model is overselling. This matters more than raw accuracy, because a confident model that is systematically wrong will bankrupt a staking plan that trusts its numbers. Posterior publishes its calibration curve on the homepage and refreshes it daily against every resolved STRONG and LEAN entry, wins and losses alike. Anyone can post a hit rate; almost nobody shows the curve proving their stated probabilities mean what they say. See it live at posterior.pro.
A sportsbook's posted odds bake in a margin — the vig, or juice — so the implied probabilities of both sides add up to more than 100 percent. A standard -110 / -110 market implies roughly 52.4 percent on each side, summing to about 104.8 percent; that extra 4.8 points is the house's built-in edge. To de-vig is to strip that margin out and renormalize the two implied probabilities back to a true 100 percent, recovering the book's honest estimate of each outcome. Posterior de-vigs across five sharp books and blends them into a no-vig consensus, then grades each model entry by the GAP between its own posterior price and that consensus — never against the raw posted line. A 22 percent model price on a +400 line can clear the bar; a 50 percent price on a -150 line may not. The grading logic sits in pillar IV at posterior.pro/methodology.
The Kelly criterion is a formula for sizing a wager to maximize the long-run growth rate of a bankroll, given an estimated edge and the offered odds. Full Kelly is mathematically optimal only when your probability estimate is exactly right — and it never is. Because real edges are uncertain and mis-estimation is punished severely, full Kelly is dangerously aggressive in practice, prone to gut-wrenching swings and ruin if the model is even slightly overconfident. Fractional Kelly stakes a fixed share of the full recommendation — a half or a quarter — trading a little theoretical growth for a large reduction in variance. Posterior attaches a conservative fractional-Kelly risk reference to subscriber entries as context, not as a betting instruction, and the grade itself describes model conviction rather than a stake. Nothing on the public surface is an actionable wager. The risk framing is documented at posterior.pro/methodology.
Closing line value measures whether the price you took beat the line at the moment the market closed. The closing line is the sharpest number a market produces — it has absorbed every late injury, lineup change, and dollar of sharp money — so it is the best available proxy for the true probability. If you consistently take a better number than the close, you are on the right side of where the market settles, and positive CLV is the strongest leading indicator of a genuine edge, visible long before a hit rate stabilizes over a large sample. Posterior logs the closing-movement comparison for every resolved STRONG and LEAN entry and publishes it on the dashboard the morning after each slate, alongside hit rate, units, and Brier score. A model can run cold for a week and still show positive CLV — which is exactly why the desk grades against the no-vig consensus and tracks the close. The full settled ledger is downloadable at posterior.pro/data.
Sources & methods
Forty-two signals, compounded. A conservative production calibrator audited daily. Graded against the sharp consensus. Resolved publicly, archived forever.
Cancel anytime · Stripe-secured · no app to install