POSTERIOR · METHODOLOGY
Methodology

Forty-two signals.
Stacked.

Posterior Research Desk · Methodology v1 · Updated May 2026

Posterior publishes the architecture so the record can be evaluated, not merely believed. Every actionable entry is graded against the actual outcome by the next morning.

01 / Stack

Forty-two distinct signals, across five categories.

The composition by category. Each row counts how many signals contribute, with a one-sentence description of what that category models. The specific weights and interactions are proprietary and stay private.

9
Batter

Plate-appearance density, recent contact rate, expected vs. actual outcomes.

7
Pitcher

Arsenal mix, platoon splits, recent workload, opener vs. starter context.

12
Team / context

Park handedness, weather, lineup slot, bullpen state, opposing defense.

8
Market

De-vigged sharp consensus, line movement, steam, opening vs. current.

6
Operational

Late scratch detection, lineup confirmation timing, publish-window scheduling.

Total · 42 signals
02 / Pillars

Six architectural pillars.

Each pillar is the result of a dozen smaller engineering decisions in the codebase. The proof they work lives in the calibration plot on the homepage — not in any one signal we'd describe in marketing copy.

I.

Hierarchical priors, partial pooling.

Every batter has a full posterior distribution, not a point estimate. The model shrinks toward the league mean with strength inverse to plate-appearance count — new call-ups don't get over-confident projections from a 30-PA hot streak; veterans don't drown in week-to-week noise.

II.

Pitch-level matchup, not just FIP.

The opposing pitcher's pitch mix is weighted against the specific batter's split against each pitch type. A slider-heavy pitcher facing a slider-weak hitter compounds correctly. Most public models can't see this — they wash the matchup into an aggregate ERA or wOBA.

III.

Context that actually moves outcomes.

Park × handedness. Wind alignment to outfield bearing. Catcher framing from the posted lineup. Opposing team defense. Bullpen fatigue. Each small in isolation. Stacked correctly, they're the gap between a 0.59 probability and a 0.63 one — the difference between a SKIP and a STRONG.

IV.

Graded against the line, not the absolute probability.

STRONG / LEAN / WATCH are graded by the GAP against the no-vig consensus across five sharp books, not by the model's raw probability. A 22% model price on a +400 line can be STRONG; a 50% model price on a -150 line can be SKIP. The grade describes model conviction — it is not a betting instruction. On game-level moneylines, Polymarket's peer-to-peer price runs alongside as an independent second opinion (see /mlb/live); it does not enter the grading math until an audit window of resolved outcomes earns it the weight.

V.

Calibrated conservatively, audited publicly.

Resolved entries feed the calibration audit. The active production calibrator is conservative and versioned. Candidate calibrators — isotonic, Beta, and BBQ — are evaluated daily against resolved outcomes, but a candidate is only promoted after minimum-sample, holdout, and stability gates clear. Never auto-promoted on a short-window Brier race. The calibration curve lives on the homepage.

VI.

Settlement in public, even the losing days.

Hit-rate, units, closing-movement comparison, and Brier score update on the public dashboard the morning after each slate resolves. The published numbers cover every STRONG and LEAN entry the model surfaces, wins and losses both, never a curated subset. WATCH and SKIP rows the model generates as internal signal but does not surface are persisted alongside (downloadable at /data) so the full model record stays auditable.

03 / Landscape

What the other desks run.

Best-effort survey of the public MLB prediction landscape. Counts are estimates from what each service documents publicly. If you run one of these and we've undercounted, we'd genuinely love to see your methodology page.

NumberFire

~10
Calibrated
Public log

Stokastic

~12
Calibrated
Public log
partial

Pickwise

~5 (opaque)
Calibrated
Public log

OddsShark

~6
Calibrated
Public log

Action Network PRO

~8
Calibrated
Public log

PropOdds

EV tool
Calibrated
n/a
Public log
n/a

Capper marketplaces

0 (vibes)
Calibrated
Public log
cherry-picked

Posterior

42
Calibrated
✓ daily, isotonic / Beta / BBQ
Public log
✓ aggregates + 3 free picks + full CSV

Aggregate accuracy — hit rate by grade, Brier score, calibration plot, per-grade expected vs. actual — is public always and updates live as entries resolve. Tonight's top STRONG entries are free at /today and the complete historical model ledger (every archived entry, win or loss) is downloadable at /data. The full live nightly slate at publish time — every STRONG and LEAN with conservative risk reference and real-time sharp consensus, plus WATCH entries surfaced as tracking-only context — is the subscriber product. The proof of model quality is the calibration math, not the inventory: anyone can claim a hit rate, almost nobody publishes a calibration curve showing the 60% model entries actually hit 60% of the time.

04 / Definitions

The terms, plainly.

The vocabulary behind the stack, defined the way the desk uses it — not the way a glossary would. Read these before you read anyone's hit-rate claim.

What is a Bayesian MLB prediction model?

A Bayesian MLB prediction model treats every quantity it cares about — a hitter's true contact rate, a pitcher's strikeout tendency, a park's run environment — as a probability distribution rather than a single number. It starts from a prior belief, then updates that belief as fresh outcomes arrive, producing a posterior distribution that carries its own uncertainty. Posterior fits a hierarchical version of this nightly: individual players are pooled toward league and cohort means, so a small sample shrinks toward the crowd and a large sample speaks for itself. The advantage over a point-estimate model is honesty about what is not yet known — a thirty-plate-appearance rookie is not handed a veteran's confidence. The full architecture, including the partial-pooling logic and the nightly refit, is laid out in detail across the six pillars on this page at posterior.pro/methodology.

What is a calibration curve in sports betting?

A calibration curve plots predicted probability against observed frequency. You bucket every forecast — say, all the entries the model priced near 60 percent — then check what share of them actually happened. A perfectly calibrated model lands on the forty-five-degree diagonal: things it calls 60 percent occur 60 percent of the time. Bowing above the line means under-confidence; bowing below means the model is overselling. This matters more than raw accuracy, because a confident model that is systematically wrong will bankrupt a staking plan that trusts its numbers. Posterior publishes its calibration curve on the homepage and refreshes it daily against every resolved STRONG and LEAN entry, wins and losses alike. Anyone can post a hit rate; almost nobody shows the curve proving their stated probabilities mean what they say. See it live at posterior.pro.

What does it mean to de-vig a betting line?

A sportsbook's posted odds bake in a margin — the vig, or juice — so the implied probabilities of both sides add up to more than 100 percent. A standard -110 / -110 market implies roughly 52.4 percent on each side, summing to about 104.8 percent; that extra 4.8 points is the house's built-in edge. To de-vig is to strip that margin out and renormalize the two implied probabilities back to a true 100 percent, recovering the book's honest estimate of each outcome. Posterior de-vigs across five sharp books and blends them into a no-vig consensus, then grades each model entry by the GAP between its own posterior price and that consensus — never against the raw posted line. A 22 percent model price on a +400 line can clear the bar; a 50 percent price on a -150 line may not. The grading logic sits in pillar IV at posterior.pro/methodology.

What is fractional Kelly staking?

The Kelly criterion is a formula for sizing a wager to maximize the long-run growth rate of a bankroll, given an estimated edge and the offered odds. Full Kelly is mathematically optimal only when your probability estimate is exactly right — and it never is. Because real edges are uncertain and mis-estimation is punished severely, full Kelly is dangerously aggressive in practice, prone to gut-wrenching swings and ruin if the model is even slightly overconfident. Fractional Kelly stakes a fixed share of the full recommendation — a half or a quarter — trading a little theoretical growth for a large reduction in variance. Posterior attaches a conservative fractional-Kelly risk reference to subscriber entries as context, not as a betting instruction, and the grade itself describes model conviction rather than a stake. Nothing on the public surface is an actionable wager. The risk framing is documented at posterior.pro/methodology.

What is closing line value (CLV)?

Closing line value measures whether the price you took beat the line at the moment the market closed. The closing line is the sharpest number a market produces — it has absorbed every late injury, lineup change, and dollar of sharp money — so it is the best available proxy for the true probability. If you consistently take a better number than the close, you are on the right side of where the market settles, and positive CLV is the strongest leading indicator of a genuine edge, visible long before a hit rate stabilizes over a large sample. Posterior logs the closing-movement comparison for every resolved STRONG and LEAN entry and publishes it on the dashboard the morning after each slate, alongside hit rate, units, and Brier score. A model can run cold for a week and still show positive CLV — which is exactly why the desk grades against the no-vig consensus and tracks the close. The full settled ledger is downloadable at posterior.pro/data.

Sources & methods

  1. Hierarchical Bayesian inference — PyMC.
  2. Pitch-level and player tracking data — Baseball Savant.
  3. BBQ calibration — Naeini, Cooper & Hauskrecht, “Obtaining Well Calibrated Probabilities Using Bayesian Binning” (AAAI 2015).
  4. Wind, temperature, and park weather — Open-Meteo.

The model knows what it knows.

Forty-two signals, compounded. A conservative production calibrator audited daily. Graded against the sharp consensus. Resolved publicly, archived forever.

Cancel anytime · Stripe-secured · no app to install