POSTERIOR · METHODOLOGY

Methodology

Forty-two signals.
Stacked.

Posterior Research Desk · Methodology v1 · Updated July 2026

Posterior publishes the architecture so the record can be evaluated, not merely believed. Every actionable entry is graded against the actual outcome by the next morning.

Subscribe — $29/mo ← Back to overview

01 / Stack

Forty-two distinct signals, across five categories.

The composition by category. Each row counts how many signals contribute, with a one-sentence description of what that category models. The specific weights and interactions are proprietary and stay private.

Batter

Plate-appearance density, recent contact rate, expected vs. actual outcomes.

Pitcher

Arsenal mix, platoon splits, recent workload, opener vs. starter context.

Team / context

Park handedness, weather, lineup slot, bullpen state, opposing defense.

Market

De-vigged sharp consensus, line movement, steam, opening vs. current.

Operational

Late scratch detection, lineup confirmation timing, publish-window scheduling.

Total · 42 signals

Backtest

Calibration context, labeled separately from the live record.

The historical calibration snapshot is a backtest, not a live betting claim. It exists so readers can see whether the probability scale behaves sensibly at volume before reading the live season record. Live performance remains on /mlb/accuracy.

Calibration error

~1.2pp

Brier score

0.178

Backtest rows

157,304

Games analyzed

5,881

Backtested on 2025–26 historical data. Simulated results, not live wagers. Past performance does not guarantee future results. Not betting advice.

Ledger note

2026 season record.

Posterior evaluates point-in-time market rows before outcomes are summarized. The primary public record is the 2026 season ledger, covers season start to Jul 12, 2026, and contains 2,175 qualified STRONG and LEAN rows. The 2025 season is shown separately as a historical benchmark from Aug 16-Nov 1, 2025, because it uses a different training cutoff.

2026 season record · 1410-765 · 64.8% · Brier 0.2205 · Audited price-reference ROI +3.27%

2025 historical benchmark · 766-530 · 59.1% · Audited price-reference ROI -2.68%

These historical ROI figures use the audited ledger's stored price references. Forward executable ROI is reported separately and only includes settled picks with one atomic, allowlisted price + sportsbook + quote-time tuple; unpriced picks remain in the public W–L record.

02 / Pillars

Six architectural pillars.

Each pillar is the result of a dozen smaller engineering decisions in the codebase. The proof they work lives in the calibration plot on /mlb/accuracy — not in any one signal we'd describe in marketing copy.

Hierarchical priors, partial pooling.

Every batter has a full posterior distribution, not a point estimate. The model shrinks toward the league mean with strength inverse to plate-appearance count — new call-ups don't get over-confident projections from a 30-PA hot streak; veterans don't drown in week-to-week noise.

II.

Pitch-level matchup, not just FIP.

The opposing pitcher's pitch mix is weighted against the specific batter's split against each pitch type. A slider-heavy pitcher facing a slider-weak hitter compounds correctly. Most public models can't see this — they wash the matchup into an aggregate ERA or wOBA.

III.

Context that actually moves outcomes.

Park × handedness. Wind alignment to outfield bearing. Catcher framing from the posted lineup. Opposing team defense. Bullpen fatigue. Each small in isolation. Stacked correctly, they're the gap between a 0.59 probability and a 0.63 one — the difference between a SKIP and a STRONG.

IV.

Graded against the line, not the absolute probability.

STRONG / LEAN / WATCH are graded by the GAP against the no-vig consensus across five sharp books, not by the model's raw probability. A 22% model price on a +400 line can be STRONG; a 50% model price on a -150 line can be SKIP. The grade describes model conviction — it is not a betting instruction. On game-level moneylines, Polymarket's peer-to-peer price runs alongside as an independent second opinion (see /mlb/live); it does not enter the grading math until an audit window of resolved outcomes earns it the weight.

Calibrated conservatively, audited publicly.

Resolved entries feed the calibration audit. The active production calibrator is conservative and versioned. Candidate calibrators — isotonic, Beta, and BBQ — are evaluated daily against resolved outcomes, but a candidate is only promoted after minimum-sample, holdout, and stability gates clear. Never auto-promoted on a short-window Brier race. The calibration curve lives on /mlb/accuracy, with /calibration kept as the public entry point.

VI.

Settlement in public, even the losing days.

Hit-rate, units, closing-movement comparison, and Brier score update on the public dashboard the morning after each slate resolves. The published numbers cover every STRONG and LEAN entry the model surfaces, wins and losses both, never a curated subset. WATCH and SKIP rows the model generates as internal signal but does not surface are persisted alongside (downloadable at /data) so the full model record stays auditable.

03 / Landscape

What the other desks run.

Best-effort survey of the public MLB prediction landscape. Counts are estimates from what each service documents publicly. If you run one of these and we've undercounted, we'd genuinely love to see your methodology page.

NumberFire

~10

Calibrated: —
Public log: —

Stokastic

~12

Calibrated: —
Public log: partial

Pickwise

~5 (opaque)

Calibrated: —
Public log: —

OddsShark

Calibrated: —
Public log: —

Action Network PRO

Calibrated: —
Public log: —

PropOdds

EV tool

Calibrated: n/a
Public log: n/a

Capper marketplaces

0 (vibes)

Calibrated: —
Public log: cherry-picked

Posterior

Calibrated: ✓ daily, isotonic / Beta / BBQ
Public log: ✓ aggregates + resolved examples + full CSV

Service	Signals	Calibrated	Public log
NumberFire	~10	—	—
Stokastic	~12	—	partial
Pickwise	~5 (opaque)	—	—
OddsShark	~6	—	—
Action Network PRO	~8	—	—
PropOdds	EV tool	n/a	n/a
Capper marketplaces	0 (vibes)	—	cherry-picked
Posterior	42	✓ daily, isotonic / Beta / BBQ	✓ aggregates + resolved examples + full CSV

Aggregate accuracy — hit rate by grade, Brier score, calibration plot, per-grade expected vs. actual — is public always and updates live as entries resolve. Tonight's top STRONG entries are free at /today and the complete historical model ledger (every archived entry, win or loss) is downloadable at /data. The full live nightly slate at publish time — every STRONG and LEAN with conservative risk reference and real-time sharp consensus, plus WATCH entries surfaced as tracking-only context — is the subscriber product. The proof of model quality is the calibration math, not the inventory: anyone can claim a hit rate, almost nobody publishes a calibration curve showing the 60% model entries actually hit 60% of the time.

04 / Definitions

The terms, plainly.

The vocabulary behind the stack, defined the way the desk uses it — not the way a glossary would. Read these before you read anyone's hit-rate claim.

What is a Bayesian MLB prediction model?

A Bayesian MLB prediction model treats every quantity it cares about — a hitter's true contact rate, a pitcher's strikeout tendency, a park's run environment — as a probability distribution rather than a single number. It starts from a prior belief, then updates that belief as fresh outcomes arrive, producing a posterior distribution that carries its own uncertainty. Posterior fits a hierarchical version of this nightly: individual players are pooled toward league and cohort means, so a small sample shrinks toward the crowd and a large sample speaks for itself. The advantage over a point-estimate model is honesty about what is not yet known — a thirty-plate-appearance rookie is not handed a veteran's confidence. The full architecture, including the partial-pooling logic and the nightly refit, is laid out in detail across the six pillars on this page at posterior.pro/methodology.

What is a calibration curve in sports betting?

A calibration curve plots predicted probability against observed frequency. You bucket every forecast — say, all the entries the model priced near 60 percent — then check what share of them actually happened. A perfectly calibrated model lands on the forty-five-degree diagonal: things it calls 60 percent occur 60 percent of the time. Bowing above the line means under-confidence; bowing below means the model is overselling. This matters more than raw accuracy, because a confident model that is systematically wrong will bankrupt a staking plan that trusts its numbers. Posterior publishes its calibration curve at posterior.pro/mlb/accuracy and refreshes it daily against every resolved STRONG and LEAN entry, wins and losses alike. Anyone can post a hit rate; almost nobody shows the curve proving their stated probabilities mean what they say. See the entry page at posterior.pro/calibration.

What does it mean to de-vig a betting line?

A sportsbook's posted odds bake in a margin — the vig, or juice — so the implied probabilities of both sides add up to more than 100 percent. A standard -110 / -110 market implies roughly 52.4 percent on each side, summing to about 104.8 percent; that extra 4.8 points is the house's built-in edge. To de-vig is to strip that margin out and renormalize the two implied probabilities back to a true 100 percent, recovering the book's honest estimate of each outcome. Posterior de-vigs across five sharp books and blends them into a no-vig consensus, then grades each model entry by the GAP between its own posterior price and that consensus — never against the raw posted line. A 22 percent model price on a +400 line can clear the bar; a 50 percent price on a -150 line may not. The grading logic sits in pillar IV at posterior.pro/methodology.

What is fractional Kelly staking?

The Kelly criterion is a formula for sizing a wager to maximize the long-run growth rate of a bankroll, given an estimated edge and the offered odds. Full Kelly is mathematically optimal only when your probability estimate is exactly right — and it never is. Because real edges are uncertain and mis-estimation is punished severely, full Kelly is dangerously aggressive in practice, prone to gut-wrenching swings and ruin if the model is even slightly overconfident. Fractional Kelly stakes a fixed share of the full recommendation — a half or a quarter — trading a little theoretical growth for a large reduction in variance. Posterior attaches a conservative fractional-Kelly risk reference to subscriber entries as context, not as a betting instruction, and the grade itself describes model conviction rather than a stake. Nothing on the public surface is an actionable wager. The risk framing is documented at posterior.pro/methodology.

What is closing line value (CLV)?

Closing line value measures whether the price you took beat the line at the moment the market closed. The closing line is the sharpest number a market produces — it has absorbed every late injury, lineup change, and dollar of sharp money — so it is the best available proxy for the true probability. If you consistently take a better number than the close, you are on the right side of where the market settles, and positive CLV is the strongest leading indicator of a genuine edge, visible long before a hit rate stabilizes over a large sample. Posterior logs the closing-movement comparison for every resolved STRONG and LEAN entry and publishes it on the dashboard the morning after each slate, alongside hit rate, units, and Brier score. A model can run cold for a week and still show positive CLV — which is exactly why the desk grades against the no-vig consensus and tracks the close. The full settled ledger is downloadable at posterior.pro/data.

Sources & methods

Hierarchical Bayesian inference — PyMC.
Pitch-level and player tracking data — Baseball Savant.
BBQ calibration — Naeini, Cooper & Hauskrecht, “Obtaining Well Calibrated Probabilities Using Bayesian Binning” (AAAI 2015).
Wind, temperature, and park weather — Open-Meteo.

The model knows what it knows.

Forty-two signals, compounded. A conservative production calibrator audited daily. Graded against the sharp consensus. Resolved publicly, archived forever.

Subscribe — $29/mo

Cancel anytime · Stripe-secured · no app to install

Forty-two signals.Stacked.