Data
Pitch-by-pitch Statcast data comes from pybaseball; challenge events are tagged from the MLB Stats API play-by-play feed, which catches every challenge (the Statcast description field alone misses most). That yields 2,629 challenges, within a fraction of a percent of the league’s published count — a strong validation of the tagging. The window here is 643 games, March 25 to May 13, 2026.
Zone geometry
The ABS zone is a rectangle: 17 inches wide, top at 53.5% and bottom at 27% of the batter’s measured height, evaluated at the middle of the plate. A pitch is a strike if any part of the balltouches the zone, so we treat the ball as a sphere and compute a signed distance from the ball’s nearest edge to the zone’s nearest edge — negative inside, positive outside, in inches. Hawk-Eye is treated as ground truth: its measurement is the ruling, so there is no measurement-error term to model.
Run expectancy
We build an RE288 table (12 counts × 3 out states × 8 base states) from historical play-by-play, and from it the run value of flipping a call — the run-expectancy swing between the correct and incorrect ruling — for every pitch. This is the upside of a successful challenge.
Probability curves
Rather than bucket pitches into a yes/no “shadow zone,” we fit smooth continuous functions: the probability an umpire calls a strike given distance, and the probability a challenge is overturned given distance and the original call. Each catcher’s perception sharpness (sigma) is fit by maximum likelihood from their challenge record.
Option value (dynamic programming)
The cost of losing a challenge is the loss of a future option. We solve it by backward induction over the remaining pitches and challenges in a game, producing a dynamic cost that rises early (many innings left) and falls late — replacing the flat run-cost approximation used elsewhere.
Win-probability engine
For the highest-leverage questions, run value is not enough — a run matters more in a tie game in the ninth than in a blowout. We compile a win-probability table (validated against observed win rates to a mean absolute error of ~0.008) and express the break-even as a confidence in win-probabilityterms, conditioned on count, outs, runners, inning, score, challenges remaining, batter and pitcher quality, and the catcher’s perception.
Caveats
This is roughly seven weeks of one season — enough to be directional, not final. Per-catcher and per-umpire numbers firm up as the sample grows. Public ABS metrics are used only to sanity-check our independent numbers, never as inputs.