Our Estimated Umpire Zone
As of @UmpScorecards v2 (2021 MLB season), we use an umpire’s established zone – or what we call the Estimated Umpire Zone (EUZ) – as a part of our consistency calculations.
The latest formulation of the EUZ follows a procedure laid out by Dr. David Hunter, professor of Mathematics at Westmont College. For an in depth explanation of the math used, read his 2018 paper on umpire consistency here, or his and a colleague’s 2021 paper on segregation surfaces (a surprisingly analogous subject) here. For a more brief (but still math reliant) overview, continue reading below.
Our algorithm for estimating an EUZ leverages a smoothing technique known as kernel density estimation (KDE), which you can read more about here. In general, KDE lets us make inferences about a population based on a finite data sample. For our purposes, we begin by generating a 3 dimensional KDE of all taken pitches in a game; we can call this function pitch(x, z). pitch(x, z) allows us to compute the probability that a taken pitch of any kind – either a called strike or a called ball – exists at any point in (x, z) space, where x is the horizontal axis and z is the vertical axis. We then generate another 3 dimensional KDE, this time only considering pitches in a game that are called strikes; we can call this function strike(x, z). strike(x, z) allows us to compute the probability that a called strike exists at any point in (x, z) space. Finally, using Bayes’ Theorem, along with our KDE’s, we can compute the likelihood that a pitch at any given location in (x, z) space would be called a strike; we can call this function prob_strike(x, z).
Once we determine this probability space, finding the EUZ for a particular game is simply a matter of finding the contiguous 50% contour line of prob_strike(x, z). Here, the 50% contour line represents the boundary across which an umpire should change his mind between a strike and ball call – pitches inside the contour line have a greater than 50% chance of being called a strike, and pitches outside the contour line have a less than 50% chance of being called a strike.
The team at @UmpScorecards believes this method is a significant improvement over the last iteration of established zones, which involved finding the smallest polygonal area that contains all of the umpire’s called strikes, otherwise known as the convex hull. This method was extremely prone to outliers, and also only counted balls inside the established zone – there was no inclusion of strikes outside the established zone (such a pitch was impossible given the definition of the established zone).