Modeling Pitch Count States as a Markov Process

The rates of strikes and balls thrown in each count (the state transition probabilities) are computed for the league average and for individual pitchers allowing us to see how pitchers compare to the league.

Profile Photo

|

Jan 20, 2024

WORK IN PROGRESS

The 1906 World Series is perhaps one of the best examples of the adage "good pitching beats good hitting". This historic Windy City showdown pitted arguably the most dominant team in MLB history, the 116-36 Chicago Cubs against their city rivals, the Chicago White Sox. The contrast was stark: the White Sox hit a mere 7 home runs through the entire season. Yes, the White Sox hit 7 home runs as a team in 1906, albeit during the deadball era. Even still, this was a third of the home runs hit by the Cubs hit that year. Meanwhile, to put in perspective how good this Cubs team was, they still hold the record for the highest season winning percentage (0.763) all time. It would take 95 years before the 2001 Seattle Mariners would match their win total, though the Mariners required 10 more games to do it. And poetically, just like the 2001 Mariners, the 1906 Cubs would not win the World Series in their record-setting year.

So how did the unimposing White Sox upset the Cubs? Pitching. Well probably some luck too, but that's not as fun to talk about. The Sox held the Cubs to an average of 2.83 runs/game over the course of the series, down from the Cubs season average of 4.57 runs/game, eventually beating them 8-3 in game 6 to seal the championship. The Sox accomplished this through a young, spitballer fueled rotation headlined by hall of famer Ed Walsh. You may know Ed Walsh as the all-time record holder in career ERA at 1.82 (insane!). Less well known, but equally impressive, he also holds the record for career FIP at 2.02. Relative to today's standards, Ed Walsh's 5.5 SO/9 in 1906 may not seem very impressive, but compared to the league average 3.6 SO/9, Ed Walsh was a punchout master. And he did this while averaging almost a full walk and hit less per 9 than his contemporaries. Given these stats, I find it interesting how we can start to create a picture of the type of pitcher Ed Walsh was: A dominant strikeout specialist with exceptional control. He was a statcast legend 100 years before statcast existed.

In the modern era, we create these pitcher typologies all the time, and eventually they come to define the way we think about players. "Kyle Hendricks is a contact first pitcher with great control" or "Blake Snell is great at striking batters out but has a propensity to allow too many walks". While part of these assessments is gleaned from the eye test, by and large these conclusions are reinforced using stats like BB/9, or K/9, or WHIP, or FIP, etc. I love that we can start crafting characters and storylines using just a few stats like this. It's one of the beautiful parts of baseball. But I wanted to go even deeper to ask the question, can we get more specific to paint an even more detailed picture? I wanted to see if we could look beyond outcome-based stats like ERA, FIP, K/9 etc. to understand differences between pitcher typologies before the outcome. How do pitchers compare during and within a Plate Appearance? But how? The answer: Markov Chains.

Markov Chains and Pitch Counts

Ok so we want to model more specific tendencies in pitching approach during Plate Appearances. What does that have to do with the 1906 World Series, Ed Walsh, and the title of this post? I will admit the connection is a bit weak, so bear with me. 1906 is not only the year that Ed Walsh and the White Sox took down the Cubs, but it also happens to be the year that Andrei Markov published his first paper on Markov Processes, a new way to model sequences of stochastically occurring events. While Andrei Markov almost certainly did not know or care about the 1906 World Series, he provided a tool in Markov Processes that we can use to create more specific pitcher profiles. His innovation is captured in an interesting property of Markov Processes, namely that the probability of the next event depends only on the state you're currently in. In other words, it doesn't matter how you got to where you are, what happens next is only a function of where you are now. While it would be easy to get philosophical about this being a metaphor for life, this is a baseball blog, so instead we're going to apply it to baseball statistics. A pitch count just so happens to follow this property, with counts (i.e., 0-0, 1-0, etc.) representing states and the pitches representing events that move us between states. If we can agree that it generally doesn't matter if you got to a 2-2 count by first fouling off the first two pitches and then taking two balls or if you took the two balls first and then swung through the next two, then we can begin to model this process as a Markov Chain. I will admit this is an assumption that can be challenged; Pitch sequencing does matter, and pitchers will change their approach based on how one got into a specific count. But if we can accept that this is an approximation, it opens up some very cool modeling opportunities.

The State Transition Matrix

We can intuit from Ed Walsh's baseball reference page that he probably got into many pitcher-friendly counts given his low walk rate, low amount of balls in play, and high strikeout rate, but we can get a bit more specific if we employ Markov Chains. In fact, using the principles that Andrei Markov developed, we can map how pitchers move through counts by treating different counts as states, and the strikes and balls that are thrown as transitions between those states. For instance, if I want to know the rate at which a pitcher moves from an 1-1 count to a 1-2 count, I can do this by finding all the 1-1 counts that pitcher found themselves in and then the rate at which they moved it to a 1-2 count. If I repeat this for all possible pitch count states, I can create a matrix of all the possible state transitions for a pitcher. This matrix is referred to as a state transition matrix and is key for any Markov Process. It's a symmetrical matrix with the rows and columns representing all the possible states of the system. In our case, each possible count state will represent both a row and column (i.e., 0-0, 1-0, 0-1, 1-1...). The cells within the matrix tell us the probability of moving between states. For example, the intersection of the 0-0 row with the 1-0 column represents the probability that a pitcher throws a ball on the first pitch. Notice, that some transitions are impossible. I can't go from a 1-0 count back to an 0-0 count, thus this probability is 0. We also have terminating states. When we reach these, we leave the system. In the context of a pitch count, these termination states would be a walk, strikeout, or ball-in-play. Once you enter one of these states, you can't go anywhere else.

In order to create this matrix, we need to do some data engineering. Essentially, we need to look at play-by-play data to see how a pitcher moved from state to state for a game, season, or career, depending on the scale we want to analyze. I did this using play-by-play data source from retrosheet.org. They record pitch sequence data using sequences of letters such as BBCSX which in this case would represent ball, ball, called strike, swinging strike, ball in play. I parsed these retrosheet sequences in python and then computed the rates at which a pitcher moved to each state relative to the total number of pitches thrown from the previous state. Unfortunately, we don't have this play-by-play data going back to 1906 so we'll never know what Ed Walsh's championship state transition matrix looked like. So, let's go more contemporary. In 2023, across all pitchers league-wide, the state transition matrix looks like this:

INSERT TABLE

This matrix tells us that in 2023 across the league a ball was thrown 38% of the time on the first pitch, a strike 51% of the time, and 11% of the time it was put into play by the batter. There's some other interesting takeaways to be found here. As one might expect, a 3-0 count is when a strike is thrown at the highest rate with only 4% of pitches in this count being put in play; Undoubtedly because batters almost always take in this count. A full count has remarkably even probabilities of any outcome happening: walk, strikeout, foul ball, or a ball put into play. This is the more detailed representation of pitcher profiles we were looking for. With this approach, we can start to create maps of how the league, or specific pitchers, move through pitch counts.

And, now that we have the league-wide state transition probabilities, we can see how individual pitchers' maps compare to the average outcomes. We use the same process of parsing play-by-play data, this time filtering for specific pitchers. Here's what Spencer Strider's 2023 state transition matrix looks like:

INSERT TABLE

When we compare Strider's state transition matrix to the league-wide matrix, we can see Strider's colloquial typology reflected in the data. He strikes a lot of dudes out. I mean he moves batters to an 0-2 count from an 0-1 count at a 13% higher rate than league average. That is elite. And his strikeout rates from any two strike count are 5-10% higher than league average. While he does walk batters at a slightly higher rate than average, it's not indicative of a wild pitcher with good stuff profile. Strider is a precise killer. If we had Ed Walsh pitch by pitch data from 1906, I imagine his profile would look similar to Strider's.

How Can We Visualize this Information?

Now, while we could stop here, tables aren't necessarily the best representation of this data. We can do better. Let's create a visualization that graphically depicts this "map" as a set of states and pathways between states. For this we'll need a bit of graph theory. All you need to know for the purpose of this article is that states will be represented as "nodes" and the pathways as "edges". It will be a "directed" graph because we can only move one direction through a count. In other words, we can only add balls and strikes, or stay in the same count in the event of a foul ball in a two strike count. We can not remove balls or strikes. The nice thing is since we already did the difficult work in parsing the play by play data and creating our state transition matrices, we can simply use these for the visualization. Some other nice features that that I'd like to have in this visualization are as follows:

  1. The size of nodes should be relative to the proportion of pitches thrown in that count vs the total
  2. We should color-code edges based on the degree to which a pitcher created a "good" or "bad" outcome relative to league average. Good will be red, and bad will be blue.
  3. Feature for toggling between absolute rates and the delta between a pitchers rates and league average

With these features in mind, I got to writing code. I implemented this in python using the plotly library storing the state transition matrices as pandas dataframes. I also added some interactive features via a pitcher select dropdown and toggle button via Dash and deployed it on heroku as a web application. You can view the full interactive app yourself here. For the purpose of this post, I will just be taking screenshots of the app.

Results

First, lets show the visualization for that league average state transition matrix I showed earlier in the post. I'll keep everything gray since these are averaged outcomes across all pitchers.

INSERT IMAGE

And now, let's bring up Spencer Strider's, adding in that color coding for good and bad outcomes relative for league average. For the purpose of this application, Red lines indicate that the pitcher induces good outcomes at a higher rate and bad outcomes at a lower rate compared to league average. Blue lines indicate that the pitcher induces good outcomes at a lower rate and bad outcomes at a higher rate compared to league average. A good outcome is moving the count in the pitcher's favor, or inducing an out. A bad outcome is moving the count in the batter's favor, or inducing a walk.

INSERT IMAGE

What's immediately evident to me is the amount of red on Strider's chart. He's simply elite at getting strikes. Also, notice how we can now represent node size as a function of pitches thrown in that count, and look at how many 0-2 counts Strider gets in. Let's compare his chart to someone else, this time enabling the toggle to look at difference from League Average instead of the direct rates.

INSERT IMAGE

Here we have Lance Lynn. We can immediately notice a difference in typology between Lance Lynn and Spencer Strider. While Strider excels at getting to two strike counts, Lynn is about average at getting to two strikes and subsequently striking batters out. I think there is an especially interesting feature about Lynn's profile. While he is decent at getting to two strikes, he doesn't really put batters away. Instead he throws balls in two strike counts at a much higher rate than league average, ending up in a lot of full counts as is evident by the large node size for 3-2. And once he gets to 3-2 he's about a league average pitcher. This type of takeaway is something you wouldn't necessarily conclude by simply looking at a Lynn's aggregate statistics since those are almost always outcome oriented.

I love this way of modeling pitch counts and the graphical visualization because you can immediately identify both positive and negative trends and pitcher-specific pathways through counts that can improve our understanding of pitchers. Two pitchers may have the same peripheral stats, but reach those stats through completely different approaches, strengths, and weaknesses. Anyway, I had a lot of fun making this app and I encourage you to check it out yourself.

Thank you for reading!

Data sourced from https://www.retrosheet.org/