Is it just me, or does 5 to 3 feel like the most quintessential baseball score? Every time I open the MLB app, I swear there's another game ending 5-3. Ok, maybe it's just me. Maybe my pattern-recognizing brain is playing tricks on me. But it got me wondering - if 5-3 isn't the most frequent final score, then what is? Which scores are incredibly rare? Which have never even happened before? And perhaps most interestingly, can we predict the next never-before-seen score — a scorigami? Well, I sought to answer these questions the other day and what follows is presenation of that deep, dark rabbit hole.
What you're looking at below is a visualization of every baseball game ever played, grouped by final score of the winning and losing team. The color intensity of each bar and cell corresponds to the number of occurrences of each score combination. The bar chart shows the relative frequency of each score combination with over 500 occurrences.
Let's explore our first question. What is the most common MLB final score? Well it turns out my intuition was wrong. While 5-3 is a commonly occuring score with about 7,200 occurrences all time, the most common baseball score is actually 3 to 2 with over 12,600 occurrences all time, meaning that approximately 5.5% of baseball games end in this score.
Before we answer our next question, let's first discuss score rarity in the context of one of the most fun concepts in sports, Scorigamis.
A scorigami, for those unaquainted with the term, is a final score that has never happened in the history of a sport. The concept was invented by Jon Bois back in 2014 to describe and identify unique NFL scores. Scorigami tracking has since become a thing where sports fans monitor games that have a shot at achieving a new scorigami. The NFL is particularly well suited for scorigami tracking. The diverse scoring events (FG for 3 points, TD for 6, extra points, etc), leads to interesting combinatorial patterns in historical NFL scores and some interesting yet to be had scorigamis. For instance, no 7-4 final score has ever happened in the NFL. This is surprising to someone unfamiliar with the sport, but less surprising when you realize that essentially only way to get 4 points is by getting two safetys, a relatively rare scoring event.
Scoring in baseball differs from football in that there is only one type of scoring event, a "run", achieved when a player crosses home plate. While multiple runs can be scored "at once" through a 2 RBI double or a three run HR for example, scoring one run is just as likely, if not more likely leading to eseentially no obscure score combinations like we have in football. This, combined with the fact that professional baseball has been played since 1871 with over 200,000 games completed in that time, it means that achieving a new scorigami is difficult to say the least.
That said, let's return to our visualization and look at some of the most interesting outlier scores to date. The first thing you may notice is an illuminated cell in the top right of this chart. No, that is not a mistake. On June 28, 1871 the Philadelphia Athletics beat the Troy Haymakers by a score of 49 to 33 in a 4 hour game. The box score for this game is pretty hilarious. Fergy Malone led the Athletics with 7 RBIs on 5 hits in his 9 At Bats. The pitcher for the Haymakers, John McMullin gave up 31 Earned Runs over his 9 IP. But hey, he only gave up two home runs! What I find equally amusing is the fact that the Haymakers gave up 11 unearned runs. We should qualify that at the time these two teams played in the National Association which is disputed as a Major League. It was the first professional league in the US and was supplanted by the National League in 1876. And part of the reason for the National Association's demise was the loose enforcement of standards and rampant corruption and gambling. I think it would probably be justified to question the validity of this score. That said, we aren't here to scrutinze scorekeeping, we're here to find funny baseball stats!
Most of the crazy outliers occured in the 1800s. In fact, since 1970 only 8 new scorigamis have occured with the most recent occuring on Sept 9, 2020 when the Atlanta Braves beat the Miami Marlins 29 to 9. I think it's fitting that the last scorigami occured in the weird year that was 2020, which by the way snapped a 20 year scoragami drought at the time. So, where does that leave us? Which scorigami will be the next to fall?
The Next Scorigami
Before we predict the next scorigami, let's first explore the data a bit more. Shown below is the distribution of the total runs scored per game (winning_team_runs + losing_team_runs) for all 200k+ professional games throughout history.
The first observation is that 7 total runs is the most frequently occuring number of runs scored in a baseball games. This makes sense to me. The second observation is that odd run counts occur much more frequently than even run counts. This makes sense when one considers that there are more ways for a baseball game to finish with an odd score than an even score. This is simply because excluding rare exceptions, baseball games cannot end in a tie, so that immediately eliminates a number combination for even total runs. For instance, 7 runs in a game can be made by the following combinations: 7-0, 6-1, 5-2, and 4-3 while 6 total runs can be made by the following combinations: 6-0, 5-1, and 4-2. 7 total runs has an extra combination. After 7 runs, the data follows a right skewed distribution where subsequent run totals become less and less frequent.
Another interesting observation, when we compare to our first chart. Seven total runs, our most frequently occuring total run count does not include our most frequently occuring score, 3-2. It does however include our second most common score, (4-3) and our ninth most frequent score (5-2). Only 9 total runs has two score combinations also in the top 10.
This gives us our first two pieces of intuition we'll use to predict the next scorigami:
The only remaining scorigamis occur to the right of our mean total runs. Since the distribution can be estimated as gaussian, we expect that the larger the number of total runs, the less likely the scorigami is to occur.
Odd final run totals are more likely to occur compared to even run totals. However, this becomes less important with higher run totals because the tie score combination becomes a smaller proportion of all other combinations.
Another thing I want to look at before making our prediction is how the mean total runs / game has changed over time. Are there any noticible recent trends toward a higher or lower run scoring environment? So I charted the mean total runs throughout history.
As you can see, the run scoring environment was extremely inflated in baseball's infancy. It continued to be very volitile until around 1920. Around that time it seems to have found a more stable range. Since 1920, the average runs per game have oscillated between ~11 R/G to ~6.8 R/G. Anecdotally, something I've always heard is that Babe Ruth's career is even more remarkable when you consider the fact that he played during the dead ball era - a pitcher friendly era where spitballs combined with a large strike zone made hitting next to impossible. Well, I am definitely not here to take away from any of Babe Ruth's accomplishments, but while Babe debutted in the heart of the dead ball era (1914), his first season playing over 100 games wasn't until 1919, and all of his remarkable seasons through 1934 occured during a relatively hitter friendly era.
Ok, anecdotes aside, when you take out those very early seasons of professional baseball and only consider years from 1900 on, you get an average runs / game value of 8.81. It's also interesting that the 2024 season fell remarkabely close to this average at 8.79 R/G. there was a general uptrend from 2014 to 2019 where we went from 8.13 R/G to 9.66 R/G, followed by a decline to 2022 at 8.57 R/G. These "micro trends" have still occured well withing our broader 1920 - 2024 range and I don't want to read too much into very recent trends at the risk of mistaking noise for signal. I think it's generally safe to say that while baseball experiences temporary trends of hitter and pitcher friendliness, these seem to be oscillations within a macro range that we've been in since about 1920.
Scorigami Prediction
I'm going to use four different models to predict the next scorigami, hopefully finding similar predictions between them to make my final conclusion. I'll start with two classical probability models, a Poisson model and a bivariate corrected Negative Binomial Model. I'll also test a smoothing method (Good-Turing) and a Random Forest ML model. For each model, I'll provide the distribution of winning and losing scores of all previously seen baseball games. The models will use that distribution to approximate the probability of "unseen" baseball scores (scorigamis). I'll look at the highest scorigami probability for each model along with their second and third highest candidates. At the end, I'll roll everything up into a table so that we can compare our outputs and make our final prediction for the next Scorigami.
First though, I want to start with my own guess for the next scorigami based on inuition alone. Since the only remaining scorigamis occur to the right of our mean total runs, I'll value scorigamis that have a low run total over a high run total. That said, for two games with the same or similar total runs, I'll weight close games higher than blowouts. This is backed up by the data. For instance, 25 total runs has occurred in 456 games, however there has never been a 25-0 game, while there have been 118 13-12 games. So I should pick a candidate that has the lowest number of total runs while still having the losing team score some runs. To be more precise, my intuition would tell me to pick the lowest winning score left that has a score combination where the losing score is close to its mean. This can be seen in the data. If we choose a given winning score, let's say 18 runs, the losing score distribution in that column seems to be centered around 2-5 runs. So in blowouts, my general intuition is to pick a losing score around the average runs scored per game by a single team (which is about 4.5 runs).
If we look at our scorigami chart at the beginning of this article, there's a few candidates that immediately stand out to me. The lowest total score scorigami remaining is 25-0. While this is a tempting selection, as 25 total runs has been achieved many times in baseball history, I just have a hard time believing in this final score, especially in an era where position players enter the game to pitch for both teams as soon as the score gets remotely out of hand. I have a hard time believeing the losing team in this scenario would put up 0 runs especially when a position player would likely be pitching for most of the game. The lowest winning run total remaining is 21-18. This is an intruiging pick and I don't think super outlandish. A slugfest that has two teams going homer-for-homer in extra-innings until the winning team walks it off in the 13th with a three-run shot. That said, both teams run totals are so high in this scenario - it would require two teams going well outside of their respective run scoring distribution.
Ok, with my first selection I'll say 27-6 makes sense as the next scorigami. While this is an astronomically high run total for the winning team, it has our losing team near the mean runs scored. I would assume position pitchers coming into this type of game, so I'll choose this over our other 27 score candidate, 27-4, though I'll put this second. Ok, for my third pick, I'll pick a lower winning score and go with the final score of 22-12. While 34 total runs is quite high, the winning score is one of the lowest for our remaining scorigamis and the losing score, while high is again feasible in this type of game profile with position players pitching.
Okay, so here's the table we'll attempt to build up:
Model Overviews
Expand each of the following to see a detailed description of each of the statistical approaches used to predict the next scorigami. If you'd rather skip to the fun stuff, continue on to the "Summary" section.
Summary
The tables below summarize the performance of each model against our empirically seen scores. The first table compares the predictions against the observed frequency of the top 10 most frequently occuring baseball scores. The columns on the left represent the empirical data and true ranking of the top ten scores while the three columns on the right show the corresponding predicted probability of the same score occurring according to each model. The number in parentheses next to the probability represents the relative rank of that score across all of the model's score predictions. I did a rough scoring approach, where I used green text if the model correctly predicts that score to occur in the top ten of its own rankings, a yellow if it occurs in the top 20, and a red if it occurs outside of that.
The second table shows the performance of each model against the least frequently occuring, yet still observed scores. Since the least frequently observed scores are all just single occurences, I summarized the performance by looking at how many of the 66 single occurrence scores were in the bottom 66 scores for each model accoring to their probability. These two models therefore summarize each model's performance on both the most and least frequently occuring scores.
It's pretty obvious that the Radom Forest outperforms the other models both on the dense and sparse scores. It's wise to question whether this is true outperformance or the model overfitting on training data. This is a valid question, however it's important to note that the test set for this model was comprised of observed scores that the model did not have access to during training. On this test set, the model achieved an RMSE of 0.00172 and an R2 of 0.9555. Now, while we weren't working with a very big dataset, these figures indicate that the model could generalize quite well to unseen scores, making me think that the model can be decently trusted.
The other question is why does the Poisson model seem to slightly outperform the NB model? This was surprising to me as the NB model is supposed to more accurately capture variance that is inherent to mlb scoring. On the most and least frequent scores the Poisson model does perform slightly better, I will note (as we'll see in a second), the NB model has similar scorigami predictions to the Random Forest model, both of which align closer to my intuition compared to the Poisson model. If I wanted to do a more detailed analysis of each model, I'd look at their performance against every score, calculating the delta between observed and predicted frequency for each score finding an MSE for each.
Scorigami Prediction
Ok, we're finally here. Let's make our final scorigami prediction.
Welp, my models vary quite a bit in their predictions for the next scorigami. The Poisson Model strongly favors blowouts with low runs from the losing teams. This isn't super consistent with my intuition for how a game like this would play out. As I discussed earlier, in a blowout scenario, you're likely to at least get some runs from the losing team, likely going up position players or long relievers. The NB Model seems to favor high scoring close games, and the RF is somewhere in between. I'm inclined to trust the RF more than the statistical models, however I want to be careful not to over-value the predictive power of the ML approach, especially when trained on a relatively small dataset. That said, there is one score that stands out: 22-12. This was my second pick based on intuition and both the NB Model's and Random Forest's models top prediction for the next scorigami. While, there's much deeper and more rigorous analysis we could apply to this problem, ultimately it will likely be somewhat of a shot in the dark regardless, so at this point I feel comfortable making my final prediction.
22-12 is my prediction for the next scorigami.
And, based on the what we know about how scorigami's have been occuring on average every ~6.7 years since 1912, with the last one in 2020, I will call my shot and say there will be a baseball score that ends 22-12 by 2028.Anyway, thanks for reading. I hope you found this article entertaining! Stay tuned for more useless baseball content.