Game Score Frequencies: Predicting the Next Scorigami

Is it just me, or does 5 to 3 feel like the quintessential baseball score? Every time I open the MLB app, it feels like there's another game ending 5-3. Ok, just me? Got it. But it got me wondering - if 5-3 isn't the most common final score, what is? And on the flip side, which scores are incredibly rare, or better yet, have never even happened at all? Can we predict the next never-before-seen score — the next scorigami? Well, I answered these questions so that you don't have to. What follows is a deep dive down a rabbit hole exploring baseball scores from the familiar to the weird.

What you're looking at below is a visualization of every baseball game ever played. The grid condenses 155 years of baseball and over 200,000 games into 49 by 33 cells with each cell representing a unique winning and losing score combination. Color intensity represents the number of occurrences of that score with unfilled cells representing score combinations that have never happened. The bar chart is another view of this data, highlighting the most common scores that appear at least 500 times.

I encourage you to explore this chart on your own; there's a lot here. But let's use it to answer the first question I posed at the beginning of this post. What is the most common MLB final score? Well, it turns out my intuition wasn't quite right. While my guess of 5-3 is indeed a common score, occurring around 7,200 times, the most frequent final score is actually 3-2 with over 12,600 occurrences (about 5.5% of all games). Rounding out the top five are 4-3 (5.3%), 2-1 (4.5%), 5-4 (4.2%), and 4-2 (3.6%).

Ok, but what about the least frequent scores? Before diving in, let's quickly discuss score rarity in the context of one of the most amusing concepts in sports: Scorigami.

For the unfamiliar, a scorigami, is a final score that's never happened in the history of a sport. The concept was introduced by Jon Bois back in 2014 to describe and identify unique NFL scores. The NFL is especially suited for scorigamis due to its diverse scoring events (FG for 3 points, TD for 6, extra points, etc), which produce unusual combinatorial patterns. For instance, no 7-4 final score has ever happened in the NFL - surprising at first glance, but less so when you realize scoring exactly 4 points typically requires two safeties, a rare scoring event.

Baseball scoring differs significantly from football. There is only one scoring event - a "run" - earned each time a player crosses home plate. While multiple runs can be scored "at once" through events like a two-RBI double or a three-run HR, scoring still occurs incrementally, one run at a time. Because of this linear progression and the huge sample size of professional baseball games played since 1871, it turns out that almost every feasible score outcome has already occurred, making new baseball scorigamis incredibly rare.

A new scorigami would require a game that is a huge outlier relative to the rest of baseball history. Returning to our visualization, potential new scorigamis (unfilled cells) only exist far to the right of our distribution of existing scores. These games require a massive run total from one or both teams. How about outliers that have occurred? One immediately stands out: An illuminated cell in the very top right of this chart. No, that's not a mistake. On June 28, 1871, the Philadelphia Athletics beat the Troy Haymakers 49-33 in a four-hour game with a hilariously absurd box score. The Athletics' Fergy Malone led the team with 7 RBIs on 5 hits in his 9, yes 9, at-bats. Troy pitcher John McMullin gave up 31 earned runs over his 9 IP (talk about leaving a pitcher out to dry). Remarkably, despite this abysmal performance, McMullin only gave up two home runs - glass half full right? Though his teammates didn't help by allowing an additional 11 unearned runs.

How is a game like this possible? The answer almost certainly has to do with the era in which this game was played. 1871 marked the first (albeit disputed) season of professional baseball, played under the National Association, predecessor to the National League (1876). The National Association was known for its loose standards, corruption, and gambling, so I think it's justified to question the legitimacy of this score. That said, we aren't here to scrutinize historical scorekeeping, we're here for quirky baseball stats! If it's good enough for retrosheet, it's good enough for us.

Many other extreme outliers also occurred in the 1800s. In fact, since 1970 only eight new scorigamis have occurred, most recently on Sept 9, 2020, when the Atlanta Braves beat the Miami Marlins 29-9. This game also snapped the second longest scorigami drought ever at 20 years. Since 1912, we've seen a new scorigami occur on average every 6.7 years.

Scorigami Trend

Occurrences of new scorigamis plotted over time

Okay, so we've explored existing scores and identified notable outliers, but how can we predict the next scorigami? Returning to our first visualization, you may notice some empty cells near scores that have occurred (such as 26-12, 25-0, and 27-4). These are promising candidates for the next scorigami, but is there a more quantitative approach we can apply to this problem?

Scoring Trends

We'll start by looking at the distribution of the total runs scored per game (winning_team_runs + losing_team_runs) for all 200k+ professional games throughout history.

Total Runs Distribution

Histogram of Total Runs Scored in a Baseball Game

The first observation from this chart is that baseball games average roughly 9 runs/game, with 7 total runs occurring most often. This makes sense to me. After all, my initial prediction for the most common score (5-3) had a run total of 8. Another observation is that the distribution follows a "sawtooth" shape, with odd-numbered run totals occurring more frequently than even numbers. This is simply because baseball games rarely ever end in ties, which removes some even-run combinations. For example, 7 runs can occur through four combinations (7-0, 6-1, 5-2, 4-3), whereas 6 runs can only occur in three after we remove a tie (6-0, 5-1, 4-2). Despite the sawtooth shape, the observed totals generally match a Poisson distribution, characterized by a right-skewed shape with a long tail after the mean.

Interestingly, our most frequent run total - seven - doesn't include the most common score combination (3-2). It does, however, include the second most common score (4-3) and the ninth most common (5-2). Only one other run total, nine, has two different combinations ranked in the top 10.

You may be wondering how the run scoring environment changed over time. Below is the average runs/game by year since 1871.

Runs per Game Trend

Year by year average number of runs scored per game over the course of a season

Early baseball saw wildly inflated and volatile scoring, before stabilizing around 1920. Since then average runs per game has settled into a range between 6.8 and 11.

Anecdotally, something I've always heard is that Babe Ruth's career is especially remarkable because he played during the dead ball era - a pitcher friendly era of spitballs and large strike zones. But while Ruth did debut during this era (1914), he didn't play a full season until 1919, and his historic peak (1919-1934) occurred during a relatively hitter-friendly period. This isn't to take away any of his achievements, he's still the GOAT, but it adds a wrinkle to a very common narrative.

Ok, anecdotes aside, when you take out the earliest seasons of professional baseball and only consider years from 1900 onward, the average runs per game is 8.81. The 2024 season landed remarkably close to that mark at 8.79 R/G. Short-term fluctuations in recent years include an increase from 8.13 R/G in 2014 to 9.66 in 2019, followed by a decline to 8.57 R/G in 2022. These juiced/dead ball micro-trends still fall comfortably within baseball's long-term 1920-2024 range so I think it's generally safe to say that while baseball experiences temporary trends of hitter and pitcher friendliness, these seem to be oscillations within a macro range that we've been in since about 1920.

Let's take a step back. Based on all the information we've looked at so far, we can glean two important pieces of intuition to guide our scorigami prediction:

Remaining scorigamis candidates are above the average total runs. Because the run totals roughly follow a Poisson-like distribution, higher totals become increasingly rare. Therefore, when comparing remaining scorigami candidates, we expect lower total run counts to be more likely.
Close scores occur more frequently than blowouts. For any given run total (e.g., 15), close games (like 8-7 and 9-6) happen far more often than blowouts (like 15-0 and 14-1). For instance 8-7 has occurred 2,684 times, compared to just 70 occurrences of 15-0.

In practice, this leaves us with a trade-off: remaining low-scoring (relatively speaking) scorigamis are almost all blowouts (25-0, 26-1, etc.), while the remaining close-game scorigamis have extremely high combined run totals (like 21-19). This leaves us needing to choose between a low total that's unlikely because it's a blowout, or a close game that's unlikely because it will require a large number of total runs.

How to Predict the Next Scorigami

Okay, let's get down to business. Let's predict the next scorigami. My methodology is simple - I used three models: two classical probability models - Poisson and Negative Binomial - and a Random Forest, a machine learning method that will hopefully capture some of the higher-dimensional patterns in our score data. Each model relies on the observed distribution of winning and losing scores to estimate the probability of "unseen" score combinations. I'll compare the highest probability predictions for each model to my intuition, ultimately making our final prediction for the next Scorigami.

I'll note my own predictions for the next scorigami before running any models. Intuition is no doubt fallible, but our brains are surprisingly good at recognizing high-dimensional patterns consciously or not. And while we are of course biased, it's naive to believe our statistical methods aren't. These models make predictions based on fitting simplifications or abstractions that that inevitably lead to their own biases. So, I'll treat my intuition as just another model to factor into the final prediction. After all, this is a blog post, not a scientific study.

We'll go into the model methodologies in a moment, but let's start with my intuition. Applying my two earlier guidelines, I prioritize scorigamis with lower total runs, but for games with similar totals, I favor close scores over blowouts. At extremely high run totals though, it actually makes more sense to pick scores where the losing team lands near the historical team average of ~4-5 runs rather than chasing super close games. The data supports this: for a high winning score like 18, the losing team score most often falls between 2 and 6. In other words, for high-scoring blowouts, the best candidates are games where the losing team still scores some runs near the typical range.

If we look at our scorigami chart, a few candidates stand out. The lowest total score remaining is 25-0. While this is a tempting selection, since 25 total runs has been achieved many times in baseball history, I find this game profile unlikely in today's game. Modern blowouts often feature position players pitching, which makes a shutout improbable.

If we instead look at minimizing runs scored by the winning team, the lowest winning run total remaining is 21-18. This feels more plausible than 25-0. Still, for a 21-18 score, each team would have to have an extreme outlier performance.

Based on the heuristics I stated before, I'd rather choose a game where only one team needs to be an extreme outlier. My top candidate is 23-11. While this is a high run total for the winning team, it has our losing team closer to the mean team runs. So, while one team would have to accomplish a huge outlier score, the losing team just needs to score 11 runs, something that has happened hundreds of times. I favor this over a score like 25-0 because in a game profile where the winning team scores over 20 runs, position players would likely pitch, boosting the chance the losing team scores an above average win total. 22-12 requires a similar game profile so I'll choose this for my second pick. These picks are also surrounded by game scores that have occurred, making me confident that this type of game is possible. For instance, 21-12 has occurred four times and 23-10 five times. I'll pick 27-6 as my third pick. It's a different game profile than the others, requiring an even larger margin of victory, but it's a fewer number of total runs.

Okay, so here's a start to the table we'll attempt to build up. We'll fill the rest of it in when we run our models.

Modeling Overview

Expand each of the following to see a detailed description of each of the statistical approaches used to predict the next scorigami. If you'd rather skip to the results, continue on to the "Summary" section.

Summary

The tables below summarize the performance of each model against our empirically observed scores. The first table compares the top 10 most common scores with the probabilities predicted by each model. The columns on the left show the empirical frequency and true ranking, while the columns on the right show each model's predicted probability and rank (in parentheses) for that same score among all possible scores. I highlighted model performance as follows: green if the score appears in the model's top 10, yellow if it appears in the top 20, and red if it falls outside the top 20.

The second table shows the performance of each model against the rarest observed scores - namely, those that have occurred once in pro baseball history. I measure performance by counting how many of the 66 single-occurrence scores each model ranked among its 66 lowest-probability predictions. Note, only the observed scores are compared for this analysis - none of the models have any unseen scores in its least-likely rankings. Together, these two tables show how well each model captures both the most common and rarest scores.

The Random Forest clearly outperforms the other models on both common and rare score combinations. Since this is an ML approach, it's fair to question whether the outperformance reflects true predictive power or overfitting. While admittedly the test set is quite small, it was purely comprised of observed scores that the model did not have access to during training. On this test set, the model achieved an RMSE of 0.00172 and an R squared of 0.9555. The results indicate that the model could generalize to unseen scores.

A surprising result is that the Poisson model seems to slightly outperform the NB model, even though NB should better capture baseball's inherent scoring variance. On both the most and least frequent scores, Poisson performs a bit better. However, as we'll see, the NB model's scorigami predictions align much more closely with the Random Forest, as well as my own intuition compared to the Poisson model. If I wanted to do a more detailed analysis of each model, I'd look at their performance against every score, calculating the delta between observed and predicted frequency for each score, finding an MSE for each. For the sake of this post, however we'll leave the performance analysis here.

Scorigami Prediction

Ok, we're finally here. Let's make our final scorigami prediction.

The Poisson Model strongly favors blowouts with shutouts or near-shutouts which is expected behavior from the model but does not match my intuition. Modern blowouts usually yield at least a few runs for the losing team, often against position players or long relievers. The Negative Binomial leans toward high-scoring closer games, while the Random Forest is somewhere in between. Out of these three models, I'm inclined to trust the RF more than the statistical models, given its performance against the empirical data and consistency with my own intuition, though I want to be careful not to overstate its predictive power when trained on a small dataset. That said, there is one score that stands out: 22-12. It was my second choice based on intuition and the top prediction for both the NB and RF models. While a more rigorous analysis could refine this further, predicting the next scorigami will always involve uncertainty. For now, 22-12 is my pick.

Final Scorigami Prediction: 22-12

And, based on the what we know about the rate at which scorigamis have occurred (~every 6.7 years since 1912), with the last one in 2020, I will call my shot and say there will be a baseball score that ends 22-12 by 2028.

Anyway, thanks for reading. I hope you found this article entertaining! Stay tuned for more random baseball content.