The Likelihood of Perfection

Back in 2003, a beat writer for the Giants named Dan Brown (no, not that guy) interviewed me for a piece on the unspectacular playing careers of the Giants’ coaches, the ranks of which included both Lenn Sakata and Fred Stanley. Naturally, he happened upon something I’d written, and the resulting article made for this site’s first mainstream media mention, both on the web and in print. In the article (now archived), Brown termed this site “the scrappiest place on earth,” a tag which I still proudly trumpet on my front page.

Last week, in the aftermath of Roy Halladay’s perfect game and Armando Galarraga’s near-perfecto (more on which here), Brown came out of the woodwork to query me for an article on the suddenly popular phenomenon of perfection: “Is there anything you or your Baseball Prospectus cohorts can add to the discussion?”

I took a stab, and wound up quoted in the article:

Baseball Prospectus writer Jay Jaffe, responding to an e-mail from the Mercury News, said it’s important to remember that there are roughly twice as many games in the 30-team, 162-game era as there were in the 16-team, 154-game era.

Perfect games happen about 0.005 percent of the time.

“Looking at the results of a single year, there’s really no sensible interpretation for the distribution of no-hitters and perfect games other than randomness,” Jaffe wrote. “The standard deviation for the percentage of no-hitters is 0.056 percent, which means that about two-thirds of the time we should expect to see between 0.3 and 5.7 no-hitters per year.”

The De-Juiced Theory: Former All-Star pitcher Bert Blyleven, whose 287 career victories included a no-hitter in 1977, subscribes to the Coincidence Theory.

But if he had to put his finger on another factor, Blyleven said the crackdown on performance-enhancing substances seems to be having an effect. “Maybe the ball isn’t jumping off the bat like it did during the steroid era,” he said.

Entering play Saturday, the American League was batting .260, the NL .256. Both marks were each league’s lowest since 1992.

“The game for the past few years, maybe longer, was all about the three-run home run,” said Blyleven, now a Twins broadcaster. “Now, we’re getting back to a time when it’s all about small ball and getting that runner in from third, just like it was in the 1960s, ’70s and most of the ’80s.”

Still, as Blyleven acknowledged, the math doesn’t add up. There were more perfect games during the muscled-up 1990s (four) than there were during any other decade. Jaffe, the Baseball Prospectus writer, argued that the offensive potency of an era doesn’t matter as much as people think. “If we check the correlation between scoring levels by decade and perfect game frequency,” he wrote, “we find that the relationship is essentially random.”

That was the take-home from a bit of spreadsheet jockeying I’d done. What follows is more or less what I wrote to Brown (corrected to fix a few imperfections), with some more fun figures to chew on.

• • •

The basic thing to remember about no-hitters and perfect games is that there are roughly twice as many games in the 30-team, 162-game era as there were in the 16-team, 154-game era. Going back to 1901, when the AL began play in parallel with the NL, about 0.063% of games are no-hitters, and 0.005% of games are perfect games. That means under current conditions we should expect to see about three no-hitters per year, and a perfect game every four years. If you draw the line between 1960 and 1961, the pre-expansion and post-expansion eras, you’ll find that the rates of no-hitters are pretty similar, but the rates of perfect games are not (note that we have to exclude Don Larsen’s 1956 perfect game because it happened in postseason play):

Era           NH+PG    NHonly      PG
1901-1960    0.067%    0.065%    0.002%
1961-2010    0.060%    0.053%    0.007%
1901-2010    0.063%    0.058%    0.005%

While the rate of no-hitters has gone down slightly in the post-expansion era, the rate of perfect games has skyrocketed, becoming 3.4 times more likely as it was in the pre-expansion era.

Looking at the results of a single year, there’s really no sensible interpretation for the distribution of no-hitters and perfect games other than randomness. The standard deviation for the percentage of no-hitters is 0.056%, which means that about two-thirds of the time we should expect to see between 0.3 and 5.7 no-hitters per year. Because we’re obviously bound by the number zero at the low end, we start to see that we need a few years worth of data to begin assessing the frequency for no-hitters fairly. As for perfect games, we need an even larger sample.

If we divide up modern baseball history by decades starting in 1901 (so, 1901-1910… 2001-2010), we find that the frequency of no-hitters has a very strong inverse correlation with scoring levels (r = -0.8), which is to say that the lower the scoring rate, the more likely there is to be a no-hitter, and the higher, the less likely. The current decade (of which we’re nearing the end) rates as the second least-likely one in which to throw a no-hitter:

Decade       PG%       NH%     R/G
1911-20    0.000%    0.116%    4.04
1961-70    0.009%    0.105%    4.03
1901-10    0.008%    0.087%    3.92
1951-60    0.000%    0.081%    4.37
1971-80    0.000%    0.070%    4.21
1991-00    0.009%    0.055%    4.97
1941-50    0.000%    0.048%    4.35
1981-90    0.007%    0.044%    4.46
1931-40    0.000%    0.041%    5.20
2001-10    0.009%    0.037%    4.80
1921-30    0.004%    0.032%    4.96

If we divide the data into even larger chunks, we get an even better correlation between scoring rates and no-hitter frequency. Splitting the 110 years into five 22-year chunks, we get a correlation of r = -.95, confirming that the larger our sample size, the more predictive scoring rates are of no-hitters.

On the other hand, if we check the correlation between scoring levels by decade and perfect game frequency, we find that the relationship is essentially random (r = .01), and even if we up it to the 22-year samples, we only get a correlation of r= -.23, which is fairly faint. Remember, this is covering somewhere between 25,000 and 50,000 games per decade, and yet there’s really no pattern we can spot at that level, nothing that particularly clues us into the fact that an era — to say nothing of a sliver of a season, as in 2010 to date — should be more or less likely to yield a perfect game.

Futility Infielder Blog

Brooklyn-based Sports Illustrated author Jay Jaffe's blog

One Comment

Leave a Reply Cancel reply