Hip to B Squared (and A Squared, Too)

No sooner had I dragged out the Yankees’ and Red Sox’s run differentials and expected (Pythagorean) winning percentages than the Yanks took a stumble back towards reality, being swept by the Mets in their three-game series at Shea Stadium. What little I saw of the series — about half of the second and third games — was a numbing succession of lead changes and lousy pitching that made for less than compelling baseball, unless of course you’re a Mets fan, in which case your team now has a pulse. The Yankee bullpen put up an 8.67 ERA in 9.1 innings, but they were only following the uninspring performance of the starters — Mike Mussina, Jose Contreras, and Javier Vazquez, who put up an ugly 8.40 ERA in 15 innings and couldn’t even make it into the sixth once. The sweep was dismaying, but also a much-needed reminder coming off the high of the Boston series that these Yanks aren’t nearly so invincible as some pundits would have you believe.

Speaking of Pythagroean winning percentages, Alan Schwarz had a neat piece in the New York Times on the stat and its use in front offices. Yes, Virginia, here is an example of one of the most basic sabermetric tenets — that we can predict a team’s record by their run differential — taking hold among old-school baseball men. Writes Schwarz:

For a more accurate forecast of which teams are contenders and which are pretenders, however, a different set of standings is catching the attention of fans and front offices. Known as the Pythagorean standings — more on the name later — they rank teams not by the more traditional measure of victories and losses, but by their building blocks: runs scored and runs allowed, which cumulatively prove to be a better indicator of future team performance than just about anything else.

“I use it every day,” said Brad Kullman, the Cincinnati Reds’ director of baseball operations. “It allows you to be a little more objective about your evaluation of where you’re at. You can ask yourself: ‘Do we have what it takes to stay in the race? Or are we going to have a hard time keeping this up?’ ”

…”Our differential says that things are better than they seem,” Gerry Hunsicker, Houston’s general manager, said, “but that’s not too practical in real life when your owner is asking why you’re struggling.”

Yet more and more clubs are using the Pythagorean standings to evaluate how their clubs might perform for the stretch drive.

“The Pythagorean records do help you formulate strategy for June and July,” said Josh Byrnes, assistant general manager for the Red Sox. “Especially when there’s a lot of clutter, like this year, they can show the hidden quality of a team. Some of it is looking at other clubs who might be overevaluating themselves, and exploiting that in a trade.”

The Reds, the Astros, and the Red Sox — an interesting cross-section of teams that’s not overly reliant on the Moneyball whiz-kid axis of Oakland, Toronto, and L.A, though of course Boston has Theo Epstein and Bill James himself in their ranks. Somehow in its electronic version the article omits the actual formula, which for those of you who are novices is: Expected Winning Percentage = RS^2/(RS^2 + RA^2) where RS is Runs Scored and RA is Runs Allowed. Schwarz explains why James dragged an ancient Greek philosopher and mathematical genius into the equation: “The squares reminded James of the Pythagorean theorem — a2 + b2 = c2 — so he borrowed the moniker, giving mathematicians the willies but readers an enduring mnemonic.”

Further refinements of the Pythagroean formula use a slighly lower exponent (1.83) that produces more accuracy, but that’s a bitch unless you’ve got a scientific calculator or a spreadsheet; this one is generally good enough. Schwarz cites a study showing the formula’s power:

Sure enough, at the midpoint of any season, the Pythagorean records are much more accurate than real-life ones in picking which contenders (teams .500 or above) will keep it up or fade. Since the two leagues split into divisions in 1969, 68 teams have begun July with records more than three victories higher than their run differential warranted; these overachievers saw their second-half winning percentage plummet from .575 to .516. Meanwhile, looking at the underachievers on the other end of the spectrum, their winning percentage stayed about constant, .548 to .540. Both groups regressed, because all teams move toward .500 over time, but the overachievers ultimately succumbed to their foreboding run performance.

Twenty-seven years after Bill James’ debut, it’s not that surprising that the Pythagorean concept has taken hold; sabermetric formulas don’t get much simpler or more useful than this one.

• • •

While we’re on the subject of our good Greek friend Pythagoras, he pops up in a study to be presented at the SABR Convention later this month by a man named Chris Jaffe (no relation, so far as I know). Building on another relatively simple and ancient Jamesian concept — pitcher run support — Chris has come up with a stat that measures how well a given starting pitcher was supported, park-adjusted and relative to his league, called Run Support Index (RSI). Think of it as the ERA+ of offensive support; a 110 RSI means a pitcher received 10 percent better support than the park-adjusted league average, while a 90 means he got 10 percent worse than expected.

“Run Support” is computed simply by adding the number of runs scored by a team in a pitcher’s starts, without worrying when the runs were put up, the thinking being that the total number of runs scored or allowed still has a pretty solid impact on a pitcher’s won-loss record whether or not he’s still in the game. ESPN and a few other sites now compute support on a per nine-inning basis using only those scored while the pitcher’s in the game, but I’ve always found this old Jamesian method (introduced in an early Baseball Abstract) to be more useful). For one thing, it’s much more easily obtainable, because you can simply cull it from a list of games such as the ones from Retrosheet which go back practically to the ancient Romans rather than needing a box score to determine the details of how many innings a pitcher threw and when the runs were scored.

Chris then uses the RSI concept to adjust the Won-Loss records of starting pitchers via — that’s right — the Pythagorean formula by finding the margin between what the pitcher would have done with league-average support and what he did with his actual support, ascribing what in many cases is a larger gap — between actual W/L record and expected W/L — to luck or underachievement. He’s got a blog devoted to his calculations, including a list of nearly 200 pitchers — Hall of Famers, 200-game winners or losers, and the Top 100 pitchers from the NBJHA — whose entire careers he has measured RSI and adjusted W/L. The top five:

Al Spalding 133.60, 216-102

Allie Reynolds 121.96, 165-124

Don Newcombe 120.44, 133-106

Vic Raschi 115.56, 119-79

Juan Marichal 115.34, 220-165

Yankees pop up all over the upper reaches of the list; in addition to Reynolds and Raschi, Carl Mays and Andy Pettitte crack the top ten, Lefty Gomez, Whitey Ford and Catfish Hunter are in the next ten, and Herb Pennock, Dwight Gooden, Red Ruffing, and David Wells are in the next ten. Granted, these pitchers weren’t exclusively Yankees, but they certainly didn’t lack for run support during their time in pinstripes.

From the Dodger perspective, besides Newcombe, who pitched in the bandbox of Ebbetts Field behind a mighty offense, many of the team’s notable hurlers are right around average for their careers: Jerry Reuss, Sandy Koufax, Don Sutton, Bob Welch, Rick Sutcliffe, Tommy John, Orel Hershiser, and Fernando Valenzuela are all in the 102-to-105 range of RSI, and Don Drysdale is at 100.02, as close to average as any pitcher. Some 300-game winners actually received lousy run support; Tom Seaver, Walter Johnson, Gaylord Perry and Nolan Ryan are all around four or five percent below average, as is the eminently Hallworthy Bert Blyleven.

Anyway, be sure to check out Chris’ presentation if you’re going to SABR (it’s on Thursdaty at 3 PM), and drop by his blog for more of this good stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>