The Mad MVP Scientist

This past week, I toiled in my laboratory attempting to build an MVP predictor (ESPN Insider part 1 / part 2) based upon past results, one that might lend some insight into who would win this year.

As the recent scrum between supporters of the candidacies of Joe Mauer and Mark Teixeira reminds us, nearly every Most Valuable Player award is capable of producing controversy. Not only do the Baseball Writers Association of America voters rarely elect the player who’s worth the most wins to his team via some objective formula, they appear to shift the standards from year to year, instead constructing narratives to fit whatever loosely-gathered facts are at hand. Particularly in recent years, defensive value is often minimized or entirely ignored in favor of heavy hitters with big Triple Crown stats, almost invariably from successful teams.

The question is whether the voters’ behavior can be predicted. Towards that end, I was tasked with building an MVP predictor in the spirit of a system such as Bill James’ Hall of Fame Monitor, one that awards points for various levels of achievement in an attempt to identify who will win, as opposed to who should win. My initial bursts of enthusiasm for the assignment were soon followed by endless hours of cowering in the fetal position before a massive spreadsheet, but in the end I emerged with a system — Jaffe’s Ugly MVP Predictor (JUMP) — which correctly identified 14 of the 28 winners during the Wild Card era (1995 onward), and put 27 of those winners among the league’s top three in its point totals.

I limited the scope of the system to that post-strike timeframe for three main reasons: none of the 28 winners were pitchers, only one (Alex Rodriguez in 2003) played for a team that finished below .500, and 22 of them played on teams that qualified for the expanded postseason — extremely strong tendencies that could help separate seemingly equal candidates. Instead of focusing on round-numbered benchmarks like James did (a .300 batting average, 100 RBI), I chose to dispense with actual stat totals and rates and focus on league rankings among batting title qualifiers (3.1 plate appearances per game) in 12 key offensive categories…

So anyway, I built a point system which rewarded top 10 placement in 12 categories (a few of which — OBP and hits, among others — turned out to be insignificant in predicting voter behavior), added a very strong team success component which could be worth more than two or three category leads, and then gerrymandered the hell out of the thing to increase the number of successful hits and top threes, the latter a concession to the fact that at some point subjective elements take over for a number of voters. My maneuvers included adding positional bonuses for middle infielders and a penalty for being primarily a DH, a penalty for playing for the Rockies, fractional weighting for a couple of categories — moves which through endless, tedious trial and error increased the system’s accuracy bit by bit.

Here’s how the actual award winners fared in JUMP, along with the players it flagged as the likely winners in years where they differed from the voting:

Year   AL Winner          Rank    System Winner
1995 Mo Vaughn 3 Albert Belle
1996 Juan Gonzalez 2 Albert Belle
1997 Ken Griffey 1
1998 Juan Gonzalez 1
1999 Ivan Rodriguez 10 Manny Ramirez
2000 Jason Giambi 1
2001 Ichiro Suzuki 2 Bret Boone
2002 Miguel Tejada 2 Alfonso Soriano
2003 Alex Rodriguez 1
2004 Vladimir Guerrero 1
2005 Alex Rodriguez 1
2006 Justin Morneau 3 Derek Jeter
2007 Alex Rodriguez 1
2008 Dustin Pedroia 1

Year NL Winner Rank System Winner

1995 Barry Larkin 3 Dante Bichette
1996 Ken Caminiti 1
1997 Larry Walker 2 Jeff Bagwell
1998 Sammy Sosa 1
1999 Chipper Jones 1
2000 Jeff Kent 3 Barry Bonds
2001 Barry Bonds 3 Sammy Sosa
2002 Barry Bonds 1
2003 Barry Bonds 1
2004 Barry Bonds 3 Albert Pujols
2005 Albert Pujols 1
2006 Ryan Howard 2 Albert Pujols
2007 Jimmy Rollins 3 Matt Holliday
2008 Albert Pujols 2 Ryan Howard

Ivan Rodriguez’s 1999 victory — which still chafes my ass a decade on, because Derek Jeter had a monster year (349/.438/.552 with 24 homers, 134 runs and 102 RBI, all career highs) – is the system’s big outlier, not to mention the only catcher who won during this era. That bodes poorly for Mauer, who as it is doesn’t rank in the top 10 in any counting stat category and plays for a team unlikely to make the playoffs; he ranked just 28th when I ran the numbers on Sunday, and with his team’s win to get right back to .500, that only pushes him to 15th. Mind you, this isn’t a prediction that Mauer would finish 15th in the voting, or that he deserves to; as Mae West famously said, “Goodness has nothing to do with it.” Basically what JUMP is saying is that history tells us that unless Mauer scores in the league’s top three, he’s got no chance of actually winning the award. Meanwhile, “Golden Boy” Teixeira leads the AL rankings thanks to running first in RBI, second in homers, and sixth in slugging while playing for a playoff bound team.

In all, it was a fun and satisfying project. I’ve got a few ideas that might increase its accuracy a hair, and I’ll revisit the topic if they turn out to be worthwhile.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Spam Protection by WP-SpamFree