Rich’s Weekend Baseball Beat continues its fine interview series this weekend, checking in with David Pinto of the prolific Baseball Musings blog. What’s most interesting about Pinto is that he has a lot more professional experience around baseball than most bloggers. He built software for Stats, Inc., working with Bill James and John Dewan and helping to develop the Zone Rating system. He worked as the head of research for ESPN’s Baseball Tonight TV show, creating graphics and putting numbers at the fingertips of the show’s hosts. He numbers Peter Gammons and Rob Neyer among his friends. Recently his name even came up in connection with an opening for a stat analysit in the New York Mets front office, though the job apparently went to somebody else. Rich Lederer touches bases with Pinto on all of these topics in his interview.
In addition to his short takes on the news of the day, Pinto has been working on a much larger and more complex project related to defensive statistics which he calls “Probabilistic Model of Range”. Here’s how he describes his work to Lederer:
Range is the Holy Grail of baseball stats. We all have a feeling for what range represents, but it’s really difficult to pin down with a number. Plays per game, plays per nine innings, and zone ratings were all attempts at measuring range, and they all have their flaws. UZR was the first probabilistic model that I know of. It looked at the probability of making a play in a particular zone (area) on the field. Mine is similar to that, although I eliminate the idea of a zone.Basically, there is a probability distribution of balls put into play. The normal position of fielders should be where those probabilities are densest; in other words, the shortstop should stand where the most ground balls are hit in his area of responsibility. Ground balls hit in the densest region should be easier to field because that’s where the SS is usually standing. So if you field a ball there it’s no big deal, everyone does that. But as you move left or right from the region of highest density, the balls are more likely to get through for hits. So a SS who consistently fields those balls well should get more credit than someone who doesn’t. So the probabilistic model of range tries to model these probabilities and assign them to fielders based on where balls are hit.
For the uninitiated, UZR stands for Ultimate Zone Rating, a system by Mitchell Lichtman which examines defense using play-by-play data including the location and speed of batted balls. Basically, what both Lichtman’s and Pinto’s systems are asking is, What is the probability of a batted ball becoming an out, given the parameters (direction, how hard, and type) of that batted ball? From Pinto’s blog:
I’ve used the STATS, Inc. database to obtain three parameters for each ball; its direction (a slice of pie fanning out from home plate), its batted type (ground, fly, line, bunt or pop) and how hard the ball was hit (soft, medium or hard). I then did a maximum likelihood estimate of the probability of an out given those three parameters for each of the nine fielders.
In a follow-up post, Pinto explains the difference between the two systems. Ultimately, work such as this will give us a better understanding of just how much influence a pitcher has in influencing the outcome of a ball in play, expanding upon the work of DIPS inventor Voros McCracken.
Pinto is definitely a prominent figure in the world of baseball blogging, one who’s clearly got the skills to be employed inside the game. Catch up with him before some team entices him to put his number-crunching skills to work for them.