Julien Headley was kind enough to respond to the three questions I posed about the statistical system he’s put forth at his weblog and which I just wrote about. He sent me a couple of very nice emails (“you did a better write-up of my system than i did!”) from which I’ve extracted his responses, pairing them up with my original questions. Again, I’ve taken the liberty of capitalizing any statistical abbreviations.
Where is the evidence that those numbers WAL, CON, and POW have meaning in small sample sizes, that, as you say “walks, strikeouts, and home runs quickly normalize to a level representative of players’ abilities”?
the evidence right now is at the level of “strong suspicion”. what happened was i started paying attention to these numbers over the past few years in an attempt to predict performance for my fantasy team. what i noticed was that after about 100 AB for WAL & CON, 300 AB for POW, the numbers don’t change.i need to build a database. once that happens i’ll have rigorous answers to all kinds of things. the problem right now is i don’t have a computer! i’ve been blogging mostly at a university computer lab or at friends’ houses.
Those “major-league averages” for referred for WAL, CON, and POW — do they refer to 2003, the last few years, or a longer-range time period?
major league averages came from 2002 data. i should note that on the site.
As far as the predictive value of this suite, can we see some comparisons based on prior seasons to see where these formulae worked and where they did not?
yes. that and more. all kinds of things will follow from the building of the database. for example, it’s clear that there are players who consistently outperform their predicted average. all who do are slap-and-run speedsters. i’ve got ideas for a speed factor that i think will correct the predictions for speedsters. also i plan to tighten POW up a little bit, using recent years to dampen the variance. and i want to do a historical study to see at what age each each of the various skills peaks. right now it seems that CON peaks early, around 23, POW peaks around 28 (although some players (bonds) can still increase at late ages. typically they are power/speed types (like bonds)), and WAL increases throughout the career. with that data it will be possible to determine a player’s future career path with quite a bit of precision.
So now we have considerably more insight into Julien’s system, if not more evidence that he’s correct. Though Julien’s got confidence in the prospective uses of his system, he also understands that he’s got a ways to go, and that his database (not to mention a computer of his own) is the key. “i had been thinking that the building of the database and the analysis of data would have to wait til the offseason. now i’m going to make it my #1 priority,” he writes. Elsewhere, he adds, “i really want to make this system as good as it can be, so peer review is vital.”
Good answers. Cheers to Julien for opening my eyes to a new way of looking at baseball stats, and for being forthright with the shortcomings of his system. Again, keep an eye on this kid, because he’s onto something.