It requires a certain type of mind to excite itself over “fragments of fragments,” but the normally sober baseball analyst Rob Neyer exults giddily over them in his column the other day.
The question at issue is how lucky the 2002 Detroit Tigers were. On the one hand, they lost 106 games. On the other, if you apply Pythagorean analysis to their run margin, they “should” have lost 112 games. So they were lucky. But on the third hand, as one of Neyer’s correspondents points out, they scored fewer runs than one would expect from their offensive components, and allowed more than would expect from the offensive components of their opponents, and they really should have lost 98 games. So they were unlucky.
But why stop there?
All hits, for example, are not created equal. If two players hit 120 singles, we consider those accomplishments the same. But what if one of the players hit 80 line drives and 40 ground balls with eyes, and the other hit 120 line drives? Would we expect them to match performances the next season?
No, we wouldn’t. We’d expect the guy with 120 line drives to outperform the guy who got lucky with the grounders.
That is just one tiny example, of hundreds we could come up with. And for the people who care about such things, finding the fragments of the fragments of the fragments is the next great frontier.
Ah, fragments of fragments of fragments. Perennial employment for baseball analysts! More work for Rob Neyer!
Neyer analogizes this process to pricing financial derivatives, which I happen to know something about, having worked as a programmer for several years for a software company that did exactly that. On slow afternoons the analytics boys would quarrel over whether to construct the yield curve using a two- or three-factor Heath-Jarrow-Morton model. Sure, with a two-factor model you might be able to price the bond to four decimal points, but with a three-factor model you can price it to seven! Eventually someone, usually me, would have to rain on their parade by pointing out that bonds are priced in sixteenths (of a dollar), and that the bid/offer spread dwarfs anything beyond the first decimal point.
In baseball granularity is not measured in sixteenths, but in wins. Since it takes about eight to ten additional runs for each additional win, any variance below five runs or so is a big, fat engineering zero. And I can assure Rob Neyer without even firing up a spreadsheet that a team’s line drive/ground ball ratio when hitting singles won’t get you anywhere near five runs. It’s barely conceivable that it could help you draft a fantasy team. Knock yourself out.
Hitting has been well understood since John Thorn and Pete Palmer published The Hidden Game of Baseball twenty years ago. All work since has been on the margins. The new frontiers in baseball analysis lie elsewhere. Pitching is still imperfectly understood, because its results are mixed with fielding, which, until Bill James’s new book on Win Shares, was not understood at all. Voros McCracken (where do you sign up for a name like that?) recently demonstrated that a pitcher’s hits allowed, relative to balls in play, is almost entirely random. That’s serious work. Fragments of fragments is masturbation.
The lesson here, which applies more broadly to the social sciences, is not to seek more precision than is proper to your subject. Fortunately Professors Mises and Hayek have already given this lecture, and I don’t have to.
(Update: Craig Henry comments.)