Posts Tagged ‘sampling’


Sports, perception and sample-size bias

New York Rangers goalie Henrik Lundqvist is the Tim Tebow of hockey. Actually, he's even better, having won nine of his last ten. Photo by Robert Kowal.

Apologies in advance, but if you don’t follow sports this post may not make much sense.

This afternoon, the Denver Broncos picked up their sixth win in seven games this season with Tim Tebow starting at quarterback. If you haven’t heard, Tebow has what might be called non-traditional passing mechanics, but as many commentators have noted, he “just wins.” There’s a lot that could be said, and already has been said, about the strange way in which quarterbacks are credited with team success in football, but that’s not really the point of this post. Rather, I want to point out how odd it is that seven games — even seven games that include six wins — can be considered so meaningful in football.

This stretch has turned Denver’s season around, to be sure. They lost four of their first five, but now find themselves tied for the lead in their division. But this is only possible because of the NFL’s relatively tiny schedule. Consider that, for a hockey goaltender — probably the only every-game player in North American major team sports with as much impact as a quarterback — six wins in seven games is barely noticeable; it’s less than a tenth of the season. For a baseball player (where there isn’t such a great analogue, since starting pitchers only go every fifth game), six wins in seven games is a good week. You could win your league’s player of the week award in May and be sent down to the minors in June. For Tebow, six wins in seven games is two months and half the season, and it’s especially significant when one of your divisional rivals (San Diego) is imploding at the same time.

But this is all perception; if we’re trying to think about what this seven-game sample means in terms of predicting the larger population of games that is a player’s career, seven tells us nothing. It doesn’t matter that an NFL season is only 16 games long; seven games don’t provide enough observations to reduce the error to an acceptable level. If we were to look at a proportionally similar number of baseball games — 70 — we’d keep the proportion the same, but reduce the sampling error by examining nominally more cases.

So where does this perception error come from? Is it just the kind of rank innumeracy we see in many contexts? Maybe, but I suspect there’s also an important media effect as well. Sports media — both reporters and game broadcasters — and the sports culture they’re embedded in frequently express hostility toward data-driven strategy. Narratives and tradition rule in sports, and when data contradict them it’s because data can’t possibly figure out the relevant “intangibles.” Noting that a seven-game span isn’t really an illuminating sample gets in the way of a lot of narrative structure.

Filed: Science Is Real || 21:49, December 4 || View Comments


Gordian knots in research methods

Whenever I have a project in mind that involves Facebook, there’s a methodological stumbling block that almost always comes up: Most of what’s interesting isn’t accessible unless you are friends with the people you’re trying to study. So maybe you rework the research questions, or you come up with a way to address them using survey data, etc.

But now I see that I was overlooking the obvious solution: Just create fake profiles to friend people with, as a group of four researchers at the University of British Columbia did. For them, it was entirely necessary, as they were studying the vulnerability of online social networks to malicious bots, so they basically created their own benign bots and observed what they accomplished. The very first phase resulted in about a 20% friend-acceptance rate, so if you’ve got a good sampling method, this is looking decent enough as a way of getting real, live Facebook content.

Filed: Science Is Real || 23:10, November 6 || View Comments