Andrew Gelman links to this nice paper by Nosek, Spies and Motel, about an exciting “result” in psychological research: instead of rushing to publish, they scrupulously rushed to replicate, and the result disappeared. The fairy tale ending is that they got a nice publication from using this experience to tell us what we already know – that “significant” results obtained from small, ad hoc experimental samples are pretty much worthless.
One reason this interests me is that, for my sins, I now read all the applications from would-be PhD students in my department. The big stumbling block for most applicants is the research proposal – in the UK, you are meant to apply for the PhD with a specific thesis topic well in mind. And, within the proposal, the biggest stumbling blocks are the data collection and data analysis methodologies: most applicants show only a vague idea of where and how they will get their data, and what they will do with it when they get it. All of which is not surprising: such proposals are hard to put together on your own, if you’ve never done such a project before.
Sometimes, the proposal includes a survey (sometimes, it’s specifically, a “Web survey”), meant to gather a few hundred observations to test some collection of vaguely formulated hypotheses. Now, I like both data and the statistical analysis of same – really like them: a data set, some statistical software, that’s my computer game. But with significance bias in publication (only results that find statistically significant relationships become publicly known – those that find no relationship are ignored), any field where it is easy to generate new small data sets, it is absurdly easy to produce statistically significant results that mean zilch, even in statistical terms. This is doubly so if the hypotheses in the field, or the empirical implications of those hypotheses, are vaguely formulated: if if the hypothesis and its implications are vague enough to allow a bunch of different statistical implications, one of those implications is likely to be “significant”, even if the relationships in the data are actually random. And if you’re unlucky enough not to find such a relationship, somebody else doing a similar survey with a different small sample, will.
One of the nice points made by Nosek et al is that while Web surveys make it easy to collect data and so can drive spurious results, for a scrupulous researcher they also ease replication for the checking of those results. Do you have the nerve to put your beautiful result at risk in that way?
If you’ve read this far, you may also want to see a previous post (again, thanks to Gelman’s excellent blog for the material linked there): on “A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null” .