[WikiEN-l] "How the Professor Who Fooled Wikipedia Got Caught by Reddit", _The Atlantic_

Gwern Branwen gwern0 at gmail.com
Mon May 21 22:02:21 UTC 2012


On Mon, May 21, 2012 at 5:32 PM, Anthony <wikimail at inbox.org> wrote:
> How could we do that?  You could have just cherrypicked the worst
> links that were last links which are not official or
> template-generated in External Link sections.  I'm not saying I think
> you did that.  But you certainly could have.

Cherrypicking even under this strategy would force me to do both >2x
as much work and engage in conscious deception. If I were consciously
trying to deceive, I would have adopted an entirely unverifiable
strategy like 'roll a dice' or 'pick a random integer 0-length of
links' and then would have both cherry-picked without problem and much
less overall effort (as I had to throw out something like a third to
half the pages with external links because they did not meet one of
the criteria).

> Anyway, the main thing I'd like to say about all of this is simply
> that your selection is not random.  Your sample is biased.  Biased in
> which direction, I don't know.  Biased intentionally, I doubt.  But
> your sample is biased.

Sheesh. Every sample is biased in many ways - but random samples are
biased in unpredictable ways, which is why randomizing was such a big
innovation when Fisher and his contemporaries introduced it. What's
next, PRNGs are unacceptable for any kind of study because you can
predict each output if you know the seed and run the PRNG
appropriately?

-- 
gwern



More information about the WikiEN-l mailing list