[WikiEN-l] Handling unreferenced but likely-valid material

Anthony wikilegal at inbox.org
Thu Dec 7 00:55:59 UTC 2006


On 12/6/06, dmehkeri at swi.com <dmehkeri at swi.com> wrote:
> Bah. I'll stick my neck out and say 50% of our articles have no sources or
> external links. Give or take 40%. 19 times out of 20. SCIENCE!!
>
> No really, I think people doing their own small samples of Special:Random is a
> great idea. Truly random sampling is more "scientific" than you might realize. I
> am not a statistician, so someone please correct me if I'm wrong, but if all
> you're trying to do is rule out the <20% and/or the >80% claim, then 25 clicks
> on Special:Random ought to be more than enough, 19 times out of 20.
>
After doing that little sampling I tried to figure out how to
calculate the actual margin of error.  I assume I was taking a
[[simple random sample]] (assuming "random article" works).  I got
hung up on the fact that the formula given in [[margin of error]]
didn't include the population size - but just read today that this
doesn't matter so long as it is "sufficiently large".

So for a sample size of 10, we get a 40.7% maximum margin of error at
99% confidence!  Even at a 90% confidence level the maximum margin of
error is 26%.  So I guess 10 is way too small.  Setting a 5% maximum
margin of error and a 90% confidence level, 269 random articles is
about the level I'd feel comfortable with.  Maybe my math is bad
though, or just my intuition about what margin of error is reasonable.

Anthony



More information about the WikiEN-l mailing list