And please stop asserting 80% of our articles are unsourced, when my informal check suggests that the number is more like 20%.
Ah how great random page is for sampling articles... Here's my 10 article "study":
[...]
40% no sources or external links, 30% one external link, 10% one broken external link, 10% one ISBN link, 10% two external links. Of the working external links, 3 were to primary sources, and 2 were to secondary sources.
Of course, 10 articles is hardly a scientific sample size.
Anthony
Bah. I'll stick my neck out and say 50% of our articles have no sources or external links. Give or take 40%. 19 times out of 20. SCIENCE!!
No really, I think people doing their own small samples of Special:Random is a great idea. Truly random sampling is more "scientific" than you might realize. I am not a statistician, so someone please correct me if I'm wrong, but if all you're trying to do is rule out the <20% and/or the >80% claim, then 25 clicks on Special:Random ought to be more than enough, 19 times out of 20.
Dan
dmehkeri@swi.com wrote:
No really, I think people doing their own small samples of Special:Random is a great idea. Truly random sampling is more "scientific" than you might realize. I am not a statistician, so someone please correct me if I'm wrong, but if all you're trying to do is rule out the <20% and/or the >80% claim, then 25 clicks on Special:Random ought to be more than enough, 19 times out of 20.
In the spirit of this, I did my own Special:Random sample of 30. I found:
Unsourced: 13 1 Working External link/reference: 6 Printed only: 2 Multiple References: 8
Of the 13 unsourced articles, 3 were lists (and I didn't check the articles they were linking to for sources) 9 stubs, and only one was a full-blown article lacking sourcing. I may be in a minority on this one, but I find unsourced stubs much less problematic than unsourced ''articles'', so my personal findings gave me a lot more hope than I thought.
-Jeff
On 12/6/06, dmehkeri@swi.com dmehkeri@swi.com wrote:
Bah. I'll stick my neck out and say 50% of our articles have no sources or external links. Give or take 40%. 19 times out of 20. SCIENCE!!
No really, I think people doing their own small samples of Special:Random is a great idea. Truly random sampling is more "scientific" than you might realize. I am not a statistician, so someone please correct me if I'm wrong, but if all you're trying to do is rule out the <20% and/or the >80% claim, then 25 clicks on Special:Random ought to be more than enough, 19 times out of 20.
After doing that little sampling I tried to figure out how to calculate the actual margin of error. I assume I was taking a [[simple random sample]] (assuming "random article" works). I got hung up on the fact that the formula given in [[margin of error]] didn't include the population size - but just read today that this doesn't matter so long as it is "sufficiently large".
So for a sample size of 10, we get a 40.7% maximum margin of error at 99% confidence! Even at a 90% confidence level the maximum margin of error is 26%. So I guess 10 is way too small. Setting a 5% maximum margin of error and a 90% confidence level, 269 random articles is about the level I'd feel comfortable with. Maybe my math is bad though, or just my intuition about what margin of error is reasonable.
Anthony