The editing sample is based on a random number generator selection of articles (namespace=0, page_is_redirect is false) in the September 8th dump of the page table. So it excludes articles created in the last weeks of September, but is otherwise a random sample of everything in article space.
-Robert
On 10/10/07, Kwan Ting Chan ktc@ktchan.info wrote:
On Wed, 2007-10-10 at 08:43 -0700, Steven Walling wrote:
You're trying to make an accurate judgment based on 100k of articles
from a
2 million article field? Don't insult our intelligence.
Exactly what is new with taking a sample of the whole to come up with trends in statistics ?
Any concern should be whether 5% of 2 millions is a high enough sample (I would say so), and whether the samples is representative of the whole sample space (all the articles). The second I can't comment on as I do not know what those 100K articles are among other reasons.
KTC
-- Experience is a good school but the fees are high.
- Heinrich Heine
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l