On 10/10/07, Robert Rohde <rarohde(a)gmail.com> wrote:
The editing sample is based on a random number
generator selection of
articles (namespace=0, page_is_redirect is false) in the September 8th dump
of the page table. So it excludes articles created in the last weeks of
September, but is otherwise a random sample of everything in article space.
As I alluded to before, though, there's a sort of [[survivorship
bias]] in the fact that any articles deleted before September 8 are
excluded from the sample.
And since you're not including redirects, there's also a (potentially
large) bias against articles which were heavily edited in the past and
then later turned into redirects.
Off the top of my head I can't think of any other significant problems.