I would expect the survivorship bias goes in the wrong direction though.
Presumably pages that will be deleted, but haven't yet been, are more likely
to be young. Hence the not-yet-deleted pages would seem to want to make
recent edit counts higher. I can't think of any reason why survivorship
effects would lead to a bump 6+ months ago.
And such biases would have no impact on the analysis of account creation,
protections, or blocking. All of which also show drops.
-Robert
On 10/10/07, Anthony <wikimail(a)inbox.org> wrote:
On 10/10/07, Robert Rohde <rarohde(a)gmail.com> wrote:
The editing sample is based on a random number
generator selection of
articles (namespace=0, page_is_redirect is false) in the September 8th
dump
of the page table. So it excludes articles
created in the last weeks of
September, but is otherwise a random sample of everything in article
space.
As I alluded to before, though,
there's a sort of [[survivorship
bias]] in the fact that any articles deleted before September 8 are
excluded from the sample.
And since you're not including redirects, there's also a (potentially
large) bias against articles which were heavily edited in the past and
then later turned into redirects.
Off the top of my head I can't think of any other significant problems.
_______________________________________________
WikiEN-l mailing list
WikiEN-l(a)lists.wikimedia.org
To unsubscribe from this mailing list, visit:
http://lists.wikimedia.org/mailman/listinfo/wikien-l