I would expect the survivorship bias goes in the wrong direction though. Presumably pages that will be deleted, but haven't yet been, are more likely to be young. Hence the not-yet-deleted pages would seem to want to make recent edit counts higher. I can't think of any reason why survivorship effects would lead to a bump 6+ months ago.
And such biases would have no impact on the analysis of account creation, protections, or blocking. All of which also show drops.
-Robert
On 10/10/07, Anthony wikimail@inbox.org wrote:
On 10/10/07, Robert Rohde rarohde@gmail.com wrote:
The editing sample is based on a random number generator selection of articles (namespace=0, page_is_redirect is false) in the September 8th
dump
of the page table. So it excludes articles created in the last weeks of September, but is otherwise a random sample of everything in article
space.
As I alluded to before, though, there's a sort of [[survivorship bias]] in the fact that any articles deleted before September 8 are excluded from the sample.
And since you're not including redirects, there's also a (potentially large) bias against articles which were heavily edited in the past and then later turned into redirects.
Off the top of my head I can't think of any other significant problems.
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l