[WikiEN-l] The Statistical Decline of the English Wikipedia Community
Robert Rohde
rarohde at gmail.com
Wed Oct 10 17:00:45 UTC 2007
I would expect the survivorship bias goes in the wrong direction though.
Presumably pages that will be deleted, but haven't yet been, are more likely
to be young. Hence the not-yet-deleted pages would seem to want to make
recent edit counts higher. I can't think of any reason why survivorship
effects would lead to a bump 6+ months ago.
And such biases would have no impact on the analysis of account creation,
protections, or blocking. All of which also show drops.
-Robert
On 10/10/07, Anthony <wikimail at inbox.org> wrote:
>
> On 10/10/07, Robert Rohde <rarohde at gmail.com> wrote:
> > The editing sample is based on a random number generator selection of
> > articles (namespace=0, page_is_redirect is false) in the September 8th
> dump
> > of the page table. So it excludes articles created in the last weeks of
> > September, but is otherwise a random sample of everything in article
> space.
> >
> As I alluded to before, though, there's a sort of [[survivorship
> bias]] in the fact that any articles deleted before September 8 are
> excluded from the sample.
>
> And since you're not including redirects, there's also a (potentially
> large) bias against articles which were heavily edited in the past and
> then later turned into redirects.
>
> Off the top of my head I can't think of any other significant problems.
>
> _______________________________________________
> WikiEN-l mailing list
> WikiEN-l at lists.wikimedia.org
> To unsubscribe from this mailing list, visit:
> http://lists.wikimedia.org/mailman/listinfo/wikien-l
>
More information about the WikiEN-l
mailing list