On 10/10/2007, Anthony wikimail@inbox.org wrote:
And since you're not including redirects, there's also a (potentially large) bias against articles which were heavily edited in the past and then later turned into redirects.
Yeah, this is an interesting one.
Firstly, we're ignoring a population of pages which have edit history but no current content - much the same issue as with deleted pages.
Secondly, though, we have a problem that doesn't occur from deleted pages - merges. When an article is redirected, it's often because it's been merged into another page; the text is copied over and a note left in the history. This is fair enough for our purposes, but for automated analysis like this it causes a glitch; the multiple edits over a long period which created that text aren't considered, and we end up perceiving the content as created at another page, at a much later date, in a single edit, by (probably) an unrelated user.
So how many merges are there? Difficult question. Juggling some numbers I'd guess about 0.5-1% of our total pages have at some point been merged...