[WikiEN-l] Longitudinal analysis

Brown, Darin Darin.Brown at enmu.edu
Wed Nov 9 16:27:05 UTC 2005



I think my original point still went unaddressed...I think we're still going
about using mistaken ideas about judging the current state of the articles
collectively. In particular, I strongly do *not* believe that hitting the
"random page" button 20 times (or even 20,000 times) is a very valid way of
assessing the success of the project, for specific reasons. I don't mean to
sound like I'm attacking those who've done this...after all, who hasn't
played around with the random page button?? I know *I* certainly have. But,
we should remain critical.

There are many objections. First, as Charles pointed out, in any system
which is growing even roughly *exponentially*, you would expect the relative
*percentage* of a particular type of article to remain roughly constant over
time. In other words, it should come as neither a great shock nor a great
threat that the *percentage* of stubs, bad articles, etc. has remained
constant over time. This is what is expected, given the growth, and given
that we're nowhere near an "equilibrium".

Here is a more equitable procedure: hitting the "random page" button
essentially assigns the same probability to each article. But, is this
really useful? What would be *more* accurate would be to assign a
probability to each article proportional to the actual number of hits it
gets from flesh-and-blood readers. By assigning equal probability to every
article, this puts some obscure fan fluff stub article on an equal par with
[[United Stated of America]], even though the latter may command 100s or
1000s of times the actual page views/day of the former.

But an even stronger objection I have to the "hit the random page button 20
times" approach is that it collects an essentially *static* view of the
process. To put it mathematically, we are only looking at the 0th
derivative, instead of the *FIRST* derivative, let alone the 2nd or
higher-order derivatives. Of course -- in order to consider anything beyond
the "0th derivative" you have to have more than a static view. You have to
know what's going on in some time-region. You have to have a *longitudinal*
view.

Here's what I would suggest would be more enlightening: go back 24 months,
when the English wiki had roughly 200,000 articles. *Then*, hit the random
page button 20 times. Then follow and see how those 20 articles have
improved or degraded in the following 24 months. If many of the worst of
these 20 articles show little or negligible improvement in the past 24
months, then I think this is something that needs to be thought about. But
as it is, we're ignoring the possibility that many of the good (or even
brilliant) articles of today might have been classified as stub or atrocious
24 months ago. And then we're pointing to the stubs and atrocious articles
of *today* as evidence that the project is failing?? Does the irony of this
make sense?

There are many other issues which are completely ignored by taking a static
view. For instance, someone noted that general articles on basic issues are
often in a much worse shape than highly-specialised articles on obscure
topics. This has sometimes been an issue in math. It's something that needs
to be thought about.

darin



More information about the WikiEN-l mailing list