[Wikipedia-l] Re: Death to the comma count!
Daniel Mayer
maveric149 at yahoo.com
Mon Mar 10 23:06:41 UTC 2003
On Sunday 09 March 2003 12:01 pm, Brion Vibber wrote:
> Aha, again demonstrating the obsession over the count. Why was it
> important to hit or not hit 100,000? Because of an offhand remark made a
> couple years ago about "we hope to reach 100,000 articles"?
>
> When did this become our holy mission?
Round numbers, especially large ones, are milestones that get people's
attention. That is why x.0 is so important in the software world, why cities
celebrate the day they reach 1,000,000 inhabitants, why there was so much
mania when our calendars hit the year 2000, why the first billion-dollar
business and billionares are mentioned in history books, and why we got a lot
of media attention after en.wiki hit the 100,000 count.
The article count is also a measure (however crude) of our progress. So there
is nothing wrong with trying to improve that measure and make it more
conservative where it makes sense (Jimbo has already stated he wanted a more
conservative count. However right after he said that we had already hit the
100,000 mark and were being slashdotted).
> Did the messianic age begin when the counter flipped into six digits?
> Have we all been betrayed by a sinister being who wants to make us look
> bad by leading us astray and "inflating our count"?
>
> What the *heck* does it matter?
Boy are you in a really bad mood today. See above.
> Bad to whom? Embarrassing to whom? Is it solely the use of the word
> "article" that throws us off? Are we obsessed with proving that our
> "articles" are so fricking wonderful that every single one of them must
> be the greatest pinnacle of writing prowess or we must lock it in the
> basement of shame and never admit its existence?
No - a simple automatic measure is all that is needed. We mention the
definition of the count on en.wikis [[Wikipedia:What is an article]] page.
> Go open up a paper encyclopedia sometime. Look at it. A fair chunk of
> the articles are *one paragraph long*. Do their editors worry themselves
> over the metric they use to stamp "over 60,000 articles!" on the cover?
> Or do they just count the number of entries at some point and say "at
> least this many"?
Exactly - and how many bytes would a smallish complete paragraph be in such an
encyclopedia? Around 500 bytes. Then we could say that we *at least* have x
number of articles. Right now the count includes many entries that do not
consist of even one complete paragraph. A per language set
{{HEADLINEARTICLECOUNT}} would be flexible enough for both large and small
wikis. {{NUMBEROFARTICLES}} would be used for comparison purposes.
> Mav, thanks for proving my point again about count-mania. Are you
> seriously suggesting that the pseudo-random number spit out on the front
> page actually *defines* what articles are in a meaningful way?
Again, more unnecessary anger. Please calm down - we are not talking about
anything of such cosmic importance to warrent such feelings. :-)
The answer to your question is above (the part talking about tracking our
progress and how the outside world sees our progress). So, yes it is
important to have a conservative estimate of the number of articles we have.
That's not to say that everything a computer would recognize as an article is
actually what a human would consider to be one. But since the computer will
also miss entries that /could/ be considered articles, then everything
averages out in the end (some really obscure subjects can, in fact, be
covered in a sub-500 byte entry).
In short, I'm not asking for an AI article count - I just would like to see a
more conservative crude method used on en.wiki that excludes more entries
that are probably not articles (however we shouldn't go live with such a
count until after have enough entries to still be above 100,000 - otherwise
we could get some negative media attention and a drop in morale).
IMO the best way to do that is to have a per wiki set
{{HEADLINEARTICLECOUNT}} in addition to {{NUMBEROFARTICLES}}. It would be up
to each language to define their own byte threshold for their own headline
count (or they could choose to ignore {{HEADLINEARTICLECOUNT}} and use the
much less conservative {{NUMBEROFARTICLES}}. Of course, each wiki that uses
{{HEADLINEARTICLECOUNT}} would then have to publicly document their threshold
for their own headline count.
-- Daniel Mayer (aka mav)
WikiKarma
The usual at [[March 8]] (I'm fresh out of WikiKarma so I need to work on
creating some more balance in the Universe before I respond to your
response).
More information about the Wikipedia-l
mailing list