On Sunday 09 March 2003 04:00 am, Brion Vibber wrote:
For months and months we've talked about revamping the article count system, but nothing's changed. The article count is still an extension of the "comma count" used to filter out empty articles in a search back in the UseMod days.
Currently, a page is counted as an "article" for "we have X articles" purposes if it is:
- in the article namespace (so excludes talk pages, user pages, Wikipedia: help and utility pages)
- not a redirect
- contains a comma (!)
This is true.
Now, we are well aware that page-count fever has gripped Wikipedia for some time. The obsession with breaking the 100,000-page barrier on the English stifled any implementation of reforms for fear of reducing the count. Concerns about languages which don't use the ASCII comma character have been shrugged off. Well, today I've seen enough.
What? Why are you blaming the English Wikipedia for this? If anything *AT ALL* people on en.wiki wanted a much more conservative count in order to put-off hitting 100,000 articles. There was general agreement on that point. But the agreement stopped there and different people had different ideas on just what to do and no consensus emerged. I and several others wanted to have the article count to only count pages in the article namespace that were more than 500 bytes. However, if this were implemented now, it would reduce en.wiki's count to below the 100,000 mark and most other wikis would have their counts cut in half or worse though.
We could have a separate count that has the more conservative definition (perhaps with a threshold that could be set per wiki). Call it {{HEADLINEARTICLECOUNT}} maybe. But the {{NUMBEROFARTICLES}} count should be the same for all wikis for comparison purposes (threshold=50 or 100 bytes perhaps - or we could simply keep the comma count for {{NUMBEROFARTICLES}}).
I think this method would be able to satisfy the huge en.wiki that is mildly embarrassed of its inflated count and many of the other wikis that are vying for second place.
While the English wiki has galumphed along for ages, secure in its place as The World's Largest Damn Wiki, the smaller languages are in intense (though friendly) competition with one another for runner-up positions. "In real life," Youssefsan tells me, "people look for economic growth; here for page growth. Both use 'creative accounting.'"
On the francophone Wikipedia, we have been exposed as the slaves to the comma count that we all are but are ashamed to admit. See: http://fr.wikipedia.org/w/wiki.phtml?title=3DCULTe&action=3Dedit&old... = 814
So you want to inflate the count then by removing the comma requirement? I don't think that's such a good idea for en.wiki since it further weakens our already weak article definition.
(Those who have trouble with my PGP-signed mail, go to fr.wikipedia.org, look up article 'CULTe', and hit 'Modifier cette page'.)
Yes that's right, people have started adding commas as hidden comments just to increase the stupid comma count. NO MORE, I say! Ils ne passeront pas!
That's not a good thing - esp for small articles. The whole point of the comma count is to exclude small articles so adding commas in HTML comments is cheating for any language that uses ASCII commas as often as English does. Other languages don't use ASCII commas much if at all so the count is worthless for them.
Unless a better count system is proposed, I will replace the comma check with a greater-than-zero-size check within twelve hours.
And what about the people who get the digest after your 12 hour deadline? How about the other people who only check or respond to Wikipedia posts during the week? Shouldn't they have a say in this?
-- Daniel Mayer (aka mav)
WikiKarma The usual at [[March 7]]
On Mon, 2003-03-10 at 06:02, Daniel Mayer wrote:
Now, we are well aware that page-count fever has gripped Wikipedia for some time. The obsession with breaking the 100,000-page barrier on the English stifled any implementation of reforms for fear of reducing the count. Concerns about languages which don't use the ASCII comma character have been shrugged off. Well, today I've seen enough.
What? Why are you blaming the English Wikipedia for this?
I'm rather curious how you came up with that interpretation of partisan wrangling.
If anything *AT ALL* people on en.wiki wanted a much more conservative count in order to put-off hitting 100,000 articles. There was general agreement on that point.
Aha, again demonstrating the obsession over the count. Why was it important to hit or not hit 100,000? Because of an offhand remark made a couple years ago about "we hope to reach 100,000 articles"?
When did this become our holy mission?
Did the messianic age begin when the counter flipped into six digits? Have we all been betrayed by a sinister being who wants to make us look bad by leading us astray and "inflating our count"?
What the *heck* does it matter?
Bad to whom? Embarrassing to whom? Is it solely the use of the word "article" that throws us off? Are we obsessed with proving that our "articles" are so fricking wonderful that every single one of them must be the greatest pinnacle of writing prowess or we must lock it in the basement of shame and never admit its existence?
Go open up a paper encyclopedia sometime. Look at it. A fair chunk of the articles are *one paragraph long*. Do their editors worry themselves over the metric they use to stamp "over 60,000 articles!" on the cover? Or do they just count the number of entries at some point and say "at least this many"?
We could have a separate count that has the more conservative definition
Why? What good is yet another arbitrary number? Why do we want it? What is it for?
So you want to inflate the count then by removing the comma requirement? I don't think that's such a good idea for en.wiki since it further weakens our already weak article definition.
Mav, thanks for proving my point again about count-mania. Are you seriously suggesting that the pseudo-random number spit out on the front page actually *defines* what articles are in a meaningful way?
Incidentally, if I were to change the count right now on the English Wikipedia, we'd get:
the comma count : 109062 for length > 0 : 116199 <- for length > 500: 90991
The "inflation" would be a meagre 6.5 percent.
The whole point of the comma count is to exclude small articles
Why? What's *wrong* with small articles?
Some "articles" *are* small.
Some non-"articles" are very long.
Some genuine "articles" of short to medium length don't contain a comma.
Many "junk pages", lists, and disambs do contain a comma.
Length and commas are just non-starters here. They provide no useful information.
Other languages don't use ASCII commas much if at all so the count is worthless for them.
You speak as though it has worth for English. It does not.
Unless a better count system is proposed, I will replace the comma check with a greater-than-zero-size check within twelve hours.
And what about the people who get the digest after your 12 hour deadline? How about the other people who only check or respond to Wikipedia posts during the week? Shouldn't they have a say in this?
They had their say months ago when no one was able to decide what to do. Do you really think a new consensus is going to come in 24 hours? 48? A week? A month? A year? I think you're sorely mistaken if so. But, please, feel free to prove me wrong.
Tell you what. I'll hold off until Wednesday night. Come up with a consensus on a better system by then, or comma-count shall be replaced with not-blank-count circa 07:00 UTC, 13 March. (11pm on the 12th here in PST.)
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Aha, again demonstrating the obsession over the count. Why was it important to hit or not hit 100,000? Because of an offhand remark made a couple years ago about "we hope to reach 100,000 articles"?
When did this become our holy mission?
Well, as it turned out, a press release was created and sent out and we got press coverage about it. For better or worse.
the comma count : 109062 for length > 0 : 116199 <- for length > 500: 90991
This is interesting.
Tell you what. I'll hold off until Wednesday night. Come up with a consensus on a better system by then, or comma-count shall be replaced with not-blank-count circa 07:00 UTC, 13 March. (11pm on the 12th here in PST.)
How about we wait a bit longer than that, but set a good deadline.
This seems like a good place for us to have an experiment in voting, since it's a relatively minor issue, since there are a variety of opinions, etc.
Erik, will you take the lead in designing a page to present the alternatives and to collate the votes?
(I nominate Erik because he's a strong advocate of voting, and this gives him an opportunity to show how a voting system can shine.)
--Jimbo
Jimmy Wales wrote:
the comma count : 109062 for length > 0 : 116199 <- for length > 500: 90991
This is interesting.
What would be *really* useful is a "diff" between each two of these, so we could see which ">0" articles don't contain a comma, so we can fix them (well, I know how to do that, but SQL is not everyone's favourite;-)
We'd need to cache these diff pages, though, because of the full-text search :-(
Magnus
On Mon, 10 Mar 2003, Jimmy Wales wrote:
How about we wait a bit longer than that, but set a good deadline.
This seems like a good place for us to have an experiment in voting, since it's a relatively minor issue, since there are a variety of opinions, etc.
That's rather what I was hoping to avoid by setting an ultimatum. Now we'll never reach a firm decision. :P Very well, have a vote. Perhaps I'll be pleasantly surprised. :)
In the meantime, I'll go ahead and change the count system for the Japanese wiki specifically, as requested.
-- brion vibber (brion @ pobox.com)
WikiKarma: fixed the several-dozen-redundant-checks-of-new-talk-table- for-every-page-view-by-an-anon-user bug.
Brion Vibber wrote:
That's rather what I was hoping to avoid by setting an ultimatum. Now we'll never reach a firm decision. :P Very well, have a vote. Perhaps I'll be pleasantly surprised. :)
O.k., the firm deadline is next Monday, March 17, 2003. Brion has to give up all of his weapons of mass destruc... oh, wait, no, that's a different firm deadline.
We haven't heard from Erik about setting up a wiki page about this, but if he declines to take the lead on vote collation, etc., we'll do it anyway starting in a couple of days.
--Jimbo
On Mon, 10 Mar 2003 04:53:27 -0800, Jimmy Wales jwales@bomis.com wrote:
How about we wait a bit longer than that, but set a good deadline.
March 17th seems to be popular for deadlines at present!
Richard Grevers wrote:
On Mon, 10 Mar 2003 04:53:27 -0800, Jimmy Wales jwales@bomis.com wrote:
How about we wait a bit longer than that, but set a good deadline.
March 17th seems to be popular for deadlines at present!
It's the day when all the beer turns green. :-) Ec.
This seems like a good place for us to have an experiment in voting, since it's a relatively minor issue, since there are a variety of opinions, etc.
Sure, I'll put something on meta tomorrow. However, without proper software support (i.e. a simple voting interface) the process of collecting votes will always be cumbersome and prone to error.
Regards,
Erik
Erik Moeller wrote:
Sure, I'll put something on meta tomorrow. However, without proper software support (i.e. a simple voting interface) the process of collecting votes will always be cumbersome and prone to error.
Great! I agree. I figure that if we set up a good voting system (Condorcet or similar) and this particular case works out o.k., we can learn from the process and improve it as we go along.
--Jimbo
wikipedia-l@lists.wikimedia.org