[Wikipedia-l] Article count: Vote result

Tue Mar 18 14:19:00 UTC 2003

Hi,

we have a winner. Because we did everything manually, vote counting has been
a bit of work, but now it's done and even with minor errors, the results are
fairly clear. The new article count system is going to be based on two
factors:

1) An article is counted if, trimmed of all trailing whitespace (blanks,
newlines etc.), it is longer than zero bytes (non-empty) AND
2) it contains at least one link.

Redirects, talk pages, user pages and Wikipedia: pages are not counted.

The vote consisted of two parts, the main vote on article size and a vote on
further restrictions.

The results for the main vote were, ranked lowest (better) first:

1) zero bytes:
   39 votes, averaging 3.1795
2) 100 bytes:
   37 votes, averaging 3.4595
3) 20 bytes:
   38 votes, averaging 3.8158
4) 250 bytes:
   37 votes, averaging 4.027
5) 5 bytes:
   36 votes, averaging 4.4167
6) language-dependent:
   33 votes, averaging 4.5758
7) 500 bytes:
   37 votes, averaging 4.7027
8) dynamic (e.g. depending on stub size):
   35 votes, averaging 5.1142

The votes for further restrictions, of which only one is to be picked, were:

1) at least one link:
   36 votes, averaging 3.1667
2) independent system for each wikipedia:
   31 votes, averaging 4.4516
3) stub flag (stubs excluded from count):
   34 votes, averaging 4.5882
4) minimum number of contributors:
   35 votes, averaging 4.6857
5) language-dependent punctuation (comma etc):
   36 votes, averaging 4.7778
6) two paragraphs minimum:
   33 votes, averaging 5
7) <article> tag
   33 votes, averaging 5.1515
8) no further restriction
   32 votes, averaging 5.1875
   minimum number of edits
   32 votes, averaging 5.1875
9) divide database size by byte size:
   30 votes, averaging 5.4333
10) existing comma requirement
   38 votes, averaging 5.5789

I have done my best to avoid errors, and for the first stage have compared
with Tomos' count, but I cannot be certain. While an error is unlikely to
affect the result, pedants may want to doublecheck just in case. Please note
that votes added after yesterday's deadline should not be counted. I also
did not count the anonymous vote (6 against dynamic).

Analysis of results
===================

The opinion regarding the main size restriction can be divided into two camps:
One group thought simply counting non-blank articles would be enough, the
other felt that excluding very short articles would also be necessary.
Consensus between these two groups was unlikely. The non-blank camp won by a
relatively narrow margin, but those who wanted more restrictions got an
important victory in the second stage of the vote: Only articles including
at least one link are counted, which excludes most newbie experiments.

This, in my opinion, is an almost perfect result that everyone should be
able to live with. It demonstrates well that voting systems can arrive at
compromises not just as well, but even better than simple discussions. How
long would we have needed to talk to agree on this solution and to determine
agreement? My guess is that we would not have arrived at it, ever.

But it is a good solution. There can simply be no valid article in the
Wikipedia system without links -- the whole wiki concept depends on high
interconnectedness. Material copied from somewhere else is not "wikified",
however -- nor are newbie experiments. Such "articles" are now
excluded from the count, as they should be. At the same time, we do not
choose an arbitrary byte size limit that would always remain arguable.
And I do not foresee users adding  HTML comments to an
article just to have it counted, as has happened with the comma.

Yet, even with this solution, there are people who feel strongly
that it is a bad one: 8 people have voted that counting only
articles with at least one link is a "very bad" idea (11 people thought
it is a very good idea). Once again, it is unlikely that there would
have been consensus between these groups, further disproving the
consensus model. With groups of 30 and more people, there is simply
never going to be "near unanimous consensus" about anything but the
most obvious questions.

But some results came as no surprise: Keeping the comma count was almost
universally rejected, and it is debatable whether we should have included
that option in the first place. Second place in the restrictions stage, with
considerable distance, was the option to let each Wikipedia decide on its
own. While this option clearly lost, its rank highlights that many
Wikipedians want "their" Wikipedia to have room for independent decisions. A
similar option should thus be included in future polls.

Finally, I would like to point out that the process has led to a remarkable
number of ideas -- some of them awful, sure, but some of them, like the link
idea, have never been mentioned on the mailing list. This, too, demonstrates
the advantages of a formalized brainstorming process.

Analysis of methodology
=======================

What have we learned from the process we used here?

1) The participation rate was very high. Concerns that an option might win
because of neglect were unjustified. HOWEVER, options which were added too
late tended to gather significantly fewer votes. Options which were
apparently written off as unlikely to win also tended to gather fewer votes.

2) Votes tended toward the extremes, i.e. 6 or 1. For most options, however,
the entire spectrum was used.

3) The system used therefore allowed us to gather a very large amount of
information about the opinions held. It would be difficult, if not
impossible, to gather as much information through a non-formalized process.

4) Deadlines and limits on options must be even more strictly enforced. The
deadline for proposing options should be longer, but options should be
discussed more carefully.

5) The combination of options and the possible requirement to split up voting
into stages need to be discussed to avoid ambiguity (e.g. "can more than one
option win?").

6) The voting system used takes some effort to handle manually, but would be
relatively easy to implement in code. Until we have such a software-based
solution, using this system would probably be overkill for small decisions.
In any case, we should try consensus finding first.

These are my thoughts for now -- please add yours.

Regards,

Erik