[WikiEN-l] Re: Status of article rating feature?
Neil Harris
usenet at tonal.clara.co.uk
Wed Nov 16 14:34:44 UTC 2005
Brion Vibber wrote:
> David Gerard wrote:
>
>> That's because the maps are marked "ALPHA QUALITY - NOT FINAL
>> PRODUCTION VERSIONS" and are written in pencil. Never mind they're in
>> the top-40 maps in the world - the project has peaked way too early.
>>
>> Damn we need the rating feature. What's holding it up right now? List
>> please, referring to current version of code. (I know the servers are
>> creaking ...)
>>
>
> Keep in mind that the cute little survey thing doesn't have anything to
> do with a system for marking particular page revisions as 'public-ready'
> or 'approved' in any useful way.
>
> That's an entirely unrelated and separate issue (and one the project
> actually needs), which Special:Validate (which might, hypothetically,
> produce 'interesting data' of some sort to some one) does nothing to
> help directly.
>
> -- brion vibber (brion @ pobox.com)
>
>
I disagree. I think it is a mistake to dismiss the article rating system
_a priori_ as not being useful in the effort to create Wikipedia 1.0.
This certainly is not the opinion of many participants on this list.
In my opinion, a Nupedia-style review procedure for every single article
clearly can't possibly scale to rate almost a million articles in any
reasonable timescale -- the painfully slow featured article process, or
the conflict-ridden AfD process are good examples of the limitations of
formal article-rating systems.
However, collective labelling systems certainly can work; look at the
category system, for example, which is almost magically self-organizing,
or Google, which uses simple link graph adjacency to do much of its work.
An article rating system can sort the possible wheat from the obvious
chaff, with a formal review system then being applied to those articles
where there is either a significant spread of article ratings, or where
an editor or external source has explicitly asserted significant
inaccuracy, insufficiency or bias in an article (and category tagging
could certainly be a useful way to indicate this).
By doing this, the formal review process can be applied in the places
it's most needed.
If there are performance problems with writing and reading the database,
why not during the initial blind rating period simply write the ratings
using syslog over UDP, and allow (possibly anonymized) versions of these
flat-text logfiles to be downloaded by whoever is interested in
analyzing the results? Comma-separated variable files with one line per
entry would do just fine, and the occasional loss of a rating due to
packet loss would matter very little, providing that it is uncorrelated
with the contents of the ratings.
This would require very little extra server load, and minimal rewriting
of the ratings code, and could be made more scalable very quickly, by --
for example -- running multiple logging processes on multiple servers,
and sending them the ratings results at random.
I'm sure that a great many interested parties would be more than glad to
data-mine the results for us. As David Gerard has remarked, most of the
obvious objections to ratings schemes, such as trolling or rating
advocacy campaigns by article supporters or opponents, can be addressed
quite easily with quite simple statistical analysis techniques (for
example, see the analysis techniques used to mine the Seti at Home data
after it's been number-crunched).
Incidentally, a very similar technique could be used to analyze server
traffic; just syslog an entry (to a different log, of course) for one in
every 100, or one in every 1000, server hits.
-- Neil
More information about the WikiEN-l
mailing list