Brion Vibber wrote:
David Gerard wrote:
That's because the maps are marked "ALPHA QUALITY - NOT FINAL PRODUCTION VERSIONS" and are written in pencil. Never mind they're in the top-40 maps in the world - the project has peaked way too early.
Damn we need the rating feature. What's holding it up right now? List please, referring to current version of code. (I know the servers are creaking ...)
Keep in mind that the cute little survey thing doesn't have anything to do with a system for marking particular page revisions as 'public-ready' or 'approved' in any useful way.
That's an entirely unrelated and separate issue (and one the project actually needs), which Special:Validate (which might, hypothetically, produce 'interesting data' of some sort to some one) does nothing to help directly.
-- brion vibber (brion @ pobox.com)
I disagree. I think it is a mistake to dismiss the article rating system _a priori_ as not being useful in the effort to create Wikipedia 1.0. This certainly is not the opinion of many participants on this list.
In my opinion, a Nupedia-style review procedure for every single article clearly can't possibly scale to rate almost a million articles in any reasonable timescale -- the painfully slow featured article process, or the conflict-ridden AfD process are good examples of the limitations of formal article-rating systems.
However, collective labelling systems certainly can work; look at the category system, for example, which is almost magically self-organizing, or Google, which uses simple link graph adjacency to do much of its work.
An article rating system can sort the possible wheat from the obvious chaff, with a formal review system then being applied to those articles where there is either a significant spread of article ratings, or where an editor or external source has explicitly asserted significant inaccuracy, insufficiency or bias in an article (and category tagging could certainly be a useful way to indicate this).
By doing this, the formal review process can be applied in the places it's most needed.
If there are performance problems with writing and reading the database, why not during the initial blind rating period simply write the ratings using syslog over UDP, and allow (possibly anonymized) versions of these flat-text logfiles to be downloaded by whoever is interested in analyzing the results? Comma-separated variable files with one line per entry would do just fine, and the occasional loss of a rating due to packet loss would matter very little, providing that it is uncorrelated with the contents of the ratings.
This would require very little extra server load, and minimal rewriting of the ratings code, and could be made more scalable very quickly, by -- for example -- running multiple logging processes on multiple servers, and sending them the ratings results at random.
I'm sure that a great many interested parties would be more than glad to data-mine the results for us. As David Gerard has remarked, most of the obvious objections to ratings schemes, such as trolling or rating advocacy campaigns by article supporters or opponents, can be addressed quite easily with quite simple statistical analysis techniques (for example, see the analysis techniques used to mine the Seti@Home data after it's been number-crunched).
Incidentally, a very similar technique could be used to analyze server traffic; just syslog an entry (to a different log, of course) for one in every 100, or one in every 1000, server hits.
-- Neil