For the moment there are automatic classification algorithms that use and don't use grammars and vocabularies, and with and without self learning. Some of those systems are very advanced, and some of them has been around for several years. There has been discussions about pure statistical system, pure lexical systems and wetter such systems should be fully automatic or not, and wetter they produce biased results or not. The short version of the results are fully statistical systems gives poor results, and that they should include some form of lexical analysis at some level. Guided systems become very work intensive and should be avoided if possible.
For examples of what automatic text analysis systems can do http://www1.cs.columbia.edu/nlp/projects.cgi
For advanced automatic classification engines http://www.autonomy.com/content/home/index.en.html http://www.cyberwatcher.com/
If lexical analysis is not handled some way or another someone _will_ include it, and if someone runs a simulation and find the system can be better by tweaking it somehow there _will_ be questions why it hasn't been done before. There are other works on vandalism at the moment, and to neglect those will also be questionable. "Exploring the feasibility of automatic rating online article quality" and "Creating, destroying and restoring value in Wikipedia" are only two such papers.
I think it is wise to accept that the proposed system is a solution to a subproblem, and that an overall system is fairly much more advanced than this one. I don't say it does not work, I don't say it should not be tested, I say it is only a solution to a very small subset of the overall solution. As such it should be built in such a way as not to block further refinements. It should not be viewed as a final solution and there should definitely not be made any claims that alternate systems does not work without backing such claims with hard proofs.
If someone will front this particular system as the ultimate one, good luck! I don't think it is the ultimate system. Still I do think it can be a very good tool if used as what it is, a solution to one of several subproblems.
John E Blad
Daniel Arnold skrev:
Am Samstag, 22. Dezember 2007 19:49:10 schrieb John Erling Blad:
Wetter you do or don't do trust metrics according to misspellings is a choice, but to not correct for misspellings will give a suboptimal solution. It is important to note that you do this, and how it changes the system. I have no doubt that trust metrics will incorporate this as an option in the future, no matter wetter it is part of an official system or not. Likewise I believe it will incorporate systems for weighting cooperation between users and articles overall quality. There is no easy single solution to this, the solution is a complex connected multivariate system.
An automated system (regardless which one) should never care about spelling: a) Many citations are in outdated or non-standard ortography. This is especially true for German, which has changed its ortography just some years ago again. A system that gives an incentive to tamper citations is bad. b) There are assistive systems integrated into the browser (at least Konqueror has this for many years and Firefox now also as spell checking). Furthermore there exists a Toolserver + Javascript based solution that highlights probably missspelled words on reading an article (curently only German but this could be adapted for other languages): http://de.wikipedia.org/wiki/Wikipedia:Helferlein/Rechtschreibpr%C3%BCfung (see http://de.wikipedia.org/wiki/Bild:Rp_js_beispiel.png and http://de.wikipedia.org/wiki/MediaWiki:Gadget-Rechtschreibpruefung.js). A more advanced external tool is http://rupp.de/cgi-bin/WP-autoreview.pl. These tools are optionally integrated into the Wikipedia interface via the gadgets extension.
So if you make it obvious to editors that there is something they should check they very likely change it and if someone wrote a text with bad ortography someone else gets reputation because of his spell checking and as he did a review it is absolutely right that this text gets more trust afterwards (I know you will come with examples of rubbish text that got corrected to right spelling, but there are nonsense texts with and without bad spelling).
Arnomane
Wikiquality-l mailing list Wikiquality-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikiquality-l