One thing that stood out for me in the small sample of articles I
examined was the flagging of innocuous changes by casual users to
correct spelling, grammar, etc. Thus a "nice-to-have" would be a
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc., or the
reordering of paragraphs and perhaps even sentences.) I think this
would cover 90% or more of changes that are immaterial to an article's
> Date: Fri, 21 Dec 2007 10:34:47 -0800
> From: "Luca de Alfaro" <luca(a)dealfaro.org>
> If you want to pick out the malicious changes, you need to flag also small
> "Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
> "John Doe, born in *1947*"
> The ** indicates changes.
Yes, and I did not mean to include cases such as this, which involve
the insertion of a few words that could radically alter the semantic
content of a unit of text. But legitimate spelling corrections (which
can be easily determined using any of the various spell-checker
databases to determine the set of common misspellings for a word) do
not. In short, I cannot imagine a case where someone changing
"Senater Clinton" to "Senator Clinton" could involve vandalism (the
"smoother" algorithm should of course also take into account that if a
"misspelling" appears repeatedly in an article, or even better,
related subject articles by different authors, is is probably a valid
technical term or a proper name). I also cannot imagine how moving a
large block of relatively self-contained text (i.e. a paragraph, since
even parsing at the level of sentences is problematic given all the
uses for the period '.') without modifying its interior could have any
large semantic repercussions (readability is, of course, a matter for
a different discussion ;-)
Again, these are mainly quibbles, but for the articles I sampled it
was quite annoying to have my eye repeatedly drawn to a single orange
word that represented nothing more than a minor, good-faith
correction. And overall the system seems to work well!
Is it possible to get some of English Wikibooks up as an experiment? If you
recall, I was concerned about how our slow editing rate and small editor
community would impact the utility of this implicit system of rating. I'd
also like to see how trust varies across modules of a book.
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole
English Wikipedia, as of its February 6, 2007 snapshot, colored according to
This is the first time that even we can look at how the "trust coloring"
looks on the whole of the Wikipedia!
We would be very interested in feedback (the
wikiquality-l(a)lists.wikimedia.org mailing list is the best place).
If you find bugs, you can email us at
PS: yes, we know, some images look off. It is currently fairly difficult
for a site outside of the Wikipedia to fetch Wikipedia images correctly.
PPS: there are going to be a few planned power outages on our campus in the
next days, so if the demo is off, try again later.
The following brief was published yesterday on 20minutes
(www.20minutos.es) free newspaper at Spain in Spanish.
I can send you the original if you want.
The numbers are quite interesting. Does anyone know more about that study?
=What you need to know about...
==Many consult this site, but few encourage to introduce new contents==
The best reference work online, made in a disinterested way by the
internet users was born in its current identity 15 January 2001. Since
then, more than 2 million articles were crated in its English version,
and more than 300.000 in the Spanish one. Currentl, it is available at
253 languages. Anyone can modify the articles or create new ones, but
according to aninvetigation by the Minnesota University, it's only a few
percentage of those who visit it, 'work' at it. Another peculiar data is
that a minimun percentage of hte users (the 1%) are responsible of half
The fact that they're few people whoe really feed the contensts don't
detract its quality. Teh same study points out that the probability that
a user arrives at a few precise article or a vandalised one is very
small. of only about 0.0037%. Moreover, 40% of the malicious changes are
solved before the article is read by two different users.
The Wikimedia Quality portal <http://quality.wikimedia.org/> is being
mentioned more and more lately. That being said, we should probably
work to get more translations made of it so that more users will be
able to read and understand it.
I have made a list of what translations should be done at
However, translations in languages other than those on that page are
more than welcome. Currently, the following languages are especially
* Japanese - ja - 日本語
* Arabic - ar - العربية
* Bengali - bn - বাংলা
* Hindi - hi - हिन्दी
* Indonesian - id - Bahasa Indonesia
* Dutch - nl - Nederlands
If you have any questions about the meanings of the words used, feel
free to e-mail me or the list. :-)
Thanks in advanced for any translations you can do or help with!
Note: This e-mail address is used for mailing lists. Personal emails sent to
this address will probably get lost.
Bah, I meant to send this here, not to just one person...
OK, in order to talk about pros vs. cons, we need to consider the uses first. Some main tasks are:1) selected an unvandalized version (for AT, this will do "worse part" checks and such)2) selecting a quality, fact-checked version (German Wikipedia wants this)3) selecting a consensus version/marking featured pages4) selecting the best version and displaying it be *default* Selecting an unvandalized versionFlagged Revisions (pros):1) Templates and images are part of the review process, so vandalism to them will not show for reviewed pages2) Users with review rights get the gratification of setting the latest unvandalized/"sighted" version3) New users can look forward to getting these rights in short order, after being considered trusted4) Edits by reviewers can be autoreviewed if they are to a page where the stable and current are synced.5) If 4 above is not possible, a diff of the changes to the stable are shown after edit to reviewers with a review form with the tags preselected. It shouldn't take long at all to glance over the changes and click "review". Flagged Revisions (cons)1) Initial review takes noticeable time for non-stub pages2) Revisions can fall out of date if not maintained, so people clicking to see a stable version may get a really old one. This is integrated with the RC patrolling system and with the autoreviewing/quick diffs to help, but it is still a possibility. Article Trust(pros)1) No workload added, all automatic. This is very nice.2) Fast and fluid since calculations for the sighted version are done on every edit without anyone having to do anything3) Accounts for consensus, so no rouge reviewer can easily flag garbage. Still, a "trusted" user can go rouge and add garbage, even in several edits to bump the trust. Article Trust(cons)1) Template and image vandalism is still a problem2) Bot and AWB edits flying through pages automatically make the trust of large chunks of text increase3) No direct control over it by anyone -> incentive loss Selecting a quality, fact-checked version Flagged Revisions (pros):1) Trusted users, who have some respect for consensus as well, can directly mark off solid revisions Flagged Revisions (cons):1) If the user goes rouge they can flag garbage. Not as bad as rouge admins, likely rare, but something to think about...2) A roughish user may ignore consensus and reasonable fact disputes. This could result in a small user or cabal having a monopoly over the "best version". Good policy standards and respect should be enforced to avoid this. Article Trust(pros):1) Nearly all "white" pages have a good chance of being reasonable accurate2) No work required3) Harder to form cabals/monopolies over the "best version" Article Trust(cons):1) Again, bots and such2) You cannot edit anything without vouching for it (bad for fixing typos) and the text around it. Either people get afraid to edit or dubious text gets more and more "trusted"3) No one necessarily committed to having fact-checked anything Selecting a consensus version/Marking feature pagesFlagged Revisions (pros):1) Trusted reviewers (higher flagging rights than normal reviewers), like bureaucrats, look at debates and see if a consensus for a community selected version exists2) As long as the trusted reviewer acts like most "bureaucrats" on Wikipedia and just measure consensus, it is not easy to game Flagged Revisions (cons):1) Rouge trusted reviewer...blah...could end up at arbcom 2) Not automatic...I mean look at how slow WP:FA stuff is... Article Trust(pros):1) It would take a lot of users to try to edit war to push the "consensus version" around since it is automatic, and that would just make a bunch of red text to the current revision which would cause it not to be selected.2) Automatic, account for all edits, not just those "voting" on some talk page3) Generally waaay faster Article Trust(cons):1) If there is consesus for a version clearly demontrated on a talk page, a small group of editors can still edit war over it and drop the trust. It would be nice for some trusted reviewer to be able to see this and expediently flag it. Selecting the best version and displaying it be *default*Flagged Revisions (pros):1) Again, template/image vandalism won't be such a problem since those are set for each reviewed revision based on how it was when reviewed.2) For bios of living people, we can easily and immediatly set the stable version without having to fiddle around getting it autotrusted.3) The incentive issue again, reviewer can set this, and editors can look forward to becoming reviewers. Flagged Revisions (cons):1) Rouge trusted reviewer...blah...could end up at arbcom 2) Spelling/grammar errors can get stuck if no one is around to review corrections (though reviewers spelling fixes could be autoreviewed sometimes) Article Trust(pros):1) The default is BY FAR the most important revision, so giving direct control over gives incentives to form evil cabals :)2) As this is important, it helps to stay up to date with the workload, this requires none...so that's pretty easy... Article Trust(cons):1) Spelling/grammar errors can get stuck since it's hard to directly control2) The "highest least trusted" and "max age" and other heurestics will be confusing to new users. Default page selection will feel kind of randomThis is still an imcomplete list probably...I should probably save this somewhere and build on it. Also, for default revisions selection on page view, I am just comparing the methods of selection by the two extensions. We could have it where Flagged Revisions does the overriding of the default revision, but that it grabs the Article Trust "most trusted" version rather than some reviewed one. This would just be to avoid duplicated code though.-Aaron Schulz
Connect and share in new ways with Windows Live.
I am sure this has already been discussed, but just in case, here goes
my two cents:
The post in http://breasy.com/blog/2007/07/01/implicit-kicks-explicits-ass/
explains why implicit metadata (like Google's PageRank) are better
than explicit metadata (Like Digg votes).
Making a comparison to Wikimedia, I'd say that Prof. Luca's trust
algorithm is a more reliable way to determine the quality of an
article's text than the Flagged Revision Extension.
However, the point of the latter is to provide a stable version to the
user who chooses that, while the former displays to which degree the
info can be trusted, but still showing the untrusted text.
What I'd like to suggest is the implementation of a filter based on
the trust calculations of Prof. Luca's algorithm, which would use the
editors' calculated reliability to automatically choose to display a
certain revision of an article. It could be implemented in 3 ways:
1. Show the last revision of an article made by an editor with a trust
score bigger than the value that the reader provided. The trusted
editor is implicitly setting a minimum quality flag in the article by
saving a revision without changing other parts of the text. This is
the simpler approach, but it doent prevent untrusted text to show up,
in case the trusted editor leaves untrusted parts of the text
2. Filter the full history. Basically, the idea is to show the parts
of the to the article written by users with a trust score bigger than
the value that the reader provided. This would work like slashdot's
comment filtering system, for example. Evidently, this is the most
complicated approach, since it would require an automated conflict
resolution system which might not be possible.
3. A mixed option could be to try to hide revisions by editors with a
lower trust value than the threshold set. This could be done as far
back in the article history as possible, while a content conflict
Instead of trust values, this could also work by setting the threshold
above unregistered users, or newbies (I think this is approximately
equivalent to accounts younger than 4 days)
Anyway, these are just rough ideas, on which I'd like to hear your thoughts.
I haven't had any press queries about this as yet, but if I do: what's
the status of putting something into place on de:wp as planned? I
understand it's delayed by devs being busy with the WMF fundraiser ...