On 07/14/11 5:56 AM, WereSpielChequers wrote:
Do we have stats yet that measure whether this is encouraging editing, or diverting even more people from improving the pedia to critiquing it?
It's difficult to see any logical connection between an article rating system, and encouraging new editors.
Remember there is a risk that this could exacerbate the templating trend. Just as we need to value edits that fix problems and remove templates above edits that add to the hundreds of thousands of maintenance templates on the pedia;
If templates were subject to a similar rating system as articles we would soon see which are being ignored by users, and are thus of no value.
So we need to value a talkpage comment that explains why someone has a specific concern about an article over a bunch of "feedback" that says people like or dislike an article without indicating why. Better still we should be encouraging readers to improve articles that they see as flawed.
This dream has been around since the stone age.
So we need to measure this tool in terms of its success at getting readers to edit, not in terms of its success at getting readers to rate articles. I hope it is successful, and I'm happy to take the long view and measure a trial over months to see how effectively we convert article raters into article editors.
I seriously doubt that it will head in that direction.
But we do need to be prepared to remove this if it has a net effect of diverting potential editors into merely rating articles for others to fix.
It's not a problem if they do. If many readers do this for a single article it's worth paying attention to these articles. A claim from a single person can be suspected eccentric.
We also need to be careful how we compare this 374k to the other "90%", not least because with 3,682,158 articles on En wiki as I write, 374k is about 6k more than a random 10% sample would be.
It's all a matter of statistical trends, and for this a 100-point scale would have been more useful than a 5-point scale. I actually suggested a 10-point scale many years ago. The first statistical measure that should develop is a cumulative rating for all articles. The mean in that will be the measure of the average article, and any article falling within a certain deviation from that could be judged average. As overall quality of WP increases so too will the average rating, but only extremely slowly. Other measures could be developed from there.
We also need to learn from one of the lessons of the Strategy wiki where we had a similar rating system. Many of the proposals there had so few ratings that they were close to being individual views and few had sufficient responses to be genuinely collective to the point where one maverick couldn't skew them - even without sockpuppetry. On average our articles get one or two edits a month, many get far less. I would not be surprised if 100,000 of the 374k in the trial had less than ten ratings even if trialled for a couple of months.
This isn't a problem either. The number of ratings given is just as important as what those ratings are. It should be reported right along with the rating on the article page Users could then be reminded that a small number of ratings is just not statistically significant; they could even be color-coded to that effect. Short samples are also more volatile. They would easily be driven into the top or bottom decile of the data, and that alone would bring attention to them.
Lastly we need to be prepared for sockpuppetry, especially as these are random unsigned votes with no rationale. Can we have assurances that something is being built into the scheme to combat this?
This FUD gives undue weight to sockpuppetry or other hostile editing. Ideally such practices should be marginalised to a point where they don't matter. Mounting a successful campaign to influence the rating of an article would take a tremendous amount of sustained effort. I played with trying to affect the page views of one of the Bomis girl articles in the early days by going repeatedly to that page; the effects were minimal. Now, with a much bigger encyclopedic corpus this would be proportionally more difficult. "Random unsigned votes" are perfectly consistent with wikiness, and will also trend toward statistical norms. Building safeguards against agenda based ratings would be a waste of time and effort.
Ec