On Jan 13, 2009, at 12:10 AM, WJhonson@aol.com wrote:
These sub-surface articles would not be googleable let's say, so reader wouldn't get side-tracked into thinking they are "acceptable" in the mainstream,
This already exists with GA/FA ratings. Creating a new public/internal division just adds a new front for controversy.
On Tue, Jan 13, 2009 at 3:22 AM, Noah Salzman noah@salzman.net wrote:
... what does the step-by-step process look like for making this change happen? I imagine there is more than one path: grass roots consensus building vs lobbying The Powers That Be?
The Powers That Be would be needed to change what search engines are told to ignore. (Presumably in robots.txt.)
The grass roots would be needed to ramp up GA/FA effort considerably. EN currently has about 5800 Good Articles (as rated), and 2400 Featured. Current article count is over 2.5 million.
sources: http://en.wikipedia.org/wiki/Template:GA_number http://en.wikipedia.org/wiki/Template:FA_number http://en.wikipedia.org/wiki/Special:Statistics
Es.
edgarde wrote:
On Jan 13, 2009, at 12:10 AM, WJhonson@aol.com wrote:
These sub-surface articles would not be googleable let's say, so reader wouldn't get side-tracked into thinking they are "acceptable" in the mainstream,
This already exists with GA/FA ratings. Creating a new public/internal division just adds a new front for controversy.
Not everybody pays attention to GA/FA. A public rating system where anyone can rate each article on a 0-10 scale might be controversial to implement, but on a cumulative basis would give a good statistically based valuation of the article.
On Tue, Jan 13, 2009 at 3:22 AM, Noah Salzman wrote:
... what does the step-by-step process look like for making this change happen? I imagine there is more than one path: grass roots consensus building vs lobbying The Powers That Be?
The grass roots would be needed to ramp up GA/FA effort considerably. EN currently has about 5800 Good Articles (as rated), and 2400 Featured. Current article count is over 2.5 million.
That's an unrealistic expectation. How long has it taken to build up this list of 8200 articles? While the GA/FA has its usefulness, it is not scalable nor equal to the task of being a general rating mechanism.
Ec
On Wed, Jan 14, 2009 at 6:22 PM, Ray Saintonge saintonge@telus.net wrote:
Not everybody pays attention to GA/FA. A public rating system where anyone can rate each article on a 0-10 scale might be controversial to implement, but on a cumulative basis would give a good statistically based valuation of the article.
<snip>
We have something similar at the moment, done by editors, not readers (a big difference):
http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Assessment
That has about 6 or 7 levels, depending whether you include both GA and A-class.
http://en.wikipedia.org/wiki/Category:Articles_by_quality
Stats are here:
http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Index
Estimates as to the reliability of the assessments vary, but the stats at the time of writing are:
1489 projects (WikiProjects) 1,960,650 articles tagged 1,607,658 articles assessed
Total number of articles: 2,698,457
See also the talk page of that index for some more stats.
Carcharoth
On 14/01/2009, Ray Saintonge saintonge@telus.net wrote:
Not everybody pays attention to GA/FA. A public rating system where anyone can rate each article on a 0-10 scale might be controversial to implement, but on a cumulative basis would give a good statistically based valuation of the article.
Possibly not. The experience with these kinds of systems at Amazon for example shows that interpreting votes is not simple. A lot of people give consistently high, middle or low votes and there are many pathologies, averaging them out gives much worse results than you could expect.
Ian Woollard wrote:
On 14/01/2009, Ray Saintonge saintonge@telus.net wrote:
Not everybody pays attention to GA/FA. A public rating system where anyone can rate each article on a 0-10 scale might be controversial to implement, but on a cumulative basis would give a good statistically based valuation of the article.
Possibly not. The experience with these kinds of systems at Amazon for example shows that interpreting votes is not simple. A lot of people give consistently high, middle or low votes and there are many pathologies, averaging them out gives much worse results than you could expect.
That sounds like an interesting hidden-variable Bayesian estimation problem: given n reviewers of various propensities and m reviewed objects, and # of reviews >> n+m, make a joint maximum likelihood estimate of both the "true" properties of both reviewers and reviewed objects. Bogus "outlier" editors could just be modeled as all variance, with their mean irrelevant.
I'd be quite surprised if someone hasn't solved this already, and written a paper about it.
-- Neil
Ian Woollard wrote:
On 14/01/2009, Ray Saintonge wrote:
Not everybody pays attention to GA/FA. A public rating system where anyone can rate each article on a 0-10 scale might be controversial to implement, but on a cumulative basis would give a good statistically based valuation of the article.
Possibly not. The experience with these kinds of systems at Amazon for example shows that interpreting votes is not simple. A lot of people give consistently high, middle or low votes and there are many pathologies, averaging them out gives much worse results than you could expect.
Sure, optimists may very well score everything high, and pessimists may score everything low. Still, the overall results will tend toward some mean value. probably higher the expected value of 5.0 that one might anticipate before we have any real data. If the overall mean migrates to say 5.7 other interpretations of data can be adjusted accordingly. We don't interpret individual votes, but overall data.
In our involvement with Wikipedia we have accepted the principle that anybody can write an encyclopedia article. Choosing a number between 0 and 10 is a somewhat easier task. Can we not accept that the vast majority will approach such a task with the same level of responsibility?
Yes, there will be some individuals determined to vote stupidly, but one of the wonders of a statistical approach is that those efforts are soon marginalized.
Ec
The "X out of X readers found this review useful" is very helpful. Using the same Amazon example, when you click See Reviews on a product, they show you a great thing: they put the most helpful and higher review aside the most helpful and lower review.
-- Alvaro
On 14-01-2009, at 21:59, Ray Saintonge saintonge@telus.net wrote:
Ian Woollard wrote:
On 14/01/2009, Ray Saintonge wrote:
Not everybody pays attention to GA/FA. A public rating system where anyone can rate each article on a 0-10 scale might be controversial to implement, but on a cumulative basis would give a good statistically based valuation of the article.
Possibly not. The experience with these kinds of systems at Amazon for example shows that interpreting votes is not simple. A lot of people give consistently high, middle or low votes and there are many pathologies, averaging them out gives much worse results than you could expect.
Sure, optimists may very well score everything high, and pessimists may score everything low. Still, the overall results will tend toward some mean value. probably higher the expected value of 5.0 that one might anticipate before we have any real data. If the overall mean migrates to say 5.7 other interpretations of data can be adjusted accordingly. We don't interpret individual votes, but overall data.
In our involvement with Wikipedia we have accepted the principle that anybody can write an encyclopedia article. Choosing a number between 0 and 10 is a somewhat easier task. Can we not accept that the vast majority will approach such a task with the same level of responsibility?
Yes, there will be some individuals determined to vote stupidly, but one of the wonders of a statistical approach is that those efforts are soon marginalized.
Ec
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
With the right hand, you rate the raters. So each of us gets a clue stick and goes around whacking good editors "good rater" rate up a notch by voting for them as raters.
With the left hand, you rate the articles, and when other editors agree with you, they whack you and your "good rater" score goes up.
Now with the giant nose of Zenobia, you multiply the article rating by the raters rating, and average.
Thusly and so, articles get a good rating based on the best raters rating them good, and nasty bad evil raters, ratings fall into the first circle (i.e. they are weighted as nothing).
Will Johnson
wjhonson@aol.com wrote:
With the right hand, you rate the raters. So each of us gets a clue stick and goes around whacking good editors "good rater" rate up a notch by voting for them as raters.
With the left hand, you rate the articles, and when other editors agree with you, they whack you and your "good rater" score goes up.
Now with the giant nose of Zenobia, you multiply the article rating by the raters rating, and average.
Thusly and so, articles get a good rating based on the best raters rating them good, and nasty bad evil raters, ratings fall into the first circle (i.e. they are weighted as nothing).
Will Johnson
You should be able to do the whole thing in one go.
It's a bit like clocks; accurate clocks are defined to be those which tend to give similar times to other accurate clocks. Inaccurate clocks do not have this property.
In this case, good raters are defined to be those who give ratings which tend to correlate well with true ratings, which are in turn extracted by ratings given by other good raters. Even though this is necessarily a recursive definition, it can still be used to generate a tractable set of simultaneous equations.
"True" ratings being a matter of subjectivity, and people being people, there may also be more than one mutually-coherent cluster of raters. If you're worried about active coordinated attacks by ratings spammers, or want to try to average across political viewpoints, you can seed things with a core of users known to be likely to be both good and impartial raters.
-- Neil
Neil Harris wrote:
You should be able to do the whole thing in one go.
It's a bit like clocks; accurate clocks are defined to be those which tend to give similar times to other accurate clocks. Inaccurate clocks do not have this property.
With standards now depending on atomic clocks we now need to to add the occasional leap second to ensure that the earth behaves as it should in its journey around the sun. How would we clean the raters' clock?
In this case, good raters are defined to be those who give ratings which tend to correlate well with true ratings, which are in turn extracted by ratings given by other good raters. Even though this is necessarily a recursive definition, it can still be used to generate a tractable set of simultaneous equations.
"True" ratings being a matter of subjectivity, and people being people, there may also be more than one mutually-coherent cluster of raters. If you're worried about active coordinated attacks by ratings spammers, or want to try to average across political viewpoints, you can seed things with a core of users known to be likely to be both good and impartial raters.
Proceeding in this vain [sic!], we are ultimately led to the great Rooto-Rater, and that problem is theological. :-)
Ec
wjhonson@aol.com wrote:
With the right hand, you rate the raters. So each of us gets a clue stick and goes around whacking good editors "good rater" rate up a notch by voting for them as raters.
With the left hand, you rate the articles, and when other editors agree with you, they whack you and your "good rater" score goes up.
Now with the giant nose of Zenobia, you multiply the article rating by the raters rating, and average.
Thusly and so, articles get a good rating based on the best raters rating them good, and nasty bad evil raters, ratings fall into the first circle (i.e. they are weighted as nothing).
By rating the raters your are finding a different way of introducing the same kind of subjectivity that we want to avoid. Our most persistent battles over the years at Wikipedia have been those that involve a key subjective factor such as notability. If we had such a concept as "good raters" it's easy to see that the race to be "good" would yield the same nonsense as we see at Requests for Adminship.
Ec
2009/1/15 Ray Saintonge saintonge@telus.net:
Yes, there will be some individuals determined to vote stupidly, but one of the wonders of a statistical approach is that those efforts are soon marginalized.
Just ignoring the top and bottom 10% of ratings can do wonders for this sort of thing, by the way.
- d.
David Gerard wrote:
2009/1/15 Ray Saintonge:
Yes, there will be some individuals determined to vote stupidly, but one of the wonders of a statistical approach is that those efforts are soon marginalized.
Just ignoring the top and bottom 10% of ratings can do wonders for this sort of thing, by the way.
That would work, but may not be necessary if the number of raters is large. I suspect that the figures with and without truncation will tend to converge. If a rating system of this sort were implemented it should be an easily tested hypothesis.
Ec
2009/1/15 Ray Saintonge saintonge@telus.net:
David Gerard wrote:
Just ignoring the top and bottom 10% of ratings can do wonders for this sort of thing, by the way.
That would work, but may not be necessary if the number of raters is large. I suspect that the figures with and without truncation will tend to converge. If a rating system of this sort were implemented it should be an easily tested hypothesis.
Another idea: make all ratings public information, because they're part of the process of working on the encyclopedia so should be viewable for transparency.
- d.
On Thu, Jan 15, 2009 at 7:44 PM, David Gerard dgerard@gmail.com wrote:
2009/1/15 Ray Saintonge saintonge@telus.net:
David Gerard wrote:
Just ignoring the top and bottom 10% of ratings can do wonders for this sort of thing, by the way.
That would work, but may not be necessary if the number of raters is large. I suspect that the figures with and without truncation will tend to converge. If a rating system of this sort were implemented it should be an easily tested hypothesis.
Another idea: make all ratings public information, because they're part of the process of working on the encyclopedia so should be viewable for transparency.
There is an option in preferences to switch on a gadget to display the FA/A/GA/B/C/Start/Stub/Unassessed ratings (and the other categories such as list and so on) on the article page not hidden on the talk page. But as I said before, that is an editorial rating system, not a reader rating system.
Carcharoth
David Gerard wrote:
Another idea: make all ratings public information, because they're part of the process of working on the encyclopedia so should be viewable for transparency.
- d.
...which is also the correct answer to the problem of selecting any one particular rating algorithm in advance. By publishing the raw ratings data, with associated timestamp and userid, anyone will be able to analyze the data any way they like.
It would be fairly easy to institute such a rating system, given that we already have an integrated login and edittoken verification system: we just need a small form that generates an appropriate GET query on each page, and an extra table to stash the results into.
Dump that table into a downloadable text file at regular intervals, and you're done. Armies of programmers and statisticians will descend on the data to see what they can do with it. If they can do something useful with it, we could eventually integrate that analysis into the software. If not, the experiment can eventually be abandoned.
The last time we tried something like this, it degenerated into a massive discussion of which ratings parameters and rating methodology should be used[1], and nothing ever happened.
-- N.
[1] my suggestion: two rating parameters: importance and overall quality, each judged from 0 to 5. Matches current article rating system, has an even number of options. Three clicks, and you're done.
2009/1/15 Neil Harris usenet@tonal.clara.co.uk:
The last time we tried something like this, it degenerated into a massive discussion of which ratings parameters and rating methodology should be used[1], and nothing ever happened.
Yes, but that was a side issue - the reason it didn't happen was that Brion didn't like the extension to implement it so it fell by the wayside.
Your mission, should you choose to accept it:
0. get Brion and Tim to agree that this is something they wouldn't actually be horrified to put on the site if it passed technical muster and the community were going "hell yes" loud enough. 1. get the community support in place. 2. write something for MediaWiki that does this but that Brion and Tim would be willing to put on the live servers.
- d.
David Gerard wrote:
2009/1/15 Neil Harris:
The last time we tried something like this, it degenerated into a massive discussion of which ratings parameters and rating methodology should be used[1], and nothing ever happened.
Yes, but that was a side issue - the reason it didn't happen was that Brion didn't like the extension to implement it so it fell by the wayside.
Your mission, should you choose to accept it:
- get Brion and Tim to agree that this is something they wouldn't
actually be horrified to put on the site if it passed technical muster and the community were going "hell yes" loud enough.
- get the community support in place.
- write something for MediaWiki that does this but that Brion and Tim
would be willing to put on the live servers.
Brion and Tim are reasonable people, and from the technical side are likely open to solid arguments that the idea won't cause the system to crash or will be reversible if the idea doesn't accomplish its objectives.
Community support is a different matter. A no-brainer like flagged versions has been under discussion for nearly three years, but the opponents have developed a huge vocal fan club. I wouldn't want to give that crowd any more respect than they deserve.
Parameters and methodology are details that can be adjusted after we have more experience with a system. It can be extremely difficult to widen the perspective of those who are focused on a narrow subset of the issues. Preconceptions are not a valid substitute for hypothesis testing.
Ec
David Gerard wrote:
2009/1/15 Ray Saintonge:
David Gerard wrote:
Just ignoring the top and bottom 10% of ratings can do wonders for this sort of thing, by the way.
That would work, but may not be necessary if the number of raters is large. I suspect that the figures with and without truncation will tend to converge. If a rating system of this sort were implemented it should be an easily tested hypothesis.
Another idea: make all ratings public information, because they're part of the process of working on the encyclopedia so should be viewable for transparency.
I don't think it's necessary information, but if that what it takes to get a useful system in place, I wouldn't stand in the way.
Ec
2009/1/14 Ray Saintonge saintonge@telus.net:
That's an unrealistic expectation. How long has it taken to build up this list of 8200 articles? While the GA/FA has its usefulness, it is not scalable nor equal to the task of being a general rating mechanism.
Particularly as the FAC regulars expressly raise the bar higher whenever the rate of FAs seems to be going up, as they want it to be for the really very very very best articles. (Or they did last time I asked on WT:FAC about this.)
- d.
FAC seeks quality not quantity which is fine. FAness has no bearing on article worthyness. - White Cat
On Wed, Jan 14, 2009 at 11:53 PM, David Gerard dgerard@gmail.com wrote:
2009/1/14 Ray Saintonge saintonge@telus.net:
That's an unrealistic expectation. How long has it taken to build up this list of 8200 articles? While the GA/FA has its usefulness, it is not scalable nor equal to the task of being a general rating mechanism.
Particularly as the FAC regulars expressly raise the bar higher whenever the rate of FAs seems to be going up, as they want it to be for the really very very very best articles. (Or they did last time I asked on WT:FAC about this.)
- d.
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l