On 26 Mar 2014, at 21:35, Andrew Gray andrew.gray@dunelm.org.uk wrote:
<snip>
It would be great if this sort of rating was being systematically checked - but at a vague estimate of thirty seconds to scan, grade, and tag, aggregated across all pages on enwiki, that's about fifteen or twenty person-years of work to do it as a once-off, much less a rolling process.
Andrew.
On 25 March 2014 23:35, Pete Forsyth peteforsyth@gmail.com wrote:
Philippe,
The Public Policy Initiative produced strong validation for the Wikipedia 1.0 approach to assessing article quality. Was Amy Roth's research ever published, and are there any plans to repeat it with a larger sample size etc.? I'd say we're closer than you think to having a good way to measure article quality.
Pete [[User:Peteforsyth]]
There is at present no comprehensive automated tool that can be used to measure article and media file quality. Measuring quantity is easy; quality much more difficult.
At the Wikimedia Conference over the weekend I presented some thoughts about a possible software project, to be lead by Wikimedia UK, to tackle this.
A review of the presentation, and slides, can be seen at: https://meta.wikimedia.org/wiki/Wikimedia_Conference_2014/Documentation/24#M...
The WMUK wiki page is here: https://wikimedia.org.uk/wiki/Technology_Committee/Project_requests/WikiRate...
Comments and feedback are most welcome. In particular, we would like to know whether creating such tools would be considered a useful thing to do by the community.
Best regards
Michael
____________ Michael Maggs Chair, Wikimedia UK
Something similar was tried at the Wikimedia Foundation, with the Article Feedback Tool. This all happened before I joined the WMF, and I wasn't aware of it in my time as a volunteer, so I'm mostly reciting what others have told me here. Take what I say with a pinch of salt.
There were two major problems that occurred with the Article Feedback Tool: 1) Way, way more feedback was generated than could be handled. 2) People often ended up rating the subject of the page rather than the content in it (e.g. [[Justin Bieber]] got lots of 1s and 5s).
It would be wise to think about ways to mitigate these problems from the very start, so that they don't occur again. I'm happy to work with you on these, as much as my time allows.
Thanks, Dan
On 16 April 2014 10:25, Michael Maggs Michael@maggs.name wrote:
On 26 Mar 2014, at 21:35, Andrew Gray andrew.gray@dunelm.org.uk wrote:
<snip>
It would be great if this sort of rating was being systematically checked - but at a vague estimate of thirty seconds to scan, grade, and tag, aggregated across all pages on enwiki, that's about fifteen or twenty person-years of work to do it as a once-off, much less a rolling process.
Andrew.
On 25 March 2014 23:35, Pete Forsyth peteforsyth@gmail.com wrote:
Philippe,
The Public Policy Initiative produced strong validation for the Wikipedia 1.0 approach to assessing article quality. Was Amy Roth's research ever published, and are there any plans to repeat it with a larger sample size etc.? I'd say we're closer than you think to having a good way to measure article quality.
Pete [[User:Peteforsyth]]
There is at present no comprehensive automated tool that can be used to measure article and media file quality. Measuring *quantity* is easy; *quality* much more difficult.
At the Wikimedia Conference over the weekend I presented some thoughts about a possible software project, to be lead by Wikimedia UK, to tackle this.
A review of the presentation, and slides, can be seen at: https://meta.wikimedia.org/wiki/Wikimedia_Conference_2014/Documentation/24#M...
The WMUK wiki page is here: https://wikimedia.org.uk/wiki/Technology_Committee/Project_requests/WikiRate...
Comments and feedback are most welcome. In particular, we would like to know whether creating such tools would be considered a useful thing to do by the community.
Best regards
Michael
Michael Maggs Chair, Wikimedia UK
Wikimedia UK mailing list wikimediauk-l@wikimedia.org http://mail.wikimedia.org/mailman/listinfo/wikimediauk-l WMUK: https://wikimedia.org.uk
Thanks Dan, to be clear, the proposal is not to develop another manual rating system (such as the AFT or the project rating systems), it’s to develop some automated quality assessments. Those might include some manual elements as inputs particularly for any machine learning approach, but generating new methods there is not the aim of the project.
Cheers
Simon
From: wikimediauk-l-bounces@lists.wikimedia.org [mailto:wikimediauk-l-bounces@lists.wikimedia.org] On Behalf Of Deskana Sent: 16 April 2014 19:24 To: UK Wikimedia mailing list Subject: Re: [Wikimediauk-l] [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)
Something similar was tried at the Wikimedia Foundation, with the Article Feedback Tool. This all happened before I joined the WMF, and I wasn't aware of it in my time as a volunteer, so I'm mostly reciting what others have told me here. Take what I say with a pinch of salt.
There were two major problems that occurred with the Article Feedback Tool:
1) Way, way more feedback was generated than could be handled.
2) People often ended up rating the subject of the page rather than the content in it (e.g. [[Justin Bieber]] got lots of 1s and 5s).
It would be wise to think about ways to mitigate these problems from the very start, so that they don't occur again. I'm happy to work with you on these, as much as my time allows.
Thanks,
Dan
On 16 April 2014 10:25, Michael Maggs Michael@maggs.name wrote:
On 26 Mar 2014, at 21:35, Andrew Gray andrew.gray@dunelm.org.uk wrote:
<snip>
It would be great if this sort of rating was being systematically checked - but at a vague estimate of thirty seconds to scan, grade, and tag, aggregated across all pages on enwiki, that's about fifteen or twenty person-years of work to do it as a once-off, much less a rolling process.
Andrew.
On 25 March 2014 23:35, Pete Forsyth peteforsyth@gmail.com wrote:
Philippe,
The Public Policy Initiative produced strong validation for the Wikipedia 1.0 approach to assessing article quality. Was Amy Roth's research ever published, and are there any plans to repeat it with a larger sample size etc.? I'd say we're closer than you think to having a good way to measure article quality.
Pete [[User:Peteforsyth]]
There is at present no comprehensive automated tool that can be used to measure article and media file quality. Measuring quantity is easy; quality much more difficult.
At the Wikimedia Conference over the weekend I presented some thoughts about a possible software project, to be lead by Wikimedia UK, to tackle this.
A review of the presentation, and slides, can be seen at: https://meta.wikimedia.org/wiki/Wikimedia_Conference_2014/Documentation/24#M...
The WMUK wiki page is here: https://wikimedia.org.uk/wiki/Technology_Committee/Project_requests/WikiRate...
Comments and feedback are most welcome. In particular, we would like to know whether creating such tools would be considered a useful thing to do by the community.
Best regards
Michael
____________
Michael Maggs
Chair, Wikimedia UK
_______________________________________________ Wikimedia UK mailing list wikimediauk-l@wikimedia.org http://mail.wikimedia.org/mailman/listinfo/wikimediauk-l WMUK: https://wikimedia.org.uk
On 16 April 2014 19:28, Simon Knight sjgknight@gmail.com wrote:
Thanks Dan, to be clear, the proposal is not to develop another manual rating system (such as the AFT or the project rating systems), it's to develop some automated quality assessments. Those might include some manual elements as inputs particularly for any machine learning approach, but generating new methods there is not the aim of the project.
There's the old DREWS acronym from How Wikipedia Works. to which I'd now
add T for traffic. In other words there are six factors that an experienced human would use to analyse quality, looking in particular for warning signs.
D = Discussion: crunch the talk page (20 archives = controversial, while no comments indicates possible neglect) R = WikiProject rating, FWIW, if there is one. E = Edit history. A single editor, or essentially only one editor with tweaking, is a warning sign. (Though not if it is me, obviously) W = Writing. This would take some sort of text analysis. Work to do here. Includes detection of non-standard format, which would suggest neglect by experienced editors. S = Sources. Count footnotes and so on. T = Traffic. Pages at 100 hits per month are not getting many eyeballs. Warning sign. Very high traffic is another issue.
Seems to me that there is enough to bite on, here.
Charles
For media files there are two things we already have, size of the file and who created it.
Filesize doesn't guarantee quality, but there must be a size below which we just have a blurry thumbnail.
Assessing the possible quality of images by looking at how many other quality images have come from the same user is likely to be helpful. Another image by someone who has had multiple featured pictures should be a positive on any rating system.
We now have hundreds of thousands of images that are categorised as being of various important objects such as listed buildings in the UK.
Other variables we could consider include the camera used, again a good camera with the settings mucked up is likely to take a worse photograph than a cheap camera in the hands of someone who knows how to coax the bet out of it.
Several other things may or may not be possible, though if there isn't already open sourced software that can do it it could be expensive to create it: Technology exists out there that can compare different human faces, so it should be possible to write software that classifies images as blurred, washed out or poorly lit. Also we could have software that identifies near duplicates and in some fashion threads similar images behind the "best" one.
The AFT was always vulnerable to just creating extra work for the community and diverting people from possibly improving articles to asking others to fix them. But software could be written that encouraged rather than undermined the SoFixIt culture of our heyday. If we had an app that invited people to check "Is this still the best image to illustrate this Wikipedia article?" Then we could feed it with articles where we believed we had more images on commons than we were using, or where the same image had been used for several years, or the image used was small or otherwise suspect, or indeed where we didn't have an image. Leutha and I ran a couple of sessions a year or so back showing donors how to add images to articles, and we found that even people who were quite hesitant about editing were very confident deciding which of several images could be used to illustrate an article, even if they had never been to the village or river in question. As well as improving articles and acting as a much needed new entrypoint for editors, this could give us another metric on media quality - "one of x images considered but not used to illustrate the article on that subject" or indeed "image z was replaced in article x by image y" which should usually mean that image y is higher "quality" than image z.
Some sort of weighted score that combines the above could be what we need, though of course many of the criteria are subjective and scores could drop over time as better media files are uploaded.
Jonathan Cardy GLAM (Galleries, Libraries, Archives & Museums) Organiser/Trefnydd GLAM (Galeriau, Llyfrgelloedd, Archifdai a llawer Mwy!) Wikimedia UK 0207 065 0990
Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).
Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.
Press Enter to send your message.
On 16 April 2014 19:53, Charles Matthews charles.r.matthews@ntlworld.comwrote:
On 16 April 2014 19:28, Simon Knight sjgknight@gmail.com wrote:
Thanks Dan, to be clear, the proposal is not to develop another manual rating system (such as the AFT or the project rating systems), it's to develop some automated quality assessments. Those might include some manual elements as inputs particularly for any machine learning approach, but generating new methods there is not the aim of the project.
There's the old DREWS acronym from How Wikipedia Works. to which I'd now
add T for traffic. In other words there are six factors that an experienced human would use to analyse quality, looking in particular for warning signs.
D = Discussion: crunch the talk page (20 archives = controversial, while no comments indicates possible neglect) R = WikiProject rating, FWIW, if there is one. E = Edit history. A single editor, or essentially only one editor with tweaking, is a warning sign. (Though not if it is me, obviously) W = Writing. This would take some sort of text analysis. Work to do here. Includes detection of non-standard format, which would suggest neglect by experienced editors. S = Sources. Count footnotes and so on. T = Traffic. Pages at 100 hits per month are not getting many eyeballs. Warning sign. Very high traffic is another issue.
Seems to me that there is enough to bite on, here.
Charles
Wikimedia UK mailing list wikimediauk-l@wikimedia.org http://mail.wikimedia.org/mailman/listinfo/wikimediauk-l WMUK: https://wikimedia.org.uk
Thanks Jonathan, I really like the photo-selection idea you suggest to onboard new editors.
I wonder if anyone has thoughts on where this conversation should happen? I was going to say it'd be better on wiki than mailing list, but really both places suffer from the same problem. It may be very difficult to keep track of the various discussions wherever they are, so I also wonder if we should suggest organising a workshop of some kind. Perhaps we could have some sort of informal fringe thing at Wikimania when lots of people (including e.g. Aaron Halfaker) will also be over.
Thanks for all the thoughts!
Simon
From: wikimediauk-l-bounces@lists.wikimedia.org [mailto:wikimediauk-l-bounces@lists.wikimedia.org] On Behalf Of Jonathan Cardy Sent: 17 April 2014 08:12 To: UK Wikimedia mailing list Subject: Re: [Wikimediauk-l] [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)
For media files there are two things we already have, size of the file and who created it.
Filesize doesn't guarantee quality, but there must be a size below which we just have a blurry thumbnail.
Assessing the possible quality of images by looking at how many other quality images have come from the same user is likely to be helpful. Another image by someone who has had multiple featured pictures should be a positive on any rating system.
We now have hundreds of thousands of images that are categorised as being of various important objects such as listed buildings in the UK.
Other variables we could consider include the camera used, again a good camera with the settings mucked up is likely to take a worse photograph than a cheap camera in the hands of someone who knows how to coax the bet out of it.
Several other things may or may not be possible, though if there isn't already open sourced software that can do it it could be expensive to create it: Technology exists out there that can compare different human faces, so it should be possible to write software that classifies images as blurred, washed out or poorly lit. Also we could have software that identifies near duplicates and in some fashion threads similar images behind the "best" one.
The AFT was always vulnerable to just creating extra work for the community and diverting people from possibly improving articles to asking others to fix them. But software could be written that encouraged rather than undermined the SoFixIt culture of our heyday. If we had an app that invited people to check "Is this still the best image to illustrate this Wikipedia article?" Then we could feed it with articles where we believed we had more images on commons than we were using, or where the same image had been used for several years, or the image used was small or otherwise suspect, or indeed where we didn't have an image. Leutha and I ran a couple of sessions a year or so back showing donors how to add images to articles, and we found that even people who were quite hesitant about editing were very confident deciding which of several images could be used to illustrate an article, even if they had never been to the village or river in question. As well as improving articles and acting as a much needed new entrypoint for editors, this could give us another metric on media quality - "one of x images considered but not used to illustrate the article on that subject" or indeed "image z was replaced in article x by image y" which should usually mean that image y is higher "quality" than image z.
Some sort of weighted score that combines the above could be what we need, though of course many of the criteria are subjective and scores could drop over time as better media files are uploaded.
Jonathan Cardy GLAM (Galleries, Libraries, Archives & Museums) Organiser/Trefnydd GLAM (Galeriau, Llyfrgelloedd, Archifdai a llawer Mwy!) Wikimedia UK 0207 065 0990
Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).
Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.
Press Enter to send your message.
On 16 April 2014 19:53, Charles Matthews charles.r.matthews@ntlworld.com wrote:
On 16 April 2014 19:28, Simon Knight sjgknight@gmail.com wrote:
Thanks Dan, to be clear, the proposal is not to develop another manual rating system (such as the AFT or the project rating systems), it's to develop some automated quality assessments. Those might include some manual elements as inputs particularly for any machine learning approach, but generating new methods there is not the aim of the project.
There's the old DREWS acronym from How Wikipedia Works. to which I'd now add T for traffic. In other words there are six factors that an experienced human would use to analyse quality, looking in particular for warning signs.
D = Discussion: crunch the talk page (20 archives = controversial, while no comments indicates possible neglect)
R = WikiProject rating, FWIW, if there is one.
E = Edit history. A single editor, or essentially only one editor with tweaking, is a warning sign. (Though not if it is me, obviously)
W = Writing. This would take some sort of text analysis. Work to do here. Includes detection of non-standard format, which would suggest neglect by experienced editors.
S = Sources. Count footnotes and so on.
T = Traffic. Pages at 100 hits per month are not getting many eyeballs. Warning sign. Very high traffic is another issue.
Seems to me that there is enough to bite on, here.
Charles
_______________________________________________ Wikimedia UK mailing list wikimediauk-l@wikimedia.org http://mail.wikimedia.org/mailman/listinfo/wikimediauk-l WMUK: https://wikimedia.org.uk
One possible model of where this could head: Wikisource's ribbon system for indicating degree of proof-reading. Basically this is a traffic-light colour code. Pages that draw on several pages of an original book may be a mixture of text that is unproofed (red/pink), text that has been proof-read once (amber/yellow), and text that has been validated by a second proof-reader (green).
See for example https://en.wikisource.org/wiki/Edward_VI_(DNB00).
A Wikipedia page system has to take account of more and disparate factors, if it is based on the sort of mechanical processing under discussion. As I described, certain things would "raise a flag". For example, if the footer sections weren't in the standard order per the manual, that would be a low-level indicator of possible neglect. On the other hand a WikiProject rating, if present, is better news: someone apparently cares.
Rather than try to render different straws in the wind down into a single score, this kind of ribbon system would keep several dimensions going.
Charles
On 17 April 2014 08:11, Jonathan Cardy jonathan.cardy@wikimedia.org.uk wrote:
For media files there are two things we already have, size of the file and who created it.
Filesize doesn't guarantee quality, but there must be a size below which we just have a blurry thumbnail.
Have a play with http://tools.wmflabs.org/faebot/cgi-bin/TARDIS.py?file=TARDIS.jpg&category=TARDIS, it would not be hard to adapt into reports.
This gave a way of solving arguments in Commons Deletion Requests by comparing a file's size and pixel resolution to others in similar categories. There was no easy way of doing this on-wiki. Knowing that a file is in the top 25% even by this crude measure, suddenly makes it appear more valuable, while a doubtful file in the bottom 10% seems a good candidate for deletion if it is a marginal out of scope case.
Setting "hard" measures for size or resolution is not always meaningful. Many small images may have educational use and have no higher resolution equivalent, though in my size comparison report (off-line) I do have a version that plucks out the smallest images in a category and passes them back as a re-paste-able gallery for review.
Fae
That looks a very interesting tool. Definitely worth adding to the wiki page as related work that we could potentially make use of (if you permit, Fae; the source code is not open is it?)
Michael
Have a play with http://tools.wmflabs.org/faebot/cgi-bin/TARDIS.py?file=TARDIS.jpg&category=TARDIS, it would not be hard to adapt into reports.
This gave a way of solving arguments in Commons Deletion Requests by comparing a file's size and pixel resolution to others in similar categories. There was no easy way of doing this on-wiki. Knowing that a file is in the top 25% even by this crude measure, suddenly makes it appear more valuable, while a doubtful file in the bottom 10% seems a good candidate for deletion if it is a marginal out of scope case.
Setting "hard" measures for size or resolution is not always meaningful. Many small images may have educational use and have no higher resolution equivalent, though in my size comparison report (off-line) I do have a version that plucks out the smallest images in a category and passes them back as a re-paste-able gallery for review.
Fae
faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
Wikimedia UK mailing list wikimediauk-l@wikimedia.org http://mail.wikimedia.org/mailman/listinfo/wikimediauk-l WMUK: https://wikimedia.org.uk
On Thu, 17 Apr 2014, Jonathan Cardy wrote:
For media files there are two things we already have, size of the file and who created it.
Filesize doesn't guarantee quality, but there must be a size below which we just have a blurry thumbnail.
This is only true of photographic images. Quality of SVG files is entirely independent of filesize and dimensions.
Assessing the possible quality of images by looking at how many other quality images have come from the same user is likely to be helpful. Another image by someone who has had multiple featured pictures should be a positive on any rating system.
There is a danger in this of selecting images only from those who submit their work for review. One user on Commons has possibly hundreds of Quality Images of rolling stock, I have none. I know I am biased but some (not all by any means) of my images are of equal or greater quality but I have never submitted my images for review in this way.
This is not to say the approaches are without merit, just that it is important to be aware of potential limitations.
---- Chris McKenna
cmckenna@sucs.org www.sucs.org/~cmckenna
The essential things in life are seen not with the eyes, but with the heart
Antoine de Saint Exupery
Charles
Thanks, some excellent points there. To ensure they are not overlooked I’ve copied your entire text, below, to the talk page [1]. Hope that’s OK.
Michael
[1] https://wikimedia.org.uk/wiki/Talk:Technology_Committee/Project_requests/Wik...
On 16 Apr 2014, at 19:53, Charles Matthews charles.r.matthews@ntlworld.com wrote:
There's the old DREWS acronym from How Wikipedia Works. to which I'd now add T for traffic. In other words there are six factors that an experienced human would use to analyse quality, looking in particular for warning signs.
D = Discussion: crunch the talk page (20 archives = controversial, while no comments indicates possible neglect) R = WikiProject rating, FWIW, if there is one. E = Edit history. A single editor, or essentially only one editor with tweaking, is a warning sign. (Though not if it is me, obviously) W = Writing. This would take some sort of text analysis. Work to do here. Includes detection of non-standard format, which would suggest neglect by experienced editors. S = Sources. Count footnotes and so on. T = Traffic. Pages at 100 hits per month are not getting many eyeballs. Warning sign. Very high traffic is another issue.
Seems to me that there is enough to bite on, here.
Charles
wikimediauk-l@lists.wikimedia.org