Re: [Wikimediauk-l] [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

List overview All Threads
Download

newer

older

Contractor scoping brief: call for...

Re: [Wikimediauk-l] Rating...

Michael Maggs

16 Apr 2014 16 Apr '14

5:25 p.m.

On 26 Mar 2014, at 21:35, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote:

...

<snip> It would be great if this sort of rating was being systematically checked - but at a vague estimate of thirty seconds to scan, grade, and tag, aggregated across all pages on enwiki, that's about fifteen or twenty person-years of work to do it as a once-off, much less a rolling process. Andrew. On 25 March 2014 23:35, Pete Forsyth <peteforsyth(a)gmail.com> wrote: > Philippe, > > The Public Policy Initiative produced strong validation for the Wikipedia > 1.0 approach to assessing article quality. Was Amy Roth's research ever > published, and are there any plans to repeat it with a larger sample size > etc.? I'd say we're closer than you think to having a good way to measure > article quality. > > Pete > [[User:Peteforsyth]]

There is at present no comprehensive automated tool that can be used to measure article and media file quality. Measuring quantity is easy; quality much more difficult. At the Wikimedia Conference over the weekend I presented some thoughts about a possible software project, to be lead by Wikimedia UK, to tackle this. A review of the presentation, and slides, can be seen at: https://meta.wikimedia.org/wiki/Wikimedia_Conference_2014/Documentation/24#… The WMUK wiki page is here: https://wikimedia.org.uk/wiki/Technology_Committee/Project_requests/WikiRat… Comments and feedback are most welcome. In particular, we would like to know whether creating such tools would be considered a useful thing to do by the community. Best regards Michael ____________ Michael Maggs Chair, Wikimedia UK

Attachments:

attachment.htm (text/html — 2.7 KB)

Show replies by date

Deskana

16 Apr 16 Apr

6:24 p.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

Something similar was tried at the Wikimedia Foundation, with the Article Feedback Tool. This all happened before I joined the WMF, and I wasn't aware of it in my time as a volunteer, so I'm mostly reciting what others have told me here. Take what I say with a pinch of salt. There were two major problems that occurred with the Article Feedback Tool: 1) Way, way more feedback was generated than could be handled. 2) People often ended up rating the subject of the page rather than the content in it (e.g. [[Justin Bieber]] got lots of 1s and 5s). It would be wise to think about ways to mitigate these problems from the very start, so that they don't occur again. I'm happy to work with you on these, as much as my time allows. Thanks, Dan On 16 April 2014 10:25, Michael Maggs <Michael(a)maggs.name> wrote:

...

On 26 Mar 2014, at 21:35, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote: <snip> It would be great if this sort of rating was being systematically checked - but at a vague estimate of thirty seconds to scan, grade, and tag, aggregated across all pages on enwiki, that's about fifteen or twenty person-years of work to do it as a once-off, much less a rolling process. Andrew. On 25 March 2014 23:35, Pete Forsyth <peteforsyth(a)gmail.com> wrote: Philippe, The Public Policy Initiative produced strong validation for the Wikipedia 1.0 approach to assessing article quality. Was Amy Roth's research ever published, and are there any plans to repeat it with a larger sample size etc.? I'd say we're closer than you think to having a good way to measure article quality. Pete [[User:Peteforsyth]] There is at present no comprehensive automated tool that can be used to measure article and media file quality. Measuring *quantity* is easy; *quality* much more difficult. At the Wikimedia Conference over the weekend I presented some thoughts about a possible software project, to be lead by Wikimedia UK, to tackle this. A review of the presentation, and slides, can be seen at: https://meta.wikimedia.org/wiki/Wikimedia_Conference_2014/Documentation/24#… The WMUK wiki page is here: https://wikimedia.org.uk/wiki/Technology_Committee/Project_requests/WikiRat… Comments and feedback are most welcome. In particular, we would like to know whether creating such tools would be considered a useful thing to do by the community. Best regards Michael ____________ Michael Maggs Chair, Wikimedia UK _______________________________________________ Wikimedia UK mailing list wikimediauk-l(a)wikimedia.org http://mail.wikimedia.org/mailman/listinfo/wikimediauk-l WMUK: https://wikimedia.org.uk

Simon Knight

6:28 p.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

Thanks Dan, to be clear, the proposal is not to develop another manual rating system (such as the AFT or the project rating systems), it’s to develop some automated quality assessments. Those might include some manual elements as inputs particularly for any machine learning approach, but generating new methods there is not the aim of the project. Cheers Simon From: wikimediauk-l-bounces(a)lists.wikimedia.org [mailto:wikimediauk-l-bounces@lists.wikimedia.org] On Behalf Of Deskana Sent: 16 April 2014 19:24 To: UK Wikimedia mailing list Subject: Re: [Wikimediauk-l] [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing) Something similar was tried at the Wikimedia Foundation, with the Article Feedback Tool. This all happened before I joined the WMF, and I wasn't aware of it in my time as a volunteer, so I'm mostly reciting what others have told me here. Take what I say with a pinch of salt. There were two major problems that occurred with the Article Feedback Tool: 1) Way, way more feedback was generated than could be handled. 2) People often ended up rating the subject of the page rather than the content in it (e.g. [[Justin Bieber]] got lots of 1s and 5s). It would be wise to think about ways to mitigate these problems from the very start, so that they don't occur again. I'm happy to work with you on these, as much as my time allows. Thanks, Dan On 16 April 2014 10:25, Michael Maggs <Michael(a)maggs.name> wrote: On 26 Mar 2014, at 21:35, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote: <snip> It would be great if this sort of rating was being systematically checked - but at a vague estimate of thirty seconds to scan, grade, and tag, aggregated across all pages on enwiki, that's about fifteen or twenty person-years of work to do it as a once-off, much less a rolling process. Andrew. On 25 March 2014 23:35, Pete Forsyth <peteforsyth(a)gmail.com> wrote: Philippe, The Public Policy Initiative produced strong validation for the Wikipedia 1.0 approach to assessing article quality. Was Amy Roth's research ever published, and are there any plans to repeat it with a larger sample size etc.? I'd say we're closer than you think to having a good way to measure article quality. Pete [[User:Peteforsyth]] There is at present no comprehensive automated tool that can be used to measure article and media file quality. Measuring quantity is easy; quality much more difficult. At the Wikimedia Conference over the weekend I presented some thoughts about a possible software project, to be lead by Wikimedia UK, to tackle this. A review of the presentation, and slides, can be seen at: https://meta.wikimedia.org/wiki/Wikimedia_Conference_2014/Documentation/24#… The WMUK wiki page is here: https://wikimedia.org.uk/wiki/Technology_Committee/Project_requests/WikiRat… Comments and feedback are most welcome. In particular, we would like to know whether creating such tools would be considered a useful thing to do by the community. Best regards Michael ____________ Michael Maggs Chair, Wikimedia UK _______________________________________________ Wikimedia UK mailing list wikimediauk-l(a)wikimedia.org http://mail.wikimedia.org/mailman/listinfo/wikimediauk-l WMUK: https://wikimedia.org.uk

Charles Matthews

6:53 p.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

On 16 April 2014 19:28, Simon Knight <sjgknight(a)gmail.com> wrote:

...

Thanks Dan, to be clear, the proposal is not to develop another manual rating system (such as the AFT or the project rating systems), it's to develop some automated quality assessments. Those might include some manual elements as inputs particularly for any machine learning approach, but generating new methods there is not the aim of the project. There's the old DREWS acronym from How Wikipedia Works. to which I'd now

add T for traffic. In other words there are six factors that an experienced human would use to analyse quality, looking in particular for warning signs. D = Discussion: crunch the talk page (20 archives = controversial, while no comments indicates possible neglect) R = WikiProject rating, FWIW, if there is one. E = Edit history. A single editor, or essentially only one editor with tweaking, is a warning sign. (Though not if it is me, obviously) W = Writing. This would take some sort of text analysis. Work to do here. Includes detection of non-standard format, which would suggest neglect by experienced editors. S = Sources. Count footnotes and so on. T = Traffic. Pages at 100 hits per month are not getting many eyeballs. Warning sign. Very high traffic is another issue. Seems to me that there is enough to bite on, here. Charles

Jonathan Cardy

17 Apr 17 Apr

7:11 a.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

For media files there are two things we already have, size of the file and who created it. Filesize doesn't guarantee quality, but there must be a size below which we just have a blurry thumbnail. Assessing the possible quality of images by looking at how many other quality images have come from the same user is likely to be helpful. Another image by someone who has had multiple featured pictures should be a positive on any rating system. We now have hundreds of thousands of images that are categorised as being of various important objects such as listed buildings in the UK. Other variables we could consider include the camera used, again a good camera with the settings mucked up is likely to take a worse photograph than a cheap camera in the hands of someone who knows how to coax the bet out of it. Several other things may or may not be possible, though if there isn't already open sourced software that can do it it could be expensive to create it: Technology exists out there that can compare different human faces, so it should be possible to write software that classifies images as blurred, washed out or poorly lit. Also we could have software that identifies near duplicates and in some fashion threads similar images behind the "best" one. The AFT was always vulnerable to just creating extra work for the community and diverting people from possibly improving articles to asking others to fix them. But software could be written that encouraged rather than undermined the SoFixIt culture of our heyday. If we had an app that invited people to check "Is this still the best image to illustrate this Wikipedia article?" Then we could feed it with articles where we believed we had more images on commons than we were using, or where the same image had been used for several years, or the image used was small or otherwise suspect, or indeed where we didn't have an image. Leutha and I ran a couple of sessions a year or so back showing donors how to add images to articles, and we found that even people who were quite hesitant about editing were very confident deciding which of several images could be used to illustrate an article, even if they had never been to the village or river in question. As well as improving articles and acting as a much needed new entrypoint for editors, this could give us another metric on media quality - "one of x images considered but not used to illustrate the article on that subject" or indeed "image z was replaced in article x by image y" which should usually mean that image y is higher "quality" than image z. Some sort of weighted score that combines the above could be what we need, though of course many of the criteria are subjective and scores could drop over time as better media files are uploaded. Jonathan Cardy GLAM (Galleries, Libraries, Archives & Museums) Organiser/Trefnydd GLAM (Galeriau, Llyfrgelloedd, Archifdai a llawer Mwy!) Wikimedia UK 0207 065 0990 Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects). Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents. Press Enter to send your message. On 16 April 2014 19:53, Charles Matthews <charles.r.matthews(a)ntlworld.com>wrote;wrote:

...

On 16 April 2014 19:28, Simon Knight <sjgknight(a)gmail.com> wrote:

Simon Knight

7:34 a.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

Thanks Jonathan, I really like the photo-selection idea you suggest to onboard new editors. I wonder if anyone has thoughts on where this conversation should happen? I was going to say it'd be better on wiki than mailing list, but really both places suffer from the same problem. It may be very difficult to keep track of the various discussions wherever they are, so I also wonder if we should suggest organising a workshop of some kind. Perhaps we could have some sort of informal fringe thing at Wikimania when lots of people (including e.g. Aaron Halfaker) will also be over. Thanks for all the thoughts! Simon From: wikimediauk-l-bounces(a)lists.wikimedia.org [mailto:wikimediauk-l-bounces@lists.wikimedia.org] On Behalf Of Jonathan Cardy Sent: 17 April 2014 08:12 To: UK Wikimedia mailing list Subject: Re: [Wikimediauk-l] [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing) For media files there are two things we already have, size of the file and who created it. Filesize doesn't guarantee quality, but there must be a size below which we just have a blurry thumbnail. Assessing the possible quality of images by looking at how many other quality images have come from the same user is likely to be helpful. Another image by someone who has had multiple featured pictures should be a positive on any rating system. We now have hundreds of thousands of images that are categorised as being of various important objects such as listed buildings in the UK. Other variables we could consider include the camera used, again a good camera with the settings mucked up is likely to take a worse photograph than a cheap camera in the hands of someone who knows how to coax the bet out of it. Several other things may or may not be possible, though if there isn't already open sourced software that can do it it could be expensive to create it: Technology exists out there that can compare different human faces, so it should be possible to write software that classifies images as blurred, washed out or poorly lit. Also we could have software that identifies near duplicates and in some fashion threads similar images behind the "best" one. The AFT was always vulnerable to just creating extra work for the community and diverting people from possibly improving articles to asking others to fix them. But software could be written that encouraged rather than undermined the SoFixIt culture of our heyday. If we had an app that invited people to check "Is this still the best image to illustrate this Wikipedia article?" Then we could feed it with articles where we believed we had more images on commons than we were using, or where the same image had been used for several years, or the image used was small or otherwise suspect, or indeed where we didn't have an image. Leutha and I ran a couple of sessions a year or so back showing donors how to add images to articles, and we found that even people who were quite hesitant about editing were very confident deciding which of several images could be used to illustrate an article, even if they had never been to the village or river in question. As well as improving articles and acting as a much needed new entrypoint for editors, this could give us another metric on media quality - "one of x images considered but not used to illustrate the article on that subject" or indeed "image z was replaced in article x by image y" which should usually mean that image y is higher "quality" than image z. Some sort of weighted score that combines the above could be what we need, though of course many of the criteria are subjective and scores could drop over time as better media files are uploaded. Jonathan Cardy GLAM (Galleries, Libraries, Archives & Museums) Organiser/Trefnydd GLAM (Galeriau, Llyfrgelloedd, Archifdai a llawer Mwy!) Wikimedia UK 0207 065 0990 Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects). Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents. Press Enter to send your message. On 16 April 2014 19:53, Charles Matthews <charles.r.matthews(a)ntlworld.com> wrote: On 16 April 2014 19:28, Simon Knight <sjgknight(a)gmail.com> wrote: Thanks Dan, to be clear, the proposal is not to develop another manual rating system (such as the AFT or the project rating systems), it's to develop some automated quality assessments. Those might include some manual elements as inputs particularly for any machine learning approach, but generating new methods there is not the aim of the project. There's the old DREWS acronym from How Wikipedia Works. to which I'd now add T for traffic. In other words there are six factors that an experienced human would use to analyse quality, looking in particular for warning signs. D = Discussion: crunch the talk page (20 archives = controversial, while no comments indicates possible neglect) R = WikiProject rating, FWIW, if there is one. E = Edit history. A single editor, or essentially only one editor with tweaking, is a warning sign. (Though not if it is me, obviously) W = Writing. This would take some sort of text analysis. Work to do here. Includes detection of non-standard format, which would suggest neglect by experienced editors. S = Sources. Count footnotes and so on. T = Traffic. Pages at 100 hits per month are not getting many eyeballs. Warning sign. Very high traffic is another issue. Seems to me that there is enough to bite on, here. Charles _______________________________________________ Wikimedia UK mailing list wikimediauk-l(a)wikimedia.org http://mail.wikimedia.org/mailman/listinfo/wikimediauk-l WMUK: https://wikimedia.org.uk

Charles Matthews

7:38 a.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

One possible model of where this could head: Wikisource's ribbon system for indicating degree of proof-reading. Basically this is a traffic-light colour code. Pages that draw on several pages of an original book may be a mixture of text that is unproofed (red/pink), text that has been proof-read once (amber/yellow), and text that has been validated by a second proof-reader (green). See for example https://en.wikisource.org/wiki/Edward_VI_(DNB00). A Wikipedia page system has to take account of more and disparate factors, if it is based on the sort of mechanical processing under discussion. As I described, certain things would "raise a flag". For example, if the footer sections weren't in the standard order per the manual, that would be a low-level indicator of possible neglect. On the other hand a WikiProject rating, if present, is better news: someone apparently cares. Rather than try to render different straws in the wind down into a single score, this kind of ribbon system would keep several dimensions going. Charles

Fæ

8:22 a.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

On 17 April 2014 08:11, Jonathan Cardy <jonathan.cardy(a)wikimedia.org.uk> wrote:

...

For media files there are two things we already have, size of the file and who created it. Filesize doesn't guarantee quality, but there must be a size below which we just have a blurry thumbnail.

Have a play with <http://tools.wmflabs.org/faebot/cgi-bin/TARDIS.py?file=TARDIS.jpg&category=TARDIS>, it would not be hard to adapt into reports. This gave a way of solving arguments in Commons Deletion Requests by comparing a file's size and pixel resolution to others in similar categories. There was no easy way of doing this on-wiki. Knowing that a file is in the top 25% even by this crude measure, suddenly makes it appear more valuable, while a doubtful file in the bottom 10% seems a good candidate for deletion if it is a marginal out of scope case. Setting "hard" measures for size or resolution is not always meaningful. Many small images may have educational use and have no higher resolution equivalent, though in my size comparison report (off-line) I do have a version that plucks out the smallest images in a category and passes them back as a re-paste-able gallery for review. Fae -- faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae

Michael Maggs

8:54 a.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

That looks a very interesting tool. Definitely worth adding to the wiki page as related work that we could potentially make use of (if you permit, Fae; the source code is not open is it?) Michael

...

Chris McKenna

9:41 a.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

On Thu, 17 Apr 2014, Jonathan Cardy wrote:

...

For media files there are two things we already have, size of the file and who created it. Filesize doesn't guarantee quality, but there must be a size below which we just have a blurry thumbnail.

This is only true of photographic images. Quality of SVG files is entirely independent of filesize and dimensions.

...

Assessing the possible quality of images by looking at how many other quality images have come from the same user is likely to be helpful. Another image by someone who has had multiple featured pictures should be a positive on any rating system.

There is a danger in this of selecting images only from those who submit their work for review. One user on Commons has possibly hundreds of Quality Images of rolling stock, I have none. I know I am biased but some (not all by any means) of my images are of equal or greater quality but I have never submitted my images for review in this way. This is not to say the approaches are without merit, just that it is important to be aware of potential limitations. ---- Chris McKenna cmckenna(a)sucs.org www.sucs.org/~cmckenna The essential things in life are seen not with the eyes, but with the heart Antoine de Saint Exupery

Michael Maggs

21 Apr 21 Apr

3:19 p.m.

New subject: [Wikimedia-l] Rating Wikimedia content (was Our next strategy plan-Paid editing)

Charles Thanks, some excellent points there. To ensure they are not overlooked I’ve copied your entire text, below, to the talk page [1]. Hope that’s OK. Michael [1] https://wikimedia.org.uk/wiki/Talk:Technology_Committee/Project_requests/Wi… On 16 Apr 2014, at 19:53, Charles Matthews <charles.r.matthews(a)ntlworld.com> wrote:

...

There's the old DREWS acronym from How Wikipedia Works. to which I'd now add T for traffic. In other words there are six factors that an experienced human would use to analyse quality, looking in particular for warning signs. D = Discussion: crunch the talk page (20 archives = controversial, while no comments indicates possible neglect) R = WikiProject rating, FWIW, if there is one. E = Edit history. A single editor, or essentially only one editor with tweaking, is a warning sign. (Though not if it is me, obviously) W = Writing. This would take some sort of text analysis. Work to do here. Includes detection of non-standard format, which would suggest neglect by experienced editors. S = Sources. Count footnotes and so on. T = Traffic. Pages at 100 hits per month are not getting many eyeballs. Warning sign. Very high traffic is another issue. Seems to me that there is enough to bite on, here. Charles

3680

days inactive

3685

days old

wikimediauk-l@lists.wikimedia.org

Manage subscription

10 comments

7 participants

tags (0)

participants (7)

Charles Matthews
Chris McKenna
Deskana
Fæ
Jonathan Cardy
Michael Maggs
Simon Knight