Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda versus other countries. I wanted to answer the question of what makes high-quality articles? Can anyone point me to any existing research on heuristics of Article Quality? That is, determining an articles quality by the wikitext properties, without human rating? I would also consider using data from the Article Feedback Tools, if there were dumps available for each Article in English, French, and Swahili Wikipedias. This is all the raw data I can seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based on:
* Per Section.
* Text length in each section
* Infoboxes in each section.
* Filled parameters in each infobox
* Images in each section
* Good Article, Featured Article?
* Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Best,
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Re other dimensions or heuristics:
Very few articles are rated as Featured, and not that many as Good, if you are going to use that rating systemhttps://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/AssessmentI'd suggest also including the lower levels, and indeed whether an article has been assessed and typically how long it takes for a new article to be assessed. Uganda for example has 1 Featured article, 3 Good Articles and nearly 400 unassessed on the English language Wikipediahttps://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content .
For a crowd sourced project like Wikipedia the size of the crowd is crucial and varies hugely per article. So I'd suggest counting the number of different editors other than bots who have contributed to the article. It might also be worth getting some measure of local internet speed or usage level as context. There was a big upgrade to East Africa's Internet connection a few years ago. For Wikipedia the crucial metric is the size of the Internet comfortable population with some free time and ready access to PCs, I'm not sure we've yet measured how long it takes from people getting internet access to their being sufficiently confident to edit Wikipedia articles, I suspect the answer is age related, but it would be worth checking the various editor surveys to see if this has been collected yet. My understanding is that in much of Africa many people are bypassing the whole PC thing and going straight to smartphones, and of course for mobilephone users Wikipedia is essentially a queryable media rather than an interactive editable one.
Whether or not a Wikipedia article has references is a quality dimension you might want to look at. At least on EN it is widely assumed to be a measure of quality, though I don't recall ever seeing a study of the relative accuracy of cited and uncited Wikipedia information.
Thankfully the Article Feedback tool has been almost eradicated from the English language Wikipedia, I don't know if it is still on French or Swahili. I don't see it as being connected to the quality of article, thouugh it should be an interesting measure of how loved or hated a given celebrity was during the time the tool was deployed. So I'd suggest ignoring it in your research on article quality.
Hope that helps
Jonathan
On 15 December 2013 06:15, Klein,Max kleinm@oclc.org wrote:
Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda versus other countries. I wanted to answer the question of what makes high-quality articles? Can anyone point me to any existing research on heuristics of Article Quality? That is, determining an articles quality by the wikitext properties, without human rating? I would also consider using data from the Article Feedback Tools, if there were dumps available for each Article in English, French, and Swahili Wikipedias. This is all the raw data I can seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based on:
Per Section. -
Text length in each section - Infoboxes in each section. - Filled parameters in each infobox - Images in each section -
Good Article, Featured Article?
Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Best, Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers < werespielchequers@gmail.com> wrote:
Re other dimensions or heuristics:
Very few articles are rated as Featured, and not that many as Good, if you are going to use that rating systemhttps://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/AssessmentI'd suggest also including the lower levels, and indeed whether an article has been assessed and typically how long it takes for a new article to be assessed. Uganda for example has 1 Featured article, 3 Good Articles and nearly 400 unassessed on the English language Wikipediahttps://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content .
For a crowd sourced project like Wikipedia the size of the crowd is crucial and varies hugely per article. So I'd suggest counting the number of different editors other than bots who have contributed to the article.
Except why would this be something that would be an indicator of quality? I've done an analysis recently of football player biographies where I looked at the total volume of edits, date created, total number of citations and total number of pictures and none of these factors correlates to article quality. You can have an article with 1,400 editors and still have it be assessed as a start. Indeed, some of the lesser known articles may actually attract specialist contributors who almost exclusively write to one topic and then take the article to DYK, GA, A or FA. The end result is you have articles with low page views that are really great that are maintained by one or two writers.
Whether or not a Wikipedia article has references is a quality dimension
you might want to look at. At least on EN it is widely assumed to
be a measure of quality, though I don't recall ever seeing a study of the
relative accuracy of cited and uncited Wikipedia information.
Yeah, I'd be skeptical of this overall though it might be bad. The problem is you could get say one contentious section of the article that ends up fully cited or overcited while the rest of the article ends up poorly cited. At the same time, you can get B articles that really should be GAs but people have been burned by that process so they just take it to B and left it there. I have heard this quite a few time from female Wikipedians operating in certain places that the process actually puts them off.
Re Laura's comment.
I don't dispute that there are plenty of high quality articles which have had only one or two contributors. However my assumption and experience is that in general the more editors the better the quality, and I'd love to see that assumption tested by research. There may be some maximum above which quality does not rise, and there are clearly a number of gifted members of the community whose work is as good as our best crowdsourced work, especially when the crowdsourcing element is to address the minor imperfection that comes from their own blind spot. It would be well worthwhile to learn if Women's football is an exception to this, or indeed if my own confidence in crowd sourcing is mistaken
I should also add that while I wouldn't filter out minor edits you might as well filter out reverted edits and their reversion. Some of our articles are notorious vandal targets and their quality is usually unaffected by a hundred vandalisms and reversions of vandalism per annum. Beaver before it was semi protected in Autumn 2011https://en.wikipedia.org/w/index.php?title=Beaver&offset=20111211084232&action=historybeing a case in point. This also feeds into Kerry's point that many assessments are outdated. An article that has been a vandalism target might have been edited a hundred times since it was assessed, and yet it is likely to have changed less than one with only half a dozen edits all of which added content.
Jonathan
On 15 December 2013 09:44, Laura Hale laura@fanhistory.com wrote:
On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers < werespielchequers@gmail.com> wrote:
Re other dimensions or heuristics:
Very few articles are rated as Featured, and not that many as Good, if you are going to use that rating systemhttps://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/AssessmentI'd suggest also including the lower levels, and indeed whether an article has been assessed and typically how long it takes for a new article to be assessed. Uganda for example has 1 Featured article, 3 Good Articles and nearly 400 unassessed on the English language Wikipediahttps://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content .
For a crowd sourced project like Wikipedia the size of the crowd is crucial and varies hugely per article. So I'd suggest counting the number of different editors other than bots who have contributed to the article.
Except why would this be something that would be an indicator of quality? I've done an analysis recently of football player biographies where I looked at the total volume of edits, date created, total number of citations and total number of pictures and none of these factors correlates to article quality. You can have an article with 1,400 editors and still have it be assessed as a start. Indeed, some of the lesser known articles may actually attract specialist contributors who almost exclusively write to one topic and then take the article to DYK, GA, A or FA. The end result is you have articles with low page views that are really great that are maintained by one or two writers.
Whether or not a Wikipedia article has references is a quality dimension
you might want to look at. At least on EN it is widely assumed to
be a measure of quality, though I don't recall ever seeing a study of the
relative accuracy of cited and uncited Wikipedia information.
Yeah, I'd be skeptical of this overall though it might be bad. The problem is you could get say one contentious section of the article that ends up fully cited or overcited while the rest of the article ends up poorly cited. At the same time, you can get B articles that really should be GAs but people have been burned by that process so they just take it to B and left it there. I have heard this quite a few time from female Wikipedians operating in certain places that the process actually puts them off.
-- twitter: purplepopple blog: ozziesport.com
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hello everybody,
I've been doing quite some work on article quality in Wikipedia - many heuristics have been mentioned here already. In my opinion, a set of universal indicators for quality that works for all of Wikipedia does not exist. This is mainly because the perception of quality is so different across various WikiProjects and subject areas in a single Wikipedia and even more so across different Wikipedia language versions. On a theoretical level, some universals can be identified. But as soon as concrete heuristics are to be identified, you will always have a bias towards the articles you used to identify these heuristics.
This aspect aside, having an abstract quality score that tells you how good an article is according to your heuristics doesn't help a lot in most cases. I much more like the approach to identify quality problems, which also gives you an idea of the quality of an article. I have done some work on this [1], [2] and there was a recent dissertation on the same topic [3].
I'm currently writing my dissertation on language technology methods to assist quality management in collaborative environments like Wikipedia. There, I start with a theoretical model, but as soon as the concrete heuristics come in to play, the model has to be grounded according to the concrete quality standards that have been established in a particular sub-community of Wikipedia. I'm still wrapping up my work, but if anybody wants to talk, I'll be happy to.
Regards, Oliver
[1] The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia Oliver Ferschke and Iryna Gurevych and Marc Rittberger In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). p. 721-730, August 2013. Sofia, Bulgaria.
[2] FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia - Notebook for PAN at CLEF 2012 Oliver Ferschke and Iryna Gurevych and Marc Rittberger In: CLEF 2012 Labs and Workshop, Notebook Papers, n. pag. September 2012. Rome, Italy.
[3] Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. Maik Anderka Dissertation, Bauhaus-Universität Weimar, June 2013
-- ------------------------------------------------------------------- Oliver Ferschke, M.A. Doctoral Researcher Ubiquitous Knowledge Processing Lab (UKP-TU DA) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111 ferschke@cs.tu-darmstadt.de www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de ------------------------------------------------------------------- ________________________________ Von: wiki-research-l-bounces@lists.wikimedia.org [wiki-research-l-bounces@lists.wikimedia.org]" im Auftrag von "WereSpielChequers [werespielchequers@gmail.com] Gesendet: Sonntag, 15. Dezember 2013 14:27 An: Research into Wikimedia content and communities Betreff: Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?
Re Laura's comment.
I don't dispute that there are plenty of high quality articles which have had only one or two contributors. However my assumption and experience is that in general the more editors the better the quality, and I'd love to see that assumption tested by research. There may be some maximum above which quality does not rise, and there are clearly a number of gifted members of the community whose work is as good as our best crowdsourced work, especially when the crowdsourcing element is to address the minor imperfection that comes from their own blind spot. It would be well worthwhile to learn if Women's football is an exception to this, or indeed if my own confidence in crowd sourcing is mistaken
I should also add that while I wouldn't filter out minor edits you might as well filter out reverted edits and their reversion. Some of our articles are notorious vandal targets and their quality is usually unaffected by a hundred vandalisms and reversions of vandalism per annum. Beaver before it was semi protected in Autumn 2011https://en.wikipedia.org/w/index.php?title=Beaver&offset=20111211084232&action=history being a case in point. This also feeds into Kerry's point that many assessments are outdated. An article that has been a vandalism target might have been edited a hundred times since it was assessed, and yet it is likely to have changed less than one with only half a dozen edits all of which added content.
Jonathan
On 15 December 2013 09:44, Laura Hale <laura@fanhistory.commailto:laura@fanhistory.com> wrote:
On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers <werespielchequers@gmail.commailto:werespielchequers@gmail.com> wrote: Re other dimensions or heuristics:
Very few articles are rated as Featured, and not that many as Good, if you are going to use that rating systemhttps://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Assessment I'd suggest also including the lower levels, and indeed whether an article has been assessed and typically how long it takes for a new article to be assessed. Uganda for example has 1 Featured article, 3 Good Articles and nearly 400 unassessed on the English language Wikipediahttps://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content.
For a crowd sourced project like Wikipedia the size of the crowd is crucial and varies hugely per article. So I'd suggest counting the number of different editors other than bots who have contributed to the article.
Except why would this be something that would be an indicator of quality? I've done an analysis recently of football player biographies where I looked at the total volume of edits, date created, total number of citations and total number of pictures and none of these factors correlates to article quality. You can have an article with 1,400 editors and still have it be assessed as a start. Indeed, some of the lesser known articles may actually attract specialist contributors who almost exclusively write to one topic and then take the article to DYK, GA, A or FA. The end result is you have articles with low page views that are really great that are maintained by one or two writers.
Whether or not a Wikipedia article has references is a quality dimension you might want to look at. At least on EN it is widely assumed to be a measure of quality, though I don't recall ever seeing a study of the relative accuracy of cited and uncited Wikipedia information.
Yeah, I'd be skeptical of this overall though it might be bad. The problem is you could get say one contentious section of the article that ends up fully cited or overcited while the rest of the article ends up poorly cited. At the same time, you can get B articles that really should be GAs but people have been burned by that process so they just take it to B and left it there. I have heard this quite a few time from female Wikipedians operating in certain places that the process actually puts them off.
-- twitter: purplepopple blog: ozziesport.comhttp://ozziesport.com/
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
http://www.grammarly.com/?utm_source=dict&utm_medium=link&utm_campaign=chrome_plg
Hi!
Oliver already mentioned my dissertation [3] on analyzing and predicting quality flaws in Wikipedia. Instead of classifying articles into some quality grading scheme (e.g. featured, non-featured etc.), the main idea is to investigate specific quality flaws, and thus providing indications of the respects in which low-quality content needs improvement. We proposed this idea in [1] and pushed it further in [2]. The second paper comprises a listing of more than 100 article features (heuristics) that have been used in previous research on automated quality assessment in Wikipedia. An in-depth description and implementation details of these features can be found in my dissertation [3] (Appendix B).
Best regards, Maik
[1] Maik Anderka, Benno Stein, and Nedim Lipka. Towards Automatic Quality Assurance in Wikipedia. In Proceedings of the 20th International Conference on World Wide Web (WWW 2011), Hyderabad, India, pages 5-6, 2011. ACM. http://www.uni-weimar.de/medien/webis/publications/papers/stein_2011d.pdf
[2] Maik Anderka, Benno Stein, and Nedim Lipka. Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012), Portland, USA, pages 981-990, 2012. ACM. http://www.uni-weimar.de/medien/webis/publications/papers/stein_2012i.pdf
[3] Maik Anderka. Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. Dissertation, Bauhaus-Universität Weimar, June 2013. http://www.uni-weimar.de/medien/webis/publications/papers/anderka_2013.pdf
On 15.12.2013 20:22, Oliver Ferschke wrote:
Hello everybody,
I've been doing quite some work on article quality in Wikipedia - many heuristics have been mentioned here already. In my opinion, a set of universal indicators for quality that works for all of Wikipedia does not exist. This is mainly because the perception of quality is so different across various WikiProjects and subject areas in a single Wikipedia and even more so across different Wikipedia language versions. On a theoretical level, some universals can be identified. But as soon as concrete heuristics are to be identified, you will always have a bias towards the articles you used to identify these heuristics.
This aspect aside, having an abstract quality score that tells you how good an article is according to your heuristics doesn't help a lot in most cases. I much more like the approach to identify quality problems, which also gives you an idea of the quality of an article. I have done some work on this [1], [2] and there was a recent dissertation on the same topic [3].
I'm currently writing my dissertation on language technology methods to assist quality management in collaborative environments like Wikipedia. There, I start with a theoretical model, but as soon as the concrete heuristics come in to play, the model has to be grounded according to the concrete quality standards that have been established in a particular sub-community of Wikipedia. I'm still wrapping up my work, but if anybody wants to talk, I'll be happy to.
Regards, Oliver
[1] The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia Oliver Ferschke and Iryna Gurevych and Marc Rittberger In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). p. 721-730, August 2013. Sofia, Bulgaria.
[2] FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia - Notebook for PAN at CLEF 2012 Oliver Ferschke and Iryna Gurevych and Marc Rittberger In: CLEF 2012 Labs and Workshop, Notebook Papers, n. pag. September 2012. Rome, Italy.
[3] Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. Maik Anderka Dissertation, Bauhaus-Universität Weimar, June 2013
--
Oliver Ferschke, M.A. Doctoral Researcher Ubiquitous Knowledge Processing Lab (UKP-TU DA) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111 ferschke@cs.tu-darmstadt.de www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
*Von:* wiki-research-l-bounces@lists.wikimedia.org [wiki-research-l-bounces@lists.wikimedia.org]" im Auftrag von "WereSpielChequers [werespielchequers@gmail.com] *Gesendet:* Sonntag, 15. Dezember 2013 14:27 *An:* Research into Wikimedia content and communities *Betreff:* Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?
Re Laura's comment.
I don't dispute that there are plenty of high quality articles which have had only one or two contributors. However my assumption and experience is that in general the more editors the better the quality, and I'd love to see that assumption tested by research. There may be some maximum above which quality does not rise, and there are clearly a number of gifted members of the community whose work is as good as our best crowdsourced work, especially when the crowdsourcing element is to address the minor imperfection that comes from their own blind spot. It would be well worthwhile to learn if Women's football is an exception to this, or indeed if my own confidence in crowd sourcing is mistaken
I should also add that while I wouldn't filter out minor edits you might as well filter out reverted edits and their reversion. Some of our articles are notorious vandal targets and their quality is usually unaffected by a hundred vandalisms and reversions of vandalism per annum. Beaver before it was semi protected in Autumn 2011 https://en.wikipedia.org/w/index.php?title=Beaver&offset=20111211084232&action=history being a case in point. This also feeds into Kerry's point that many assessments are outdated. An article that has been a vandalism target might have been edited a hundred times since it was assessed, and yet it is likely to have changed less than one with only half a dozen edits all of which added content.
Jonathan
On 15 December 2013 09:44, Laura Hale <laura@fanhistory.com mailto:laura@fanhistory.com> wrote:
On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers <werespielchequers@gmail.com <mailto:werespielchequers@gmail.com>> wrote: Re other dimensions or heuristics: Very few articles are rated as Featured, and not that many as Good, if you are going to usethat rating system <https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Assessment> I'd suggest also including the lower levels, and indeed whether an article has been assessed and typically how long it takes for a new article to be assessed. Uganda for example has 1 Featured article, 3 Good Articles and nearly 400 unassessed on the English language Wikipedia <https://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content>. For a crowd sourced project like Wikipedia the size of the crowd is crucial and varies hugely per article. So I'd suggest counting the number of different editors other than bots who have contributed to the article. Except why would this be something that would be an indicator of quality? I've done an analysis recently of football player biographies where I looked at the total volume of edits, date created, total number of citations and total number of pictures and none of these factors correlates to article quality. You can have an article with 1,400 editors and still have it be assessed as a start. Indeed, some of the lesser known articles may actually attract specialist contributors who almost exclusively write to one topic and then take the article to DYK, GA, A or FA. The end result is you have articles with low page views that are really great that are maintained by one or two writers. >Whether or not a Wikipedia article has references is a quality dimension you might want to look at. At least on EN it is widely assumed to >be a measure of quality, though I don't recall ever seeing a study of the relative accuracy of cited and uncited Wikipedia information. Yeah, I'd be skeptical of this overall though it might be bad. The problem is you could get say one contentious section of the article that ends up fully cited or overcited while the rest of the article ends up poorly cited. At the same time, you can get B articles that really should be GAs but people have been burned by that process so they just take it to B and left it there. I have heard this quite a few time from female Wikipedians operating in certain places that the process actually puts them off. -- twitter: purplepopple blog: ozziesport.com <http://ozziesport.com/> _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
**
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hello Everybody,
Thank so much for fantastic suggestions.
Morten,
Thank you for the tell me more paper, those kind of features were exactly what I was looking for. I will report my results to let you know how they compare.
Maik,
Thanks for introducing the idea of flaw-based assessment. I will see which are the most frequent clean-up tags I come across in the many different languages, I hadn't thought of using flaws as actionable features.
Laura,
Your conclusions about football player biographies are instructive, I will see if diversity of editorship relates to quality in country articles.
Oliver,
Thanks for the warning about topic bias, I see that the problem affects Wikipedias as a whole. Since I am only looking at articles in a specific category at a time, I think my guiding assumption is that topic bias is not an issue inter-category, but it is good to keep in mind.
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
________________________________ From: wiki-research-l-bounces@lists.wikimedia.org wiki-research-l-bounces@lists.wikimedia.org on behalf of Maik Anderka maik.anderka@uni-weimar.de Sent: Monday, December 16, 2013 12:33 AM To: wiki-research-l@lists.wikimedia.org Subject: Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?
Hi!
Oliver already mentioned my dissertation [3] on analyzing and predicting quality flaws in Wikipedia. Instead of classifying articles into some quality grading scheme (e.g. featured, non-featured etc.), the main idea is to investigate specific quality flaws, and thus providing indications of the respects in which low-quality content needs improvement. We proposed this idea in [1] and pushed it further in [2]. The second paper comprises a listing of more than 100 article features (heuristics) that have been used in previous research on automated quality assessment in Wikipedia. An in-depth description and implementation details of these features can be found in my dissertation [3] (Appendix B).
Best regards, Maik
[1] Maik Anderka, Benno Stein, and Nedim Lipka. Towards Automatic Quality Assurance in Wikipedia. In Proceedings of the 20th International Conference on World Wide Web (WWW 2011), Hyderabad, India, pages 5-6, 2011. ACM. http://www.uni-weimar.de/medien/webis/publications/papers/stein_2011d.pdf
[2] Maik Anderka, Benno Stein, and Nedim Lipka. Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012), Portland, USA, pages 981-990, 2012. ACM. http://www.uni-weimar.de/medien/webis/publications/papers/stein_2012i.pdf
[3] Maik Anderka. Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. Dissertation, Bauhaus-Universität Weimar, June 2013. http://www.uni-weimar.de/medien/webis/publications/papers/anderka_2013.pdf
On 15.12.2013 20:22, Oliver Ferschke wrote: Hello everybody,
I've been doing quite some work on article quality in Wikipedia - many heuristics have been mentioned here already. In my opinion, a set of universal indicators for quality that works for all of Wikipedia does not exist. This is mainly because the perception of quality is so different across various WikiProjects and subject areas in a single Wikipedia and even more so across different Wikipedia language versions. On a theoretical level, some universals can be identified. But as soon as concrete heuristics are to be identified, you will always have a bias towards the articles you used to identify these heuristics.
This aspect aside, having an abstract quality score that tells you how good an article is according to your heuristics doesn't help a lot in most cases. I much more like the approach to identify quality problems, which also gives you an idea of the quality of an article. I have done some work on this [1], [2] and there was a recent dissertation on the same topic [3].
I'm currently writing my dissertation on language technology methods to assist quality management in collaborative environments like Wikipedia. There, I start with a theoretical model, but as soon as the concrete heuristics come in to play, the model has to be grounded according to the concrete quality standards that have been established in a particular sub-community of Wikipedia. I'm still wrapping up my work, but if anybody wants to talk, I'll be happy to.
Regards, Oliver
[1] The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia Oliver Ferschke and Iryna Gurevych and Marc Rittberger In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). p. 721-730, August 2013. Sofia, Bulgaria.
[2] FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia - Notebook for PAN at CLEF 2012 Oliver Ferschke and Iryna Gurevych and Marc Rittberger In: CLEF 2012 Labs and Workshop, Notebook Papers, n. pag. September 2012. Rome, Italy.
[3] Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. Maik Anderka Dissertation, Bauhaus-Universität Weimar, June 2013
-- ------------------------------------------------------------------- Oliver Ferschke, M.A. Doctoral Researcher Ubiquitous Knowledge Processing Lab (UKP-TU DA) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111 ferschke@cs.tu-darmstadt.demailto:ferschke@cs.tu-darmstadt.de www.ukp.tu-darmstadt.dehttp://www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.dehttp://www.werc.tu-darmstadt.de ------------------------------------------------------------------- ________________________________ Von: wiki-research-l-bounces@lists.wikimedia.orgmailto:wiki-research-l-bounces@lists.wikimedia.org [wiki-research-l-bounces@lists.wikimedia.orgmailto:wiki-research-l-bounces@lists.wikimedia.org]" im Auftrag von "WereSpielChequers [werespielchequers@gmail.commailto:werespielchequers@gmail.com] Gesendet: Sonntag, 15. Dezember 2013 14:27 An: Research into Wikimedia content and communities Betreff: Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?
Re Laura's comment.
I don't dispute that there are plenty of high quality articles which have had only one or two contributors. However my assumption and experience is that in general the more editors the better the quality, and I'd love to see that assumption tested by research. There may be some maximum above which quality does not rise, and there are clearly a number of gifted members of the community whose work is as good as our best crowdsourced work, especially when the crowdsourcing element is to address the minor imperfection that comes from their own blind spot. It would be well worthwhile to learn if Women's football is an exception to this, or indeed if my own confidence in crowd sourcing is mistaken
I should also add that while I wouldn't filter out minor edits you might as well filter out reverted edits and their reversion. Some of our articles are notorious vandal targets and their quality is usually unaffected by a hundred vandalisms and reversions of vandalism per annum. Beaver before it was semi protected in Autumn 2011https://en.wikipedia.org/w/index.php?title=Beaver&offset=20111211084232&action=history being a case in point. This also feeds into Kerry's point that many assessments are outdated. An article that has been a vandalism target might have been edited a hundred times since it was assessed, and yet it is likely to have changed less than one with only half a dozen edits all of which added content.
Jonathan
On 15 December 2013 09:44, Laura Hale <laura@fanhistory.commailto:laura@fanhistory.com> wrote:
On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers <werespielchequers@gmail.commailto:werespielchequers@gmail.com> wrote: Re other dimensions or heuristics:
Very few articles are rated as Featured, and not that many as Good, if you are going to use that rating systemhttps://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Assessment I'd suggest also including the lower levels, and indeed whether an article has been assessed and typically how long it takes for a new article to be assessed. Uganda for example has 1 Featured article, 3 Good Articles and nearly 400 unassessed on the English language Wikipediahttps://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content.
For a crowd sourced project like Wikipedia the size of the crowd is crucial and varies hugely per article. So I'd suggest counting the number of different editors other than bots who have contributed to the article.
Except why would this be something that would be an indicator of quality? I've done an analysis recently of football player biographies where I looked at the total volume of edits, date created, total number of citations and total number of pictures and none of these factors correlates to article quality. You can have an article with 1,400 editors and still have it be assessed as a start. Indeed, some of the lesser known articles may actually attract specialist contributors who almost exclusively write to one topic and then take the article to DYK, GA, A or FA. The end result is you have articles with low page views that are really great that are maintained by one or two writers.
Whether or not a Wikipedia article has references is a quality dimension you might want to look at. At least on EN it is widely assumed to be a measure of quality, though I don't recall ever seeing a study of the relative accuracy of cited and uncited Wikipedia information.
Yeah, I'd be skeptical of this overall though it might be bad. The problem is you could get say one contentious section of the article that ends up fully cited or overcited while the rest of the article ends up poorly cited. At the same time, you can get B articles that really should be GAs but people have been burned by that process so they just take it to B and left it there. I have heard this quite a few time from female Wikipedians operating in certain places that the process actually puts them off.
-- twitter: purplepopple blog: ozziesport.comhttp://ozziesport.com/
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I agree that the existing rating of articles are not very useful. Many articles are unassessed. Others were assessed within days of the article being created and assessed as Stub/Start and have not been revisited since despite considerable further development of the article. Some people work very hard to get an article to GA (or whatever) and explicitly request assessment. I would think most "high quality" articles have had people actively working to achieve a high rating and explicitly requesting assessment. I don't know how many articles get to high levels of quality just through the uncoordinated contributions of the crowd but I bet it's hardly enough. Indeed, I suspect if you train on high quality articles, you'll learn that having a small number of editors doing a lot of work in its recent history is the best indicator of quality.
If you are going to train your heuristics, I'd suggest collecting articles which have had little/no further development since their last rating so that you know the assessments have some chance of being accurate.
I doubt there is any single metric that is a predictor of quality but I think citations is probably a good proxy. Of course, there are probably counter-examples but generally an article with lots of citations suggests a sincere effort at a better-quality article. Of course if any tool is deployed to automatically assess article quality, then we can expect people to "game" it, but at this stage one would assume that people are not actively gaming the rating system while it has a manual assessment process. However, people probably are "gaming" NPOV in specific articles by adding lots of citations that support their views; I doubt any metric will allow you to easily spot this kind of behaviour without doing some kind of analysis of the sources and interrelationships between them.
But, as Laura comments, there may be a lot of citations clustered in a small part of the article, but few elsewhere. Also, the number of sources is relevant - I can cite the same source 1000 times in one article and that's probably not quality either. I'd be inclined to reduce the influence of both multiple citations at the same point of the text (or very close in the text) as well as repeated citations to the same source. It's not that either is bad but there should be some limit to how much they influence any conclusions.
Kerry
_____
From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of WereSpielChequers Sent: Sunday, 15 December 2013 6:54 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Existitng Research on Article QualityHeuristics?
Re other dimensions or heuristics:
Very few articles are rated as Featured, and not that many as Good, if you are going to use https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Assessme nt that rating system I'd suggest also including the lower levels, and indeed whether an article has been assessed and typically how long it takes for a new article to be assessed. Uganda for example has 1 https://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content Featured article, 3 Good Articles and nearly 400 unassessed on the English language Wikipedia.
For a crowd sourced project like Wikipedia the size of the crowd is crucial and varies hugely per article. So I'd suggest counting the number of different editors other than bots who have contributed to the article. It might also be worth getting some measure of local internet speed or usage level as context. There was a big upgrade to East Africa's Internet connection a few years ago. For Wikipedia the crucial metric is the size of the Internet comfortable population with some free time and ready access to PCs, I'm not sure we've yet measured how long it takes from people getting internet access to their being sufficiently confident to edit Wikipedia articles, I suspect the answer is age related, but it would be worth checking the various editor surveys to see if this has been collected yet. My understanding is that in much of Africa many people are bypassing the whole PC thing and going straight to smartphones, and of course for mobilephone users Wikipedia is essentially a queryable media rather than an interactive editable one.
Whether or not a Wikipedia article has references is a quality dimension you might want to look at. At least on EN it is widely assumed to be a measure of quality, though I don't recall ever seeing a study of the relative accuracy of cited and uncited Wikipedia information.
Thankfully the Article Feedback tool has been almost eradicated from the English language Wikipedia, I don't know if it is still on French or Swahili. I don't see it as being connected to the quality of article, thouugh it should be an interesting measure of how loved or hated a given celebrity was during the time the tool was deployed. So I'd suggest ignoring it in your research on article quality.
Hope that helps
Jonathan
On 15 December 2013 06:15, Klein,Max kleinm@oclc.org wrote:
Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda versus other countries. I wanted to answer the question of what makes high-quality articles? Can anyone point me to any existing research on heuristics of Article Quality? That is, determining an articles quality by the wikitext properties, without human rating? I would also consider using data from the Article Feedback Tools, if there were dumps available for each Article in English, French, and Swahili Wikipedias. This is all the raw data I can seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based on:
* Per Section.
o Text length in each section
o Infoboxes in each section.
* Filled parameters in each infobox
o Images in each section
* Good Article, Featured Article?
* Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Best,
Maximilian Klein Wikipedian in Residence, OCLC +17074787023 tel:%2B17074787023
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Sun, Dec 15, 2013 at 11:36 AM, Kerry Raymond kerry.raymond@gmail.comwrote:
But, as Laura comments, there may be a lot of citations clustered in a small part of the article, but few elsewhere. Also, the number of sources is relevant – I can cite the same source 1000 times in one article and that’s probably not quality either. I’d be inclined to reduce the influence of both multiple citations at the same point of the text (or very close in the text) as well as repeated citations to the same source. It’s not that either is bad but there should be some limit to how much they influence any conclusions.
The issue of volume of citations can also be subject specific. An article about Sudan women's national football team, which is a Good Article, has 26 total citations. Topically, this makes a lot of sense. Sioma, an article about a town in Zambia, has 23 citations and is a Start. I would expect an article about a town to potentially have more sources. There more well known a topic is, the more page views, it seems a sliding scale for sources should be used if trying to assess relative quality.
On 15.12.2013 13:24, Laura Hale wrote:
The issue of volume of citations can also be subject specific. An article about Sudan women's national football team, which is a Good Article, has 26 total citations. Topically, this makes a lot of sense. Sioma, an article about a town in Zambia, has 23 citations and is a Start. I would expect an article about a town to potentially have more sources. There more well known a topic is, the more page views, it seems a sliding scale for sources should be used if trying to assess relative quality.
--
I just started to look into Zambia, seems to be a horrible mess. The subdivisions were changed in 2011 and again in 2013, this until yesterday was not reflected in the articles. There are very little sources available, I had to spend a lot of time yesterday understanding which districts now belong to which province. I also created a missing article on a province (first level of the administrative subdivision) - I thought all such articles were done by 2004. I will clean up at least all articles on Zambian subdivisions, but it can very well take several weeks. Welcome to the Global South.
Cheers Yaroslav
On 15/12/2013, at 23:36, "Kerry Raymond" kerry.raymond@gmail.com wrote:
I doubt there is any single metric that is a predictor of quality but I think citations is probably a good proxy. Of course, there are probably counter-examples but generally an article with lots of citations suggests a sincere effort at a better-quality article.
We are currently having the opposite experience in the education field. The problem is tertiary level students who have learn that references are important, but not that the nature of the thing referenced is important. So when a lecturer sets a class assignment of writing a Wikipedia article they include their list of 50 primary sources that they've been building up in zotero, with little to no consideration of their appropriateness.
Then they wonder why the article gets PRODd.
Cheers Stuart
As I said .
Of course if any tool is deployed to automatically assess article quality, then we can expect people to "game" it
This is exactly what the students are doing. They've obviously suspect that their Wikipedia assignments are being assessed by trivial checks like number of references and are now trying "game" the system. I presume this is their experience with assignment writing more generally. In which case it says more about their teachers than it does about Wikipedia.
But generally Wikipedia articles are not written by students for academic credit, so most Wikipedia articles should not exhibit this sort of reference-gaming behaviour.
Kerry
_____
From: Stuart Yeates [mailto:syeates@gmail.com] Sent: Monday, 16 December 2013 10:27 AM To: kerry.raymond@gmail.com; Research into Wikimedia content and communities Cc: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Existitng Research on Article QualityHeuristics?
On 15/12/2013, at 23:36, "Kerry Raymond" kerry.raymond@gmail.com wrote:
I doubt there is any single metric that is a predictor of quality but I think citations is probably a good proxy. Of course, there are probably counter-examples but generally an article with lots of citations suggests a sincere effort at a better-quality article.
We are currently having the opposite experience in the education field. The problem is tertiary level students who have learn that references are important, but not that the nature of the thing referenced is important. So when a lecturer sets a class assignment of writing a Wikipedia article they include their list of 50 primary sources that they've been building up in zotero, with little to no consideration of their appropriateness.
Then they wonder why the article gets PRODd.
Cheers
Stuart
Max,
With regards to quality assessment features, I recommend reading through our paper from WikiSym this year: http://www-users.cs.umn.edu/~morten/publications/wikisym2013-tellmemore.pdf
The related work section contains quite a lot of the previous research on predicting article quality, so there should be plenty of useful reading. As James points out, content and number of footnote references are a good start.
There are a lot of dependencies when it comes to predicting article quality. If you're trying to predict High quality vs everything else, the task isn't overly difficult. Otherwise it could be more challenging, for instance there are quite a bit of difference between the FAs and GAs on English Wikipedia, and in your case you'll probably find the A-class articles mess things up because their length tends to be somewhere between the other two and they're of high quality. I'm currently of the opinion that an A-class article is simply an FAC that hasn't been submitted for FA review yet.
You might of course run into problems with different citation traditions if you're working across language editions. English uses footnotes heavily, others might instead use bibliography sections and not really cite specific claims in the article text. (An issue we mention in our article when we tried to get our model to work on Norwegian (bokmål) and Swedish Wikipedia).
My $.02, if you'd like to discuss this more, feel free to get in touch.
Cheers, Morten
On 15 December 2013 07:15, Klein,Max kleinm@oclc.org wrote:
Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda versus other countries. I wanted to answer the question of what makes high-quality articles? Can anyone point me to any existing research on heuristics of Article Quality? That is, determining an articles quality by the wikitext properties, without human rating? I would also consider using data from the Article Feedback Tools, if there were dumps available for each Article in English, French, and Swahili Wikipedias. This is all the raw data I can seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based on:
Per Section. -
Text length in each section - Infoboxes in each section. - Filled parameters in each infobox - Images in each section -
Good Article, Featured Article?
Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Best, Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org