Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles

9 May 2014

      On Thu, May 8, 2014 at 8:26 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
...
---------- Forwarded message ----------
From: David Gerard dgerard@gmail.com
...
While acknowledging the likely truth of the flaws in scientific
knowledge production as it stands (single studies in medicine being
literally useless, as 80% are actually wrong) ... I think you'll have
a bit of an uphill battle attempting to enforce stronger standards in
Wikipedia than exist in the field itself. We could go to requiring all
medical sourced to be Cochrane-level studies of studies of studies,
That actually is the current best practice for medical articles in English,
I believe, and I think it's a good one:
https://en.wikipedia.org/wiki/Wikipedia:MEDRS
Indeed so, and I agree it is a good idea.
...
Sourcing to reviews when possible is particularly relevant for a field
(like medicine) that has a well-established tradition of conducting and
publishing systematic reviews -- but I find it a useful practice in lots of
areas, on the theory that reviews are generally more helpful for someone
trying to find out more about a topic.
This is of course part of the same scholarly system that I was referring to
earlier in this discussion.
Within Wikipedia, peer-reviewed publications and/or systematic reviews of
such studies are considered among the most valuable and high-quality
sources. They're a vital building block of the knowledge that Wikipedia
seeks to disseminate. We know that all human methods are imperfect; but
we're also agreed that the scholarly method is, by and large, superior to
other methods of knowledge production.
Now, when I suggested that the Foundation bring these established methods
to bear on Wikipedia itself, you (and one or two others) chimed in with
concerns about real and potential flaws of scholarly studies and the peer
review system. It seemed to me as though underlying these comments there
were some sense that, while scholarly methods were good to illuminate any
topic under the sun that Wikipedia writes about, they wouldn't be welcome
as a method to illuminate Wikipedia itself.
I am well aware of the various documented problems with peer review, and
its occasional failures. They haven't led Wikipedia to abandon its view
that, by and large, peer-reviewed studies are among the best sources
available. So I didn't think your raising problems with aspects of the
scholarly method was particularly germane to this discussion of content
quality studies. If we didn't believe in the scholarly method, we wouldn't
privilege its output in Wikipedia.
...
Anthony: I hear you about veracity being particularly important in medical
articles; and I don't mean to get us too far in the weeds about what
quality means -- there's lots to do on lots of articles that I think would
be pretty obvious quality improvement, including straight-up fact-checking.
I think any research programme evaluating the quality of Wikipedia content
should first and foremost focus on such basics: veracity and fact checking.
...
...
Given that the post that started this thread referenced medical content,
are you telling me that you think it would be useless to have qualified
medical experts reviewing Wikipedia's medical content, because the
process
...
...
would be "opaque, messy, prone to failure and doesn't always support
innovation"?
...
No, that is not what I am saying; and leaping to that conclusion seems
like
...
a rather pointy and bad-faith approach, which makes it just that much more
of an effort to participate in this conversation -- if you want to have a
dialog with other people, please try to be more generous in your
assumptions.
I hope I have explained why I reacted the way I did. Your comments led me
to believe that you were simply not very keen on Wikipedia being subjected
to a test, using the most objective method available.
...
What I was trying to say is that I don't think your implication that there
is already a well-designed solution that will fix all our problems is
correct -- both because it's difficult to apply peer review in this
context, and because peer review has plenty of problems itself. I think
blind-review quality studies can be useful, but I don't think they're a
panacea, anymore than scholarly peer review is itself a panacea for making
sure good scholarly work gets published.
There are well-established methods for assessing the quality of written
work. I should think that a team composed of both academics well-versed in
study design and statistics and Wikimedians familiar with Wikipedia content
would over time be able to come up with a methodology that produces good
results in assessing project content in various topic areas against the
Wikimedia vision.
Once the basic framework has been established, the academics concerned
should be given full intellectual freedom to assess the content as they see
fit.
I think such efforts would demonstrate leadership, and reflect well on the
Foundation.
...
Anyway, reviewer studies are one tool for assessing quality, but imho they
are mostly good for raising awareness of Wikipedia within a particular
field (thus possibly gaining new editors), and occasionally for correcting
the few articles that do get reviewed.
...
Article quality has lots of dimensions, including those that reviewers
might look for, and others that might not be apparent:
...

factual accuracy -- that seems pretty straightforward, though of course

it's not always -- cf historical debates, new evidence coming to light,
etc.

important facets of the topic being highlighted and appropriate coverage

-- also pretty straightforward, except when it's not: what if a new and
emerging theory isn't noted, or a historical one given short shrift? More
to the point for reviewers, what if *my* theory isn't highlighted?

A good bibliography and references -- I think experts can particularly

weigh in on this, though standards vary widely across fields and articles
for what gets cited, and what's good/seminal/classic is of course never
easy to determine and is always under debate.
For some of these aspects, the Wikimedia movement has standards that could
be communicated to reviewers. For example, the requirement that content be
neutral, reflect prevalent opinions in proportion to their prevalence in
the best sources, and so on – a reviewer should not complain that a theory
she or he doesn't like, but which is part of scholarly discourse, is given
due visibility in Wikipedia. Failures might occur, but we know no system is
perfect. All you can do is impress upon reviewers what ideal you are
pursuing, and trust in their intellectual honesty to assess articles in
terms of their being an effort to meet that goal.
...

clear writing -- sometimes we get accused of being too dry or pedantic,

when that's our house style. What to do with this?

Accessibility -- depends entirely on who is reading it, doesn't it? are

our physics articles accessible to grad students? Usually. Accessible to
laypeople or 10th graders? Rarely.
In other areas, like the one you mention here, standards are lacking. I
cannot recall the Wikimedia Foundation board ever having provided guidance
on whether maths content, say, should be written so that it is helpful to
kids doing their schoolwork, to maths students doing their coursework, or
maths professors looking to brush up on an area. This is a point Anne
touched upon, and there have been many complaints over the years that some
Wikipedia content is not written in a way that would be helpful to the
average reader.*
I think this reflects a lack of vision on the part of the Foundation as to
what kind of reference work Wikipedia should be. And I believe the reason
is that opinions on the matter vary, and that people in the Foundation feel
that whatever guidance they might provide on such an issue would be
disliked by some section of the Wikipedia community.
(Personally, I think that every maths article should at least have an
introduction that a 9th-grader would be able to understand.)
...

Answers readers' questions -- hard to know without something like

article
...
feedback or another measuring mechanism. The questions of a new student
are
...
rarely those of an expert.  Using medicine as an example: does the article
on cancer answer the questions of doctors, or of newly-diagnosed patients
(who are likely to be reading it)? Or the patients' relatives and
caregivers? (Or none of the above?)
...
So yes, we should do reviewer studies to review for "objective" quality.
Also, if we're serious about seeing how our articles meet reader needs
[certainly one dimension of quality], we should also do reviewer studies
with lots of groups of reviewers (medical experts, high school students,
cancer patients!) And we should look at automated quality metrics, since
reviewing 31 million articles by hand does not necessarily scale. And, we
should look into ways to follow up on quality studies with things like
to-do lists generated from reviewers, getting people in societies and
universities engaged in editing based on the outcome of reviewing, etc. --
so  that all of this work has the outcome of measurably improved quality.
...
Personally (not speaking for the WMF or other trustees here) I think the
best thing the WMF can do is provide a platform for this kind of work:
yes,
...
we can (and do) fund research studies, but in line with our general
mission
...
to provide the infrastructure for the projects to grow on, we can also
help
...
build tools to make this work easier, so that groups like Wiki Project Med
etc can get studies done easily as well. And we (the community) should
develop a list of tools that those interested in doing this work need and
want -- and those tools could be developed anywhere, under the aegis of
the
...
WMF or not.
I disagree. I do think the Foundation, and you as a board member, have a
responsibility here. The provision of "high quality content" is part of the
Foundation's core aims and values.
You have money – about ten times as much as five or six years ago. I would
urge you to invest some of it in seeing how good the present system of
content production is at delivering content that meets the aspirations
expressed in the Foundation's core values.
Gaining objective data on this would be instructive to both the movement
and the public, and provide an important stimulus for quality improvement
and measurement efforts, and the recruitment of qualified editors. I
understand that quantitative metrics (number of edits, editors, articles
and page views) are easier to collect, but still find it disappointing that
you haven't made more vigorous efforts to evaluate quality, using the input
of subject matter experts.
Identifying problems is useful. Neither the Foundation nor the community
should be afraid of some being found, and becoming public. They should be
glad, because whenever a problem is brought into focus, it presents the
Wikimedia movement with a chance to overcome it, and become a better and
more effective movement as a result.
Likewise, when Wikipedians get an article through scholarly peer review, as
WikiProject Medicine/Wiki Project Med Foundation have just now managed to
do, this motivates further such efforts and fosters learning within the
community, based on outside expert input. That is a really good thing, and
I would like you to support such grassroots efforts to the best of your
ability.
...
(Off the top of my head, these could include: tools to pull a random/blind
sample from a category, perhaps across already-rated articles, that could
be replicated across topics to do multiple comparable reviewer studies.
Tools to consolidate editor-rating metrics from across languages; maybe
representing those ratings in Wikidata. A strong to-do list functionality,
and a strong category/quality rating intersection functionality, so that,
say, an oncologist interested in working on poor-quality cancer articles
could easily get to editing. Displaying all this data easily in the
projects, by article. etc. etc.)
These are good ideas that would complement any research initiative
undertaken by the Foundation itself. I for one would be happy to see
resources invested in pursuing them.
But please consider funding your own content quality research programme,
and supporting and encouraging such research being done. The aim in this
should not be to be given a clean bill of health that validates the status
quo, but the identification of things gone wrong, and improvement
opportunities.
Content quality is not the only area worth studying. Wikipedian
anthropology and sociology – investigating behavioural patterns,
interaction patterns and the effectiveness of administrative structures in
the community – would be another worthwhile topic of study. There are
plenty of anecdotes of social dysfunction in the community (evidence can be
found any day at AN/I, or in documented failures such as that of the
Croatian Wikipedia highlighted in the press last autumn).** There have been
a handful of studies focused on this area, but I would love the Foundation
see more to advance and publicise such research.
Basically, I believe there are unexploited potentials here. Academic
research programmes would create synergies, lead to an influx of new people
and ideas, and vitalise the movement. The media coverage that Wiki Project
Med Foundation/WikiProject Medicine has generated is a good example of
that. Such public debates have multiple benefits: they make Wikipedia less
opaque, they explain to the public how Wikipedia content is generated and
why relying on Wikipedia content may sometimes be a bad idea, they provide
visibility for the quality improvement efforts that are underway, and
demonstrate social responsibility.
* See the discussion of Wikipedia's maths articles by Alan Riskin from the
Maths Department at Whittier College, see
http://wikipediocracy.com/2013/10/20/elementary-mathematics-on-wikipedia-2/
**
http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-controv...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles