Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles

9 May 2014

      Hello everyone,
I think Wikimedia UK has an example project, related to medical articles,
that may be of interest. John Byrne is the Wikimedian in Residence at
Cancer Research UK, one of the UK's largest charities. He's put together
the below message but isn't subscribed to this list so can't post. I am
posting on his behalf. I'm happy to answer any questions about this and
those I can't, I shall pass on to John.
Thanks and regards,
Stevie
John's message:
Cancer Research UK (CRUK), the world’s largest cancer research charity,
have just taken me on as Wikipedian in Residence until mid-December 2014
(4/5 part time).
Parts of the plan for the role are very relevant to this thread.   We are
aiming to improve WP articles on cancer to ensure they are accurate,
up-to-date and accessible to the full range of WP’s readership, working
closely with the existing English WP medical editing community, many of
whom have already been supportive of the project.   With the medical
translation project also underway, this is great timing for us to improve
important content across large numbers of language versions.
We will be able to draw upon the expertise of both the medical research
staff funded by CRUK (over 4000 in the UK) and the various kinds of staff
they have with professional expertise in writing for a range of audiences,
from patients to scientists ( their editorial
policyhttp://www.cancerresearchuk.org/cancer-help/utilities/about-cancerhelp-uk/cancerhelp-uk-policies/editorial-policy/
).
We are also planning to do research with the public into what they think of
specific WP articles, perhaps before and after improvement, and into how
they use WP and other sites at the top of search pages when looking for
medical information on the internet.   There has been little research into
this area, and the results should be very useful in focusing the ways
medical content generally can be improved.
The CRUK position is funded by the Wellcome Trust and supported by
Wikimedia UK, and the budget includes an element for this research.  I will
be https://en.wikipedia.org/wiki/User:Wiki_CRUK_John in this role (Usually
https://en.wikipedia.org/wiki/User:Johnbod.  Until  early July I will also
continue my role (1/5) as Wikimedian in Residence at the Royal Society, the
UK’s National Academy for the Sciences)
John Byrne
On 8 May 2014 22:43, edward edward@logicmuseum.com wrote:
...
On 08/05/2014 22:29, Andrew Gray wrote:
...
Section 3.3 of the report covers article selection. They went about it
...
backwards (at least, backwards to the way you might expect) -
recruiting reviewers and then manually identifying relevant articles,
as the original goal was to use relevant topics for individual
specialists.
Even this selective method didn't work as well as might be hoped,
...
...
because the mechanism of the study required a minimum level of content

the articles had to be substantial enough to be useful for a

comparison, and of sufficient length and comparable scope in both sets
of sources - which ruled out many of the initial selections.
After it was published I emailed both the epic and the Oxford team to
understand why they chose the articles they did. I was unable to get a
satisfactory answer.
The method of selecting the most notable philosopher-theologians from a
certain period is a good one.  There is no reason it has to be random, so
long as there is a clearly defined selection method. However, they were
unable to explain why of the most notable subjects, they chose Aquinas and
Anselm.  I suspect there was a selection bias, as those were the articles
which 'looked' the best. (The ones on Ockham and Scotus were so obviously
vandalised that even a novice would have spotted the problem).
Even then, as I have already pointed out above, they missed the fact that
the Anselm article was plagiarised from Britannica 1911, so that instead of
comparing Britannica to Wikiepedia, they were comparing Britannica 2011
with Britannica 1911.  And they missed some bad errors that had been
introduced by Wikipedia editors when they attempted to modernise the old
Britannica prose.
To give a simple example that even Geni will have to concede is not
'subjectively wrong', the Wikipedia article on Anselm said
"Anselm wrote many proofs within Monologion and Proslogion. In the first
proof, Anselm relies on the ordinary grounds of realism, which coincide to
some extent with the theory of Augustine."
This is a mangled version of the B1911 which reads
"This demonstration is the substance of the Monologion and Proslogion. In
the first of these the proof rests on the ordinary grounds of realism"
You see what went wrong?  'first of these' should refer to the first book,
namely Monologion. But one editor removed ""This demonstration is the
substance of the Monologion and Proslogion" as being too difficult for
ordinary readers, leaving 'first of these'. Another editor came along and
thought it referred to the first proof. This is quite incorrect.
I am still amazed the Oxford team didn't spot this. Even if you don't know
the article was lifted from B1911, the oddity of the assertion should have
rung alarm bells. There are about 9 other mistakes of differing severity.
On 08/05/2014 22:29, Andrew Gray wrote:
...
On 8 May 2014 01:56, Andreas Kolbe jayen466@gmail.com wrote:
(However, this study does not seem to have been based on a random sample
...
–
at least I cannot find any mention of the sample selection method in the
study's write-up. The selection of a random sample is key to any such
effort, and the method used to select the sample should be described in
detail in any resulting report.)
https://meta.wikimedia.org/wiki/File:EPIC_Oxford_report.pdf
Section 3.3 of the report covers article selection. They went about it
backwards (at least, backwards to the way you might expect) -
recruiting reviewers and then manually identifying relevant articles,
as the original goal was to use relevant topics for individual
specialists.
Even this selective method didn't work as well as might be hoped,
because the mechanism of the study required a minimum level of content

the articles had to be substantial enough to be useful for a

comparison, and of sufficient length and comparable scope in both sets
of sources - which ruled out many of the initial selections.
(This is a key point to remember: the study effectively assesses the
quality of a subset of "developed" articles in Wikipedia, rather than
the presumably less-good fragmentary ones. It's a valid question to
ask, but not always the one people think it's answering...)
"Thus the selection of articles was constrained by two important
factors: one, the need to find topics appropriate for the academics
whom we were able to recruit to the project; secondly, that articles
from different online encyclopaedias were of comparable substance and
focus. (Such factors would need to be taken carefully into account
when embarking on a future large-scale study, where the demands of
finding large numbers of comparable articles are likely to be
considerable.)"
You'd need to adopt a fairly different methodology if you wanted a
random sampling; I suppose you could prefilter a sample by "likely to
be suitable" metrics (eg minimum size, article title matching a title
list from the other reference works) and randomly select from within
*those*, but of course you would still have the fundamental issue that
you're essentially reviewing a selected portion of the project.

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- 

Stevie Benton
Head of External Relations
Wikimedia UK
+44 (0) 20 7065 0993 / +44 (0) 7803 505 173
@StevieBenton

Wikimedia UK is a Company Limited by Guarantee registered in England
and Wales, Registered No. 6741827. Registered Charity No.1144513.
Registered Office 4th Floor, Development House, 56-64 Leonard Street,
London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a
global Wikimedia movement. The Wikimedia projects are run by the
Wikimedia Foundation (who operate Wikipedia, amongst other projects).

*Wikimedia UK is an independent non-profit charity with no legal
control over Wikipedia nor responsibility for its contents.*

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles