Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles

8 May 2014


      On 8 May 2014 01:56, Andreas Kolbe jayen466@gmail.com wrote:
...
(However, this study does not seem to have been based on a random sample –
at least I cannot find any mention of the sample selection method in the
study's write-up. The selection of a random sample is key to any such
effort, and the method used to select the sample should be described in
detail in any resulting report.)
https://meta.wikimedia.org/wiki/File:EPIC_Oxford_report.pdf
Section 3.3 of the report covers article selection. They went about it
backwards (at least, backwards to the way you might expect) -
recruiting reviewers and then manually identifying relevant articles,
as the original goal was to use relevant topics for individual
specialists.
Even this selective method didn't work as well as might be hoped,
because the mechanism of the study required a minimum level of content
- the articles had to be substantial enough to be useful for a
comparison, and of sufficient length and comparable scope in both sets
of sources - which ruled out many of the initial selections.
(This is a key point to remember: the study effectively assesses the
quality of a subset of "developed" articles in Wikipedia, rather than
the presumably less-good fragmentary ones. It's a valid question to
ask, but not always the one people think it's answering...)
"Thus the selection of articles was constrained by two important
factors: one, the need to find topics appropriate for the academics
whom we were able to recruit to the project; secondly, that articles
from different online encyclopaedias were of comparable substance and
focus. (Such factors would need to be taken carefully into account
when embarking on a future large-scale study, where the demands of
finding large numbers of comparable articles are likely to be
considerable.)"
You'd need to adopt a fairly different methodology if you wanted a
random sampling; I suppose you could prefilter a sample by "likely to
be suitable" metrics (eg minimum size, article title matching a title
list from the other reference works) and randomly select from within
*those*, but of course you would still have the fundamental issue that
you're essentially reviewing a selected portion of the project.
-- 
- Andrew Gray
  andrew.gray@dunelm.org.uk

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles