Hoi,
There is work done on software that compares data from Wikidata and other
external sources. This is by someone connected to Wikidata. The details are
not clear to me. It is supposed to become available in 2015.
What I am looking for is a way to learn at what level the quality is. The
problem we face is that Wikipedians are not convinced by the quality of
Wikidata because there are no sources. Their observation is correct, there
is hardly any credible source information but that does not necessarily
imply that quality is worse than info at other sources or in Wikipedia
itself. The two things are not really related. My blogpost is an approach
to quality. What it does do is make it plausible that an approach where
data is compared with data in linked sources may aid in improving
quality.It does however not provide an argument that is easy to digest. It
does not rate quality in percentages, it does not indicate in numbers how
quality is improving when this approach is taken. They are the kind of
arguments that may convince Wikipedians that Wikidata is safe to use even
without the sources they seek.
I am NOT saying that sources on statements are not good to have, What I am
saying is that it is unlikely for the many millions of statements to have
credible sources any time soon. Consequently it is best to work on sourcing
potential problematic statements and have statistics on problematic
statements due to comparisons with other sources. With numbers like this,
we encourage people to do the hard work by showing how much of a difference
they make.
Finding such numbers is exactly what research is about. This is why I put
this challenge to you as I am not a scientist nor do I have the right
skills.
Thanks,
GerardM
On 25 August 2015 at 23:11, Ellery Wulczyn <ewulczyn(a)wikimedia.org> wrote:
Hi Gerard,
your blog post got me thinking about designing a Wikidata fact checking
tool. The idea would be to rank facts to be checked by a human by some
combination of a fact importance score and a fact uncertainty score. Do you
know of any work that has already been done in this space? Do you think
such a tool would be used? What are the current systems for quality control
in Wikidata?
As an aside, estimating fact uncertainty may reduce to estimating Wikidata
quality as a whole.
Best,
Ellery
On Mon, Aug 24, 2015 at 11:24 PM, Gerard Meijssen <
gerard.meijssen(a)gmail.com> wrote:
Hoi,
There is a lot of knowledge on quality in online databases. It is known
that all of them have a certain error rate. This is true for Wikidata as
much as any other source.
My question is: is there a way to track Wikidata quality improvements
over time. One approach I blogged about [1]. It is however only an approach
to improve quality not an approach to determine quality and track the
improvement of quality.
The good news is that there are many dumps of Wikidata so it is possible
to compare current Wikidata with how it was in the past.
Would this be something that makes sense to get into for Wikimedia
research. particularly in the light of Wikidata becoming more easily
available to Wikipedia?
Thanks,
GerardM
[1]
http://ultimategerardm.blogspot.nl/2015/08/wikidata-quality-probability-and…
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l