Hey Jane,
Yes. Exactly :)
Best
On Tue, Dec 8, 2015 at 9:37 PM Jane Darnell <jane023(a)gmail.com> wrote:
Very useful, Amir, thanks! I just ran it for
occupation=painter
(p=P106&q=Q1028181)
Am I correct in my interpretation that in general painters have fewer
claims than the entire population of items with the property occupation?
On Tue, Dec 8, 2015 at 6:48 PM, Amir Ladsgroup <ladsgroup(a)gmail.com>
wrote:
Hey,
There has been several discussion regarding quality of information in
Wikidata. I wanted to work on quality of wikidata but we don't have any
source of good information to see where we are ahead and where we are
behind. So I thought the best thing I can do is to make something to show
people how exactly sourced our data is with details. So here we have
*http://tools.wmflabs.org/wd-analyst/index.php
<http://tools.wmflabs.org/wd-analyst/index.php>*
You can give only a property (let's say P31) and it gives you the four
most used values + analyze of sources and quality in overall (check this
out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>)
and then you can see about ~33% of them are sources which 29.1% of them
are based on Wikipedia.
You can give a property and multiple values you want. Let's say you want
to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US)
Check this out
<http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30%7CQ183>. And
you can see US biographies are more abundant (300K over 200K) but German
biographies are more descriptive (3.8 description per item over 3.2
description over item)
One important note: Compare P31:Q5 (a trivial statement) 46% of them are
not sourced at all and 49% of them are based on Wikipedia **but* *get
this statistics for population properties (P1082
<http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a
trivial statement and we need to be careful about them. It turns out there
are slightly more than one reference per statement and only 4% of them are
based on Wikipedia. So we can relax and enjoy these highly-sourced data.
Requests:
- Please tell me whether do you want this tool at all
- Please suggest more ways to analyze and catch unsourced materials
Future plan (if you agree to keep using this tool):
- Support more datatypes (e.g. date of birth based on year,
coordinates)
- Sitelink-based and reference-based analysis (to check how much of
articles of, let's say, Chinese Wikipedia are unsourced)
- Free-style analysis: There is a database for this tool that can be
used for way more applications. You can get the most unsourced statements
of P31 and then you can go to fix them. I'm trying to build a playground
for this kind of tasks)
I hope you like this and rock on!
<http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399>
Best
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata