Re: [Wikidata] Wikidata Analyst, a tool to comprehensively analyze quality of Wikidata

16 Dec 2015

Obviously data can't be licensed but graphs and other parts can be
copyrighted. I'm just trying to make re-useability easier.

Best

On Wed, Dec 16, 2015 at 4:14 PM Gerard Meijssen &lt;gerard.meijssen(a)gmail.com&gt;
wrote:

...
  Hoi,
 What is achieved in this way and, on what basis can you license the output
 of a tool?
 Thanks,
     GerardM

 On 16 December 2015 at 12:58, Amir Ladsgroup &lt;ladsgroup(a)gmail.com&gt; wrote:

  Content created by this tools is licensed under
CC-BY v4.0. I made it
 explicit now :)

 Best

 On Wed, Dec 16, 2015 at 3:11 PM Jane Darnell &lt;jane023(a)gmail.com&gt; wrote:

  Amir,
 Thanks for your work! I like this one showing how our
 Sum-of-all-Paintings project is doing compared to sculptures (which have
 many copyright issues, but you could still put the data on Wikidata)
 http://tools.wmflabs.org/wd-analyst/index.php?p=p31&q=Q3305213%7CQ860861

 Jane

 On Wed, Dec 16, 2015 at 12:23 PM, Amir Ladsgroup &lt;ladsgroup(a)gmail.com&gt;
 wrote:

  Hey,
 Thanks for your feedback. That's exactly what I'm looking for.

 On Mon, Dec 14, 2015 at 5:29 PM Paul Houle &lt;ontology2(a)gmail.com&gt; wrote:

> It's a step in the right direction,  but it took a very long time to
> load on my computer.
>
  It's maybe related to labs recent issues. Now I get reasonable time:
 http://tools.pingdom.com/fpt/#!/eq1i3s/http://tools.wmflabs.org/wd-analyst/…

>
> After the initial load,  it was pretty peppy,  then I ran the default
> example that is grayed in but not active (I had to retype it)
>

 I made some modifications that might help;

> Then I get the page that says "results are ready" and how cool they
> are,  then it takes me a while to figure out what I am looking at and
> finally realize it is a comparison of data quality metrics (which I think
> are all fact counts) between all of the P31 predicates and the Q5.
>
 I made some changes so you can see things easier. I appreciate if you
 suggest some words I put in the description;

> The use of the graphic on the first row complicated this for me.
>
> Please sugest something I write there for people :);

> There are a lot of broken links on this page too such as
>
> http://tools.wmflabs.org/wd-analyst/sitelink.php
> https://www.wikidata.org/wiki/P31
>

 The property broken should be fixed by now and sitelink is broken
 because It's not there yet. I'll make it very soon;

>
>
> and of course no merged in documentation about what P31 and Q5 are.
> Opaque identifiers are necessary for your project,  but
>
> Also some way to find the P's and Q's hooked up to this would be most
> welcome.
>
> Done, Now we have label for everything;

> It's a great start and is completely in the right direction but it
> could take many sprints of improvement.
>
> On Wed, Dec 9, 2015 at 4:36 AM, Gerard Meijssen <
> gerard.meijssen(a)gmail.com&gt; wrote:
>
>> Hoi,
>> What would be nice is to have an option to understand progress from
>> one dump to the next like you can with the Statistics by Magnus. Magnus
>> also has data on sources but this is more global.
>> Thanks,
>>      GerardM
>>
>> On 8 December 2015 at 21:41, Markus Krötzsch <
>> markus(a)semantic-mediawiki.org&gt; wrote:
>>
>>> Hi Amir,
>>>
>>> Very nice, thanks! I like the general approach of having a
>>> stand-alone tool for analysing the data, and maybe pointing you to issues.
>>> Like a dashboard for Wikidata editors.
>>>
>>> What backend technology are you using to produce these results? Is
>>> this live data or dumped data? One could also get those numbers from the
>>> SPARQL endpoint, but performance might be problematic (since you compute
>>> averages over all items; a custom approach would of course be much faster
>>> but then you have the data update problem).
>>>
>>> An obvious feature request would be to display entity ids as links
>>> to the appropriate page, and maybe with their labels (in a language of your
>>> choice).
>>>
>>> But overall very nice.
>>>
>>> Regards,
>>>
>>> Markus
>>>
>>>
>>> On 08.12.2015 18:48, Amir Ladsgroup wrote:
>>>
>>>> Hey,
>>>> There has been several discussion regarding quality of information
>>>> in
>>>> Wikidata. I wanted to work on quality of wikidata but we don't have
>>>> any
>>>> source of good information to see where we are ahead and where we
>>>> are
>>>> behind. So I thought the best thing I can do is to make something to
>>>> show people how exactly sourced our data is with details. So here we
>>>> have *http://tools.wmflabs.org/wd-analyst/index.php*
>>>>
>>>> You can give only a property (let's say P31) and it gives you the
>>>> four
>>>> most used values + analyze of sources and quality in overall (check
>>>> this
>>>> out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>)
>>>>   and then you can see about ~33% of them are sources which 29.1% of
>>>> them are based on Wikipedia.
>>>> You can give a property and multiple values you want. Let's say you
>>>> want
>>>> to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30
>>>> (US)
>>>> Check this out
>>>>
<http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>.
>>>> And
>>>> you can see US biographies are more abundant (300K over 200K) but
>>>> German
>>>> biographies are more descriptive (3.8 description per item over 3.2
>>>> description over item)
>>>>
>>>> One important note: Compare P31:Q5 (a trivial statement) 46% of
>>>> them are
>>>> not sourced at all and 49% of them are based on Wikipedia **but*
>>>> *get
>>>> this statistics for population properties (P1082
>>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's
not a
>>>> trivial statement and we need to be careful about them. It turns out
>>>> there are slightly more than one reference per statement and only
>>>> 4% of
>>>> them are based on Wikipedia. So we can relax and enjoy these
>>>> highly-sourced data.
>>>>
>>>> Requests:
>>>>
>>>>   * Please tell me whether do you want this tool at all
>>>>   * Please suggest more ways to analyze and catch unsourced
>>>> materials
>>>>
>>>> Future plan (if you agree to keep using this tool):
>>>>
>>>>   * Support more datatypes (e.g. date of birth based on year,
>>>> coordinates)
>>>>   * Sitelink-based and reference-based analysis (to check how much
>>>> of
>>>>     articles of, let's say, Chinese Wikipedia are unsourced)
>>>>
>>>>   * Free-style analysis: There is a database for this tool that can
>>>> be
>>>>     used for way more applications. You can get the most unsourced
>>>>     statements of P31 and then you can go to fix them. I'm trying to
>>>>     build a playground for this kind of tasks)
>>>>
>>>> I hope you like this and rock on!
>>>>
<http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399>
>>>> Best
>>>>
>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list
>>>> Wikidata(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wikidata mailing list
>>> Wikidata(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
>
> --
> Paul Houle
>
> *Applying Schemas for Natural Language Processing, Distributed
> Systems, Classification and Text Mining and Data Lakes*
>
> (607) 539 6254    paul.houle on Skype   ontology2(a)gmail.com
>
> :BaseKB -- Query Freebase Data With SPARQL
> http://basekb.com/gold/
>
> Legal Entity Identifier Lookup
> https://legalentityidentifier.info/lei/lookup/
> <http://legalentityidentifier.info/lei/lookup/>
>
> Join our Data Lakes group on LinkedIn
> https://www.linkedin.com/grp/home?gid=8267275
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

  _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

  _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata Analyst, a tool to comprehensively analyze quality of Wikidata