Amir,
Thanks for your work! I like this one showing how our
Sum-of-all-Paintings project is doing compared to sculptures (which have
many copyright issues, but you could still put the data on Wikidata)
Jane
On Wed, Dec 16, 2015 at 12:23 PM, Amir Ladsgroup <ladsgroup(a)gmail.com>
wrote:
Hey,
Thanks for your feedback. That's exactly what I'm looking for.
On Mon, Dec 14, 2015 at 5:29 PM Paul Houle <ontology2(a)gmail.com> wrote:
It's a step in the right direction, but it
took a very long time to
load on my computer.
It's maybe related to labs recent issues. Now I get reasonable time:
http://tools.pingdom.com/fpt/#!/eq1i3s/http://tools.wmflabs.org/wd-analyst/…
After the initial load, it was pretty peppy, then I ran the default
example that is grayed in but not active (I had to retype it)
I made some modifications that might help;
Then I get the page that says "results are
ready" and how cool they
are, then it takes me a while to figure out what I am looking at and
finally realize it is a comparison of data quality metrics (which I think
are all fact counts) between all of the P31 predicates and the Q5.
I made some changes so you can see things easier. I appreciate if you
suggest some words I put in the description;
The use of the graphic on the first row
complicated this for me.
Please sugest something I write there for people :);
The property broken should be fixed by now and sitelink is broken
because It's not there yet. I'll make it very soon;
and of course no merged in documentation about what P31 and Q5 are.
Opaque identifiers are necessary for your project, but
Also some way to find the P's and Q's hooked up to this would be most
welcome.
Done, Now we have label for everything;
It's a great start and is completely in the
right direction but it
could take many sprints of improvement.
On Wed, Dec 9, 2015 at 4:36 AM, Gerard Meijssen <
gerard.meijssen(a)gmail.com> wrote:
> Hoi,
> What would be nice is to have an option to understand progress from
> one dump to the next like you can with the Statistics by Magnus. Magnus
> also has data on sources but this is more global.
> Thanks,
> GerardM
>
> On 8 December 2015 at 21:41, Markus Krötzsch <
> markus(a)semantic-mediawiki.org> wrote:
>
>> Hi Amir,
>>
>> Very nice, thanks! I like the general approach of having a
>> stand-alone tool for analysing the data, and maybe pointing you to issues.
>> Like a dashboard for Wikidata editors.
>>
>> What backend technology are you using to produce these results? Is
>> this live data or dumped data? One could also get those numbers from the
>> SPARQL endpoint, but performance might be problematic (since you compute
>> averages over all items; a custom approach would of course be much faster
>> but then you have the data update problem).
>>
>> An obvious feature request would be to display entity ids as links to
>> the appropriate page, and maybe with their labels (in a language of your
>> choice).
>>
>> But overall very nice.
>>
>> Regards,
>>
>> Markus
>>
>>
>> On 08.12.2015 18:48, Amir Ladsgroup wrote:
>>
>>> Hey,
>>> There has been several discussion regarding quality of information in
>>> Wikidata. I wanted to work on quality of wikidata but we don't have
>>> any
>>> source of good information to see where we are ahead and where we are
>>> behind. So I thought the best thing I can do is to make something to
>>> show people how exactly sourced our data is with details. So here we
>>> have *http://tools.wmflabs.org/wd-analyst/index.php*
>>>
>>> You can give only a property (let's say P31) and it gives you the
>>> four
>>> most used values + analyze of sources and quality in overall (check
>>> this
>>> out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>)
>>> and then you can see about ~33% of them are sources which 29.1% of
>>> them are based on Wikipedia.
>>> You can give a property and multiple values you want. Let's say you
>>> want
>>> to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30
>>> (US)
>>> Check this out
>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>.
>>> And
>>> you can see US biographies are more abundant (300K over 200K) but
>>> German
>>> biographies are more descriptive (3.8 description per item over 3.2
>>> description over item)
>>>
>>> One important note: Compare P31:Q5 (a trivial statement) 46% of them
>>> are
>>> not sourced at all and 49% of them are based on Wikipedia **but* *get
>>> this statistics for population properties (P1082
>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not
a
>>> trivial statement and we need to be careful about them. It turns out
>>> there are slightly more than one reference per statement and only 4%
>>> of
>>> them are based on Wikipedia. So we can relax and enjoy these
>>> highly-sourced data.
>>>
>>> Requests:
>>>
>>> * Please tell me whether do you want this tool at all
>>> * Please suggest more ways to analyze and catch unsourced materials
>>>
>>> Future plan (if you agree to keep using this tool):
>>>
>>> * Support more datatypes (e.g. date of birth based on year,
>>> coordinates)
>>> * Sitelink-based and reference-based analysis (to check how much of
>>> articles of, let's say, Chinese Wikipedia are unsourced)
>>>
>>> * Free-style analysis: There is a database for this tool that can
>>> be
>>> used for way more applications. You can get the most unsourced
>>> statements of P31 and then you can go to fix them. I'm trying to
>>> build a playground for this kind of tasks)
>>>
>>> I hope you like this and rock on!
>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399>
>>> Best
>>>
>>>
>>> _______________________________________________
>>> Wikidata mailing list
>>> Wikidata(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
--
Paul Houle
*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*
(607) 539 6254 paul.houle on Skype ontology2(a)gmail.com
:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/
Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>
Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org