I fed the Wikidata dump into a JSON profiling tool; in the first stage it
identified unique paths one could follow through the JSON data structures.
The table below shows a count of the literal data items that can be found
behind a path -- we're not counting how many claims of P31[] have been
made, we are also counting all of the literals inside the node, so the
more information that is qualifying the claim the bigger this number gets.
/claims/P31[]144350720 instance of/claims/P625[]35948377 geographic
coordinates/claims/P17[]35165614 country: sovereign
state/claims/P646[]31095359
freebase identifier/claims/P569[]30881885 date of birth/claims/P21[]30466476
sex or gender/claims/P105[] 29234406 taxon rank/claims/P225[]27808194
taxon name/claims/P131[]27806448 located in administrative div
/claims/P171[]25159278 parent taxon
None of those are a surprise at all: the two great hierarchies (spatial
and biological) are represented and there are properties about people,
oddly though the most documented property connected with creative works is
P161, which ranks in at #20.
Anyhow, it is not all claims, if you look at the highest level you see
/datatype1328/id16647896/type16647896/aliases17824992/sitelinks82865847
/descriptions112796452/labels120721644/claims772152821
Everything above the /claims is part of what I have been calling the
"taxonomic core". There are quite a few reasons to treat this data
specially, and I'd guess this solved a chicken-vs-egg problem for WD.
In Freebase the taxonomic core is roughly half the mass of the whole
thing. The claims are certainly bulked up in Wikidata because of the
qualifying information.
If anything is weak about the fundamental data model it is that aliases and
labels are not reified the way the claims are. This is a big deal if you
want a usable lexical database. For instance, labels should be taggable
as to
* being potentially offensive (i.e. insults that start with "N")
* generic name for drug/brand names for drug
* Japanese labels should be available in kanji, hiragana, and romanized
form and should be identifiable that way
* in English we have it easy and you can generate "Mad Lib" style texts
correctly if you can (1) know which article to use and (2) how to make both
the plural and singular forms. (1) is easy to guess if you have semantic
data and you can get away with being imperfect at (2).
* for German however you need to tag by grammatical gender and the choice
of the article is a function of said gender and the relationship between
the concept and the predicate as well as the verb tense
* similar things exist for most of the other languages...
* various organizations have defined viewpoints on terminology; for
instance firefighters want you to say 'flammable' because people might get
the morphology wrong on 'inflammable'; in the army you could be sexually
harassed if you call your "Rifle" your "Gun"
--
Paul Houle
(607) 539 6254 paul.houle on Skype ontology2(a)gmail.com
http://legalentityidentifier.info/lei/lookup
I've noticed that many of the items in Wikidata have as their root item
(via combinations of subclass-of, instance-of, and part-of
relationships) the item named "entity" (Q35120). For example, the
musical note named "E♭" is rooted at "entity" via various paths,
including the following:
E♭ (Q633464) instance-of
note (Q263478) part-of
musical notation (Q233861) part-of
music (Q638) subclass-of
art (Q735) subclass-of
process (Q3249551) subclass-of
event (Q1190554) subclass-of
entity (Q35120)
I have the following questions related to the example above:
1) Is the intent that all items in Wikidata have as their root item (via
combinations of subclass-of, instance-of, and part-of relationships) the
item named "entity" (Q35120)?
2) Is it the intent that any item (e.g. music) that is a subclass-of
another item (e.g. art), have as its root item (via only subclass-of
relationships) the item named "entity" (Q35120)?
To summarize these questions as they apply to a project on which I'm
working: Given a tree that has as its root the item named "entity"
(Q35120), would/should it be possible to navigate from the root to every
item (QXXXXXXX) in the Wikidata database?
Regards,
James Weaver
Hi, I got an open question about Wikidata concepts, partly related to the
idea of selecting a templates wrt. a query for placeholder articles.
One question about this idea is : what to do when several templates are
possible for an item, for example the item with no article is in the result
set of several queries associated with article stubs templates, say:
* the query "anything", that could be associated with a totally generic
templates that shows a Wikibase page like article templates that shows all
the claims about this item
* a more specific query "living organism"
* another even more specific query like "animal"
* ...
In this example each more specific query results is obviously a subset of
each more generic one. In such cases it could be useful to choose the
template of the most specific one.
In the same spirit of the "subclass of" property we can create (or reuse
it) for the queries. But as no property has in Wikibase itself a meaning,
this means the choice of the template would not be possible using raw
Wikibase concepts, which partly breaks the interests of the idea.
Any thoughts about this problem ?
Cheers, TomT0m
Hey folks :)
Language fallbacks are live for a bit now and I'd like to get some
feedback. Are they working for you? Anything where they're horribly
wrong? Anything where they could be better?
I'd especially like feedback from people who use Wikidata in a
language other than English.
Everything I am already aware of is linked as "blocked by" tickets of
https://phabricator.wikimedia.org/T76216
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Hi,
I see the page for Google lists Mario Artero as the founder and CEO of
Google. This is definitely not correct. What is the procedure for fixing
this? I can't tell from the history page which edit added that.
http://www.wikidata.org/wiki/Q95
Thanks,
Ben
--
about.me/benmccann
Hey folks :)
Happy new year everyone. It is surely going to be an exciting one for
Wikidata. Over the last weeks I've been thinking a lot about the year
ahead of us. One thing is clear to me: It will be about successfully
scaling Wikidata and keeping all the amazing things we have achieved
in the process.
I've written down my thoughts on the subject in a blog post to kick
off some thinking and discussions:
http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-t…
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Having followed Freebase and the announcement about migrating to
Wikidata, I'm trying to get up to speed on the structure of Wikidata. I
read on the site that relationships such as subclass-of and instance-of
are managed by everyone. Looking at automobile (Q1420) I see that it is
both subclass-of and instance-of motor road vehicle, which I imagine is
not correct. Are there processes in place to manage the integrity of
these structural components of Wikidata?
Thanks,
James Weaver