On Fri, Jul 4, 2014 at 9:01 AM, Daniel Kinzler <daniel.kinzler@wikimedia.de> wrote:

Am 04.07.2014 07:10, schrieb Rohan Badlani:

> I had downloaded the wikidata dump from
> http://dumps.wikimedia.org/wikidatawiki/latest/
> There is a file wikidatawiki-20140420-pages-articles-multistream-index which
> consists of triplets like:
>
> 537:114:Q17

I couldn't find documentation for the multistream-index format at
<https://meta.wikimedia.org/wiki/Data_dumps>. I can't make sense of it myself
offhand. Perhaps ask on the wikitech-l list. I suppose the authority on the
question would be Ariel Glenn, perhaps you can get hold of him on IRC.

Note that this format is used for all wikis, so it will not contain anything
that is specific to Wikidata. It would be the same for Wikipedia.

If you figure it out, please add the info to
<https://meta.wikimedia.org/wiki/Data_dumps>!

> which I interpreted as following:
> 537 - category of the topic (which I am unable to find. I want the details of
> this item)

It's not a category. Wikidata doesn't use MediaWiki's Category feature for data
items at all. Wikipedia does, but there pages generally have multiple
categories, identified by name, not a numeric ID.

If you want to build a classification graph of the concepts in Wikidata (I'm
intentionally avoiding the terms "ontology" and "taxonomy" here), you will have
to go by the properties P31 (instance of) and P279 (subclass of) which are used
in many (roughly half) of the data items.

> 114 - page_id of the item Q17.

That seems to be correct.

> Q17 - which is the item. (JSON:
> https://www.wikidata.org/wiki/Special:EntityData/Q17.json)

It's the page title, which, on wikidata.org, is the same as the item ID.

HTH
Daniel

PS: we are close to providing JSON dumps on a regular basis, and also make the
JSON contained in the XML dumps more readable. This will hopefully make
analyzing Wikidata less painful.

--
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l