Hey everyone :)
Many people writing tools around Wikidata are using the standard XML dumps. They're not fun to work with. We're now also publishing JSON dumps. They're generated weekly at the moment. You can find them at http://dumps.wikimedia.org/other/wikidata/ The current one still has one little issue: We try to have one entity per line in the dump so it can be processed line-by-line. In the current dump there are a few exceptions. It's still valid JSON but not super nice. This will be fixed with the next one. I hope we'll see a few more cool tools developed thanks to this.
Cheers Lydia
On Wed, Jul 23, 2014 at 12:40 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Thanks. In what component are bug reports being attended?
You can file them under "Datasets". Make sure wikidata-bugs@lists.wikimedia.org is in the CC. Then it'll show up on my list and get attention.
Cheers Lydia
Congrats !!
Very helpful to play around indeed !!
Can we expect also a JSON-LD export ?
Thanks
Mohamed
On Wed, Jul 23, 2014 at 12:17 PM, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hey everyone :)
Many people writing tools around Wikidata are using the standard XML dumps. They're not fun to work with. We're now also publishing JSON dumps. They're generated weekly at the moment. You can find them at http://dumps.wikimedia.org/other/wikidata/ The current one still has one little issue: We try to have one entity per line in the dump so it can be processed line-by-line. In the current dump there are a few exceptions. It's still valid JSON but not super nice. This will be fixed with the next one. I hope we'll see a few more cool tools developed thanks to this.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi, Thank you, I want to implement it in pywikibot if I can, I have some questions: 1- Can you give me an example for an item? i.e. Do you save ns or id or title properties in them? 2-What is the marker that one entry has finished and the next one is starting (similar to "start" or "end" in xml dumps)
but since th file is huge It makes me lots of problem if I want to check myself and answer these question :(
Best
On 7/23/14, Innovimax SARL innovimax@gmail.com wrote:
Congrats !!
Very helpful to play around indeed !!
Can we expect also a JSON-LD export ?
Thanks
Mohamed
On Wed, Jul 23, 2014 at 12:17 PM, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hey everyone :)
Many people writing tools around Wikidata are using the standard XML dumps. They're not fun to work with. We're now also publishing JSON dumps. They're generated weekly at the moment. You can find them at http://dumps.wikimedia.org/other/wikidata/ The current one still has one little issue: We try to have one entity per line in the dump so it can be processed line-by-line. In the current dump there are a few exceptions. It's still valid JSON but not super nice. This will be fixed with the next one. I hope we'll see a few more cool tools developed thanks to this.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Innovimax SARL Consulting, Training & XML Development 9, impasse des Orteaux 75020 Paris Tel : +33 9 52 475787 Fax : +33 1 4356 1746 http://www.innovimax.fr RCS Paris 488.018.631 SARL au capital de 10.000 EURO
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Wed, Jul 23, 2014 at 7:03 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
but since th file is huge It makes me lots of problem if I want to check myself and answer these question :(
curl http://dumps.wikimedia.org/other/wikidata/20140721.json.gz | zcat | head
will get you a sample in a few seconds.
Tom
Thank you, let me check
On 7/23/14, Tom Morris tfmorris@gmail.com wrote:
On Wed, Jul 23, 2014 at 7:03 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
but since th file is huge It makes me lots of problem if I want to check myself and answer these question :(
curl http://dumps.wikimedia.org/other/wikidata/20140721.json.gz | zcat
| head
will get you a sample in a few seconds.
Tom
On Wed, Jul 23, 2014 at 12:46 PM, Innovimax SARL innovimax@gmail.com wrote:
Congrats !!
Very helpful to play around indeed !!
Cool :)
Can we expect also a JSON-LD export ?
It's not currently on my plan. Is there a lot of demand for it? If so please file a ticket on bugzilla.
Cheers Lydia
Hi, thank you for providing JSON dump.
I found that the lang tag based property for Indonesian ("id") conflicts with identifier property "id". Although this might not be a problem for ordinaly JSON, it would cause a trouble when trying to add @context for JSON-LD.
Is it possible to use "@id" for identifier property to avoid conflint ? It will also make it easier to use the data as JSON-LD.
cheers,
2014-07-25 1:01 GMT+09:00 Lydia Pintscher lydia.pintscher@wikimedia.de:
On Wed, Jul 23, 2014 at 12:46 PM, Innovimax SARL innovimax@gmail.com wrote:
Congrats !!
Very helpful to play around indeed !!
Cool :)
Can we expect also a JSON-LD export ?
It's not currently on my plan. Is there a lot of demand for it? If so please file a ticket on bugzilla.
Am 25.07.2014 09:15, schrieb KANZAKI Masahide:
Hi, thank you for providing JSON dump.
I found that the lang tag based property for Indonesian ("id") conflicts with identifier property "id". Although this might not be a problem for ordinaly JSON, it would cause a trouble when trying to add @context for JSON-LD.
Is it possible to use "@id" for identifier property to avoid conflint ? It will also make it easier to use the data as JSON-LD.
If we add support for JSON-LD, it will be based on our RDF mapping, not directly on the native JSON. I think that approach will avoid any problems like the one you mentioned.
-- daniel
Thanks, this looks very promising!
Any chance you could do daily diffs as well? And sync them to Labs quickly?
Cheers, Magnus
On Wed, Jul 23, 2014 at 11:17 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
Hey everyone :)
Many people writing tools around Wikidata are using the standard XML dumps. They're not fun to work with. We're now also publishing JSON dumps. They're generated weekly at the moment. You can find them at http://dumps.wikimedia.org/other/wikidata/ The current one still has one little issue: We try to have one entity per line in the dump so it can be processed line-by-line. In the current dump there are a few exceptions. It's still valid JSON but not super nice. This will be fixed with the next one. I hope we'll see a few more cool tools developed thanks to this.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 23.07.2014 14:25, schrieb Magnus Manske:
Thanks, this looks very promising!
Any chance you could do daily diffs as well? And sync them to Labs quickly?
Please file a feature request for daily diffs :)
-- daniel
On Wed, Jul 23, 2014 at 2:25 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Thanks, this looks very promising!
Any chance you could do daily diffs as well? And sync them to Labs quickly?
We'll have to investigate. Can you file a ticket on bugzilla please? Thanks!
Cheers Lydia
Great. Which JSON-version is used in the dumps? The "internal" dumpfile-JSON or the "external" Web-API-JSON?
Thumbs up, Fredo
On 23.07.2014 12:17, Lydia Pintscher wrote:
Hey everyone :)
Many people writing tools around Wikidata are using the standard XML dumps. They're not fun to work with. We're now also publishing JSON dumps. They're generated weekly at the moment. You can find them at http://dumps.wikimedia.org/other/wikidata/ The current one still has one little issue: We try to have one entity per line in the dump so it can be processed line-by-line. In the current dump there are a few exceptions. It's still valid JSON but not super nice. This will be fixed with the next one. I hope we'll see a few more cool tools developed thanks to this.
Cheers Lydia
Am 23.07.2014 16:30, schrieb Fredo Erxleben:
Great. Which JSON-version is used in the dumps? The "internal" dumpfile-JSON or the "external" Web-API-JSON?
The JSON dumps use the external (canonical) serialization format. The XML dumps will be switching to that too, soon.
-- daniel
Can you give some estimate how long to wait for this 'soon'?
Lukas
On 23. Juli 2014 16:59:01 MESZ, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 23.07.2014 16:30, schrieb Fredo Erxleben:
Great. Which JSON-version is used in the dumps? The "internal" dumpfile-JSON
or the
"external" Web-API-JSON?
The JSON dumps use the external (canonical) serialization format. The XML dumps will be switching to that too, soon.
-- daniel
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 23.07.2014 17:00, schrieb LB:
Can you give some estimate how long to wait for this 'soon'?
Could happen during wikimania, or shortly after. not promising anything, though.
In any case, you should use the JSON dumps, not the XML dumps, if at all possible!
-- daniel
On 23.07.2014 16:59, Daniel Kinzler wrote:
Am 23.07.2014 16:30, schrieb Fredo Erxleben:
Great. Which JSON-version is used in the dumps? The "internal" dumpfile-JSON or the "external" Web-API-JSON?
The JSON dumps use the external (canonical) serialization format. The XML dumps will be switching to that too, soon.
-- daniel
Double - no triple - thumbs up :)
BTW did the discussion about how to handle the snak-orders yield any results?