Standard XML MW format exists for long time and is supported by existing software.
IMHO both XML and Json dumps should be treated with the same priority

Best,
Dimitris

On Fri, Feb 27, 2015 at 10:19 AM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de> wrote:
On 26.02.2015 21:40, Martynas Jusevičius wrote:
Looks like someone hasn't learned the lesson:
https://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg02588.html

No, this post is unrelated. The cause of the problem was correctly analysed by Stas.

Markus



On Thu, Feb 26, 2015 at 9:27 PM, Lukas Benedix
<lukas.benedix@fu-berlin.de> wrote:
I second this!


btw:  what is the status of the problem with the missing dumps with
history? (latest available from November 2014)

Lukas

Am Do 26.02.2015 um 14:52 schrieb Markus Kroetzsch:
Hi,

It's that time of the year again when I am sending a reminder that we
still have broken JSON in the dump files ;-). As usual, the problem is
that empty maps {} are serialized wrongly as empty lists []. I am not
sure if there is any open bug that tracks this, so I am sending an
email. There was one, but it was closed [1].

As you know (I had sent an email a while ago), there are some remaining
problems of this kind in the JSON dump, and also in the live exported
JSON, e.g.,

https://www.wikidata.org/wiki/Special:EntityData/Q4383128.json
(uses [] as a value for snaks: this item has a reference with an empty
list of snaks, which is an error by itself)

However, the situation is considerably worse in the XML dumps, which
have seen less usage since we have JSON, but as it turns out are still
preferred by some users. Surprisingly (to me), the JSON content in the
XML dumps is still not the same as in the JSON dumps. A large part of
the records in the XML dump is broken because of the map-vs-list issue.

For example, the latest dump of current revisions [2] has countless
instances of the problem. The first is in the item Q3261 (empty list for
claims), but you can easily find more by grepping for things like

&quot;claims&quot;:[]

It seems that all empty maps are serialized wrongly in this dump
(aliases, descriptions, claims, ...). In contrast, the site's export
simply omits the key of empty maps entirely, see

https://www.wikidata.org/wiki/Special:EntityData/Q3261.json

The JSON in the JSON dumps is the same.

Cheers,

Markus


[1] https://github.com/wmde/WikibaseDataModelSerialization/issues/77
[2]
http://dumps.wikimedia.org/wikidatawiki/20150207/wikidatawiki-20150207-pages-meta-current.xml.bz2





_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



--
Kontokostas Dimitris