Please see https://gerrit.wikimedia.org/r/#/c/229099/ and https://gerrit.wikimedia.org/r/#/c/229100/ for the change to master and the currently deployed branch.
This will be merged and back ported today and a new dump created

I'm also going to follow this up by writing some more integration tests for our json dumps to spot this kind of thing!

On 4 August 2015 at 11:26, Lydia Pintscher <lydia.pintscher@wikimedia.de> wrote:
On Tue, Aug 4, 2015 at 12:20 PM, Markus Krötzsch
<markus@semantic-mediawiki.org> wrote:
> Hi,
>
> The recent Wikidata JSON dumps again contain huge amounts of broken JSON
> where empty maps are serialized as [] instead of using {}. Just grep for
>
> "claims":[]
> or
> "aliases":[]
> or
> any other key that requires a map
>
> to find many examples. The scope of the problem is massive. Basically all
> entity documents that include some empty map are broken, which is almost
> every entity document in
> http://dumps.wikimedia.org/other/wikidata/20150803.json.gz. Concretely,
> there are around 15.7 million entities with [] for aliases.
>
> This is critically breaking the consumption of Wikidata content for all
> model-based JSON parsers, including Wikidata Toolkit.
>
> The bug used to occur only in XML dumps, but now also affects the JSON dumps
> in the same way. In previous JSON dumps, the problem was avoided by omitting
> empyt maps altogether (no keys, no values), which is better because it
> allows implementations to fall back to the obvious default. This is still
> done in the Web API, e.g.,
> https://www.wikidata.org/wiki/Special:EntityData/Q12062430.json
>
> It would be nice to test the export code before deploying it.

Sorry for that. Adam and Marius are working on a fix right now.
They'll report back in a bit.


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



--
Addshore