On Tue, Aug 4, 2015 at 12:20 PM, Markus Krötzsch
<markus@semantic-mediawiki.org> wrote:
> Hi,
>
> The recent Wikidata JSON dumps again contain huge amounts of broken JSON
> where empty maps are serialized as [] instead of using {}. Just grep for
>
> "claims":[]
> or
> "aliases":[]
> or
> any other key that requires a map
>
> to find many examples. The scope of the problem is massive. Basically all
> entity documents that include some empty map are broken, which is almost
> every entity document in
> http://dumps.wikimedia.org/other/wikidata/20150803.json.gz. Concretely,
> there are around 15.7 million entities with [] for aliases.
>
> This is critically breaking the consumption of Wikidata content for all
> model-based JSON parsers, including Wikidata Toolkit.
>
> The bug used to occur only in XML dumps, but now also affects the JSON dumps
> in the same way. In previous JSON dumps, the problem was avoided by omitting
> empyt maps altogether (no keys, no values), which is better because it
> allows implementations to fall back to the obvious default. This is still
> done in the Web API, e.g.,
> https://www.wikidata.org/wiki/Special:EntityData/Q12062430.json
>
> It would be nice to test the export code before deploying it.
Sorry for that. Adam and Marius are working on a fix right now.
They'll report back in a bit.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata