Forwarding as this will also be relevant for people who consume Wikidata XML dumps (but not entity dumps), and especially for people who are interested in working with Structured Data on Commons from dumps.
---------- Forwarded message --------- Von: Ariel Glenn WMF ariel@wikimedia.org Date: Mi., 27. Nov. 2019 um 14:39 Uhr Subject: [Wikitech-l] BREAKING CHANGE: schema update, xml dumps To: Wikipedia Xmldatadumps-l Xmldatadumps-l@lists.wikimedia.org, Wikimedia developers wikitech-l@lists.wikimedia.org
We plan to move to the new schema for xml dumps for the February 1, 2020 run. Update your scripts and apps accordingly!
The new schema contains an entry for each 'slot' of content. This means that, for example, the commonswiki dump will contain MediaInfo information as well as the usual wikitext. See https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/docs/... for the schema and https://www.mediawiki.org/wiki/Requests_for_comment/Schema_update_for_multip... for further explanation and example outputs.
Phabricator task for the update: https://phabricator.wikimedia.org/T238972
PLEASE FORWARD to other lists as you deem appropriate. Thanks!
Ariel Glenn _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Though I should also note that using the Wikidata XML dumps is not recommended in general (see Wikidata:Database download#XML dumps https://www.wikidata.org/wiki/Wikidata:Database_download#XML_dumps), and this change also mainly affects non-main slots, which we don’t yet have on Wikidata (Quarry https://quarry.wmflabs.org/query/40356). If you use the dumps to analyze non-entity content (e. g. discussion pages), you may notice a new <origin> element within an <entity> element, or a new sha1 attribute on the <entity>’s main <text>; otherwise, this should be backwards compatible.
Am Mi., 27. Nov. 2019 um 17:14 Uhr schrieb Lucas Werkmeister < lucas.werkmeister@wikimedia.de>:
Forwarding as this will also be relevant for people who consume Wikidata XML dumps (but not entity dumps), and especially for people who are interested in working with Structured Data on Commons from dumps.
---------- Forwarded message --------- Von: Ariel Glenn WMF ariel@wikimedia.org Date: Mi., 27. Nov. 2019 um 14:39 Uhr Subject: [Wikitech-l] BREAKING CHANGE: schema update, xml dumps To: Wikipedia Xmldatadumps-l Xmldatadumps-l@lists.wikimedia.org, Wikimedia developers wikitech-l@lists.wikimedia.org
We plan to move to the new schema for xml dumps for the February 1, 2020 run. Update your scripts and apps accordingly!
The new schema contains an entry for each 'slot' of content. This means that, for example, the commonswiki dump will contain MediaInfo information as well as the usual wikitext. See
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/docs/... for the schema and
https://www.mediawiki.org/wiki/Requests_for_comment/Schema_update_for_multip... for further explanation and example outputs.
Phabricator task for the update: https://phabricator.wikimedia.org/T238972
PLEASE FORWARD to other lists as you deem appropriate. Thanks!
Ariel Glenn _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Lucas Werkmeister (he/er) Full Stack Developer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
wikidata-tech@lists.wikimedia.org