Hi all!
Context: We plan to change the XML dumps (and Special:Export) to use the same JSON serialization that is used by the API, instead of the terse but brittle "internal" format. This is about the mechanism we plan to use for the conversion.
SO, I just went and checked my assertion that WikiExporter will use the Content object's serialize method to generate output. I WAS WRONG. It doesn't. I'll use the text from the database, as-is (for reference, find the call to Revision::getRevisionText in Export.php).
In order to force a conversion to the new format, we'll need to patch core to a) inject a hook here to override the default behavior or b) make it always use a Content object (unless, perhaps, told explicitly not to).
This is not hard to code, but doing it Right (tm) may need some discussion, and getting it merged may need some time.
Sorry for not checking this earlier. Daniel
Do the Wikimedia xml dump scripts even use php / MediaWiki at all? I am aware of some python scripts.
Please check with Ariel.
Katie On Apr 14, 2014 12:47 PM, "Daniel Kinzler" daniel.kinzler@wikimedia.de wrote:
Hi all!
Context: We plan to change the XML dumps (and Special:Export) to use the same JSON serialization that is used by the API, instead of the terse but brittle "internal" format. This is about the mechanism we plan to use for the conversion.
SO, I just went and checked my assertion that WikiExporter will use the Content object's serialize method to generate output. I WAS WRONG. It doesn't. I'll use the text from the database, as-is (for reference, find the call to Revision::getRevisionText in Export.php).
In order to force a conversion to the new format, we'll need to patch core to a) inject a hook here to override the default behavior or b) make it always use a Content object (unless, perhaps, told explicitly not to).
This is not hard to code, but doing it Right (tm) may need some discussion, and getting it merged may need some time.
Sorry for not checking this earlier. Daniel
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
wikidata-tech@lists.wikimedia.org