On Wed, Jul 24, 2013 at 11:59 AM, dan entous <dan.entous.wikimedia@gmail.com
wrote:
context
i’m working on a mediawiki extension, http://www.mediawiki.org/wiki/Extension:GWToolset, which has as one of its goals, the ability to upload media files to a wiki. the extension, among other tasks, will process an XML file that has a list of urls to media files and upload those media files to the wiki along with metadata contained within the XML file. our ideal goal is to have this extension run on http://commons.wikimedia.org/ onhttp://commons.wikimedia.org/.
Check out the 'DataPages' subdirectory in the mediawiki/extensions/examples repository (< https://git.wikimedia.org/summary/mediawiki%2Fextensions%2Fexamples.git%3E). It was designed to showcase how to work with ContentHandler, and it does so by implementing an XML content type and namespace.
we initially developed the extension to store the files in the File: namespace, but we were told by the Foundation that we should use ContentHandler instead. unfortunately there is an issue with storing content > 1mb in the db so we need to find another solution.
That's a lot of XML! You can gzip page content, FWIW.