On Wed, Jul 24, 2013 at 9:12 PM, Ori Livneh <ori(a)wikimedia.org> wrote:
thanks for the reply Ori.
On Wed, Jul 24, 2013 at 11:59 AM, dan entous
<dan.entous.wikimedia(a)gmail.com
wrote:
context
-------
i’m working on a mediawiki extension,
http://www.mediawiki.org/wiki/Extension:GWToolset, which has as one of its
goals, the ability to upload media files to a wiki. the extension, among
other tasks, will process an XML file that has a list of urls to media
files and upload those media files to the wiki along with metadata
contained within the XML file. our ideal goal is to have this extension run
on
http://commons.wikimedia.org/ <onhttp://commons.wikimedia.org/>.
Check out the 'DataPages' subdirectory in the mediawiki/extensions/examples
repository (<
https://git.wikimedia.org/summary/mediawiki%2Fextensions%2Fexamples.git>)t;).
It was designed to showcase how to work with ContentHandler, and it does so
by implementing an XML content type and namespace.
in our last meeting with the foundation, in july, we were asked to
move away from ContentHandler since there is a potential for XML files
to exceed a 1mb limit. at the moment, the extension is using
ContentHandler and DOMDocument to read the XML Content because in june
we were asked to use ContentHandler; we originally planned to read the
XML as a file and use XMLReader, which would be more efficient.
in a subsequent reply to this thread, Brian offers a potential way of
dealing with this issue. i’ll be able to take a look at his approach
later this month, but if anyone can prove the concept beforehand or
refer me to some code that has already done so, that would be great.
we initially developed the extension to store the files in the File:
namespace, but we were told by the Foundation that we should use
ContentHandler instead. unfortunately there is an issue with storing
content > 1mb in the db so we need to find another solution.
That's a lot of XML! You can gzip page content, FWIW.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l