* MZMcBride z@mzmcbride.com [2012-09-10 02:45]:
K. Peachey wrote:
On Mon, Sep 10, 2012 at 8:01 AM, MZMcBride wrote:
page) is just absurd. There's enormous value in the HTML dumps. This subject came up in December 2011 and from the comments in that thread, it seemed as though the only reason the HTML dumps have been updated is that nobody has run the relevant script.
AFAIK, E:DumpHTML needs some loving first.
Can you elaborate on this? Is there anything actually stopping the extension (or rather the script) from being run? Of course every piece of software has bugs or feature requests, but if there are blockers to actually running this script, can you point me to the list of these (or more preferably add them as blockers to bug 15017)?
For context, "E:DumpHTML" refers to https://www.mediawiki.org/wiki/Extension:DumpHTML, a pseudo-extension (quasi-extension?) used to generate HTML dumps.
I use this extension on my wiki (http://spiele.j-crew.de/, http://misc.j-crew.de/wiki-dump/), but I find it quite brittle in the face of MediaWiki software changes. Every few months, a change in trunk breaks the extension in one way or another.
I recently submitted a bunch of fixes for the extension (see https://gerrit.wikimedia.org/r/#/c/17697/). These changes used to work for me a few months ago, but on current trunk image handling in DumpHTML is broken again (filename mangling of images seems broken, and thumbs are not included in the dump, which used to work).
I think HTML dumps of Wikipedia would be very useful, but it needs someone from WMF who actively maintains this extension.
Best regards Thomas