* MZMcBride <z(a)mzmcbride.com> [2012-09-10 02:45]:
K. Peachey wrote:
On Mon, Sep 10, 2012 at 8:01 AM, MZMcBride
wrote:
page) is just absurd. There's enormous value
in the HTML dumps. This subject
came up in December 2011 and from the comments in that thread, it seemed as
though the only reason the HTML dumps have been updated is that nobody has
run the relevant script.
AFAIK, E:DumpHTML needs some loving first.
Can you elaborate on this? Is there anything actually stopping the extension
(or rather the script) from being run? Of course every piece of software has
bugs or feature requests, but if there are blockers to actually running this
script, can you point me to the list of these (or more preferably add them
as blockers to bug 15017)?
For context, "E:DumpHTML" refers to
<https://www.mediawiki.org/wiki/Extension:DumpHTML>, a pseudo-extension
(quasi-extension?) used to generate HTML dumps.
I use this extension on my wiki (
http://spiele.j-crew.de/,
http://misc.j-crew.de/wiki-dump/), but I find it quite brittle in the
face of MediaWiki software changes. Every few months, a change in trunk
breaks the extension in one way or another.
I recently submitted a bunch of fixes for the extension (see
https://gerrit.wikimedia.org/r/#/c/17697/). These changes used to work for
me a few months ago, but on current trunk image handling in DumpHTML is
broken again (filename mangling of images seems broken, and thumbs are
not included in the dump, which used to work).
I think HTML dumps of Wikipedia would be very useful, but it needs
someone from WMF who actively maintains this extension.
Best regards
Thomas