I may be mistaken, but isn't this done by Kiwix now? There was some discussion of that at http://www.kiwix.org/wiki/Mediawiki_DumpHTML_extension_improvement, and recent discussion here: https://blog.wikimedia.org/2014/09/12/emmanuel-engelhart-inventor-of-kiwix/
I could be mistaken.
On Wed, Oct 1, 2014 at 10:45 PM, Quim Gil qgil@wikimedia.org wrote:
Thank you Daniel for this email.
On Mon, Sep 29, 2014 at 10:16 AM, Daniel Friesen < daniel@nadir-seen-fire.com
wrote:
The DumpHTML extension looks like it's in a pretty bad state, it doesn't work at all in the current version of MediaWiki.
This seems to be an unfortunate symptom of how it's used and how it's treated by core developers.
What you are describing is clearly a MediaWiki problem, not just a DumpHTML problem. The Wiki Release Team and the MediaWiki Cooperation group should be helpful in situations like this, and this is why I'm proposing this task:
Take DumpHTML as a use case of 3rd party extension that MediaWiki maintainers cannot ignore https://phabricator.wikimedia.org/T536
DumpHTML is most useful when someone is shutting down and archiving
their wiki, so it doesn't get tested regularly.
The act of creating a version of wiki pages suitable for offline use from static files is something which inherently requires different behaviour from things deep within core.
Because DumpHTML has been segregated into an extension and core doesn't support an offline/dump mode internally DumpHTML has to use a bunch of hacks to make core behave properly during the dump.
Then, because they are completely unaware of DumpHTML's needs, core developers make improvements to core that then break DumpHTML without providing it an alternative interface to get what it needs out of core.
For one DumpHTML needs to proxy and mess with file repo behaviours. To do that it messed with properties like thumbScriptUrl, but then those properties were protected leaving DumpHTML unsupported. This was reported as a bug a month ago, which has gone relatively unnoticed: https://bugzilla.wikimedia.org/show_bug.cgi?id=69824
It also subclassed RepoGroup and since it proxied existing repo instances instead of working with repo info it had to bypass the __constructor. But then a $repoGroup->cache was added to the __constructor in core, breaking DumpHTML in another way.
There are probably more issues that I haven't found yet while trying to workaround the other issues.
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Quim Gil Engineering Community Manager @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l