Hello,
I tried dumping local Wikipedia to HTML using dumpHTML.php and I found two bugs:
- the most important one is in includes/Title.php in getHashedDirectory function. When the $dbkey contains characters such as ".", it is used, so for example if $dbkey is "1. ", then the generated directory name is "1/./_/", which of course is only "1/_" and all links in that file stop working (assuming I'm using depth of 3).
The fix is easy (borrowed from getHashedFilename). Just adding
$chars[$i] = strtr( $chars[$i], '/\*?"<>|~.', '__________' );
to the else part of if in the for cycle fixes the problem.
- When generating specials (I suggest generating also Special:Allpages, not only categories), we should either make it possible to navigate through the result set using static HTML, or (for smaller wikis) getting rid of limits and navigation alltogether. I worked around this problem by increasing limits and deleting forms from the particular pages.
Also, the generation is painfully slow, is it possible to speed it up somehow? I had to rerun it on Sun Fire V20 with two processors to get at least a little bit reasonable time for generation (two hours for sk and cs wikipedia).
One last question: how to get rid of interwiki links to different language mutations? They don't work in local wikipedia of course...
Sincerely,
Juraj Bednar.
wikitech-l@lists.wikimedia.org