Hi!
I'm stuck with dumphtml.php. I want to create an offline version of our company wiki, but it's not working on any of our installation :S
At first it give "PHP Warning: Cannot modify header information - headers already sent by (output started at E:\PubHtml\wiki\maintenance\dumpHTML.php:119) in E:\PubHtml\wiki\includes\WebResponse.php on line 9" warnings, and the all produced html pages flow to console, none files created. I tried with the default path, and given path; with switches or without, no change.
I searched the net, but I didn't find anything useful.. For the PHP warning, I checked that none of the PHP files have BOM, and no starting or trailing whitespaces before the first "<" and last ">".
Our mediawiki server config: Windows XP SP2, MediaWiki 1.9.3, PHP: 5.2.1 (apache2handler), MySQL 5.0.37-community-nt.
Thanks in advance, Akos Szabo
On 04/04/07, j2k jacko2000@gmail.com wrote:
At first it give "PHP Warning: Cannot modify header information - headers already sent by (output started at E:\PubHtml\wiki\maintenance\dumpHTML.php:119) in E:\PubHtml\wiki\includes\WebResponse.php on line 9" warnings, and the all produced html pages flow to console, none files created. I tried with the default path, and given path; with switches or without, no change.
I was unable to reproduce this at first with a fresh checkout of MediaWiki 1.9.3 (from tags), however, I *was* able to get the error to occur when restricting read access in LocalSettings.php:
$wgGroupPermissions['*']['read'] = false;
This did the trick; the script spat out the HTML for the standard "please log in to view pages" page, and then PHP went a little bit nuts over the content printed to the screen.
This seems to be http://bugzilla.wikimedia.org/show_bug.cgi?id=4132. I suppose the obvious workaround is to override, and enable page views for all users for the dump session, which I'll make in trunk if no-one has objections...?
Rob Church
MediaWiki 1.9.3 (from tags), however, I *was* able to get the error to occur when restricting read access in LocalSettings.php:
$wgGroupPermissions['*']['read'] = false;
Thanks, it worked. Till backup I commented out this line. But the dump still not too good on windows...
Now the dump is okay in the \static directory, but with unicode filenames... Thus none of the links works, because the filenames are corrupt. Is there a way to convert page names to windows compatible ( :) format?
Regards, Akos Szabo
j2k wrote:
MediaWiki 1.9.3 (from tags), however, I *was* able to get the error to occur when restricting read access in LocalSettings.php:
$wgGroupPermissions['*']['read'] = false;
Thanks, it worked. Till backup I commented out this line. But the dump still not too good on windows...
Now the dump is okay in the \static directory, but with unicode filenames... Thus none of the links works, because the filenames are corrupt. Is there a way to convert page names to windows compatible ( :) format?
Windows XP has no problem with unicode filenames, it's a UTF-16 native platform. The problem is that PHP 5 doesn't support the UTF-16 interface, it just calls the compatibility functions, which interpret UTF-8 input as CP1252, resulting in garbage. There's a proposal to fix the problem in PHP 6.
It's no problem for Wikipedia since we generate our dumps on Linux, and transfer them to Windows with a tool that uses the UTF-16 interface (namely 7-Zip). You could do the same, or you could hack dumpHTML.php to generate encoded filenames. The problem with encoded filenames is that they are unreadable, so you can't find the article you want by browsing the files on disk.
-- Tim Starling
mediawiki-l@lists.wikimedia.org