I have a few inactive MediaWiki sites currently running 1.26.4 that I want to convert to static HTML. As a first step, I enabled short URLs using the instructions at https://www.mediawiki.org/wiki/Manual:Short_URL/Apache . The site I am working on was originally in https://example.ca/dm. I moved the 'dm' folder to 'w'. In the root folder, I added the following line to .htaccess in the document root which contains both the 'w' and 'dm' folders.
RewriteRule ^/?dm(/.*)?$ %{DOCUMENT_ROOT}/w/index.php [L]
The site loaded fine when accessed as https://example.ca/w. I then updated LocalSettings.php using 'dm' as a replacement for 'wiki':
$wgScriptPath = "/w"; $wgArticlePath = "/dm/$1";
I purged the cache by truncating the MySQL objectcache. When I access https://example.ca/dm, the site appears to load properly. However, I noticed a lot of references to /w:
<link rel="stylesheet" href="/w/load.php?debug=false&lang=en&modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.interface%7Cskins.vector.styles&only=styles&skin=vector" />
<script async src="/w/load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector"></script>
There are other references to /w and //example.ca/w for links such as 'edit', 'search', and 'talk' but those will be deleted anyway.
I downloaded the site using wget and the instructions at http://camwebb.info/blog/2012-12-20/. When I uploaded the static HTML files to a web server and displayed the first page, it lacked stylesheets - they had been downloaded to a 'w' folder as:
load.php?debug=false&lang=en&modules=mediawiki.action.view.filepage%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.content.externallinks%7Cmediawiki.skinning.interface%7Cskins.monobook.styles&only
The browser failed with a 404 when trying to access the following in the static index.html:
<link rel="stylesheet" href="../w/load.php?debug=false&lang=en&modules=mediawiki.legacy.commonPrint%252Cshared%257Cmediawiki.sectionAnchor%257Cmediawiki.skinning.content.externallinks%257Cmediawiki.skinning.interface%257Cskins.monobook.styles&only=styles&skin=monobook" />
I tried renaming the various "load.php?..." files to "load.html" and "load2.html", then updating index.html to point to them. The loadx.html files load successfully but there is another load.php reference that I have not found yet, and lots in the other wiki pages.
Am I missing something obvious? I found additional .htaccess Rewrite rules in https://www.mediawiki.org/wiki/Manual:Short_URL/wiki/Page_Title_--_.htaccess but these appear apply if I am trying to run the wiki from https://example.ca.
Short urls are for viewing normal pages. Other actions like edit, along with other entry points like api.php or load.php (static css and js) continue to use long urls. Load.php is critical for site css. If you want to save a site using wget you will have to save the load.php requests too.
Some javascript/css may get loaded dynamically when the user does certain actions in the interface and thus not be caught by wget (however this does not sound like your main issue). Things should degrade gracefully without javascript, but of course the css is important.
-- Brian On Tuesday, August 29, 2023, nh905ml--- via MediaWiki-l < mediawiki-l@lists.wikimedia.org> wrote:
I have a few inactive MediaWiki sites currently running 1.26.4 that I want to convert to static HTML. As a first step, I enabled short URLs using the instructions at https://www.mediawiki.org/wiki/Manual:Short_URL/Apache. The site I am working on was originally in https://example.ca/dm. I moved the 'dm' folder to 'w'. In the root folder, I added the following line to .htaccess in the document root which contains both the 'w' and 'dm' folders.
RewriteRule ^/?dm(/.*)?$ %{DOCUMENT_ROOT}/w/index.php [L]
The site loaded fine when accessed as https://example.ca/w. I then updated LocalSettings.php using 'dm' as a replacement for 'wiki':
$wgScriptPath = "/w"; $wgArticlePath = "/dm/$1";
I purged the cache by truncating the MySQL objectcache. When I access https://example.ca/dm, the site appears to load properly. However, I noticed a lot of references to /w:
<link rel="stylesheet" href="/w/load.php?debug=false& amp;lang=en&modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki. sectionAnchor%7Cmediawiki.skinning.interface%7Cskins. vector.styles&only=styles&skin=vector" />
<script async src="/w/load.php?debug=false&lang=en&modules= startup&only=scripts&skin=vector"></script>
There are other references to /w and //example.ca/w for links such as 'edit', 'search', and 'talk' but those will be deleted anyway.
I downloaded the site using wget and the instructions at http://camwebb.info/blog/2012-12-20/. When I uploaded the static HTML files to a web server and displayed the first page, it lacked stylesheets - they had been downloaded to a 'w' folder as:
load.php?debug=false&lang=en&modules=mediawiki.action.view. filepage%7Cmediawiki.legacy.commonPrint%2Cshared% 7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.content. externallinks%7Cmediawiki.skinning.interface%7Cskins.monobook.styles&only
The browser failed with a 404 when trying to access the following in the static index.html:
<link rel="stylesheet" href="../w/load.php?debug= false&lang=en&modules=mediawiki.legacy.commonPrint% 252Cshared%257Cmediawiki.sectionAnchor%257Cmediawiki.skinning.content. externallinks%257Cmediawiki.skinning.interface%257Cskins. monobook.styles&only=styles&skin=monobook" />
I tried renaming the various "load.php?..." files to "load.html" and "load2.html", then updating index.html to point to them. The loadx.html files load successfully but there is another load.php reference that I have not found yet, and lots in the other wiki pages.
Am I missing something obvious? I found additional .htaccess Rewrite rules in https://www.mediawiki.org/wiki/Manual:Short_URL/wiki/ Page_Title_--_.htaccess but these appear apply if I am trying to run the wiki from https://example.ca.
On August 29, 2023 8:10:57 PM EDT, Brian Wolff bawolff@gmail.com wrote:
Short urls are for viewing normal pages. Other actions like edit, along with other entry points like api.php or load.php (static css and js) continue to use long urls. Load.php is critical for site css. If you want to save a site using wget you will have to save the load.php requests too.
Some javascript/css may get loaded dynamically when the user does certain actions in the interface and thus not be caught by wget (however this does not sound like your main issue). Things should degrade gracefully without javascript, but of course the css is important.
-- Brian
Thanks for the fast response - it looks like I will need to do a fair bit of editing of the files downloaded by wget.
Although wget downloaded three "load.php" files, Chrome Developer Tools/Network showed additional load.php calls that appear to be invoked from the output of the two load.php calls in Main_Page. Because these calls were invoked by a script, they were not captured by wget. I will attempt to download them individually and add script calls to the output in Main_Page and the other article pages. I started a thread on converting Mediawiki sites to static HTML and will post my experiences there.
mediawiki-l@lists.wikimedia.org