) does a
pretty good job of converting a bunch of wiki pages to HTML, although
it starts from a live wiki instance (and a properly-configured Parsoid
pointed at it) rather than an XML dump. Zim-format dumps (for
example, from
)
can also be unpacked into a directory tree of HTML files.
There are also the "HTML dumps" that the service team is involved
with. This following links have more information:
Perhaps your use case could inform the ongoing design of that service.
--scott
On Mon, Sep 21, 2015 at 3:12 PM, Brion Vibber <bvibber(a)wikimedia.org> wrote:
On Mon, Sep 21, 2015 at 12:09 PM, v0id null
<v0idnull(a)gmail.com> wrote:
#1: mwdumper has not been updated in a very long
time. I did try to use it,
but it did not seem to work properly. I don't entirely remember what the
problem was but I believe it was related to schema incompatibility. xml2sql
comes with a warning about having to rebuild links. Considering that I'm
just in a command line and passing in page IDs manually, do I really need
to worry about it? I'd be thrilled not to have to reinvent the wheel here.
You would need to rebuild link tables if you need them for either mwdumper
or xml2sql. For your case it doesn't sound like you'd need them.
#2: Is there some way to figure it out? as I
showed in a previous reply,
the template that it can't find, is there in the page table.
As noted in previous reply, your import process is buggy and the page
record's page_title field is incorrect, so it cannot be found. You need to
correctly parse the incoming title into namespace and base title portions
and store them correctly into page_namespace numeric ID and page_title text
portion.
#3: Those lua modules, are they stock modules
included with the mediawiki
software, or something much more custom? If the latter, are they available
to download somewhere?
They are on the wiki, in the 'Module' namespace. Should be included with a
complete dump. I have no idea about the 'articles' dump, but I would assume
it *should* include them.
#4: I'm not any expert on mediawiki, but it seems when that the titles in
the xml dump need to be formatted, mainly replacing spaces with
underscores.
That's another thing your import process needs to do. I recommend using
existing code that already has all this logic. :)
-- brion
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l