Nicolas Weeger schrieb:
A long, long
time ago, I started writing a PHP script to convert
MediaWiki markup into XML. I believe it is now feature-complete and
relatively reliable. Not only can it process a single wiki text, but a
list of articles, taking the text from any MediaWiki-based site online.
It uses the same method to replace templates.
Great thing!
Question: will it generate internal links, ie if i select [[wikipedia]]
and [[Jimmy Wales]] as articles, will the links between be "internal",
or to the site?
It'd be great to have the result xml as "self-contained" as possible.
Short answer : It does use self-contained links, and ignores all others.
Long answer:
The XML that is generated in the first step doesn't care about link
targets at all. The actual output function in the second step (e.g. to
DocBook) handles that, so it can be adapted to the format.
Maybe I should turn all non-internal links into external ones (back to
the wiki). However, that would also create link to non-existing articles
(or I'll have to check each of them across the web, resulting in
bandwidth hell ;-)
*dreams of having, on MediaWiki, a way to select
articles efficiently
and exporting directly*
Soon. Maybe not on Wikipedia (performance reasons), but otherwise - soon.
BTW, I have the parsing times way down by now. [[Biology]] is converted
in 0.5 seconds, not counting the times it has to load sources of
article/templates from the web. This might become a MediaWiki parser
replacement after all...
Magnus