Nicolas Weeger schrieb:
A long, long time ago, I started writing a PHP script to convert MediaWiki markup into XML. I believe it is now feature-complete and relatively reliable. Not only can it process a single wiki text, but a list of articles, taking the text from any MediaWiki-based site online. It uses the same method to replace templates.
Great thing!
Question: will it generate internal links, ie if i select [[wikipedia]] and [[Jimmy Wales]] as articles, will the links between be "internal", or to the site? It'd be great to have the result xml as "self-contained" as possible.
Short answer : It does use self-contained links, and ignores all others.
Long answer: The XML that is generated in the first step doesn't care about link targets at all. The actual output function in the second step (e.g. to DocBook) handles that, so it can be adapted to the format.
Maybe I should turn all non-internal links into external ones (back to the wiki). However, that would also create link to non-existing articles (or I'll have to check each of them across the web, resulting in bandwidth hell ;-)
*dreams of having, on MediaWiki, a way to select articles efficiently and exporting directly*
Soon. Maybe not on Wikipedia (performance reasons), but otherwise - soon.
BTW, I have the parsing times way down by now. [[Biology]] is converted in 0.5 seconds, not counting the times it has to load sources of article/templates from the web. This might become a MediaWiki parser replacement after all...
Magnus