Re: [Wikitech-l] SoC: MediaWiki Export

10 May 2006

      Jay R. Ashworth schrieb:
...
On Tue, May 09, 2006 at 05:26:12PM +0100, Ben Francis wrote:
...
Jay R. Ashworth wrote:
...
I've
badly wanted to see something like this for some time, and would be
glad (with 20 years system analyst experience :-) to kibitz on the
design, if you like.
That's cool, thanks.
...
My personal target was mostly being able to extract a partial tree from
a running MW install, and dump it into a DocBook source file,
When you say a "partial" tree, how would you define the boundaries of
said tree?
Well, this is where you diverge from the translation part to the
extraction part, the part I don't think Magnus' tool deals with yet
(though I haven't looked deeply into it).
BTW, the tool has now moved to
http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php
The current version can also work as an extension within a MediaWiki
installation, where it appears as a Special Page. Just put some
link/button into your MediaWiki skin, and you have your export :-)
...
There are 2 issues that I can see, off hand:

Limiting what you extract (and handling it properly, viz a

viz, say, "actual pages" vs "glossary items", and suchlike)
and

Properly handling cross references

When my tool coverts all given articles into XML, it keeps a record of
the converted article names. IIRC, the DocBook export then creates
internal links to articles that have been converted. This works nicely
in DocBook-based HTML and PDFs, for example.
...
The first can probably be handled by category tagging and some
configuration files; the latter will likely require some indepth
knowledge of how DocBook handles such things, since you can't do
hyperlinks on many of DocBook's target formats (like, um, paper :-),
and you can't bind traditional cross-references until you have real
page numbers.
Just put an arrow in front of a reference ;-)
...
On the specific issue of trimming the tree, part of it is going to have
to be discipline on the part of the maintainers of the wiki not to
introduce loops -- it will likely be necessary to have a pre-pass
switch on the driver engine that extracts and displays the "table of
contents" in a raw unnumbered mode (in addition, of course, to one that
generates a formattable ToC) so you can see if it ever ends.
It might be best to have a starting page (Main Page), use every link on
there as level one, every link on these as level two etc. Or use CatScan
(on the toolserver as well, somewhere;-) to get a category tree.
A simple way to pass this to my script would then be putting spaces in
front of the article names - one space per depth. Leading spaces Should
be filtered out right now, but I could make them part of the XML output.
In case you didn't know, my script can do a full conversion, e.g. 
XML->DocBook->PDF, if configured properly. I have a local test setup on
a Window$ machine with some out-of-the-box tools to do the last step,
just didn't bother to do that on my test site, since I don't really have
experience with DocBook...
Magnus

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] SoC: MediaWiki Export