On Tue, May 09, 2006 at 05:26:12PM +0100, Ben Francis wrote:
Jay R. Ashworth wrote:
I've badly wanted to see something like this for some time, and would be glad (with 20 years system analyst experience :-) to kibitz on the design, if you like.
That's cool, thanks.
My personal target was mostly being able to extract a partial tree from a running MW install, and dump it into a DocBook source file,
When you say a "partial" tree, how would you define the boundaries of said tree?
Well, this is where you diverge from the translation part to the extraction part, the part I don't think Magnus' tool deals with yet (though I haven't looked deeply into it).
There are 2 issues that I can see, off hand:
1) Limiting what you extract (and handling it properly, viz a viz, say, "actual pages" vs "glossary items", and suchlike)
and
2) Properly handling cross references
The first can probably be handled by category tagging and some configuration files; the latter will likely require some indepth knowledge of how DocBook handles such things, since you can't do hyperlinks on many of DocBook's target formats (like, um, paper :-), and you can't bind traditional cross-references until you have real page numbers.
On the specific issue of trimming the tree, part of it is going to have to be discipline on the part of the maintainers of the wiki not to introduce loops -- it will likely be necessary to have a pre-pass switch on the driver engine that extracts and displays the "table of contents" in a raw unnumbered mode (in addition, of course, to one that generates a formattable ToC) so you can see if it ever ends.
But ignoring everything except the content of the actual page (what you get from &action=raw, which was going to be my approach) is probably the starting point, of course.
Cheers, -- jra