Re: [Wikitech-l] SoC: MediaWiki Export

9 May 2006


      On Tue, May 09, 2006 at 05:26:12PM +0100, Ben Francis wrote:
...
Jay R. Ashworth wrote:
...
I've
badly wanted to see something like this for some time, and would be
glad (with 20 years system analyst experience :-) to kibitz on the
design, if you like.
That's cool, thanks.
...
My personal target was mostly being able to extract a partial tree from
a running MW install, and dump it into a DocBook source file,
When you say a "partial" tree, how would you define the boundaries of
said tree?
Well, this is where you diverge from the translation part to the
extraction part, the part I don't think Magnus' tool deals with yet
(though I haven't looked deeply into it).
There are 2 issues that I can see, off hand:
1) Limiting what you extract (and handling it properly, viz a
    viz, say, "actual pages" vs "glossary items", and suchlike)
and
2) Properly handling cross references
The first can probably be handled by category tagging and some
configuration files; the latter will likely require some indepth
knowledge of how DocBook handles such things, since you can't do
hyperlinks on many of DocBook's target formats (like, um, paper :-),
and you can't bind traditional cross-references until you have real
page numbers.
On the specific issue of trimming the tree, part of it is going to have
to be discipline on the part of the maintainers of the wiki not to
introduce loops -- it will likely be necessary to have a pre-pass
switch on the driver engine that extracts and displays the "table of
contents" in a raw unnumbered mode (in addition, of course, to one that
generates a formattable ToC) so you can see if it ever ends.
But ignoring everything except the content of the actual page (what you
get from &action=raw, which was going to be my approach) is probably
the starting point, of course.
Cheers,
-- jra
-- 
Jay R. Ashworth                                                jra@baylink.com
Designer                          Baylink                             RFC 2100
Ashworth & Associates        The Things I Think                        '87 e24
St Petersburg FL USA      http://baylink.pitas.com             +1 727 647 1274

     A: Because it messes up the order in which people normally read text.
     Q: Why is top-posting such a bad thing? 

     A: Top-posting.
     Q: What is the most annoying thing on Usenet and in e-mail?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] SoC: MediaWiki Export