Kyle Anderson schrieb:
On Thu, 2006-06-15 at 11:47 +0200, Andy Rabagliati wrote:
I am also interested in post-processing - stripping redlinks, including ''main article'' references on core articles, like ''History of South Africa'' etc. I want to be able to tweak parameters, then press a button and get a new CD (from my downloaded XML dump of en and a picture collection, and possibly via a live mediawiki snapshot of that content).
Live? How did you want to do this? Perhaps using the toolserver?
To shamelessly plug one of my toys again: Included in my Wiki-to-XML-to-(inser your favourite format) is a script that reads an XML dump and converts it into lots of single files, one per article. This "file heap" can then be browsed using the Wiki-to-XML-to-XHTML converter. It thus only depends on PHP and a web server, which should be easy to bundle with a CD/DVD. I also had the whole "file heap" indexed with Lucene, but didn't bother to hack a search frontend into the generated XHTML. Otherwise, I'd have had fulltext search as well.
A next step would be to output the "file heap" as compressed files to save disk space. Further, these compressed files could be merged into one single huge file, since the file system has a little trouble coping with that many files. That would require a separate file with title and position-in-file indices, though. Note that one can't just use the compressed dump file directly, as seeking in a compressed file is slooooow (it has to decomress everything that comes before the position you seek). So I'd go for a single file consisting of compressed individual files stacked on top of each other.
Templates work fine for that solution, categories don't - yet.
Magnus
P.S.: SVN, module "wiki2xml", directory "php", for those interested.