Hey, it seems that a lot of people is doing the same thing here. Well, thanks for all the input about the code and the Plucker possibility. For now I think I'll go on like before (I like my spaghetti code :-) When I'll have a version working sufficiently well I'll make things public.
Ciao, Alfio
On Wed, 21 May 2003, Erik Zachte wrote:
If anyone is interested, I have a rudimental Perl script that is capable of reading the downloadable SQL dump and output all the articles as separate files in a number of alphabetical directories. It's not very fast, but it works. What's missing from the script: wikimarkup -> HTML conversion,
Mr David A. Wheeler, Have you seen my Perl script for conversion of the SQL dump to TomeRaider datase? You might find useful code there.
It renders all pages in html, checks als hyperlinks and unlinks half a million orphaned ones. It edits wiki code to remove redundant tags, fixes some badly coded html tables, adds stats and language specific introduction. Replaces html tags by extended ascii (saves a lot of space). Resolves redirects, thus making hyperlinks point directly to the proper article. It removes tables that only contained an image (plus possibly a single footer text).
In fact I think the script could be extended to generate separate html pages in a few hours. Plucker specifics not taken into account.
Script: http://members.chello.nl/epzachte/Wikipedia/WikiToTome.pl More info: http://members.chello.nl/epzachte/Wikipedia
Erik Zachte
Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l