Je Mardo 20 Majo 2003 03:53, Alfio Puglisi skribis:
I just subscribed (I'm the wikipedia user At18) to ask about the automatic html dump function. ...
If anyone is interested, I have a rudimental Perl script that is capable of reading the downloadable SQL dump and output all the articles as separate files in a number of alphabetical directories. It's not very fast, but it works.
That's great!
What's missing from the script: wikimarkup -> HTML conversion,
You should be able to call the existing PHP code that generates HTML to do this.
A tool that generated the entire Wikipedia, in static HTML format, would make it trivial to generate the "Plucker" format for Palm PDAs. Plucker is offline web browser for Palm PDAs; it's open source software/Free Software (OSS/FS) released under the GPL. It can handle HTML, as well as PNG, GIF, JPEG, txt, and a few others; HTML is usually rendered as you'd expect (hypertext, italics, bold, font size changes, lists, indenting all work). It'd be very nice if the Wikipedia were available in Plucker format; that would mean that an OSS/FS reader could be used to view the text on a Palm PDA.
Plucker is available at: "http://www.plkr.org". I have a Palm, and it is the MOST important program I use by far.
One minor problem is that Plucker doesn't have an index facility. That could be solved by creating HTML pages that link to sorted articles, e.g., "Master Index" could list "A, B, C..."; clicking on "A" would reach "Index A" which would list "AA, AB, AC...". Then, modify the static version of the main main page so you could quickly jump to the master index.
Internally, Plucker will break long pages (>32K) into multiple pages with front and back link - but that'll be automatic and won't affect anything.
I don't know of an automatic way to download the Wikipedia images (which, in my mind, is a serious problem). Hopefully there will soon be a way to download the images other than trawling. However, for a Palm you'd have to drop the images in general anyway, so for that particular use it wouldn't matter.