On Wed, 15 Aug 2001, Tim Chambers wrote:
I'd like to have access to a tarball of the Wikipedia. Larry mentioned Jason as being the one to set up the cron jobs, so I'm copying him FWIW.
A couple details haven't been discussed:
- When the tarball becomes available, I assume it'll be a link from the
appropriate Wikpedia page? I'd propose at least HomePage. Maybe Wikipedia_FAQ. And how about a page devoted to the tarball. Say, [[Wikipedia Snapshot]]? See
ditto
- It wasn't discussed whether all versions would be archived or not. That,
of course, is ideal. But it will grow ''ad infiniutum.'' (''ad nauseum''? :-) I think it's essential for the historical record to archive all revisions, but I think there should also be a "snapshot" tarball of only the latest version of each article. It would be considerably smaller and wouldn't grow as fast as the full archive. If that was done, then there could be a page called [[Wikipedia Archives]] that had links to both tarballs.
Well the way wiki is implemented is that each page's file actually contains the prior revisions of itself within it. So we'll only be able to browse it with wiki (afaik). It would actually be *harder* (I think) to provide only the latest HTML snapshot (actually not *that* hard: wiki has a capability to produce cached HTML versions of pages, and one could set it up to build that cache and snag it; but that's more work for Bomis, and any one of us can produce the snapshots once we have the original, if we really needed it.)
Maybe we'll find there is an easy way to generate a snapshot of just the HTML pages. I scanned through the wiki.pl file and didn't see such a routine, however the "raw material" exists to create such a thing.
One obvious exercise I can think of is running a DocBook or Latex converter over the source to produce a printable version of Wikipedia. The trick would be figuring out how to filter out stuff inappropriate for printing (e.g., user pages or discussion). That would kick ass. :-) (The filter would be handy for creating sanitized distros of the pedia for inclusion in linux distros too, if someone had the druthers to do that some day.)
Bryce