On Wed, 15 Aug 2001, Tim Chambers wrote:
I'd like to have access to a tarball of the
Wikipedia. Larry mentioned Jason
as being the one to set up the cron jobs, so I'm copying him FWIW.
A couple details haven't been discussed:
1. When the tarball becomes available, I assume it'll be a link from the
appropriate Wikpedia page? I'd propose at least HomePage. Maybe
Wikipedia_FAQ. And how about a page devoted to the tarball. Say, [[Wikipedia
Snapshot]]? See
ditto
2. It wasn't discussed whether all versions would
be archived or not. That,
of course, is ideal. But it will grow ''ad infiniutum.'' (''ad
nauseum''?
:-) I think it's essential for the historical record to archive all
revisions, but I think there should also be a "snapshot" tarball of only the
latest version of each article. It would be considerably smaller and
wouldn't grow as fast as the full archive. If that was done, then there
could be a page called [[Wikipedia Archives]] that had links to both
tarballs.
Well the way wiki is implemented is that each page's file actually
contains the prior revisions of itself within it. So we'll only be able
to browse it with wiki (afaik). It would actually be *harder* (I think)
to provide only the latest HTML snapshot (actually not *that* hard: wiki
has a capability to produce cached HTML versions of pages, and one could
set it up to build that cache and snag it; but that's more work for
Bomis, and any one of us can produce the snapshots once we have the
original, if we really needed it.)
Maybe we'll find there is an easy way to generate a snapshot of just the
HTML pages. I scanned through the wiki.pl file and didn't see such a
routine, however the "raw material" exists to create such a thing.
One obvious exercise I can think of is running a DocBook or Latex
converter over the source to produce a printable version of Wikipedia.
The trick would be figuring out how to filter out stuff inappropriate
for printing (e.g., user pages or discussion). That would kick ass.
:-) (The filter would be handy for creating sanitized distros of the
pedia for inclusion in linux distros too, if someone had the druthers to
do that some day.)
Bryce