I'd like to have access to a tarball of the Wikipedia. Larry mentioned Jason as being the one to set up the cron jobs, so I'm copying him FWIW.
A couple details haven't been discussed:
1. When the tarball becomes available, I assume it'll be a link from the appropriate Wikpedia page? I'd propose at least HomePage. Maybe Wikipedia_FAQ. And how about a page devoted to the tarball. Say, [[Wikipedia Snapshot]]? See
2. It wasn't discussed whether all versions would be archived or not. That, of course, is ideal. But it will grow ''ad infiniutum.'' (''ad nauseum''? :-) I think it's essential for the historical record to archive all revisions, but I think there should also be a "snapshot" tarball of only the latest version of each article. It would be considerably smaller and wouldn't grow as fast as the full archive. If that was done, then there could be a page called [[Wikipedia Archives]] that had links to both tarballs.
<>< [[Tbc]]
On Wed, 15 Aug 2001, Tim Chambers wrote:
I'd like to have access to a tarball of the Wikipedia. Larry mentioned Jason as being the one to set up the cron jobs, so I'm copying him FWIW.
A couple details haven't been discussed:
- When the tarball becomes available, I assume it'll be a link from the
appropriate Wikpedia page? I'd propose at least HomePage. Maybe Wikipedia_FAQ. And how about a page devoted to the tarball. Say, [[Wikipedia Snapshot]]? See
ditto
- It wasn't discussed whether all versions would be archived or not. That,
of course, is ideal. But it will grow ''ad infiniutum.'' (''ad nauseum''? :-) I think it's essential for the historical record to archive all revisions, but I think there should also be a "snapshot" tarball of only the latest version of each article. It would be considerably smaller and wouldn't grow as fast as the full archive. If that was done, then there could be a page called [[Wikipedia Archives]] that had links to both tarballs.
Well the way wiki is implemented is that each page's file actually contains the prior revisions of itself within it. So we'll only be able to browse it with wiki (afaik). It would actually be *harder* (I think) to provide only the latest HTML snapshot (actually not *that* hard: wiki has a capability to produce cached HTML versions of pages, and one could set it up to build that cache and snag it; but that's more work for Bomis, and any one of us can produce the snapshots once we have the original, if we really needed it.)
Maybe we'll find there is an easy way to generate a snapshot of just the HTML pages. I scanned through the wiki.pl file and didn't see such a routine, however the "raw material" exists to create such a thing.
One obvious exercise I can think of is running a DocBook or Latex converter over the source to produce a printable version of Wikipedia. The trick would be figuring out how to filter out stuff inappropriate for printing (e.g., user pages or discussion). That would kick ass. :-) (The filter would be handy for creating sanitized distros of the pedia for inclusion in linux distros too, if someone had the druthers to do that some day.)
Bryce
On 15-08-2001, Bryce Harrington wrote thusly :
On Wed, 15 Aug 2001, Tim Chambers wrote:
I'd like to have access to a tarball of the Wikipedia. Larry mentioned Jason as being the one to set up the cron jobs, so I'm copying him FWIW. A couple details haven't been discussed:
- When the tarball becomes available, I assume it'll be a link from the
appropriate Wikpedia page? I'd propose at least HomePage. Maybe Wikipedia_FAQ. And how about a page devoted to the tarball. Say, [[Wikipedia Snapshot]]? See
ditto
Agreed. I wonder how is it possible that such tarball and page have not been created so far. It good that the original poster pointed that out.
- It wasn't discussed whether all versions would be archived or not. That,
of course, is ideal. But it will grow ''ad infiniutum.'' (''ad nauseum''? :-) I think it's essential for the historical record to archive all revisions, but I think there should also be a "snapshot" tarball of only the latest version of each article. It would be considerably smaller and wouldn't grow as fast as the full archive. If that was done, then there could be a page called [[Wikipedia Archives]] that had links to both tarballs.
Well the way wiki is implemented is that each page's file actually contains the prior revisions of itself within it. So we'll only be able to browse it with wiki (afaik). It would actually be *harder* (I think) to provide only the latest HTML snapshot (actually not *that* hard: wiki has a capability to produce cached HTML versions of pages, and one could set it up to build that cache and snag it; but that's more work for Bomis, and any one of us can produce the snapshots once we have the original, if we really needed it.)
IMO there should be 2 kinds of tarballs - one with the usemod source so that I could grab it and install it right away and the other kind in human readable format. I think it is too soon for some articles to be rendered in printable form - today [[Siouxie and the Banshees]] article would give a blank page.
Maybe we'll find there is an easy way to generate a snapshot of just the HTML pages. I scanned through the wiki.pl file and didn't see such a routine, however the "raw material" exists to create such a thing. One obvious exercise I can think of is running a DocBook or Latex converter over the source to produce a printable version of Wikipedia. The trick would be figuring out how to filter out stuff inappropriate for printing (e.g., user pages or discussion). That would kick ass. :-) (The filter would be handy for creating sanitized distros of the pedia for inclusion in linux distros too, if someone had the druthers to do that some day.)
Thanks for this thread. kpj.
wikipedia-l@lists.wikimedia.org