Are there good unofficial sites with mirrors and dumps? Is anyone using a live feed to generate same?
Here is one of those core project support tasks that only the Foundation can do at the moment, that never seems to become a priority... but is fundamental to supporting a broad network of people who are carrying out their own Wikipedia and related initiatives.
Among the core ways that the projects' work gets out into the world is through full dumps provided by the foundation in all languages. There aren't many people with access to the databases to generate those dumps, and it often requires scheduling machine processor and disk time from inside the cluster to carry out regular dumps effectively.
Image dumps haven't worked reliably since sometime in 2005. I blogged about this in mid-2006, at which point I believe there was a bittorrent option but no other; the bittorrent option hasn't worked for over a year. http://downloads.wikimedia.org/images/ used to offer a few 2006-era dump links; those too are now gone.
Static versions of the site have also been available from time to time -- at the moment, the links from download.wikimedia.org are broken: http://static.wikipedia.org/
And html dumps of the projects have been generated from time to time; I don't know why these are presented separately from the full dump-lists (which generate xml dumps in many gratifying varieties), but the process involved needs some upkeep. At the moment, one can only get wikipedia dumps from february for languages aa to eml. http://static.wikipedia.org/downloads/2008-02/
Cheers, SJ
On Tue, Apr 29, 2008 at 3:53 PM, Samuel Klein meta.sj@gmail.com wrote: [snip]
Image dumps haven't worked reliably since sometime in 2005. I blogged about this in mid-2006, at which point I believe there was a bittorrent option but no other; the bittorrent option hasn't worked for over a year. http://downloads.wikimedia.org/images/ used to offer a few 2006-era dump links; those too are now gone.
[snip]
For several months one of the Wikimedia systems was configured to push images out to a system of mine with several terrabytes set aside, via the then-development version 3 of rsync. (prior versions of rsync were unable to cope with the large number of files)
This worked fairly well at the time and I handed out snapshots to a number of other people who requested them.
The feed seems to have stopped in early January. I never bothered looking into why, since I haven't been doing much with them recently and no one has asked me for a new snapshot lately.
Bittorrent is pretty much non-viable for maintaining a mirror of images (and nearly so for even making the initial several tbyte transfer).
On Tue, Apr 29, 2008 at 12:53 PM, Samuel Klein meta.sj@gmail.com wrote:
Are there good unofficial sites with mirrors and dumps? Is anyone using a live feed to generate same?
Here is one of those core project support tasks that only the Foundation can do at the moment, that never seems to become a priority... but is fundamental to supporting a broad network of people who are carrying out their own Wikipedia and related initiatives.
Among the core ways that the projects' work gets out into the world is through full dumps provided by the foundation in all languages. There aren't many people with access to the databases to generate those dumps, and it often requires scheduling machine processor and disk time from inside the cluster to carry out regular dumps effectively.
On the wiki-research list, Sue Gardner recently made a post about Foundation research priorities: http://lists.wikimedia.org/pipermail/wiki-research-l/2008-April/000546.html
There's an associated document on Meta: http://meta.wikimedia.org/wiki/Wikimedia_Foundation_Research_Goals
which lists a lot of the things many of us have been interested in researching for a long time.
Arguably, however, providing solid dumps is the backbone for getting most of this research getting done, since having project data to manipulate is necessary for many possible studies. So not only are regular dumps critical for fulfilling our free content responsibilities and mission, but they are critical for future research. Which is to say: we all really want to see them happen! And agreed, the Foundation is the only one that can make it so (even though it's not an easy task); and this is the sort of infrastructure task that should be absolutely core.
-- phoebe
wikimedia-l@lists.wikimedia.org