[Foundation-l] Dumps mirroring (was: Request: WMF commitment as a long term cultural archive?)

Huib Laurens sterkebak at gmail.com
Wed Sep 21 09:55:12 UTC 2011


I would be happy to mirror. I was looking and poking arround for that a year
ago and the biggest problem for me is that its not clear how WikiMedia would
like to be mirrored.

We are currently a Centos and Ubuntu mirror on the machine. We have the
space, thats not the problem.


Best,

Huib Laurens
WickedWay.nl

2011/9/21 Brian J Mingus <brian.mingus op colorado.edu>

> On Wed, Sep 21, 2011 at 3:45 AM, Strainu <strainu10 op gmail.com> wrote:
>
> > 2011/9/21 emijrp <emijrp op gmail.com>:
> > > Hi all;
> > >
> > > Just like the scripts to preserve wikis[1], I'm working in a new script
> > to
> > > download all Wikimedia Commons images packed by day. But I have limited
> > > spare time. Sad that volunteers have to do this without any help from
> > > Wikimedia Foundation.
> > >
> > > I started too an effort in meta: (with low activity) to mirror XML
> > dumps.[2]
> > > If you know about universities or research groups which works with
> > > Wiki[pm]edia XML dumps, they would be a possible successful target to
> > mirror
> > > them.
> > >
> > > If you want to download the texts into your PC, you only need 100GB
> free
> > and
> > > to run this Python script.[3]
> > >
> > > I heard that Internet Archive saves XML dumps quarterly or so, but no
> > > official announcement. Also, I heard about Library of Congress wanting
> to
> > > mirror the dumps, but not news since a long time.
> > >
> > > L'Encyclopédie has an "uptime"[4] of 260 years[5] and growing. Will
> > > Wiki[pm]edia projects reach that?
> > >
> > > Regards,
> > > emijrp
> > >
> > > [1] http://code.google.com/p/wikiteam/
> > > [2]
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
> > > [3]
> > >
> >
> http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py
> > > [4] http://en.wikipedia.org/wiki/Uptime
> > > [5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die
> > >
> > >
> >
> > Hi emirjrp,
> >
> > I can understand why you would prefer to have "full mirrors" of the
> > dumps, but let's face it, 10TB is not (yet) something that most
> > companies/universities can easily spare. Also, most people only work
> > on 1-5 versions of Wikipedia, the rest is just overhead to them.
> >
> > My suggestion would be to accept mirrors of a single language and have
> > a smart interface at dumps.wikimedia.org that redirects requests to
> > the location that is the best match for the user. This system is used
> > by some Linux distributions (see download.opensuse.org for instance)
> > with great success.
> >
> > Regards,
> >   Strainu
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l op lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
>
> Perhaps a torrent setup would be successful in this case.
>
>
> --
> Brian Mingus
> Graduate student
> Computational Cognitive Neuroscience Lab
> University of Colorado at Boulder
> _______________________________________________
> foundation-l mailing list
> foundation-l op lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



-- 
Kind regards,

Huib Laurens
WickedWay.nl

Webhosting the wicked way.


More information about the foundation-l mailing list