[Foundation-l] Dumps mirroring (was: Request: WMF commitment as a long term cultural archive?)

Brian J Mingus brian.mingus at Colorado.EDU
Wed Sep 21 09:51:13 UTC 2011


On Wed, Sep 21, 2011 at 3:45 AM, Strainu <strainu10 at gmail.com> wrote:

> 2011/9/21 emijrp <emijrp at gmail.com>:
> > Hi all;
> >
> > Just like the scripts to preserve wikis[1], I'm working in a new script
> to
> > download all Wikimedia Commons images packed by day. But I have limited
> > spare time. Sad that volunteers have to do this without any help from
> > Wikimedia Foundation.
> >
> > I started too an effort in meta: (with low activity) to mirror XML
> dumps.[2]
> > If you know about universities or research groups which works with
> > Wiki[pm]edia XML dumps, they would be a possible successful target to
> mirror
> > them.
> >
> > If you want to download the texts into your PC, you only need 100GB free
> and
> > to run this Python script.[3]
> >
> > I heard that Internet Archive saves XML dumps quarterly or so, but no
> > official announcement. Also, I heard about Library of Congress wanting to
> > mirror the dumps, but not news since a long time.
> >
> > L'Encyclopédie has an "uptime"[4] of 260 years[5] and growing. Will
> > Wiki[pm]edia projects reach that?
> >
> > Regards,
> > emijrp
> >
> > [1] http://code.google.com/p/wikiteam/
> > [2] http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
> > [3]
> >
> http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py
> > [4] http://en.wikipedia.org/wiki/Uptime
> > [5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die
> >
> >
>
> Hi emirjrp,
>
> I can understand why you would prefer to have "full mirrors" of the
> dumps, but let's face it, 10TB is not (yet) something that most
> companies/universities can easily spare. Also, most people only work
> on 1-5 versions of Wikipedia, the rest is just overhead to them.
>
> My suggestion would be to accept mirrors of a single language and have
> a smart interface at dumps.wikimedia.org that redirects requests to
> the location that is the best match for the user. This system is used
> by some Linux distributions (see download.opensuse.org for instance)
> with great success.
>
> Regards,
>   Strainu
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


Perhaps a torrent setup would be successful in this case.


-- 
Brian Mingus
Graduate student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder


More information about the foundation-l mailing list