2011/11/18 Ariel T. Glenn <ariel@wikimedia.org>

As I said below, providing multiterabyte dumps does not seem reasonable
to me.

What is the problem? Bandwidth? Disk space?

Monthly incrementals don't provide a workaround, unless you are
suggesting that we put dumps online for every month since the beginning
of the project.

Yes, indeed.

I think that a much more workable way to jump-start a
mirror is to copy directly to disks in the datacenter, for an
organization which will provide public access to its copy. This
requires three things: 1) an organization that wants to host such a
mirror, 2) them sending us disks, 3) me clearing it with Rob and with
our datacenter tech, but he's agreed to this in principle in the past.

Ariel

Στις 17-11-2011, ημέρα Πεμ, και ώρα 14:11 +0100, ο/η emijrp έγραψε:

> People can't mirror Commons if there is no public image dump. As there
> is no public image dump, people don't care about mirror. And so on...
>
> You can offer monthly incremental image dumps.[1] Until mid-2008,
> month uploads are lower than 100 GB. Recently, it is on the 200-300GB
> rage. People is mirroring Domas visit logs at Internet Archive, ok,
> Commons monthly size in this case is about 10x, but it is not
> impossible. Arcnhive Team has mirrored GeoCities (0.9TB), Yahoo!
> Videos (20TB), Jamendo (2.5TB) and other huge sites. So, if you put
> that image dumps online, they are going to rage-download all.
>
> You can start offering full resolution monthly dumps until 2007 or
> similar. But, man, we have to restart this soon or later.
>
> [1]
> http://archiveteam.org/index.php?title=Wikimedia_Commons#Size_stats
>
> 2011/11/17 Ariel T. Glenn <ariel@wikimedia.org>
> I had a quick look and it turns out that the English language
> Wikipedia
> uses over 2.8 million images today. So, as you point out, an
> off line
> reader that just used thumbnails would still have to be
> selective about
> its image use.
>
> In any case, putting together collections of thumbs doesn't
> resolve the
> need for a mirror of the originals, which I would really like
> to see
> happen.
>
> Ariel
>
> Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik
> Zachte έγραψε:
>
> > Ariel:
> > > Providing multiple terabyte sized files for download
> doesn't make any kind of sense to me. However, if we get
> concrete proposals for categories of Commons images people
> really want and would use, we can put those together. I think
> this has been said before on wikitech-l if not here.
> >
> > There is another way to cut down on download size, which
> would serve a whole class of content re-users, e.g. offline
> readers.
> > For offline readers it is not so important to have pictures
> of 20 Mb each, rather to have pictures at all, preferably 10's
> Kb's in size.
> > A download of all images, scaled down to say 600x600 max
> would be quite appropriate for many uses.
> > Map and diagrams would not survive this scale down
> (illegible text), but are very compact already.
> > In fact the compress ratio of each image is very reliable
> predictor of the type of content.
> >
> > In 2005 I distributed a DVD [1] with all unabridged texts
> for English Wikipedia and all 320,000 images on one DVD, to be
> loaded on 4Gb CF card for handheld.
> > Now we have 10 million images on Commons, so even scaled
> down images would need some filtering, but any collection
> would still be 100-1000 times smaller in size.
> >
> > Erik Zachte
> >
> > [1] http://www.infodisiac.com/Wikipedia/
> >
> >
> >
> > _______________________________________________
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>