Re: [Xmldatadumps-l] [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

18 Nov 2011

As I said below, providing multiterabyte dumps does not seem reasonable
to me.  Monthly incrementals don't provide a workaround, unless you are
suggesting that we put dumps online for every month since the beginning
of the project.  I think that a much more workable way to jump-start a
mirror is to copy directly to disks in the datacenter, for an
organization which will provide public access to its copy.  This
requires three things: 1) an organization that wants to host such a
mirror, 2) them sending us disks, 3) me clearing it with Rob and with
our datacenter tech, but he's agreed to this in principle in the past.

Ariel

Στις 17-11-2011, ημέρα Πεμ, και ώρα 14:11 +0100, ο/η emijrp έγραψε:
...
  People can't mirror Commons if there is no public
image dump. As there
 is no public image dump, people don't care about mirror. And so on...

 You can offer monthly incremental image dumps.[1] Until mid-2008,
 month uploads are lower than 100 GB. Recently, it is on the 200-300GB
 rage. People is mirroring Domas visit logs at Internet Archive, ok,
 Commons monthly size in this case is about 10x, but it is not
 impossible. Arcnhive Team has mirrored GeoCities (0.9TB), Yahoo!
 Videos (20TB), Jamendo (2.5TB) and other huge sites. So, if you put
 that image dumps online, they are going to rage-download all.

 You can start offering full resolution monthly dumps until 2007 or
 similar. But, man, we have to restart this soon or later.

 [1]
 http://archiveteam.org/index.php?title=Wikimedia_Commons#Size_stats

 2011/11/17 Ariel T. Glenn &lt;ariel(a)wikimedia.org&gt;
         I had a quick look and it turns out that the English language
         Wikipedia
         uses over 2.8 million images today.  So, as you point out, an
         off line
         reader that just used thumbnails would still have to be
         selective about
         its image use.

         In any case, putting together collections of thumbs doesn't
         resolve the
         need for a mirror of the originals, which I would really like
         to see
         happen.

         Ariel

         Στις 17-11-2011, ημέρα Πεμ, και ώρα 01:46 +0100, ο/η Erik
         Zachte έγραψε:

  Ariel:
 > Providing multiple terabyte sized files for download          doesn't make
any kind of sense to me. However, if we get
         concrete proposals for categories of Commons images people
         really want and would use, we can put those together. I think
         this has been said before on wikitech-l if not here.

 There is another way to cut down on download size, which          would serve a
whole class of content re-users, e.g. offline
         readers.
  For offline readers it is not so important to
have pictures          of 20 Mb each, rather to have pictures at all, preferably
10's
         Kb's in size.
  A download of all images, scaled down to say
600x600 max          would be quite appropriate for many uses.
  Map and diagrams would not survive this scale
down          (illegible text), but are very compact already.
  In fact the compress ratio of each image is very
reliable          predictor of the type of content.

 In 2005 I distributed a DVD [1] with all unabridged texts          for English
Wikipedia and all 320,000 images on one DVD, to be
         loaded on 4Gb CF card for handheld.
  Now we have 10 million images on Commons, so even
scaled          down images would need some filtering, but any collection
         would still be 100-1000 times smaller in size.

 Erik Zachte

 [1] http://www.infodisiac.com/Wikipedia/

 _______________________________________________
 Xmldatadumps-l mailing list
 Xmldatadumps-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l          

         _______________________________________________
         Xmldatadumps-l mailing list
         Xmldatadumps-l(a)lists.wikimedia.org
         https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Xmldatadumps-l] [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005