I don't plan to do dailies any time soon. We don't even have real
incrementals for the text revs, which people have been begging for
forever; cleaning up the current adds/changes dumps and making them more
useful (and making them stable) has to be first. For the images, we
need to get the main bulk of the images out of here and into other
folks' hands first. Dailies, if they were to happen, would be quite
some time down the road; it's why I haven't written them into the plan.
Note that we don't have a place to keep a second copy of everything from
2004 til now, which is another reason I can't go that route right now.
To get an account on wikitech, please give me a user name you want and
an email address you prefer and I'll set you up.
Ariel
Στις 30-01-2012, ημέρα Δευ, και ώρα 23:42 +0100, ο/η emijrp έγραψε:
I see you are working on
this
https://wikitech.wikimedia.org/view/Dumps/Image_dumps
I don't have account there (how can i request one?). Why don't you
offer incremental image backups, in one-day chunks? Since 2004-09-07
to (today - 1 year) to leave enough time to remove copyvios.
2011/12/2 Ariel T. Glenn <ariel(a)wikimedia.org>
Στις 18-11-2011, ημέρα Παρ, και ώρα 11:49 +0200, ο/η Ariel T.
Glenn
έγραψε:
There are scripts to download all media used on a project
(
http://meta.wikimedia.org/wiki/Wikix ). As long as the
end user runs
one command, it doesn't matter what's
happening on the back
end.
> _and_ it needs to be possible for any consumer to perform
the task
of
> obtaining the source. Does the WMF block
people who
attempt to mirror
> the project content one item at a time? IMO
blocking them
is very
> sane, but if that is the only way to obtain
the source
then it would
again be
breaking the licence.
AFAIK we do not block folks that are making serial requests,
even if
they crawl the entire media space. Serial
requests don't
incur a big
cost on our servers.
I should clarify this.
Crawling the media server and requesting all images one at a
time (as
long as a pile of people aren't doing it at once) is fine.
Requesting
all images in a specific or several thumb sizes is not; in the
first
case we serve files that already exist while in the second
case the
files may need to be generated and put someplace. And we
simply don't
have space to keep generated thumbs of every image on commons
in various
arbitrary sizes at the moment. So folks that *do* want to
crawl the
media server and request thumbs for all of them should check
in with me
so we can figure out how to get you the data you need.
Ariel
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l