[Labs-l] Dumps project storage

Ryan Lane rlane32 at gmail.com
Wed May 22 05:36:34 UTC 2013


On Tue, May 21, 2013 at 10:28 PM, Federico Leva (Nemo)
<nemowiki at gmail.com>wrote:

> Ryan Lane, 21/05/2013 22:27:
>
>> It's not that I'm opposed to it, but it's a massive waste of resources
>> to download from something in the network to a network fileserver, then
>> to upload it to archive.org <http://archive.org>.
>>
>>
>> Why is it necessary to write hundreds of GB to the fileserver before
>> they are uploaded?
>>
>
> Sorry, I don't understand the question. Consider the request withdrawn,
> thanks for answering.
>
>
I'd like to make sure your need is handled, but I'd like to understand the
need too. We've had quite a bit of discussion with Hydriz in the past about
this project. It's resource intensive for us, so we try to make sure it's
being done efficiently. We made the dumps available at /public/data so that
it wouldn't be necessary to download them from download.wm.o, then upload
them to archive.org (it's possible to upload them directly from the
read-only dumps filesystem).

What I'm trying to understand is what is being written to /data/project and
why it's larger than 200GB. Based on what I've been told so far, the
project uploads dumps to archive.org. This is the first I'm hearing about
uploading commons images. Are you downloading large amounts of images from
upload.wm.o, writing them to /data/project, uploading them to archive.org,
then deleting them from /data/project?

- Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20130521/e6f0e55d/attachment-0001.html>


More information about the Labs-l mailing list