[Labs-l] Dumps project storage

Hydriz Scholz admin at alphacorp.tk
Wed May 22 13:55:39 UTC 2013


Yes, this has been discussed in the past that we will be reducing the
amount of resources used for the dumps project. I am currently coming up
with a few scripts and libraries to actually make this process of
uploading/downloading much less resource intensive and hell more efficient.
So Ryan, you don't have to worry too much about this one :)

However, the Wikimedia Commons grab is something that was undertaken by a
team not directly related to Wikimedia. We download from upload.wm.o (yes),
but at a rather slow speed to avoid overloading the servers. It has stopped
since quite a while ago in the process of trying to optimize bandwidth and
resource usage.

I am not exactly sure what Nemo wished to do in the original request, but I
believe the team is still discussing better ways to handle this (like using
the mirrors).

So, don't worry about the resource usage, we are currently still testing
only, so not much usage of precious resources.


On Wed, May 22, 2013 at 1:36 PM, Ryan Lane <rlane32 at gmail.com> wrote:

> On Tue, May 21, 2013 at 10:28 PM, Federico Leva (Nemo) <nemowiki at gmail.com
> > wrote:
>
>> Ryan Lane, 21/05/2013 22:27:
>>
>>> It's not that I'm opposed to it, but it's a massive waste of resources
>>> to download from something in the network to a network fileserver, then
>>> to upload it to archive.org <http://archive.org>.
>>>
>>>
>>> Why is it necessary to write hundreds of GB to the fileserver before
>>> they are uploaded?
>>>
>>
>> Sorry, I don't understand the question. Consider the request withdrawn,
>> thanks for answering.
>>
>>
> I'd like to make sure your need is handled, but I'd like to understand the
> need too. We've had quite a bit of discussion with Hydriz in the past about
> this project. It's resource intensive for us, so we try to make sure it's
> being done efficiently. We made the dumps available at /public/data so that
> it wouldn't be necessary to download them from download.wm.o, then upload
> them to archive.org (it's possible to upload them directly from the
> read-only dumps filesystem).
>
> What I'm trying to understand is what is being written to /data/project
> and why it's larger than 200GB. Based on what I've been told so far, the
> project uploads dumps to archive.org. This is the first I'm hearing about
> uploading commons images. Are you downloading large amounts of images from
> upload.wm.o, writing them to /data/project, uploading them to archive.org,
> then deleting them from /data/project?
>
> - Ryan
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>


-- 
Regards,
Hydriz

Be social, follow/add me:
Facebook: http://tinyurl.com/hydrizfb
Google+: http://tinyurl.com/hydrizgl
Twitter: @hydrizwiki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20130522/3c5dd950/attachment.html>


More information about the Labs-l mailing list