[Labs-l] Failure to support database dumps on labs

Dan Andreescu dandreescu at wikimedia.org
Sat Feb 14 05:11:59 UTC 2015


Ah, John, sorry.  That's a known problem with the dumps process.  It's been
taking longer and longer and is harder and harder to manage because of the
increased size.  We weren't even able to update our reportcard lately
because the process is taking so long it doesn't leave Erik Z. the time to
run his analysis.  I have started talking to people privately about
revamping the dumps process.  We need it in Analytics for some very
important work that Aaron Halfaker is doing on diff analysis and folks like
you need it for your work.  From the start it's clear we need:

* incremental dumps
* fast access to them
* reliable bandwidth or a cluster to explore on

This is a million times easier said than done, but I'll keep making the
case for it.

On Fri, Feb 13, 2015 at 11:51 PM, John <phoenixoverride at gmail.com> wrote:

> I thought I included the link.... https://phabricator.wikimedia.org/T47646
> is for the two year old ticket. (that should make context a little clearer)
>
> Dan the basic dumps from dumps.wikimedia.org is all that I need, if you
> take a look at the path I provided the dumps for
>
> 20150112
> 20150204
> 20150205
>
> are all missing.
>
> On Fri, Feb 13, 2015 at 11:39 PM, Dan Andreescu <dandreescu at wikimedia.org>
> wrote:
>
>> Sorry to hear, John.  While I'm not ops, is there anything I can help
>> with to get your immediate need filled?  What would you do with the dump?
>> Is labsdb a good alternative or do you already have scripts?  Do you use
>> http://dumps.wikimedia.org/ ?  Are the dumps you need not there?  I know
>> that site's experiencing some rate limiting but that's simply a budget
>> issue.
>>
>> I'm on the analytics team and one of my goals is to make datasets and raw
>> data publicly available, so I appreciate your perspective and I'm sorry in
>> advance if I can't help.
>>
>> On Fri, Feb 13, 2015 at 11:23 PM, John <phoenixoverride at gmail.com> wrote:
>>
>>> I am looking at a ticket filed almost two years ago for labs to support
>>> the -latest format that the toolserver had, and guess what? Zero progress
>>> has been made.
>>>
>>> This is getting sad, when labs was created it was supposed to be a
>>> replacement and improvement on the toolserver, yet a basic feature of
>>> running tools on database dumps has yet to be implemented,
>>>
>>> So knowing that, I got a request to run a database scan today. I took a
>>> look at /public/dumps/public/enwiki to figure out the path to the most
>>> current dump. Guess what? we don't have it on labs. The most current dump
>>> for enwiki is from last year.... /public/dumps/public/enwiki/20141208/
>>>
>>> Something needs to happen, key, basic functionality of the toolserver is
>>> still missing, its not rocket science, yet ops has consistently failed to
>>> provide needed functionality in this area, filing tickets gets me nowhere,
>>> so the real question here is why is this still an issue and who do I need
>>> to call in order to get things resolved?
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150214/b68b6863/attachment-0001.html>


More information about the Labs-l mailing list