[Labs-l] Failure to support database dumps on labs

Bryan White bgwhite at gmail.com
Sat Feb 14 06:07:31 UTC 2015


John,

I suggest downloading them from https://dumps.wikimedia.org/enwiki/

There is no February dump available at /public/dumps because labs doesn't
copy the data until the entire run is done, which takes 15 days.  So it
will be available around the 20th.  That means it is useless for me.  Other
times,  the dump data I use is available from dumps.wikimedia.org, but not
from labs because the run didn't finish. This is currently the case for
20150125 run of dewiki.

Over the past 12 months, the dumps directory has not been upto date atleast
six of those months.  They were not updating last October and November.
Over summer it was dead as the disk filled up.  Labs did buy more space,
but it took several months.

Labs isn't all to blame.  The dumps have become highly erratic. All
languages, except enwiki are supposed to biweekly, but this is hit or miss
right now.  Some languages don't get dumped for months.  Currently, arwiki
hasn't been dumped in two months.  dawiki for 6 weeks.  Of the 51 languages
I do for Checkwiki, 24 haven't been dumped in 2015.

Bryan

On Fri, Feb 13, 2015 at 10:11 PM, Dan Andreescu <dandreescu at wikimedia.org>
wrote:

> Ah, John, sorry.  That's a known problem with the dumps process.  It's
> been taking longer and longer and is harder and harder to manage because of
> the increased size.  We weren't even able to update our reportcard lately
> because the process is taking so long it doesn't leave Erik Z. the time to
> run his analysis.  I have started talking to people privately about
> revamping the dumps process.  We need it in Analytics for some very
> important work that Aaron Halfaker is doing on diff analysis and folks like
> you need it for your work.  From the start it's clear we need:
>
> * incremental dumps
> * fast access to them
> * reliable bandwidth or a cluster to explore on
>
> This is a million times easier said than done, but I'll keep making the
> case for it.
>
> On Fri, Feb 13, 2015 at 11:51 PM, John <phoenixoverride at gmail.com> wrote:
>
>> I thought I included the link....
>> https://phabricator.wikimedia.org/T47646 is for the two year old ticket.
>> (that should make context a little clearer)
>>
>> Dan the basic dumps from dumps.wikimedia.org is all that I need, if you
>> take a look at the path I provided the dumps for
>>
>> 20150112
>> 20150204
>> 20150205
>>
>> are all missing.
>>
>> On Fri, Feb 13, 2015 at 11:39 PM, Dan Andreescu <dandreescu at wikimedia.org
>> > wrote:
>>
>>> Sorry to hear, John.  While I'm not ops, is there anything I can help
>>> with to get your immediate need filled?  What would you do with the dump?
>>> Is labsdb a good alternative or do you already have scripts?  Do you use
>>> http://dumps.wikimedia.org/ ?  Are the dumps you need not there?  I
>>> know that site's experiencing some rate limiting but that's simply a budget
>>> issue.
>>>
>>> I'm on the analytics team and one of my goals is to make datasets and
>>> raw data publicly available, so I appreciate your perspective and I'm sorry
>>> in advance if I can't help.
>>>
>>> On Fri, Feb 13, 2015 at 11:23 PM, John <phoenixoverride at gmail.com>
>>> wrote:
>>>
>>>> I am looking at a ticket filed almost two years ago for labs to support
>>>> the -latest format that the toolserver had, and guess what? Zero progress
>>>> has been made.
>>>>
>>>> This is getting sad, when labs was created it was supposed to be a
>>>> replacement and improvement on the toolserver, yet a basic feature of
>>>> running tools on database dumps has yet to be implemented,
>>>>
>>>> So knowing that, I got a request to run a database scan today. I took a
>>>> look at /public/dumps/public/enwiki to figure out the path to the most
>>>> current dump. Guess what? we don't have it on labs. The most current dump
>>>> for enwiki is from last year.... /public/dumps/public/enwiki/20141208/
>>>>
>>>> Something needs to happen, key, basic functionality of the toolserver
>>>> is still missing, its not rocket science, yet ops has consistently failed
>>>> to provide needed functionality in this area, filing tickets gets me
>>>> nowhere, so the real question here is why is this still an issue and who do
>>>> I need to call in order to get things resolved?
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150213/2772f3fc/attachment.html>


More information about the Labs-l mailing list