Yep, labs is currently experiencing a disk failure which will affect our
instance. The thread subject on the labs list:
[Labs-l] Partial outage in progress
On Tue, Feb 17, 2015 at 12:43 PM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
Amanda, did you base your query on the pseudo-sql
listed on the metric
page? In case you haven't seen it:
https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/…
(Sorry I'm linking to github, labs seems to be suffering some serious DNS
problems right now, all my attempts to load wikimetrics are failing)
On Tue, Feb 17, 2015 at 12:40 PM, Amanda Bittaker <abittaker(a)wikimedia.org
wrote:
> On that note, Jonathan, do you have SQL queries that return the same
> results as the Wikimetrics reports? I tried writing my own for bytes
> added, but it's pretty janky and takes forever to return anything in
> Quarry. Would you share your wisdom with a poor wayward amateur?
>
>
> On Tue, Feb 17, 2015 at 9:35 AM, Amanda Bittaker <abittaker(a)wikimedia.org
>
wrote:
>
>> Sweet, thanks Jonathan. I added a "Heartbreak" token, because at this
>> point I am really far too emotionally attached to Wikimetrics.
>>
>> On Tue, Feb 17, 2015 at 9:25 AM, Jonathan Morgan <jmorgan(a)wikimedia.org>
>> wrote:
>>
>>> Hi Amanda,
>>>
>>> Here's a ticket you can upvote:
>>>
https://phabricator.wikimedia.org/T87596
>>>
>>> I added a link to this thread to the task. I also added an "Evil Spooky
>>> Haunted Tree" token to the task. Because... well it just felt like the
>>> right thing to do.
>>>
>>> - J
>>>
>>> On Tue, Feb 17, 2015 at 7:58 AM, Dan Andreescu <
>>> dandreescu(a)wikimedia.org> wrote:
>>>
>>>> I can't find a specific ticket, Nuria may know of one. In general,
>>>> this is the project that LabsDB tickets are tagged with:
>>>>
https://phabricator.wikimedia.org/tag/wikimedia-labs-infrastructure/
>>>>
>>>> On Tue, Feb 17, 2015 at 10:53 AM, Amanda Bittaker <
>>>> abittaker(a)wikimedia.org> wrote:
>>>>
>>>>> Good morning Dan,
>>>>>
>>>>> Thanks very much for the explanation. Is there a Phabricator task
we
>>>>> can upvote (award a token?) to make this issue more visible?
>>>>>
>>>>> As always, we really appreciate your help with this.
>>>>>
>>>>> Best,
>>>>> Amanda
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 17, 2015 at 7:20 AM, Dan Andreescu <
>>>>> dandreescu(a)wikimedia.org> wrote:
>>>>>
>>>>>> Sorry for the trouble, Amanda. The problem is solely with the
>>>>>> underlying database, which we don't maintain. It's a
sanitized replica of
>>>>>> all the changes being made to all the wikis so it's a fairly
complicated
>>>>>> piece of infrastructure that sometimes has problems. The folks
who
>>>>>> maintain it are aware of the issues, but we'll continue
representing them
>>>>>> until they're solved.
>>>>>>
>>>>>> On Mon, Feb 16, 2015 at 3:49 PM, Amanda Bittaker <
>>>>>> abittaker(a)wikimedia.org> wrote:
>>>>>>
>>>>>>> Oop, thanks for the ping, Nuria. Wikimetrics seems to be
working
>>>>>>> better now. I still get failures, especially when running
three or four
>>>>>>> reports in one batch, but the reports work if you rerun them
(sometimes a
>>>>>>> couple times.)
>>>>>>>
>>>>>>> I'm still getting "PENDING"s that turn into
"FAILURE"s sometimes,
>>>>>>> which I just noticed for the first time last Thursday. Also,
sometimes the
>>>>>>> "FAILURE"s change position in the Current Report
Inbox list, moving up or
>>>>>>> down a spot. Not sure if that helps diagnose what might be
happening...
>>>>>>>
>>>>>>> In any case, Wikimetrics is mostly functioning but seems to
be
>>>>>>> having recurring troubles that sometimes blow up to freeze
the whole tool.
>>>>>>> It would be great to resolve the troubles before the next
explosion--is
>>>>>>> there anything I can do to help? Dan H and I still have
plenty of reports
>>>>>>> to run, we can keep you updated on the reports ran and
failure rate while
>>>>>>> you are fixing, if that would be useful.
>>>>>>>
>>>>>>> Many thanks,
>>>>>>> Amanda
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 16, 2015 at 10:15 AM, Nuria Ruiz
<nuria(a)wikimedia.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Ping ....
>>>>>>>>
>>>>>>>> On Fri, Feb 13, 2015 at 2:19 PM, Nuria Ruiz
<nuria(a)wikimedia.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Amanda,
>>>>>>>>>
>>>>>>>>> Looks like wikimetrics was able to run automatic
reports last
>>>>>>>>> night w/o big issues, are your reports still
failing?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Nuria
>>>>>>>>>
>>>>>>>>> On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker
<
>>>>>>>>> abittaker(a)wikimedia.org> wrote:
>>>>>>>>>
>>>>>>>>>> Alright, thanks so much for your help once again,
Nuria.
>>>>>>>>>>
>>>>>>>>>> If there's anything I can do or any
information I can
>>>>>>>>>> contribute, please don't hesitate to ping
me.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Amanda
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 12, 2015 at 1:36 PM, Nuria Ruiz
<nuria(a)wikimedia.org
>>>>>>>>>>
wrote:
>>>>>>>>>>
>>>>>>>>>>> DB connections in labs look to be failing,
unfortunately I
>>>>>>>>>>> think besides asking for help on the labs
list there is not much we can do
>>>>>>>>>>> there. I will start a thread on this regard.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Nuria
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 12, 2015 at 1:32 PM, Amanda
Bittaker <
>>>>>>>>>>> abittaker(a)wikimedia.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks so much for the quick response,
Nuria.
>>>>>>>>>>>>
>>>>>>>>>>>> I ran the exact same reports on the same
cohort as one of the
>>>>>>>>>>>> last batches that were failing. Last
time 2/4 of the reports failed, when
>>>>>>>>>>>> I reran the individually they succeeded.
(But they don't always, I reran
>>>>>>>>>>>> one report 3 times this morning before it
worked.) This time, my failure
>>>>>>>>>>>> rate got worse: 4/4 failed, although
they said "PENDING" for a few seconds
>>>>>>>>>>>> first, which is new.
>>>>>>>>>>>>
>>>>>>>>>>>> Is that useful information? Please do
let me know what else I
>>>>>>>>>>>> can do to help solve this.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>> Amanda
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Feb 12, 2015 at 1:09 PM, Jonathan
Morgan <
>>>>>>>>>>>> jmorgan(a)wikimedia.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Nuria!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Feb 12, 2015 at 12:57 PM,
Nuria Ruiz <
>>>>>>>>>>>>> nuria(a)wikimedia.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If so a cohort + report to repro
will be most useful.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Translation:* try to run the exact
same reports on the same
>>>>>>>>>>>>> cohort again, to see if the same
metrics fail. Let us know what you find. ;)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Same goes for anyone else who
experiences these issues: the
>>>>>>>>>>>>> more details we (users) can provide
the engineers, the more effective they
>>>>>>>>>>>>> can be at diagnosing and addressing
the problems.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> - J
>>>>>>>>>>>>>
>>>>>>>>>>>>> *for anyone who is not 100% familiar
with that hip, new
>>>>>>>>>>>>> software engineering lingo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nuria
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Feb 12, 2015 at 12:35 PM,
Dan Andreescu <
>>>>>>>>>>>>>> dandreescu(a)wikimedia.org>
wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Recently there was a restart
of the labsdb cluster. I'm
>>>>>>>>>>>>>>> sorry but I don't have
time to check on it, but I bet that's the problem.
>>>>>>>>>>>>>>> I'm off tomorrow
unfortunately but I'll try to check tomorrow night :( I
>>>>>>>>>>>>>>> hope someone else beats me to
it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Feb 12, 2015 at 3:20
PM, Jonathan Morgan <
>>>>>>>>>>>>>>> jmorgan(a)wikimedia.org>
wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (ping Kevin and Dan A.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Amanda, I've had
some problems with report failures
>>>>>>>>>>>>>>>> recently when I ran a few
test cohorts. On the same cohort, when I ran
>>>>>>>>>>>>>>>> multiple concurrent
reports (say, bytes added, edits, and pages created),
>>>>>>>>>>>>>>>> some would fail and
others succeed. It wasn't clear what the issue was.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - J
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Feb 12, 2015 at
12:16 PM, Amanda Bittaker <
>>>>>>>>>>>>>>>>
abittaker(a)wikimedia.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am getting failures
again, both when uploading cohorts
>>>>>>>>>>>>>>>>> and running reports.
Strangely, it seems the more reports you try to run
>>>>>>>>>>>>>>>>> in one batch the less
likely it is any report will succeed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is anyone else having
these problems again? Wonderful
>>>>>>>>>>>>>>>>> Analytics people,
could you please work your magic again?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>> Amanda
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>> Wikimetrics mailing
list
>>>>>>>>>>>>>>>>>
Wikimetrics(a)lists.wikimedia.org
>>>>>>>>>>>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Jonathan T. Morgan
>>>>>>>>>>>>>>>> Community Research Lead
>>>>>>>>>>>>>>>> Wikimedia Foundation
>>>>>>>>>>>>>>>> User:Jmorgan (WMF)
>>>>>>>>>>>>>>>>
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>>>>>>>>>>>>>>>> jmorgan(a)wikimedia.org
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>> Wikimetrics mailing list
>>>>>>>>>>>>>>>
Wikimetrics(a)lists.wikimedia.org
>>>>>>>>>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jonathan T. Morgan
>>>>>>>>>>>>> Community Research Lead
>>>>>>>>>>>>> Wikimedia Foundation
>>>>>>>>>>>>> User:Jmorgan (WMF)
>>>>>>>>>>>>>
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>>>>>>>>>>>>> jmorgan(a)wikimedia.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> Wikimetrics mailing list
>>>>>>>>>>>>> Wikimetrics(a)lists.wikimedia.org
>>>>>>>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Wikimetrics mailing list
>>>>>>> Wikimetrics(a)lists.wikimedia.org
>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikimetrics mailing list
>>>>>> Wikimetrics(a)lists.wikimedia.org
>>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wikimetrics mailing list
>>>>> Wikimetrics(a)lists.wikimedia.org
>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wikimetrics mailing list
>>>> Wikimetrics(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>>>>
>>>>
>>>
>>>
>>> --
>>> Jonathan T. Morgan
>>> Community Research Lead
>>> Wikimedia Foundation
>>> User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>>> jmorgan(a)wikimedia.org
>>>
>>>
>>> _______________________________________________
>>> Wikimetrics mailing list
>>> Wikimetrics(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>>>
>>>
>>
>
> _______________________________________________
> Wikimetrics mailing list
> Wikimetrics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>
>