Thank you so much!!! We really appreciate it!
-Edward
On Fri, Jan 23, 2015 at 9:31 AM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
Wikimetrics has been having serious connectivity problems for a few days.
It turned out to be solvable by using some new hostnames
(labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports
and let me know if anything is still wrong.
On Fri, Jan 23, 2015 at 10:46 AM, Dan Andreescu
<dandreescu(a)wikimedia.org> wrote:
Hi everyone. I will work on this as soon as I get into the office, in
about an hour from now. Yuvi suggested one thing that I wasn't aware of
that might make this a simple fix.
On Friday, January 23, 2015, Dan Higgins <dhiggins(a)wikimedia.org> wrote:
>
> Hi Kevin,
>
> Sorry to be a pest but do you have any update on sorting out the
> Wikimetrics issues? It seems to have gotten worse since we last spoke to you
> with around 1 in 10 reports going through.
>
> Thanks,
>
> Dan
>
> On Tue, Jan 20, 2015 at 7:17 PM, Kevin Leduc <kevin(a)wikimedia.org>
> wrote:
>>
>> All the developers are in transit to SF today. Dan said he'd be in
>> the office this afternoon. First dev I see I'll notify them of problems in
>> wikimetrics.
>>
>> On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker
>> <abittaker(a)wikimedia.org> wrote:
>>>
>>> Hello again gentlemen,
>>>
>>> I think Dan might have already pinged you, but just in case, I wanted
>>> to let you know that we are getting these failures again. It's kind
>>> of crunch time for getting this data, so we're just banging our heads
>>> against the wall and retrying the reports until they work (1 out of 4
>>> times for me.) Is there any way you all could work your magic again?
>>>
>>> Many thanks once again,
>>> Amanda
>>>
>>>
>>>
>>> On Wed, Dec 10, 2014 at 4:30 PM, Kevin Leduc <kevin(a)wikimedia.org>
>>> wrote:
>>> > It's good to hear it's working again. Don't hesitate to
reach out
>>> > to us
>>> > here or at wikimetrics(a)lists.wikimedia.org if you notice this kind
>>> > of
>>> > trouble again.
>>> >
>>> > On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker
>>> > <abittaker(a)wikimedia.org>
>>> > wrote:
>>> >>
>>> >> It's working perfectly now--a thousand thank yous, Dan and
Marcel.
>>> >>
>>> >> On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez
>>> >> <egalvez(a)wikimedia.org>
>>> >> wrote:
>>> >>>
>>> >>> Thanks so much Dan and Marcel!
>>> >>>
>>> >>> -E
>>> >>>
>>> >>>
>>> >>> On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu
>>> >>> <dandreescu(a)wikimedia.org>
>>> >>> wrote:
>>> >>>>
>>> >>>> forgot Marcel - my fault. Jaime & folks, in general
Marcel
>>> >>>> rules and
>>> >>>> he's probably going to help you out faster / better than
I can.
>>> >>>>
>>> >>>> On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
>>> >>>> <dandreescu(a)wikimedia.org> wrote:
>>> >>>>>
>>> >>>>> Ok, Amanda and anyone else who had problems. Please try
again.
>>> >>>>> I
>>> >>>>> think I've cleared up some gunk and that might have
helped
>>> >>>>> things. We'll be
>>> >>>>> looking at performance more closely soon.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> Steps taken, logging mostly for post-mortem purpose
>>> >>>>>
>>> >>>>> * delete from report where recurrent_parent_id is null
and
>>> >>>>> recurrent =
>>> >>>>> 0 and created < date('2014-12-01');
>>> >>>>> ** This deleted records that are not visible in the
system
>>> >>>>> anymore.
>>> >>>>> They are recoverable from the wikimetrics database
backups but
>>> >>>>> we don't need
>>> >>>>> them in the database. These probably slowed some things
down,
>>> >>>>> in total the
>>> >>>>> statement deleted 1623628 rows.
>>> >>>>>
>>> >>>>> * alter table report add column old_recurrent
tinyint(1);
>>> >>>>> update report
>>> >>>>> set recurrent = 0, old_recurrent = 1 where user_id = 461
and
>>> >>>>> recurrent = 1;
>>> >>>>> ** This disables WikimetricsBot recurrent reports, but
>>> >>>>> preserves the
>>> >>>>> data so we can deal with them later. When labs is done
>>> >>>>> re-synchronizing, we
>>> >>>>> will be re-running these reports. They feed data to
Vital
>>> >>>>> Signs, in case
>>> >>>>> someone's curious about what they are.
>>> >>>>>
>>> >>>>> * Stopped and rebooted the system. The backup system
seems to
>>> >>>>> be
>>> >>>>> hanging or taking a really long time. I'd like to
take a look
>>> >>>>> at this in
>>> >>>>> more depth, but my guess is the amount it's
transferring has
>>> >>>>> gone beyond
>>> >>>>> what we expected.
>>> >>>>>
>>> >>>>> On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
>>> >>>>> <dandreescu(a)wikimedia.org> wrote:
>>> >>>>>>
>>> >>>>>> We're sorry - the problems we were facing last
week have
>>> >>>>>> probably
>>> >>>>>> festered. I'm going to turn off some things and
reset the
>>> >>>>>> system. I'll
>>> >>>>>> report back.
>>> >>>>>>
>>> >>>>>> On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
>>> >>>>>> <abittaker(a)wikimedia.org> wrote:
>>> >>>>>>>
>>> >>>>>>> Oh yes, and Jaime did have me restart my browser
and clear
>>> >>>>>>> the cache,
>>> >>>>>>> but it did not help.
>>> >>>>>>>
>>> >>>>>>> Thanks again,
>>> >>>>>>> Amanda
>>> >>>>>>>
>>> >>>>>>> On Wed, Dec 10, 2014 at 1:45 PM, Amanda
Bittaker
>>> >>>>>>> <abittaker(a)wikimedia.org> wrote:
>>> >>>>>>>>
>>> >>>>>>>> Hello Kevin,
>>> >>>>>>>>
>>> >>>>>>>> Jaime asked me to email you about some
trouble I've been
>>> >>>>>>>> having with
>>> >>>>>>>> Wikimetrics. The whole team has been
experiencing a pretty
>>> >>>>>>>> high rate of
>>> >>>>>>>> failures in both report creation and cohort
uploads. Almost
>>> >>>>>>>> nothing has
>>> >>>>>>>> gotten through for me today: of the last 13
reports I've
>>> >>>>>>>> run, 3 were
>>> >>>>>>>> successful. Of the failures, I would say
maybe only two or
>>> >>>>>>>> three "pended"
>>> >>>>>>>> at all before becoming failures. I've
been experiencing the
>>> >>>>>>>> same problem
>>> >>>>>>>> with cohort uploads.
>>> >>>>>>>>
>>> >>>>>>>> The reports have been: Newly Registered,
Edits, and Rolling
>>> >>>>>>>> Active
>>> >>>>>>>> Editor using expanded cohorts. Please find
attached an
>>> >>>>>>>> example of one of
>>> >>>>>>>> the reports. I tried uploading cohorts
using text files of
>>> >>>>>>>> user names and
>>> >>>>>>>> pasting user names from Notepad into the
"Paste Usernames"
>>> >>>>>>>> field. I do
>>> >>>>>>>> expand the cohorts every time.
>>> >>>>>>>>
>>> >>>>>>>> Do you know why the failure rate is so high,
especially this
>>> >>>>>>>> morning, and is there a way to eliminate or
mitigate this
>>> >>>>>>>> problem in the
>>> >>>>>>>> future?
>>> >>>>>>>>
>>> >>>>>>>> Many thanks for the assistance, and please
do let me know if
>>> >>>>>>>> you
>>> >>>>>>>> need any more information from me on this.
>>> >>>>>>>>
>>> >>>>>>>> Best,
>>> >>>>>>>> Amanda
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Edward Galvez
>>> >>> Program Evaluation Associate
>>> >>> Wikimedia Foundation
>>> >>
>>> >>
>>> >
>>
>>
>