After talking with Dario and Leila we decided
that we will sample the
page-impression event at 1:1000. We would, however, like to retain the
widget-impression event unsampled if possible. That event happens
approximately 50% as often as page-impression. So we're probably talking
about somewhere around 60 events per second in that case. Would that be
acceptable or should we sample the widget-impression event as well?
Kaldari
On Wed, Jan 7, 2015 at 5:33 PM, Leila Zia <leila(a)wikimedia.org> wrote:
Thanks, Nuria!
On Wed, Jan 7, 2015 at 5:30 PM, Ryan Kaldari <rkaldari(a)wikimedia.org>
wrote:
> Thanks everyone for the research on this! I'll go ahead and create a
> card for implementing sampling on the high-throughput WikiGrok events.
>
> Kaldari
>
> On Wed, Jan 7, 2015 at 5:20 PM, Nuria Ruiz <nuria(a)wikimedia.org>
> wrote:
>
>> Sorry, I send it too soon, trying again:
>>
>> >We're talking about a total of ~170 events per sec for these pages.
>> This is to high to log in 1:1 rate, we would need to do 1:10. At this
>> time most events on EL logging log at a much lower rate, events over 1 per
>> sec are the following, as you can see mobile & media viewer are the
>> majority of the throughput.
>>
>> My preference would be to be less than 400 events per sec until we
>> have done some perf testing to make sure we can handle it (we might be able
>> to as we have done many improvements since we set these thresholds)
>>
>> MobileWebClickTracking 41.35% (114.15/sec)
>> MediaViewer 21.66% (59.78/sec)
>> MobileWikiAppToCInteraction 12.44% (34.35/sec)
>> PageContentSaveComplete 3.39% (9.35/sec)
>> EchoInteraction 2.69% (7.42/sec)
>> NavigationTiming 2.51% (6.93/sec)
>> MultimediaViewerNetworkPerformance 1.84% (5.07/sec)
>> SaveTiming 1.58% (4.37/sec)
>> Edit 1.39% (3.83/sec)
>> PersonalBar 1.24% (3.43/sec)
>> TimingData 0.83% (2.28/sec)
>> MobileWebUIClickTracking 0.73% (2.02/sec)
>> Popups 0.68% (1.87/sec)
>> MobileWikiAppOnboarding 0.62% (1.70/sec)
>> MultimediaViewerDimensions 0.61% (1.68/sec)
>> UniversalLanguageSelector 0.50% (1.37/sec)
>> PageCreation 0.50% (1.37/sec)
>> MultimediaViewerDuration 0.47% (1.30/sec)
>> MobileWebEditing 0.45% (1.25/sec)
>> MobileWikiAppSearch 0.41% (1.13/sec)
>> CentralAuth 0.40% (1.12/sec)
>>
>> On Wed, Jan 7, 2015 at 5:12 PM, Nuria Ruiz <nuria(a)wikimedia.org>
>> wrote:
>>
>>> >We're talking about a total of ~170 events per sec for these pages.
>>> This is to high to log in 1:1 rate, we would need to do 1:10.
>>>
>>> On Wed, Jan 7, 2015 at 4:10 PM, Leila Zia <leila(a)wikimedia.org>
>>> wrote:
>>>
>>>> Thanks everyone for chiming in. Your comments were very helpful. :-)
>>>>
>>>> Nuria, I checked the per second pageview count for the pages
>>>> wikigrok will be live on for 3 hours in 2015-01-07 (as a sample).
We're
>>>> talking about a total of ~170 events per sec for these pages. Of course
>>>> major events can affect this number. This number added to the current
270
>>>> events per sec you mentioned will send us over the 350 events per sec
limit
>>>> (if it's a hard limit). What do you think?
>>>>
>>>> Leila
>>>>
>>>>
>>>>
>>>> On Wed, Jan 7, 2015 at 10:13 AM, Nuria Ruiz <nuria(a)wikimedia.org>
>>>> wrote:
>>>>
>>>>> >Given that information, do you have any idea if we are in danger
>>>>> of overloading EventLogging?
>>>>> Logging broad events (such a page load) 1 to 1 might incur into
>>>>> problems as our traffic is high enough that events logged1/1000
happen
>>>>> still in very large amounts.
>>>>>
>>>>> Some numbers (oversimplyfying and rounding)
>>>>>
>>>>> We have about 200 million visits per day for the enwiki mobile
>>>>> site . This means about 2300 pageviews per sec, if we are sending 1
load
>>>>> event per pageview EL will (sadly) die, most likely.
>>>>>
>>>>> If we assume EL handles up to 350 events per second (and now we
>>>>> are at 270 events per sec) I would think that sending 10 events per
sec on
>>>>> your case would be pretty safe. That would be sampling about 1/200
for a
>>>>> load event per every pageview. This seems like a good upper bound.
>>>>>
>>>>> Now, since there are no constrains as to how long you keep your
>>>>> experiment running you can try a lower sampling ratio, say, 1/1000
and keep
>>>>> the experiment running for longer.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 6, 2015 at 5:50 PM, Ryan Kaldari <
>>>>> rkaldari(a)wikimedia.org> wrote:
>>>>>
>>>>>> The highest volume events we are going to log will be:
>>>>>> 1. For each of the 166,000 articles, one event when the page
loads
>>>>>> 2. For each of the 166,000 articles, one event when the WikiGrok
>>>>>> widget enters the viewport (about half as often as #1)
>>>>>>
>>>>>> These will be active for all mobile users, logged in and logged
>>>>>> out, including many high pageview articles.
>>>>>>
>>>>>> Given that information, do you have any idea if we are in danger
>>>>>> of overloading EventLogging? If so, do you have recommendations
on
>>>>>> sampling? So far, everyone has said not to worry about it, but it
would be
>>>>>> good to get a sanity check for this test specifically.
>>>>>>
>>>>>> Kaldari
>>>>>>
>>>>>> On Tue, Jan 6, 2015 at 4:57 PM, Nuria Ruiz
<nuria(a)wikimedia.org>
>>>>>> wrote:
>>>>>>
>>>>>>> (cc-ing mobile-tech)
>>>>>>>
>>>>>>> Since we do not the details of how wikigrok is used and its
>>>>>>> throughput of requests we can not "estimate"
sampling ourselves. I imagine
>>>>>>> wikigrok is been deployed to a number of users and it is with
that usage
>>>>>>> the mobile team could estimate the total throughput expected,
with this
>>>>>>> throughput we can recommend sampling ratios.
>>>>>>>
>>>>>>>
>>>>>>> Thanks for asking about this without before deploying!
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 6, 2015 at 4:55 PM, Ryan Kaldari <
>>>>>>> rkaldari(a)wikimedia.org> wrote:
>>>>>>>
>>>>>>>> I can elaborate on this after I finished the SWAT
>>>>>>>> deployment.... Gimme 30 minutes or so.
>>>>>>>>
>>>>>>>> On Tue, Jan 6, 2015 at 4:51 PM, Leila Zia
<leila(a)wikimedia.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> The mobile team is planning to switch WikiGrok on
for
>>>>>>>>> non-logged in users next week (2014-01-12). The
widget will be on on
>>>>>>>>> 166,029 article pages in enwiki. There are two
EventLogging schema that may
>>>>>>>>> collect data heavily and we want to make sure EL can
handle the influx of
>>>>>>>>> data.
>>>>>>>>>
>>>>>>>>> The two schema collecting data are:
>>>>>>>>>
https://meta.wikimedia.org/wiki/Schema:MobileWebWikiGrok
>>>>>>>>>
https://meta.wikimedia.org/wiki/Schema:MobileWebWikiGrokError
>>>>>>>>> and the list of pages affected is in:
>>>>>>>>> wgq_page in enwiki.wikigrok_questions.
>>>>>>>>>
>>>>>>>>> It would be great if someone from the dev side let
us know
>>>>>>>>> whether we will need sampling.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Leila
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Analytics mailing list
>>>>>>>> Analytics(a)lists.wikimedia.org
>>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Analytics mailing list
>>>>>>> Analytics(a)lists.wikimedia.org
>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> Analytics(a)lists.wikimedia.org
>>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics(a)lists.wikimedia.org
>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org