RFC: Building a frontend performance analysis platform

List overview All Threads
Download

newer

older

Re: [Analytics] [Engineering]...

Roadmap Progress Updated

Asher Feldman

29 Nov 2012 29 Nov '12

11:40 p.m.

We've recently begun trialing a few frontend performance monitoring services - Keynote, Gomez, trying to get the most out of Watchmouse. They have their individual pros and cons and when they report sporadic issues, it can be difficult to correlate to actual user experiences (how many users were effected, where, and to what extent?) The dearth of data around end-user page load times (and things like domComplete) is a major blind spot.

Now that /event messages are flowing from bits to both kraken and vanadium, I think an initial in house system to analyze page load times as measured by actual users could be rapidly prototyped, and trump the above trials. This may already be an eventual deliverable for kraken, but given the drive behind the current trials, why wait?

The client side would be simple js - for n% of page views from a supported browser (ie >= 9, chrome >= 6, ff >=6, android >= 4.0) fire off an event request containing everything relevant from the window.performance.timing object (https://developer.mozilla.org/en-US/docs/Navigation_timing).

On the backend, perhaps some frequently periodic processing around geoip lookups and ISP (or other network path) determination before going into a data store from which we pull structured data for pretty numbers and pictures. The end result should be able to help identify everything from js/dom performance issues after a release, to who we should peer with and where we should provision our next edge cache center.

My main questions right now:

- Would vanadium or kraken be better suited for building this sooner than later (within a few weeks)

- Would anyone like to help? (David, your guidance around coding the frontend visualization would be highly valued even if you don't have a day or two to personally throw at it)

Asher

Attachments:

attachment.htm (text/html — 1.9 KB)

Show replies by date

Diederik van Liere

30 Nov 30 Nov

12:05 a.m.

New subject: [Ops] RFC: Building a frontend performance analysis platform

...

My main questions right now:

Would vanadium or kraken be better suited for building this sooner than

later (within a few weeks)

I definitely think that this is something for Kraken but would like David and Ottomata to chime in as well.

...

Would anyone like to help? (David, your guidance around coding the

frontend visualization would be highly valued even if you don't have a day or two to personally throw at it)

I would say, let's use Etsy's statsd & graphite visualization solution,

see http://github.com/etsy/statsd statsd is network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services like Graphite and posssibly also Ganglia. Very little coding involved :)

just my 2 cents.

...

Ops@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops

Asher Feldman

12:20 a.m.

New subject: [Ops] RFC: Building a frontend performance analysis platform

On Thu, Nov 29, 2012 at 4:05 PM, Diederik van Liere <dvanliere@wikimedia.org

...

wrote:

...

My main questions right now:

...

Would vanadium or kraken be better suited for building this sooner than

later (within a few weeks)

I definitely think that this is something for Kraken but would like David and Ottomata to chime in as well.

...

...

Would anyone like to help? (David, your guidance around coding the

frontend visualization would be highly valued even if you don't have a day or two to personally throw at it)

I would say, let's use Etsy's statsd & graphite visualization solution,

see http://github.com/etsy/statsd statsd is network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services like Graphite and posssibly also Ganglia. Very little coding involved :)

While some extracted data will certainly go into graphite.wikimedia.org for aggregation with the edge cache performance timing data already there, this doesn't at all map to what I have in mind.

This needs to go into a more traditional non-time series data store with advanced filtering / query / sort capabilities across all axis. That better fits with client side visualization, for both graph and textual targets.

Andrew Otto

2:56 p.m.

New subject: [Ops] RFC: Building a frontend performance analysis platform

Now that the stream is blasting in, I plan to start regularly (hourly?) importing it into Hadoop via Kafka. Once it is there, it can be poked and prodded in any way you like :)

On Nov 29, 2012, at 7:05 PM, Diederik van Liere dvanliere@wikimedia.org wrote:

...

My main questions right now:

Would vanadium or kraken be better suited for building this sooner than later (within a few weeks)

I definitely think that this is something for Kraken but would like David and Ottomata to chime in as well.

Would anyone like to help? (David, your guidance around coding the frontend visualization would be highly valued even if you don't have a day or two to personally throw at it)

I would say, let's use Etsy's statsd & graphite visualization solution, see http://github.com/etsy/statsd statsd is network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services like Graphite and posssibly also Ganglia. Very little coding involved :)

just my 2 cents.

D

Ops@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops

Ops mailing list Ops@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops

Ori Livneh

12:20 a.m.

Hey Asher,

We have something like this in place for mobile at the moment -- it's (somewhat uselessly) measuring time to DOMReady and DOMContentLoaded, but Jon (CC'd) and I were going to migrate it to use the Navigation Timing API sometime this week. Happy to help in whatever way. The Kraken / vanadium question isn't crucially important, since the data is going to both by default anyhow.

On Thursday, November 29, 2012 at 3:40 PM, Asher Feldman wrote:

...

We've recently begun trialing a few frontend performance monitoring services - Keynote, Gomez, trying to get the most out of Watchmouse. They have their individual pros and cons and when they report sporadic issues, it can be difficult to correlate to actual user experiences (how many users were effected, where, and to what extent?) The dearth of data around end-user page load times (and things like domComplete) is a major blind spot.

Now that /event messages are flowing from bits to both kraken and vanadium, I think an initial in house system to analyze page load times as measured by actual users could be rapidly prototyped, and trump the above trials. This may already be an eventual deliverable for kraken, but given the drive behind the current trials, why wait?

The client side would be simple js - for n% of page views from a supported browser (ie >= 9, chrome >= 6, ff >=6, android >= 4.0) fire off an event request containing everything relevant from the window.performance.timing object (https://developer.mozilla.org/en-US/docs/Navigation_timing).

On the backend, perhaps some frequently periodic processing around geoip lookups and ISP (or other network path) determination before going into a data store from which we pull structured data for pretty numbers and pictures. The end result should be able to help identify everything from js/dom performance issues after a release, to who we should peer with and where we should provision our next edge cache center.

My main questions right now:

Would vanadium or kraken be better suited for building this sooner than later (within a few weeks)

Would anyone like to help? (David, your guidance around coding the frontend visualization would be highly valued even if you don't have a day or two to personally throw at it)

Asher

Analytics mailing list Analytics@lists.wikimedia.org (mailto:Analytics@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/analytics

Asher Feldman

12:31 a.m.

New subject: RFC: Building a frontend performance analysis platform

That's great! Patrick also coded up something around the navigation timing api and took some approaches I really like (base64 encoding the entire time window.performance.timing object and including its sha1 in the /event request, to help ward off garbage data being dumped in) and is on the way to having a standalone mediawiki extension. A quick sync-up should ensure that the results meet everyone's needs.

-A

On Thu, Nov 29, 2012 at 4:20 PM, Ori Livneh ori.livneh@gmail.com wrote:

...

Hey Asher,

We have something like this in place for mobile at the moment -- it's (somewhat uselessly) measuring time to DOMReady and DOMContentLoaded, but Jon (CC'd) and I were going to migrate it to use the Navigation Timing API sometime this week. Happy to help in whatever way. The Kraken / vanadium question isn't crucially important, since the data is going to both by default anyhow.

O

On Thursday, November 29, 2012 at 3:40 PM, Asher Feldman wrote:

...
We've recently begun trialing a few frontend performance monitoring

services - Keynote, Gomez, trying to get the most out of Watchmouse. They have their individual pros and cons and when they report sporadic issues, it can be difficult to correlate to actual user experiences (how many users were effected, where, and to what extent?) The dearth of data around end-user page load times (and things like domComplete) is a major blind spot.

...
Now that /event messages are flowing from bits to both kraken and

vanadium, I think an initial in house system to analyze page load times as measured by actual users could be rapidly prototyped, and trump the above trials. This may already be an eventual deliverable for kraken, but given the drive behind the current trials, why wait?

...
The client side would be simple js - for n% of page views from a

supported browser (ie >= 9, chrome >= 6, ff >=6, android >= 4.0) fire off an event request containing everything relevant from the window.performance.timing object ( https://developer.mozilla.org/en-US/docs/Navigation_timing).

...
On the backend, perhaps some frequently periodic processing around geoip

lookups and ISP (or other network path) determination before going into a data store from which we pull structured data for pretty numbers and pictures. The end result should be able to help identify everything from js/dom performance issues after a release, to who we should peer with and where we should provision our next edge cache center.

...
My main questions right now:

Would vanadium or kraken be better suited for building this sooner

than later (within a few weeks)

...

Would anyone like to help? (David, your guidance around coding the

frontend visualization would be highly valued even if you don't have a day or two to personally throw at it)

...
Asher

Analytics mailing list Analytics@lists.wikimedia.org (mailto:Analytics@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Patrick Reilly

12:54 a.m.

New subject: RFC: Building a frontend performance analysis platform

So, that this string that's 623 characters long:

{"loadEventEnd":1354236146041,"loadEventStart":1354236146041,"domComplete":1354236146041,"domContentLoadedEventEnd":1354236146041,"domContentLoadedEventStart":1354236146036,"domInteractive":1354236146036,"domLoading":1354236145976,"responseEnd":1354236145973,"responseStart":1354236145972,"requestStart":1354236145971,"secureConnectionStart":0,"connectEnd":1354236145971,"connectStart":1354236145971,"domainLookupEnd":1354236145969,"domainLookupStart":1354236145969,"fetchStart":1354236145969,"redirectEnd":0,"redirectStart":0,"unloadEventEnd":1354236145973,"unloadEventStart":1354236145973,"navigationStart":1354236145969}

becomes this representation that's 874 characters long:

a7ee61ab6ba6de242e80e430304ee923157d8524:eyJsb2FkRXZlbnRFbmQiOjEzNTQyMzYxNDYwNDEsImxvYWRFdmVudFN0YXJ0IjoxMzU0MjM2MTQ2MDQxLCJkb21Db21wbGV0ZSI6MTM1NDIzNjE0NjA0MSwiZG9tQ29udGVudExvYWRlZEV2ZW50RW5kIjoxMzU0MjM2MTQ2MDQxLCJkb21Db250ZW50TG9hZGVkRXZlbnRTdGFydCI6MTM1NDIzNjE0NjAzNiwiZG9tSW50ZXJhY3RpdmUiOjEzNTQyMzYxNDYwMzYsImRvbUxvYWRpbmciOjEzNTQyMzYxNDU5NzYsInJlc3BvbnNlRW5kIjoxMzU0MjM2MTQ1OTczLCJyZXNwb25zZVN0YXJ0IjoxMzU0MjM2MTQ1OTcyLCJyZXF1ZXN0U3RhcnQiOjEzNTQyMzYxNDU5NzEsInNlY3VyZUNvbm5lY3Rpb25TdGFydCI6MCwiY29ubmVjdEVuZCI6MTM1NDIzNjE0NTk3MSwiY29ubmVjdFN0YXJ0IjoxMzU0MjM2MTQ1OTcxLCJkb21haW5Mb29rdXBFbmQiOjEzNTQyMzYxNDU5NjksImRvbWFpbkxvb2t1cFN0YXJ0IjoxMzU0MjM2MTQ1OTY5LCJmZXRjaFN0YXJ0IjoxMzU0MjM2MTQ1OTY5LCJyZWRpcmVjdEVuZCI6MCwicmVkaXJlY3RTdGFydCI6MCwidW5sb2FkRXZlbnRFbmQiOjEzNTQyMzYxNDU5NzMsInVubG9hZEV2ZW50U3RhcnQiOjEzNTQyMzYxNDU5NzMsIm5hdmlnYXRpb25TdGFydCI6MTM1NDIzNjE0NTk2OX0=

Notice that it's got the SHA-1 hash of the original string prepended with a colon colon as a separator.

— Patrick

On Thu, Nov 29, 2012 at 4:31 PM, Asher Feldman afeldman@wikimedia.org wrote:

...

That's great! Patrick also coded up something around the navigation timing api and took some approaches I really like (base64 encoding the entire time window.performance.timing object and including its sha1 in the /event request, to help ward off garbage data being dumped in) and is on the way to having a standalone mediawiki extension. A quick sync-up should ensure that the results meet everyone's needs.

-A

On Thu, Nov 29, 2012 at 4:20 PM, Ori Livneh ori.livneh@gmail.com wrote:

...
Hey Asher,

We have something like this in place for mobile at the moment -- it's (somewhat uselessly) measuring time to DOMReady and DOMContentLoaded, but Jon (CC'd) and I were going to migrate it to use the Navigation Timing API sometime this week. Happy to help in whatever way. The Kraken / vanadium question isn't crucially important, since the data is going to both by default anyhow.

O

On Thursday, November 29, 2012 at 3:40 PM, Asher Feldman wrote:

...
We've recently begun trialing a few frontend performance monitoring services - Keynote, Gomez, trying to get the most out of Watchmouse. They have their individual pros and cons and when they report sporadic issues, it can be difficult to correlate to actual user experiences (how many users were effected, where, and to what extent?) The dearth of data around end-user page load times (and things like domComplete) is a major blind spot.

Now that /event messages are flowing from bits to both kraken and vanadium, I think an initial in house system to analyze page load times as measured by actual users could be rapidly prototyped, and trump the above trials. This may already be an eventual deliverable for kraken, but given the drive behind the current trials, why wait?

The client side would be simple js - for n% of page views from a supported browser (ie >= 9, chrome >= 6, ff >=6, android >= 4.0) fire off an event request containing everything relevant from the window.performance.timing object (https://developer.mozilla.org/en-US/docs/Navigation_timing).

On the backend, perhaps some frequently periodic processing around geoip lookups and ISP (or other network path) determination before going into a data store from which we pull structured data for pretty numbers and pictures. The end result should be able to help identify everything from js/dom performance issues after a release, to who we should peer with and where we should provision our next edge cache center.

My main questions right now:

Would vanadium or kraken be better suited for building this sooner

than later (within a few weeks)

Would anyone like to help? (David, your guidance around coding the

frontend visualization would be highly valued even if you don't have a day or two to personally throw at it)

Asher

Analytics mailing list Analytics@lists.wikimedia.org (mailto:Analytics@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Ori Livneh

9:39 a.m.

On Thursday, November 29, 2012 at 4:54 PM, Patrick Reilly wrote:

...

<snipped>

...

Notice that it's got the SHA-1 hash of the original string prepended with a colon colon as a separator.

— Patrick

Back in July I tried to read through parts of Google Analytics's obfuscated ga.js, which had recently added page performance analytics. Rather than send the absolute timestamps, they calculate the following deltas and pack them into an array:

timing.loadEventStart - timing.navigationStart, timing.domainLookupEnd - timing.domainLookupStart, timing.connectEnd - timing.connectStart, timing.responseStart - timing.requestStart, timing.responseEnd - timing.responseStart, timing.fetchStart - timing.navigationStart

(A timer polls navigationStart and grabs these data points as soon as navigationStart is nonzero.)

I think we ought to read the specs carefully and experiment a little until we figure out precisely what each of these intervals measures and what sort of things it can usefully indicate. It would be good to come up with descriptive names for each of them.

If you want to give it a shot, you can use this gist to get the data in a way that won't break things on older browsers: https://gist.github.com/4174695

-O

Dan Andreescu

1:55 p.m.

New subject: RFC: Building a frontend performance analysis platform

...

    timing.loadEventStart - timing.navigationStart,
    timing.domainLookupEnd - timing.domainLookupStart,
    timing.connectEnd - timing.connectStart,
    timing.responseStart - timing.requestStart,
    timing.responseEnd - timing.responseStart,
    timing.fetchStart - timing.navigationStart
I think we ought to read the specs carefully and experiment a little until we figure out precisely what each of these intervals measures and what sort of things it can usefully indicate. It would be good to come up with descriptive names for each of them.

If you want to give it a shot, you can use this gist to get the data in a way that won't break things on older browsers: https://gist.github.com/4174695

I'll be doing so right now as I think this is very important and useful.

Dan

Dan Andreescu

2:28 p.m.

New subject: RFC: Building a frontend performance analysis platform

Ok, so these are actually the same metrics that Google Chrome shows you when you hover over one of the lines in the Network tab of their debugger. This is how they've named them (re-ordered chronologically and kept original order in numbers):

2. DNS Lookup: timing.domainLookupEnd - timing.domainLookupStart 3. Connecting: timing.connectEnd - timing.connectStart 6. Sending: timing.fetchStart - timing.navigationStart 4. Waiting: timing.responseStart - timing.requestStart 5. Receiving: timing.responseEnd - timing.responseStart 1. (not named): timing.loadEventStart - timing.navigationStart

4. Waiting is a bit vague. Maybe "Server processing Request" or something like that. 1. doesn't show up in the Network tab but it's basically 4. plus the time to load all the resources. In jquery speak, timing.loadEventStart is the number of milliseconds since 1/1/1970 that $(document).ready handlers would fire.

On Fri, Nov 30, 2012 at 8:55 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:

...

...
    timing.loadEventStart - timing.navigationStart,
    timing.domainLookupEnd - timing.domainLookupStart,
    timing.connectEnd - timing.connectStart,
    timing.responseStart - timing.requestStart,
    timing.responseEnd - timing.responseStart,
    timing.fetchStart - timing.navigationStart
I think we ought to read the specs carefully and experiment a little until we figure out precisely what each of these intervals measures and what sort of things it can usefully indicate. It would be good to come up with descriptive names for each of them.

If you want to give it a shot, you can use this gist to get the data in a way that won't break things on older browsers: https://gist.github.com/4174695
I'll be doing so right now as I think this is very important and useful.

Dan

Dan Andreescu

2:30 p.m.

New subject: RFC: Building a frontend performance analysis platform

...

doesn't show up in the Network tab but it's basically 4. plus the time

to load all the resources. In jquery speak, timing.loadEventStart is the number of milliseconds since 1/1/1970 that $(document).ready handlers would fire.

Sorry, 1. = 2+3+6+4+5+(load all resources). I think :)

Dan Andreescu

2:32 p.m.

New subject: RFC: Building a frontend performance analysis platform

I used a fiddle in case anyone wants to play with that. I'd love to help integrate client side performance testing into an extension/experiment or whatever else you guys deem useful.

http://jsfiddle.net/J4uzS/2/

On Fri, Nov 30, 2012 at 9:30 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:

...

...

doesn't show up in the Network tab but it's basically 4. plus the time

to load all the resources. In jquery speak, timing.loadEventStart is the number of milliseconds since 1/1/1970 that $(document).ready handlers would fire.

Sorry, 1. = 2+3+6+4+5+(load all resources). I think :)

4421

Age (days ago)

4422

Last active (days ago)

analytics@lists.wikimedia.org

11 comments

6 participants

tags (0)

participants (6)

Andrew Otto
Asher Feldman
Dan Andreescu
Diederik van Liere
Ori Livneh
Patrick Reilly