[gergo] send the event log as a synchronous request from an unload event handler.
[giles]That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers
Please avoid logging synchronously, this will make the UI slower for all users that are part of the logging sample. And not just a tad slower, potentially it could be much slower. For some of our users a network roundtrip is over 500 ms at the 50th percentile. So you can potentially block the UI for a long time.
We had a similar discussion with growth team regarding synchronous logging. You can see (a lot) of details here: https://bugzilla.wikimedia.org/show_bug.cgi?id=52287
We decided to switch to a localStorage based solution. In your case I think with UserTimings and sessionStorage you could get you the data you need. Support for storage is broad: http://caniuse.com/#feat=namevalue-storage, support for user timings less so but you get chrome and IE and that is a big percentage of user base: http://caniuse.com/#feat=user-timing
[gergo] store the event in cookies/localStorage, log it on the next page load. This works in all browsers but it is less reliable
I do not think so, clearing some concerns:
Probably runs int all sorts of complications with multiple tabs.
This should not be a concern, as the page visibility API tells you whether the tab is actually visible. You can restrict user timings logging and event logging reporting according to visibility so they only happen when user is interacting with the page.
what if the user comes back after a month?
If you use session storage the events disappear when the user closes the browser.
store the event in cookies/localStorage, log it on the next page load
Actually you can store the 'transition' in sessionStorage and use regular polling to report it. You do not necessarily need to report the transition from the next page. That being said you are right that the "last" step might be under-reported as user might leave the page. Now, we can analyze the data keeping this in mind. We can even 'estimate' how much are we underreporting the last step the user did.
On Wed, May 14, 2014 at 9:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
- send the event log as a synchronous request from an unload event
handler.
That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.
set a random identifier (which only lives until the page is unloaded), and add it to every event
That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.
On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.
The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.
Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.
Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).
So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.
I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.
One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:
- send the event log as a synchronous request from an unload event
handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.
- store the event in cookies/localStorage, log it on the next page load.
This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.
The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.
What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics