Using EventLogging for funnel analysis

List overview All Threads
Download

newer

older

Fwd: Per-file view counts

Notes on mobile instrumentation...

Gergo Tisza

13 May 2014 13 May '14

4:54 p.m.

Hi all,

the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.

The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.

Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.

Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).

So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.

I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.

One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that: - send the event log as a synchronous request from an unload event handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long. - store the event in cookies/localStorage, log it on the next page load. This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.

The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.

What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.

Attachments:

attachment.htm (text/html — 3.7 KB)

Show replies by date

Gilles Dubuc

14 May 14 May

12:02 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

...

send the event log as a synchronous request from an unload event handler.

That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.

set a random identifier (which only lives until the page is unloaded), and

...

add it to every event

That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.

On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...

Hi all,

the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.

The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.

Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.

Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).

So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.

I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.

One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:

send the event log as a synchronous request from an unload event

handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.

store the event in cookies/localStorage, log it on the next page load.

This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.

The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.

What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Pau Giner

12:47 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi Gergo,

The number of users that drop-off at each stage will be really useful. Would it be possible to get the information in such a way that we could also check how long each step takes? In that way we could get an idea of how much time on average a user spends on each step and in total, even if they succeeded in the process.

Pau

On Wed, May 14, 2014 at 9:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

send the event log as a synchronous request from an unload event handler.

...
That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.

set a random identifier (which only lives until the page is unloaded), and

...
add it to every event

That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.

On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...
Hi all,

the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.

The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.

Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.

Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).

So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.

I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.

One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:

send the event log as a synchronous request from an unload event

handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.

store the event in cookies/localStorage, log it on the next page load.

This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.

The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.

What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

-- Pau Giner Interaction Designer Wikimedia Foundation

Gergo Tisza

1:37 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Wed, May 14, 2014 at 12:47 AM, Pau Giner pginer@wikimedia.org wrote:

...

The number of users that drop-off at each stage will be really useful. Would it be possible to get the information in such a way that we could also check how long each step takes? In that way we could get an idea of how much time on average a user spends on each step and in total, even if they succeeded in the process.

Logging the time from a successful step to the next successful step is easy. Logging the time from a successful step to a failed step (i.e. the user leaving) is possible, but what Gilles said applies (we have to discard some browsers).

Gergo Tisza

1:46 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Wed, May 14, 2014 at 12:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge.

As far as I can see, this can't be done with a simple GROUP BY: you would need logic like "from all records with the same sequence id which have an authorship_change field set, select the one with the latest timestamp". In SQL dialects supporting windowed/analytical expressions this is not bad, but in MySQL it would require some sort of self-join, I think.

Nuria Ruiz

15 May 15 May

3:06 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

...

...
[gergo] send the event log as a synchronous request from an unload event

handler.

...

[gilles]That works really well, I've done it before for autosaving

features. Obviously this only works if sampling users is >[gilles] enough (as opposed to measuring every single one), since it doesn't work on all browser

Please do not log synchronously, this would make the UI slower for all users in the logging sample. Not just a tad slower but potentially a *lot* slower. A network round trip for some of our users is >500 ms on the 50th percentile. We had a similar discussion with growth team and we agreed it was best to keep application state via localStorage:

See: https://bugzilla.wikimedia.org/show_bug.cgi?id=52287

There are several performance APIs to log application events. UserTimings is only present in newer browsers but with that and localStorage there are a lot of possibilities opening: http://www.html5rocks.com/en/tutorials/webperformance/usertiming/

On Wed, May 14, 2014 at 9:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

send the event log as a synchronous request from an unload event handler.

...
That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.

set a random identifier (which only lives until the page is unloaded), and

...
add it to every event

That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.

On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...
Hi all,

the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.

The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.

Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.

Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).

So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.

I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.

One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:

send the event log as a synchronous request from an unload event

handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.

store the event in cookies/localStorage, log it on the next page load.

This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.

The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.

What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Nuria Ruiz

3:56 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

...

...
[gergo] send the event log as a synchronous request from an unload event handler.

[giles]That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers

Please avoid logging synchronously, this will make the UI slower for all users that are part of the logging sample. And not just a tad slower, potentially it could be much slower. For some of our users a network roundtrip is over 500 ms at the 50th percentile. So you can potentially block the UI for a long time.

We had a similar discussion with growth team regarding synchronous logging. You can see (a lot) of details here: https://bugzilla.wikimedia.org/show_bug.cgi?id=52287

We decided to switch to a localStorage based solution. In your case I think with UserTimings and sessionStorage you could get you the data you need. Support for storage is broad: http://caniuse.com/#feat=namevalue-storage, support for user timings less so but you get chrome and IE and that is a big percentage of user base: http://caniuse.com/#feat=user-timing

...

[gergo] store the event in cookies/localStorage, log it on the next page load. This works in all browsers but it is less reliable

I do not think so, clearing some concerns:

...

Probably runs int all sorts of complications with multiple tabs.

This should not be a concern, as the page visibility API tells you whether the tab is actually visible. You can restrict user timings logging and event logging reporting according to visibility so they only happen when user is interacting with the page.

...

what if the user comes back after a month?

If you use session storage the events disappear when the user closes the browser.

...

store the event in cookies/localStorage, log it on the next page load

Actually you can store the 'transition' in sessionStorage and use regular polling to report it. You do not necessarily need to report the transition from the next page. That being said you are right that the "last" step might be under-reported as user might leave the page. Now, we can analyze the data keeping this in mind. We can even 'estimate' how much are we underreporting the last step the user did.

On Wed, May 14, 2014 at 9:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

...

send the event log as a synchronous request from an unload event

handler.

That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.

...
set a random identifier (which only lives until the page is unloaded), and add it to every event

That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.

On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...
Hi all,

the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.

The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.

Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.

Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).

So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.

I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.

One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:

send the event log as a synchronous request from an unload event

handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.

store the event in cookies/localStorage, log it on the next page load.

This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.

The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.

What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Gergo Tisza

4:46 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Thu, May 15, 2014 at 3:56 AM, Nuria Ruiz nuria@wikimedia.org wrote:

...

We decided to switch to a localStorage based solution. In your case I think with UserTimings and sessionStorage you could get you the data you need. Support for storage is broad: http://caniuse.com/#feat=namevalue-storage, support for user timings less so but you get chrome and IE and that is a big percentage of user base: http://caniuse.com/#feat=user-timing

I don't see how UserTiming is related. That API is about obtaining sub-millisecond precision - that is useful when you are building a 3D rendering engine or similar extremely time-sensitive feature, but generally using regular millisecond numbers is OK.

...

what if the user comes back after a month? If you use session storage the events disappear when the user closes the browser.

That means we are biasing the stats towards successful users, as they are guaranteed to make another page load in the same session, while unsuccessful ones might just leave for a while. Trying to compensate for that by making guesstimates, as you suggest, is a path I would rather not take.

Lars Aronsson

14 May 14 May

1:16 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On 05/14/2014 01:54 AM, Gergo Tisza wrote:

...

the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard.

Good.

...

UW has a "checkout" structure

Yes. But is this a given fact, or something that might change? It feels very monolithic, very un-wiki, very un-collaborative. Sometimes I want to unload my camera, so I can go out and capture more images, but I'm not ready to do all the detailed categorization. And perhaps others can help with that work anyway.

The UploadWizard is still better than the old upload form. But it's not entirely "wiki" (quick, collaborative, publish first, edit later). So how much energy and resources are we spending on making it slightly better, rather than designing something very different?

Some of the UploadWizard's convenient operations, like renaming or categorizing or editing the description of a whole group of images would be very nice to have after the upload. It would be similar to running a bot for recategorization, but built into the wiki interface. If you separate these operations from the upload, the upload would shrink to "unloading the camera", and you would have fewer interrupted/abandoned uploads.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Gergo Tisza

1:53 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Wed, May 14, 2014 at 1:16 AM, Lars Aronsson lars@aronsson.se wrote:

...

Yes. But is this a given fact, or something that might change?

We do intend to change it. You can see out plans (in a somewhat undigested form) at http://etherpad.wikimedia.org/p/design-multimedia-uploader But usage metrics from the current interface can still be helpful for designing a different one.

So how much energy and resources are we

...

spending on making it slightly better, rather than designing something very different?

That's the million dollar question... given that small improvements will have instant effect (but is wasted time in the long run), while a big redesign will take several months (I am being optimistic here...), we will have to do some mix of the two, but exactly what mix that will be is an open question.

Aaron Halfaker

5:58 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hey guys,

Here's how I'd do it.

*Assumption:* Only logged-in users can start the UW funnel

*Schemas:*

UploadWizardStep

Stored when the user loads a new step of the Upload Wizard

UploadWizardRightsSelection

Stored when the user selects a "rights" option.

- user_id : int -- The user's identifier - flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) - rights_selected : enum("own", "other) -- The rights that a user selected (note that multiple selections actions can take place for a single flow)

I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.

Make sense?

-Aaron

On Wed, May 14, 2014 at 3:53 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Wed, May 14, 2014 at 1:16 AM, Lars Aronsson lars@aronsson.se wrote:

...
Yes. But is this a given fact, or something that might change?

We do intend to change it. You can see out plans (in a somewhat undigested form) at http://etherpad.wikimedia.org/p/design-multimedia-uploader But usage metrics from the current interface can still be helpful for designing a different one.

So how much energy and resources are we

...
spending on making it slightly better, rather than designing something very different?

That's the million dollar question... given that small improvements will have instant effect (but is wasted time in the long run), while a big redesign will take several months (I am being optimistic here...), we will have to do some mix of the two, but exactly what mix that will be is an open question.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Fabrice Florin

8:20 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi Aaron,

Thanks so much for your good advice!

The approach you propose below makes good sense to me.

For now, I have added it in the notes section of our Mingle ticket for the data collection:

https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/305/edit...

Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?

Thanks again for your helpful insights :)

Fabrice

On May 14, 2014, at 5:58 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:

...

Hey guys,

Here's how I'd do it.

Assumption: Only logged-in users can start the UW funnel

Schemas:

UploadWizardStep

Stored when the user loads a new step of the Upload Wizard user_id : int -- The user's identifier flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) step : int -- 1 - 4 of the UW process UploadWizardRightsSelection

Stored when the user selects a "rights" option. user_id : int -- The user's identifier flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) rights_selected : enum("own", "other) -- The rights that a user selected (note that multiple selections actions can take place for a single flow) I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.

Make sense?

-Aaron

On Wed, May 14, 2014 at 3:53 AM, Gergo Tisza gtisza@wikimedia.org wrote: On Wed, May 14, 2014 at 1:16 AM, Lars Aronsson lars@aronsson.se wrote: Yes. But is this a given fact, or something that might change?

We do intend to change it. You can see out plans (in a somewhat undigested form) at http://etherpad.wikimedia.org/p/design-multimedia-uploader But usage metrics from the current interface can still be helpful for designing a different one.

So how much energy and resources are we spending on making it slightly better, rather than designing something very different?

That's the million dollar question... given that small improvements will have instant effect (but is wasted time in the long run), while a big redesign will take several months (I am being optimistic here...), we will have to do some mix of the two, but exactly what mix that will be is an open question.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

_______________________________

Fabrice Florin Product Manager Wikimedia Foundation

http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)

Dan Andreescu

8:56 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

...

Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?

Limn doesn't have many graph types but it might have a decent one for this purpose: http://debugging.wmflabs.org/graphs/ordinal_example

So you could have Step 1, Step 2, Step 3, ... on your X axis, then the number of people who made it through each step on your Y axis.

For a better funnel visualization, we should really support Sankey diagrams: http://www.practicalecommerce.com/files/images/0005/0938/2_goalFlow_Large.pn...

Dario Taraborelli

9:45 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On May 14, 2014, at 8:56 AM, Dan Andreescu dandreescu@wikimedia.org wrote:

...

Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?

Limn doesn't have many graph types but it might have a decent one for this purpose: http://debugging.wmflabs.org/graphs/ordinal_example

So you could have Step 1, Step 2, Step 3, ... on your X axis, then the number of people who made it through each step on your Y axis.

For a better funnel visualization, we should really support Sankey diagrams: http://www.practicalecommerce.com/files/images/0005/0938/2_goalFlow_Large.pn...

+1, this might be overkill for a linear funnel with a small number of nodes (where a directed graph visualization, a line chart or an actual “funnel” [1] could do the job). I agree Sankey diagrams would be nice to have down the line for more complex scenarios.

Dario

[1] https://github.com/smilli/d3-funnel-charts [2] The d3 implementation: http://bost.ocks.org/mike/sankey/

Fabrice Florin

11:17 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Muchas gracias, Dario :)

Can you clarify what you mean by ‘directed graph visualization’, to make sure we’re on the same page?

It seems like the bar graph recommended by Dan would be a good start, but we might also consider another simple line graph to see how these patterns change over time.

I added your suggestions and links into a Notes section for the funnel dashboard ticket #541, though I will most likely be split into a couple smaller tickets, like #305.

We’re hoping to have some of this data next week, in time for our Upload Wizard planning meeting on Thursday — wish us luck :)

Onward!

Fabrice

On May 14, 2014, at 9:45 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:

...

On May 14, 2014, at 8:56 AM, Dan Andreescu dandreescu@wikimedia.org wrote:

...
Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?

Limn doesn't have many graph types but it might have a decent one for this purpose: http://debugging.wmflabs.org/graphs/ordinal_example

So you could have Step 1, Step 2, Step 3, ... on your X axis, then the number of people who made it through each step on your Y axis.

For a better funnel visualization, we should really support Sankey diagrams: http://www.practicalecommerce.com/files/images/0005/0938/2_goalFlow_Large.pn...

+1, this might be overkill for a linear funnel with a small number of nodes (where a directed graph visualization, a line chart or an actual “funnel” [1] could do the job). I agree Sankey diagrams would be nice to have down the line for more complex scenarios.

Dario

[1] https://github.com/smilli/d3-funnel-charts [2] The d3 implementation: http://bost.ocks.org/mike/sankey/

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

_______________________________

Fabrice Florin Product Manager Wikimedia Foundation

http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)

Dario Taraborelli

11:25 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On May 14, 2014, at 11:17 AM, Fabrice Florin fflorin@wikimedia.org wrote:

...

Can you clarify what you mean by ‘directed graph visualization’, to make sure we’re on the same page?

https://twitter.com/WikiResearch/status/449252646063329280

you can imagine using a similar visualization with nodes as individual steps in the funnel and node size scaled to represent the absolute number of users at each step.

Dario

Fabrice Florin

11:58 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Thanks, Dario, much appreciated!

Are there any LIMN graphs that could be adapted to create directed graphs like these — or is this something we would do manually?

Also, we now plan to count the number of clicks on the main buttons for each step, not the number of unique users. We got the impression that unique users may be harder to count with our limited resources.

Does this plan seem reasonable for now? Or would you recommend a different approach?

Fabrice

On May 14, 2014, at 11:25 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:

...

On May 14, 2014, at 11:17 AM, Fabrice Florin fflorin@wikimedia.org wrote:

...
Can you clarify what you mean by ‘directed graph visualization’, to make sure we’re on the same page?

https://twitter.com/WikiResearch/status/449252646063329280

you can imagine using a similar visualization with nodes as individual steps in the funnel and node size scaled to represent the absolute number of users at each step.

Dario _______________________________________________ Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

_______________________________

Fabrice Florin Product Manager Wikimedia Foundation

http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)

Dan Andreescu

12:12 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

...

Are there any LIMN graphs that could be adapted to create directed graphs like these — or is this something we would do manually?

Manually for now

Fabrice Florin

10:54 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Thanks, Dan, this is really helpful!

I added your recommendations to this ticket for the visualization of our funnel metrics:

https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/541

And I agree that a Sankey diagram would be wonderful for our purposes — even if we have to create it manually for now. Any recommended tools that would be easy to use and would let us copy and paste our metrics data to quickly create such a diagram?

Much appreciated,

Fabrice

On May 14, 2014, at 8:56 AM, Dan Andreescu dandreescu@wikimedia.org wrote:

...

Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?

Limn doesn't have many graph types but it might have a decent one for this purpose: http://debugging.wmflabs.org/graphs/ordinal_example

So you could have Step 1, Step 2, Step 3, ... on your X axis, then the number of people who made it through each step on your Y axis.

For a better funnel visualization, we should really support Sankey diagrams: http://www.practicalecommerce.com/files/images/0005/0938/2_goalFlow_Large.pn... _______________________________________________ Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

_______________________________

Fabrice Florin Product Manager Wikimedia Foundation

http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)

Christian Aistleitner

15 May 15 May

1:42 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi,

tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.

Some less general remarks are inline below.

On Wed, May 14, 2014 at 07:58:58AM -0500, Aaron Halfaker wrote:

...

UploadWizardStep [...]

user_id : int -- The user's identifier

[...] UploadWizardRightsSelection [...]

user_id : int -- The user's identifier

Are those "user_id"s meant to be arbitrary but (for a given funnel) constant numbers, or are they meant as values of the user_id column of the user table?

If the latter is the case, is that information really needed? We could instead use a hash, which we could seed by the field

...

flow_initialized : str -- The timestamp at which the current flow

through the funnel began (will need to be stored in a cookie and reset at loads of step 1)

(which got suggested for both schemas and is constant).

Thereby, we could still analyze funnels (which is a separate issue, and needs a separate discussion), but we need not store unneeded data.

Best regards, Christian

-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/ ---------------------------------------------------------------

Christian Aistleitner

2:38 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi,

On Thu, May 15, 2014 at 10:42:22AM +0200, Christian Aistleitner wrote:

...

tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.

it seems that the above passage caused confusion internally in the Analytics team.

So just to avoid doubt, let me clarify publicly that by the above “I”, I really only mean me. Not the whole of the Analytics team. :-)

Have fun, Christian

P.S.: This clarification does of course not lessen my call about hashing user_ids.

Aaron Halfaker

2:49 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

If we hashed user IDs, we'd not be able to compute statistics about the images that UW users uploaded or their other work. Being able to correlate UW success with other work and the experience level of the editor seems like a clearly important thing. Also, storing a user ID is probably the least invasive way to track identity. It's a standard practice.

-Aaron

On Thu, May 15, 2014 at 4:38 PM, Christian Aistleitner < christian@quelltextlich.at> wrote:

...

Hi,

On Thu, May 15, 2014 at 10:42:22AM +0200, Christian Aistleitner wrote:

...
tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.

it seems that the above passage caused confusion internally in the Analytics team.

So just to avoid doubt, let me clarify publicly that by the above “I”, I really only mean me. Not the whole of the Analytics team. :-)

Have fun, Christian

P.S.: This clarification does of course not lessen my call about hashing user_ids.

-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Max Semenik

2:54 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Does EL honor Do Not Track?

On Thu, May 15, 2014 at 2:49 PM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:

...

If we hashed user IDs, we'd not be able to compute statistics about the images that UW users uploaded or their other work. Being able to correlate UW success with other work and the experience level of the editor seems like a clearly important thing. Also, storing a user ID is probably the least invasive way to track identity. It's a standard practice.

-Aaron

On Thu, May 15, 2014 at 4:38 PM, Christian Aistleitner < christian@quelltextlich.at> wrote:

...
Hi,

On Thu, May 15, 2014 at 10:42:22AM +0200, Christian Aistleitner wrote:

...
tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.

it seems that the above passage caused confusion internally in the Analytics team.

So just to avoid doubt, let me clarify publicly that by the above “I”, I really only mean me. Not the whole of the Analytics team. :-)

Have fun, Christian

P.S.: This clarification does of course not lessen my call about hashing user_ids.

-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Best regards, Max Semenik ([[User:MaxSem]])

Aaron Halfaker

2:56 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

If I'm understanding correctly, Do Not Track is about tracking cookies that track activities between websites. Is that right?

FWIW, the user_id that I propose storing is not a cookie.

-Aaron

On Thu, May 15, 2014 at 4:54 PM, Max Semenik maxsem.wiki@gmail.com wrote:

...

Does EL honor Do Not Track?

On Thu, May 15, 2014 at 2:49 PM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:

...
If we hashed user IDs, we'd not be able to compute statistics about the images that UW users uploaded or their other work. Being able to correlate UW success with other work and the experience level of the editor seems like a clearly important thing. Also, storing a user ID is probably the least invasive way to track identity. It's a standard practice.

-Aaron

On Thu, May 15, 2014 at 4:38 PM, Christian Aistleitner < christian@quelltextlich.at> wrote:

...
Hi,

On Thu, May 15, 2014 at 10:42:22AM +0200, Christian Aistleitner wrote:

...
tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.

it seems that the above passage caused confusion internally in the Analytics team.

So just to avoid doubt, let me clarify publicly that by the above “I”, I really only mean me. Not the whole of the Analytics team. :-)

Have fun, Christian

P.S.: This clarification does of course not lessen my call about hashing user_ids.

-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Best regards, Max Semenik ([[User:MaxSem]])

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Christian Aistleitner

16 May 16 May

3:59 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi Aaron,

On Thu, May 15, 2014 at 04:56:38PM -0500, Aaron Halfaker wrote:

...

If I'm understanding correctly, Do Not Track is about tracking cookies that track activities between websites. Is that right?

That's a really hard question.

Yes and no. There is no standard yet. And people generally have different understandings of the header, its definition and its intention, and purpose.

(If you care to read only one of the below items, read the last one titled “Consumer's confused view”)

* W3C's Tracking Protection Working Group's point of view

W3C's current work in progress “Tracking Preference Expression (DNT)” document [1] does not limit to “cookies”, or “tracking cookies”. The document explicitly states that the intention is “general, regardless of protocols” [2].

Also, the boundary is not between websites, but on an organisational level. If an entity is first party on SiteA, and SiteB, different rules^Wrecommendations apply than if the same party were first party on SiteA, and third party on SiteB.

In general, the Do Not Track header is more geared towards targeted advertizing than analytics. It's been ridiculed by some to be “Do Not Target” instead of “Do Not Track”.

And it comes with so many exceptions [3] and vague definitions (or definitions getting twisted by their use) [4].

* Company's PR point of view:

If you have DNT enabled in your browser settings, we will not collect the information that enables this feature, so you won’t see any tailored suggestions. We hope that our support of DNT highlights its importance as a privacy tool for consumers and creates even more interest and wider adoption across the web. https://blog.twitter.com/2012/new-tailored-suggestions-for-you-to-follow-on-...

* Company's FUD point of view:

http://whatisdnt.com/

* Company's ignoring point of view

http://arstechnica.com/information-technology/2014/05/yahoo-is-the-latest-co...

* Consumer's confused view

Consumers think that “Do not track” actually means “Do not track”.

As a consumer, you’d think that the meaning of “Do Not Track” is pretty clear. You’re making a polite request of the web sites and advertisers: “Don’t collect and store any information about me without my explicit permission.” http://www.zdnet.com/why-do-not-track-is-worse-than-a-miserable-failure-7000...

Have fun, Christian

[1] http://www.w3.org/TR/tracking-dnt/

[2] http://www.w3.org/TR/tracking-dnt/#other-protocols

[3] Like

Regardless of the tracking preference expressed, data MAY be collected and used for billing and auditing related to the current network interaction and concurrent transactions. This may include counting ad impressions to unique visitors, verifying positioning and quality of ad impressions and auditing compliance with this and other standards.

http://www.w3.org/2011/tracking-protection/drafts/tracking-compliance.html#f...

[4] Like

Tracking is the collection of data regarding a particular user's activity across multiple distinct contexts and the retention, use, or sharing of data derived from that activity outside the context in which it occurred. A context is a set of resources that are controlled by the same party or jointly controlled by a set of parties.

http://www.w3.org/TR/tracking-dnt/#terminology

Christian Aistleitner

2:18 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi,

On Thu, May 15, 2014 at 02:54:24PM -0700, Max Semenik wrote:

...

Does EL honor Do Not Track?

Due to controversies around the “Do Not Track” header, “honors” here is a difficult term for me. But currently EventLogging logs events that came with “DNT: 1” :-(

Best regards, Christian

P.S.: I am actively seeking community expectations around “Do Not Track” handling and general privacy expectations around WMF Analytics. So please do voice opinions. Be it here, private email, IRC or through some other means.

Luis Villa

1:44 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Fri, May 16, 2014 at 2:18 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:

...

Hi,

On Thu, May 15, 2014 at 02:54:24PM -0700, Max Semenik wrote:

...
Does EL honor Do Not Track?

Due to controversies around the “Do Not Track” header, “honors” here is a difficult term for me. But currently EventLogging logs events that came with “DNT: 1” :-(

Best regards, Christian

P.S.: I am actively seeking community expectations around “Do Not Track” handling and general privacy expectations around WMF Analytics. So please do voice opinions. Be it here, private email, IRC or through some other means.

When we evaluated the last spec draft (Jan/Feb?) "do not track" in the specification quite clearly and explicitly meant "do not allow tracking by *third parties*". So the tracking we do internally is permissible, whether or not DNT: 1 is set.

FAQ about it is here: https://meta.wikimedia.org/wiki/Privacy_policy/FAQ#What_are_Do_Not_Track_.28...

That said, a few weeks ago W3C published a last call draft, and I have not evaluated it yet, so things may have changed. (As late as December the draft defined neither "track" nor "third party", which was... frustrating.)

Luis

-- Luis Villa Deputy General Counsel Wikimedia Foundation 415.839.6885 ext. 6810 *This message may be confidential or legally privileged. If you have received it by accident, please delete it and let us know about the mistake. As an attorney for the Wikimedia Foundation, for legal/ethical reasons I cannot give legal advice to, or serve as a lawyer for, community members, volunteers, or staff members in their personal capacity. For more on what this means, please see our legal disclaimer https://meta.wikimedia.org/wiki/Wikimedia_Legal_Disclaimer.*

Christian Aistleitner

21 May 21 May

3:43 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi Luis,

On Fri, May 16, 2014 at 01:44:12PM -0700, Luis Villa wrote:

...

When we evaluated the last spec draft (Jan/Feb?) "do not track" in the specification quite clearly and explicitly meant "do not allow tracking by *third parties*". So the tracking we do internally is permissible, whether or not DNT: 1 is set.

According to the W3C draft document, I guess we should be fine.

But the W3C draft as it currently stands misses the people.

And I'd much rather see us matching people's expectations than W3C's (to which not many buy in around DNT).

I gave some citations in a parallel thread [1], but since in a privacy discussion today, there was the call for more official statements from higher body's, let me add a quote from Neelie Kroes, European Commissioner for Digital Agenda [2]:

DNT has a lot of potential because it can apply: First, to all networked devices and applications Second, to all types of tracking and Third, to all purposes of tracking.

That's a much broader DNT vision than W3C's. People more buy into this broader interpretation than W3C's.

And there are also other more technical and concrete interpretations of the DNT header. For example EFF's pretty new one used in their Privacy Badger:

https://www.eff.org/dnt-policy

...

As late as December the draft defined neither "track" nor "third party", which was... frustrating.

That sentiment to the W3C's DNT drafts is shared by many :-D Although meanwhile those definitions have been added, they do not help in meeting people's expectations.

Have fun, Christian

[1] http://lists.wikimedia.org/pipermail/analytics/2014-May/002052.html [2] http://europa.eu/rapid/press-release_SPEECH-11-461_en.htm

Christian Aistleitner

16 May 16 May

2:21 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi Aaron,

On Thu, May 15, 2014 at 04:49:21PM -0500, Aaron Halfaker wrote:

...

If we hashed user IDs, we'd not be able to compute statistics about the images that UW users uploaded or their other work.

Right.

I agree that such statistics would be interesting.

But not being able to compute such statistics is a good thing too, as—from my point of view—the OP's question did not call for such statistics.

...

Being able to correlate UW success with other work and the experience level of the editor seems like a clearly important thing.

Yes, it might seem like a clearly important thing for you. But for me, it is beyond the /current/ question from OP.

Of course, the question can be refined and details added upon need. But adding details that are /currently/ not needed and asked for is premature optimization.

Have fun, Christian

Gergo Tisza

15 May 15 May

3:19 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Thu, May 15, 2014 at 1:42 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:

...

tracking users through the site is evil. I'd prefer we do not do it.

I do not plan to add user id tracking until there is a specific need. (That specific need would be probably some sort of user cohorts and we can see at that time whether there is a more privacy-sensitive way of doing that.) For now, just a random token will do fine. Since UploadWizard is a single-page application (we do not need to track users across page loads, at least for a funnel analysis), we don't even need to put it into local storage.

Christian Aistleitner

16 May 16 May

2:20 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

Hi Gergo,

On Thu, May 15, 2014 at 03:19:26PM -0700, Gergo Tisza wrote:

...

I do not plan to add user id tracking until there is a specific need. [...] For now, just a random token will do fine.

I am glad to read that it is off of the plate at least for now. Thanks! That's really appreciated :-)

Have fun, Christian

Nuria Ruiz

15 May 15 May

4:09 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

...

The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1)

I would strongly advise against using cookies for this purpose. Cookies will easily get bloated if we set a precedence of using them to 'support' event logging metrics. Bloated cookies are a concern from both the performance and architectural stand point.

On Wed, May 14, 2014 at 2:58 PM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:

...

Hey guys,

Here's how I'd do it.

*Assumption:* Only logged-in users can start the UW funnel

*Schemas:*

UploadWizardStep

Stored when the user loads a new step of the Upload Wizard

user_id : int -- The user's identifier

flow_initialized : str -- The timestamp at which the current flow

through the funnel began (will need to be stored in a cookie and reset at loads of step 1)

step : int -- 1 - 4 of the UW process

UploadWizardRightsSelection

Stored when the user selects a "rights" option.

user_id : int -- The user's identifier

flow_initialized : str -- The timestamp at which the current flow

through the funnel began (will need to be stored in a cookie and reset at loads of step 1)

rights_selected : enum("own", "other) -- The rights that a user

selected (note that multiple selections actions can take place for a single flow)

I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.

Make sense?

-Aaron

On Wed, May 14, 2014 at 3:53 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...
On Wed, May 14, 2014 at 1:16 AM, Lars Aronsson lars@aronsson.se wrote:

...
Yes. But is this a given fact, or something that might change?

We do intend to change it. You can see out plans (in a somewhat undigested form) at http://etherpad.wikimedia.org/p/design-multimedia-uploader But usage metrics from the current interface can still be helpful for designing a different one.

So how much energy and resources are we

...
spending on making it slightly better, rather than designing something very different?

That's the million dollar question... given that small improvements will have instant effect (but is wasted time in the long run), while a big redesign will take several months (I am being optimistic here...), we will have to do some mix of the two, but exactly what mix that will be is an open question.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Gergo Tisza

4:51 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Wed, May 14, 2014 at 5:58 AM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:

...

Hey guys,

Here's how I'd do it.

*Assumption:* Only logged-in users can start the UW funnel

*Schemas:*

UploadWizardStep

Stored when the user loads a new step of the Upload Wizard

user_id : int -- The user's identifier

flow_initialized : str -- The timestamp at which the current flow

through the funnel began (will need to be stored in a cookie and reset at loads of step 1)

step : int -- 1 - 4 of the UW process

UploadWizardRightsSelection

Stored when the user selects a "rights" option.

user_id : int -- The user's identifier

flow_initialized : str -- The timestamp at which the current flow

through the funnel began (will need to be stored in a cookie and reset at loads of step 1)

rights_selected : enum("own", "other) -- The rights that a user

selected (note that multiple selections actions can take place for a single flow)

I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.

Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.

(will need to be stored in a cookie and reset at loads of step 1)

We don't even need this part since UploadWizard is a single-page application with no page load between the steps, so we can just store the token in memory. I don't want to log userids unless we really need them, so I'll just go with initial timestamp + random number. I don't think connecting separate upload attempts by the same user is particularly useful at this point.

Aaron Halfaker

6:51 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

One reason you may choose to record the a user_id in the future is to compare the flow for _new_ vs. _experienced_ editors/uploaders. Experienced users are likely to have substantially different behavior as they'll have had time to learn their way around UI quirks.

Either way, I'm glad to hear that your needs are met without including user_ids for now and I support your decision to not store them until they are needed.

-Aaron

On Thu, May 15, 2014 at 6:51 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Wed, May 14, 2014 at 5:58 AM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:

...
Hey guys,

Here's how I'd do it.

*Assumption:* Only logged-in users can start the UW funnel

*Schemas:*

UploadWizardStep

Stored when the user loads a new step of the Upload Wizard

user_id : int -- The user's identifier

flow_initialized : str -- The timestamp at which the current flow

through the funnel began (will need to be stored in a cookie and reset at loads of step 1)

step : int -- 1 - 4 of the UW process

UploadWizardRightsSelection

Stored when the user selects a "rights" option.

user_id : int -- The user's identifier

flow_initialized : str -- The timestamp at which the current flow

through the funnel began (will need to be stored in a cookie and reset at loads of step 1)

rights_selected : enum("own", "other) -- The rights that a user

selected (note that multiple selections actions can take place for a single flow)

I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.

Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.

(will need to be stored in a cookie and reset at loads of step 1)

We don't even need this part since UploadWizard is a single-page application with no page load between the steps, so we can just store the token in memory. I don't want to log userids unless we really need them, so I'll just go with initial timestamp + random number. I don't think connecting separate upload attempts by the same user is particularly useful at this point.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Gergo Tisza

18 May 18 May

10:17 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Thu, May 15, 2014 at 6:51 PM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:

...

One reason you may choose to record the a user_id in the future is to compare the flow for _new_ vs. _experienced_ editors/uploaders. Experienced users are likely to have substantially different behavior as they'll have had time to learn their way around UI quirks.

I was planning to use user_touched and/or user_editcount on server side to determine a cohort and then pass that via the makeGlobalVariablesScript hook to JS. That could be inconvenient since we cannot analyze past data, but have to wait for new data to be collected every time we define a now cohort. In my experience with MediaViewer, though, almost all our data-driven decisions were based on data which we collected with a specific purpose in mind. We collected lots of data with an "it will probably be good for something" mindset, and it turned out to be not so useful - whenever we wanted to use it to answer some specific question, it turned out that there were some small mistakes or inconsistencies which made it questionable, and which we would have surely catched had we set up the data collection with that specific question in mind. So I am not too worried about that.

Nuria Ruiz

16 May 16 May

1:10 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

...

Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.

So we agree you do not need to use cookies right? Being a single page app you should not need them. As you said you actually do not even need local storage.

...

I don't see how UserTiming is related. That API is about obtaining sub-millisecond precision - that is useful when you >are building a 3D rendering engine or similar extremely time-sensitive feature, but generally using regular millisecond >numbers is OK.

While the API is part of the performance spec it can be used for anything that needs to track 'workflow' and time across those steps. It's usefulness comes not only from precision when it comes to time but also from being able to track steps with a clear api.

...

...
what if the user comes back after a month?

If you use session storage the events disappear when the user closes the browser. That means we are biasing the stats towards successful users, as they are guaranteed to make another page load in >the same session, while unsuccessful ones might just leave for a while. Trying to compensate for that by making guesstimates, as you suggest, is a path I would rather not take.

Being a single page app you do not need session storage, of course. Also you should be able to report every step w/o issues.

On Fri, May 16, 2014 at 1:51 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Wed, May 14, 2014 at 5:58 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:

...
Hey guys,

Here's how I'd do it.

Assumption: Only logged-in users can start the UW funnel

Schemas:

UploadWizardStep

Stored when the user loads a new step of the Upload Wizard

user_id : int -- The user's identifier flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) step : int -- 1 - 4 of the UW process

UploadWizardRightsSelection

Stored when the user selects a "rights" option.

user_id : int -- The user's identifier flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) rights_selected : enum("own", "other) -- The rights that a user selected (note that multiple selections actions can take place for a single flow)

I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.

Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.

...
(will need to be stored in a cookie and reset at loads of step 1)

We don't even need this part since UploadWizard is a single-page application with no page load between the steps, so we can just store the token in memory. I don't want to log userids unless we really need them, so I'll just go with initial timestamp + random number. I don't think connecting separate upload attempts by the same user is particularly useful at this point.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Gergo Tisza

18 May 18 May

10:29 p.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

On Fri, May 16, 2014 at 1:10 AM, Nuria Ruiz nuria@wikimedia.org wrote:

...

So we agree you do not need to use cookies right? Being a single page app you should not need them. As you said you actually do not even need local storage.

Well, the one use case that is not covered by simply reporting everything as soon as it happens is Pau's request to track the time spent in failed conversion steps, from entering the step to closing the window/navigating away. To do that, we would need a leave event which is either saved when the user leaves (but an asynchronous request would be lost much of the time, and a synchronous one would decrease site performance), or store the event and replay it as soon as the user is on a wiki page again. I guess we can just use localstorage for that; with the flow_id multiple tabs is not a problem, and since we only lose the last event when the logging fails, not the whole event chain, we don't have to worry about the results becoming biased by non-localStorage-supporting browsers or infrequent users.

Nuria Ruiz

19 May 19 May

5:15 a.m.

New subject: [Multimedia] Using EventLogging for funnel analysis

...

To do that, we would need a leave event which is either saved when the user leaves (but an asynchronous request >would be lost much of the time,

Correct

...

and a synchronous one would decrease site performance),

Correct

...

or store the event and replay it as soon as the user is on a wiki page again. I guess we can just use localstorage for that

Very well, thank you. The least cookies connected to EL the better.

On Mon, May 19, 2014 at 7:29 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Fri, May 16, 2014 at 1:10 AM, Nuria Ruiz nuria@wikimedia.org wrote:

...
So we agree you do not need to use cookies right? Being a single page app you should not need them. As you said you actually do not even need local storage.

Well, the one use case that is not covered by simply reporting everything as soon as it happens is Pau's request to track the time spent in failed conversion steps, from entering the step to closing the window/navigating away. To do that, we would need a leave event which is either saved when the user leaves (but an asynchronous request would be lost much of the time, and a synchronous one would decrease site performance), or store the event and replay it as soon as the user is on a wiki page again. I guess we can just use localstorage for that; with the flow_id multiple tabs is not a problem, and since we only lose the last event when the logging fails, not the whole event chain, we don't have to worry about the results becoming biased by non-localStorage-supporting browsers or infrequent users.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

3858

Age (days ago)

3866

Last active (days ago)

analytics@lists.wikimedia.org

37 comments

12 participants

tags (0)

participants (12)

Aaron Halfaker
Christian Aistleitner
Dan Andreescu
Dario Taraborelli
Fabrice Florin
Gergo Tisza
Gilles Dubuc
Lars Aronsson
Luis Villa
Max Semenik
Nuria Ruiz
Pau Giner