Hi all,
the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.
The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.
Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.
Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).
So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.
I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.
One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that: - send the event log as a synchronous request from an unload event handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long. - store the event in cookies/localStorage, log it on the next page load. This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.
The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.
What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.
- send the event log as a synchronous request from an unload event handler.
That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.
set a random identifier (which only lives until the page is unloaded), and
add it to every event
That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.
On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.
The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.
Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.
Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).
So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.
I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.
One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:
- send the event log as a synchronous request from an unload event
handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.
- store the event in cookies/localStorage, log it on the next page load.
This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.
The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.
What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Hi Gergo,
The number of users that drop-off at each stage will be really useful. Would it be possible to get the information in such a way that we could also check how long each step takes? In that way we could get an idea of how much time on average a user spends on each step and in total, even if they succeeded in the process.
Pau
On Wed, May 14, 2014 at 9:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
- send the event log as a synchronous request from an unload event handler.
That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.
set a random identifier (which only lives until the page is unloaded), and
add it to every event
That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.
On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.
The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.
Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.
Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).
So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.
I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.
One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:
- send the event log as a synchronous request from an unload event
handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.
- store the event in cookies/localStorage, log it on the next page load.
This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.
The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.
What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
On Wed, May 14, 2014 at 12:47 AM, Pau Giner pginer@wikimedia.org wrote:
The number of users that drop-off at each stage will be really useful. Would it be possible to get the information in such a way that we could also check how long each step takes? In that way we could get an idea of how much time on average a user spends on each step and in total, even if they succeeded in the process.
Logging the time from a successful step to the next successful step is easy. Logging the time from a successful step to a failed step (i.e. the user leaving) is possible, but what Gilles said applies (we have to discard some browsers).
On Wed, May 14, 2014 at 12:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge.
As far as I can see, this can't be done with a simple GROUP BY: you would need logic like "from all records with the same sequence id which have an authorship_change field set, select the one with the latest timestamp". In SQL dialects supporting windowed/analytical expressions this is not bad, but in MySQL it would require some sort of self-join, I think.
[gergo] send the event log as a synchronous request from an unload event
handler.
[gilles]That works really well, I've done it before for autosaving
features. Obviously this only works if sampling users is >[gilles] enough (as opposed to measuring every single one), since it doesn't work on all browser
Please do not log synchronously, this would make the UI slower for all users in the logging sample. Not just a tad slower but potentially a *lot* slower. A network round trip for some of our users is >500 ms on the 50th percentile. We had a similar discussion with growth team and we agreed it was best to keep application state via localStorage:
See: https://bugzilla.wikimedia.org/show_bug.cgi?id=52287
There are several performance APIs to log application events. UserTimings is only present in newer browsers but with that and localStorage there are a lot of possibilities opening: http://www.html5rocks.com/en/tutorials/webperformance/usertiming/
On Wed, May 14, 2014 at 9:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
- send the event log as a synchronous request from an unload event handler.
That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.
set a random identifier (which only lives until the page is unloaded), and
add it to every event
That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.
On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.
The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.
Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.
Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).
So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.
I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.
One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:
- send the event log as a synchronous request from an unload event
handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.
- store the event in cookies/localStorage, log it on the next page load.
This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.
The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.
What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
[gergo] send the event log as a synchronous request from an unload event handler.
[giles]That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers
Please avoid logging synchronously, this will make the UI slower for all users that are part of the logging sample. And not just a tad slower, potentially it could be much slower. For some of our users a network roundtrip is over 500 ms at the 50th percentile. So you can potentially block the UI for a long time.
We had a similar discussion with growth team regarding synchronous logging. You can see (a lot) of details here: https://bugzilla.wikimedia.org/show_bug.cgi?id=52287
We decided to switch to a localStorage based solution. In your case I think with UserTimings and sessionStorage you could get you the data you need. Support for storage is broad: http://caniuse.com/#feat=namevalue-storage, support for user timings less so but you get chrome and IE and that is a big percentage of user base: http://caniuse.com/#feat=user-timing
[gergo] store the event in cookies/localStorage, log it on the next page load. This works in all browsers but it is less reliable
I do not think so, clearing some concerns:
Probably runs int all sorts of complications with multiple tabs.
This should not be a concern, as the page visibility API tells you whether the tab is actually visible. You can restrict user timings logging and event logging reporting according to visibility so they only happen when user is interacting with the page.
what if the user comes back after a month?
If you use session storage the events disappear when the user closes the browser.
store the event in cookies/localStorage, log it on the next page load
Actually you can store the 'transition' in sessionStorage and use regular polling to report it. You do not necessarily need to report the transition from the next page. That being said you are right that the "last" step might be under-reported as user might leave the page. Now, we can analyze the data keeping this in mind. We can even 'estimate' how much are we underreporting the last step the user did.
On Wed, May 14, 2014 at 9:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
- send the event log as a synchronous request from an unload event
handler.
That works really well, I've done it before for autosaving features. Obviously this only works if sampling users is enough (as opposed to measuring every single one), since it doesn't work on all browsers.
set a random identifier (which only lives until the page is unloaded), and add it to every event
That sounds perfectly fine. Ops can add indexes to the EventLogging tables for us, SQL queries grouping by that column should pose no challenge. That sounds like the simplest and most universal option.
On Wed, May 14, 2014 at 1:54 AM, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard. UW has a "checkout" structure (step 1: put files in basket, step 2: choose license, step 3: add description, step 4: you are done), so a funnel analysis to identify which step causes the most users to abort the upload process and why seems like a good approach. I'm trying to understand how well the existing EventLogging infrastructure supports this.
The problem is how to get information about the actions of users who fell out of the funnel. I'll try to illustrate with an example: in one of the steps, the user can choose between "I am uploading my own work" and "I am uploading someone else's work" and the resulting interaction will be quite different. We would like to know whether that choice has a big effect on the likeliness of the user making it to the next step.
Using EventLogging, I can count the number of users who make it until that step. I can count the number of users making it to the next step. I can count the number of users choosing this or that author option. These numbers do not tell us much on their own, though; the interesting information would be how they are correlated.
Another thing I could do is creating a schema which includes both the choice of author option and the step number; when the user chooses "own work", we log an ownwork event, when they click "next step", we log a step(step=3, work=own) event. We can then calculate the number of users who did choose "own work" but did not make it to the next step as the difference of the two. But this won't work: "own work" is a radio button, the user select and deselect it any number of times before proceeding to the next step (or leaving the page).
So what we are trying to log are not really events but application states that describe users who are successful vs. unsuccessful in the given step.
I thought of two ways of dealing with this; any feedback on the plausibility of these or possible alternatives would be highly appreciated.
One would be to have a "step X succeeded" and a "step X failed" event (the schema for which could include all sorts of state, such as which authorship option was selected). This would require the ability to log an event when the user leaves the page. I see two ways two do that:
- send the event log as a synchronous request from an unload event
handler. This is not supported on ancient browsers; also, there is probably some mechanism in most browsers to kill an unload event handler if it takes long.
- store the event in cookies/localStorage, log it on the next page load.
This works in all browsers but it is less reliable (what if the user never comes back?) and logs the event for a different page load from where it actually occurred (what if the user comes back after a month?), and probably runs int all sorts of complications with multiple tabs.
The other way could be to log event chains: set a random identifier (which only lives until the page is unloaded), and add it to every event. Event groups can then be merged into meta-events by SQL magic, although that looks like it will be extremely painful to do. On the other hand, this is much more generic than the previous method, and could be used to answer more complex questions.
What do you think? Which would be the method I am not shooting myself in the foot with? Currently I am leaning towards using unload handlers.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Thu, May 15, 2014 at 3:56 AM, Nuria Ruiz nuria@wikimedia.org wrote:
We decided to switch to a localStorage based solution. In your case I think with UserTimings and sessionStorage you could get you the data you need. Support for storage is broad: http://caniuse.com/#feat=namevalue-storage, support for user timings less so but you get chrome and IE and that is a big percentage of user base: http://caniuse.com/#feat=user-timing
I don't see how UserTiming is related. That API is about obtaining sub-millisecond precision - that is useful when you are building a 3D rendering engine or similar extremely time-sensitive feature, but generally using regular millisecond numbers is OK.
what if the user comes back after a month? If you use session storage the events disappear when the user closes the browser.
That means we are biasing the stats towards successful users, as they are guaranteed to make another page load in the same session, while unsuccessful ones might just leave for a while. Trying to compensate for that by making guesstimates, as you suggest, is a path I would rather not take.
On 05/14/2014 01:54 AM, Gergo Tisza wrote:
the Multimedia team is preparing to collect data to better understand usability problems with UploadWizard.
Good.
UW has a "checkout" structure
Yes. But is this a given fact, or something that might change? It feels very monolithic, very un-wiki, very un-collaborative. Sometimes I want to unload my camera, so I can go out and capture more images, but I'm not ready to do all the detailed categorization. And perhaps others can help with that work anyway.
The UploadWizard is still better than the old upload form. But it's not entirely "wiki" (quick, collaborative, publish first, edit later). So how much energy and resources are we spending on making it slightly better, rather than designing something very different?
Some of the UploadWizard's convenient operations, like renaming or categorizing or editing the description of a whole group of images would be very nice to have after the upload. It would be similar to running a bot for recategorization, but built into the wiki interface. If you separate these operations from the upload, the upload would shrink to "unloading the camera", and you would have fewer interrupted/abandoned uploads.
On Wed, May 14, 2014 at 1:16 AM, Lars Aronsson lars@aronsson.se wrote:
Yes. But is this a given fact, or something that might change?
We do intend to change it. You can see out plans (in a somewhat undigested form) at http://etherpad.wikimedia.org/p/design-multimedia-uploader But usage metrics from the current interface can still be helpful for designing a different one.
So how much energy and resources are we
spending on making it slightly better, rather than designing something very different?
That's the million dollar question... given that small improvements will have instant effect (but is wasted time in the long run), while a big redesign will take several months (I am being optimistic here...), we will have to do some mix of the two, but exactly what mix that will be is an open question.
Hey guys,
Here's how I'd do it.
*Assumption:* Only logged-in users can start the UW funnel
*Schemas:*
UploadWizardStep
Stored when the user loads a new step of the Upload Wizard
- user_id : int -- The user's identifier - flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) - step : int -- 1 - 4 of the UW process
UploadWizardRightsSelection
Stored when the user selects a "rights" option.
- user_id : int -- The user's identifier - flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) - rights_selected : enum("own", "other) -- The rights that a user selected (note that multiple selections actions can take place for a single flow)
I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.
Make sense?
-Aaron
On Wed, May 14, 2014 at 3:53 AM, Gergo Tisza gtisza@wikimedia.org wrote:
On Wed, May 14, 2014 at 1:16 AM, Lars Aronsson lars@aronsson.se wrote:
Yes. But is this a given fact, or something that might change?
We do intend to change it. You can see out plans (in a somewhat undigested form) at http://etherpad.wikimedia.org/p/design-multimedia-uploader But usage metrics from the current interface can still be helpful for designing a different one.
So how much energy and resources are we
spending on making it slightly better, rather than designing something very different?
That's the million dollar question... given that small improvements will have instant effect (but is wasted time in the long run), while a big redesign will take several months (I am being optimistic here...), we will have to do some mix of the two, but exactly what mix that will be is an open question.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Aaron,
Thanks so much for your good advice!
The approach you propose below makes good sense to me.
For now, I have added it in the notes section of our Mingle ticket for the data collection:
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/305/edit...
Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?
Thanks again for your helpful insights :)
Fabrice
On May 14, 2014, at 5:58 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Hey guys,
Here's how I'd do it.
Assumption: Only logged-in users can start the UW funnel
Schemas:
UploadWizardStep
Stored when the user loads a new step of the Upload Wizard user_id : int -- The user's identifier flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) step : int -- 1 - 4 of the UW process UploadWizardRightsSelection
Stored when the user selects a "rights" option. user_id : int -- The user's identifier flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) rights_selected : enum("own", "other) -- The rights that a user selected (note that multiple selections actions can take place for a single flow) I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.
Make sense?
-Aaron
On Wed, May 14, 2014 at 3:53 AM, Gergo Tisza gtisza@wikimedia.org wrote: On Wed, May 14, 2014 at 1:16 AM, Lars Aronsson lars@aronsson.se wrote: Yes. But is this a given fact, or something that might change?
We do intend to change it. You can see out plans (in a somewhat undigested form) at http://etherpad.wikimedia.org/p/design-multimedia-uploader But usage metrics from the current interface can still be helpful for designing a different one.
So how much energy and resources are we spending on making it slightly better, rather than designing something very different?
That's the million dollar question... given that small improvements will have instant effect (but is wasted time in the long run), while a big redesign will take several months (I am being optimistic here...), we will have to do some mix of the two, but exactly what mix that will be is an open question.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
_______________________________
Fabrice Florin Product Manager Wikimedia Foundation
Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?
Limn doesn't have many graph types but it might have a decent one for this purpose: http://debugging.wmflabs.org/graphs/ordinal_example
So you could have Step 1, Step 2, Step 3, ... on your X axis, then the number of people who made it through each step on your Y axis.
For a better funnel visualization, we should really support Sankey diagrams: http://www.practicalecommerce.com/files/images/0005/0938/2_goalFlow_Large.pn...
On May 14, 2014, at 8:56 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?
Limn doesn't have many graph types but it might have a decent one for this purpose: http://debugging.wmflabs.org/graphs/ordinal_example
So you could have Step 1, Step 2, Step 3, ... on your X axis, then the number of people who made it through each step on your Y axis.
For a better funnel visualization, we should really support Sankey diagrams: http://www.practicalecommerce.com/files/images/0005/0938/2_goalFlow_Large.pn...
+1, this might be overkill for a linear funnel with a small number of nodes (where a directed graph visualization, a line chart or an actual “funnel” [1] could do the job). I agree Sankey diagrams would be nice to have down the line for more complex scenarios.
Dario
[1] https://github.com/smilli/d3-funnel-charts [2] The d3 implementation: http://bost.ocks.org/mike/sankey/
Muchas gracias, Dario :)
Can you clarify what you mean by ‘directed graph visualization’, to make sure we’re on the same page?
It seems like the bar graph recommended by Dan would be a good start, but we might also consider another simple line graph to see how these patterns change over time.
I added your suggestions and links into a Notes section for the funnel dashboard ticket #541, though I will most likely be split into a couple smaller tickets, like #305.
We’re hoping to have some of this data next week, in time for our Upload Wizard planning meeting on Thursday — wish us luck :)
Onward!
Fabrice
On May 14, 2014, at 9:45 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
On May 14, 2014, at 8:56 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?
Limn doesn't have many graph types but it might have a decent one for this purpose: http://debugging.wmflabs.org/graphs/ordinal_example
So you could have Step 1, Step 2, Step 3, ... on your X axis, then the number of people who made it through each step on your Y axis.
For a better funnel visualization, we should really support Sankey diagrams: http://www.practicalecommerce.com/files/images/0005/0938/2_goalFlow_Large.pn...
+1, this might be overkill for a linear funnel with a small number of nodes (where a directed graph visualization, a line chart or an actual “funnel” [1] could do the job). I agree Sankey diagrams would be nice to have down the line for more complex scenarios.
Dario
[1] https://github.com/smilli/d3-funnel-charts [2] The d3 implementation: http://bost.ocks.org/mike/sankey/
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
_______________________________
Fabrice Florin Product Manager Wikimedia Foundation
On May 14, 2014, at 11:17 AM, Fabrice Florin fflorin@wikimedia.org wrote:
Can you clarify what you mean by ‘directed graph visualization’, to make sure we’re on the same page?
https://twitter.com/WikiResearch/status/449252646063329280
you can imagine using a similar visualization with nodes as individual steps in the funnel and node size scaled to represent the absolute number of users at each step.
Dario
Thanks, Dario, much appreciated!
Are there any LIMN graphs that could be adapted to create directed graphs like these — or is this something we would do manually?
Also, we now plan to count the number of clicks on the main buttons for each step, not the number of unique users. We got the impression that unique users may be harder to count with our limited resources.
Does this plan seem reasonable for now? Or would you recommend a different approach?
Fabrice
On May 14, 2014, at 11:25 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
On May 14, 2014, at 11:17 AM, Fabrice Florin fflorin@wikimedia.org wrote:
Can you clarify what you mean by ‘directed graph visualization’, to make sure we’re on the same page?
https://twitter.com/WikiResearch/status/449252646063329280
you can imagine using a similar visualization with nodes as individual steps in the funnel and node size scaled to represent the absolute number of users at each step.
Dario _______________________________________________ Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
_______________________________
Fabrice Florin Product Manager Wikimedia Foundation
Thanks, Dan, this is really helpful!
I added your recommendations to this ticket for the visualization of our funnel metrics:
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/541
And I agree that a Sankey diagram would be wonderful for our purposes — even if we have to create it manually for now. Any recommended tools that would be easy to use and would let us copy and paste our metrics data to quickly create such a diagram?
Much appreciated,
Fabrice
On May 14, 2014, at 8:56 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph as we do for other descriptive metrics studies?
Limn doesn't have many graph types but it might have a decent one for this purpose: http://debugging.wmflabs.org/graphs/ordinal_example
So you could have Step 1, Step 2, Step 3, ... on your X axis, then the number of people who made it through each step on your Y axis.
For a better funnel visualization, we should really support Sankey diagrams: http://www.practicalecommerce.com/files/images/0005/0938/2_goalFlow_Large.pn... _______________________________________________ Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
_______________________________
Fabrice Florin Product Manager Wikimedia Foundation
Hi,
tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.
Some less general remarks are inline below.
On Wed, May 14, 2014 at 07:58:58AM -0500, Aaron Halfaker wrote:
UploadWizardStep [...]
- user_id : int -- The user's identifier
[...] UploadWizardRightsSelection [...]
- user_id : int -- The user's identifier
Are those "user_id"s meant to be arbitrary but (for a given funnel) constant numbers, or are they meant as values of the user_id column of the user table?
If the latter is the case, is that information really needed? We could instead use a hash, which we could seed by the field
- flow_initialized : str -- The timestamp at which the current flow
through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
(which got suggested for both schemas and is constant).
Thereby, we could still analyze funnels (which is a separate issue, and needs a separate discussion), but we need not store unneeded data.
Best regards, Christian
Hi,
On Thu, May 15, 2014 at 10:42:22AM +0200, Christian Aistleitner wrote:
tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.
it seems that the above passage caused confusion internally in the Analytics team.
So just to avoid doubt, let me clarify publicly that by the above “I”, I really only mean me. Not the whole of the Analytics team. :-)
Have fun, Christian
P.S.: This clarification does of course not lessen my call about hashing user_ids.
If we hashed user IDs, we'd not be able to compute statistics about the images that UW users uploaded or their other work. Being able to correlate UW success with other work and the experience level of the editor seems like a clearly important thing. Also, storing a user ID is probably the least invasive way to track identity. It's a standard practice.
-Aaron
On Thu, May 15, 2014 at 4:38 PM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
On Thu, May 15, 2014 at 10:42:22AM +0200, Christian Aistleitner wrote:
tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.
it seems that the above passage caused confusion internally in the Analytics team.
So just to avoid doubt, let me clarify publicly that by the above “I”, I really only mean me. Not the whole of the Analytics team. :-)
Have fun, Christian
P.S.: This clarification does of course not lessen my call about hashing user_ids.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Does EL honor Do Not Track?
On Thu, May 15, 2014 at 2:49 PM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:
If we hashed user IDs, we'd not be able to compute statistics about the images that UW users uploaded or their other work. Being able to correlate UW success with other work and the experience level of the editor seems like a clearly important thing. Also, storing a user ID is probably the least invasive way to track identity. It's a standard practice.
-Aaron
On Thu, May 15, 2014 at 4:38 PM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
On Thu, May 15, 2014 at 10:42:22AM +0200, Christian Aistleitner wrote:
tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.
it seems that the above passage caused confusion internally in the Analytics team.
So just to avoid doubt, let me clarify publicly that by the above “I”, I really only mean me. Not the whole of the Analytics team. :-)
Have fun, Christian
P.S.: This clarification does of course not lessen my call about hashing user_ids.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
If I'm understanding correctly, Do Not Track is about tracking cookies that track activities between websites. Is that right?
FWIW, the user_id that I propose storing is not a cookie.
-Aaron
On Thu, May 15, 2014 at 4:54 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Does EL honor Do Not Track?
On Thu, May 15, 2014 at 2:49 PM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:
If we hashed user IDs, we'd not be able to compute statistics about the images that UW users uploaded or their other work. Being able to correlate UW success with other work and the experience level of the editor seems like a clearly important thing. Also, storing a user ID is probably the least invasive way to track identity. It's a standard practice.
-Aaron
On Thu, May 15, 2014 at 4:38 PM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
On Thu, May 15, 2014 at 10:42:22AM +0200, Christian Aistleitner wrote:
tracking users through the site is evil. I'd prefer we do not do it. Especially since users currently have no way to choose between getting tracked and not getting tracked.
it seems that the above passage caused confusion internally in the Analytics team.
So just to avoid doubt, let me clarify publicly that by the above “I”, I really only mean me. Not the whole of the Analytics team. :-)
Have fun, Christian
P.S.: This clarification does of course not lessen my call about hashing user_ids.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Aaron,
On Thu, May 15, 2014 at 04:56:38PM -0500, Aaron Halfaker wrote:
If I'm understanding correctly, Do Not Track is about tracking cookies that track activities between websites. Is that right?
That's a really hard question.
Yes and no. There is no standard yet. And people generally have different understandings of the header, its definition and its intention, and purpose.
(If you care to read only one of the below items, read the last one titled “Consumer's confused view”)
* W3C's Tracking Protection Working Group's point of view
W3C's current work in progress “Tracking Preference Expression (DNT)” document [1] does not limit to “cookies”, or “tracking cookies”. The document explicitly states that the intention is “general, regardless of protocols” [2].
Also, the boundary is not between websites, but on an organisational level. If an entity is first party on SiteA, and SiteB, different rules^Wrecommendations apply than if the same party were first party on SiteA, and third party on SiteB.
In general, the Do Not Track header is more geared towards targeted advertizing than analytics. It's been ridiculed by some to be “Do Not Target” instead of “Do Not Track”.
And it comes with so many exceptions [3] and vague definitions (or definitions getting twisted by their use) [4].
* Company's PR point of view:
If you have DNT enabled in your browser settings, we will not collect the information that enables this feature, so you won’t see any tailored suggestions. We hope that our support of DNT highlights its importance as a privacy tool for consumers and creates even more interest and wider adoption across the web. https://blog.twitter.com/2012/new-tailored-suggestions-for-you-to-follow-on-...
* Company's FUD point of view:
* Company's ignoring point of view
http://arstechnica.com/information-technology/2014/05/yahoo-is-the-latest-co...
* Consumer's confused view
Consumers think that “Do not track” actually means “Do not track”.
As a consumer, you’d think that the meaning of “Do Not Track” is pretty clear. You’re making a polite request of the web sites and advertisers: “Don’t collect and store any information about me without my explicit permission.” http://www.zdnet.com/why-do-not-track-is-worse-than-a-miserable-failure-7000...
Have fun, Christian
[1] http://www.w3.org/TR/tracking-dnt/
[2] http://www.w3.org/TR/tracking-dnt/#other-protocols
[3] Like
Regardless of the tracking preference expressed, data MAY be collected and used for billing and auditing related to the current network interaction and concurrent transactions. This may include counting ad impressions to unique visitors, verifying positioning and quality of ad impressions and auditing compliance with this and other standards.
http://www.w3.org/2011/tracking-protection/drafts/tracking-compliance.html#f...
[4] Like
Tracking is the collection of data regarding a particular user's activity across multiple distinct contexts and the retention, use, or sharing of data derived from that activity outside the context in which it occurred. A context is a set of resources that are controlled by the same party or jointly controlled by a set of parties.
http://www.w3.org/TR/tracking-dnt/#terminology
Hi,
On Thu, May 15, 2014 at 02:54:24PM -0700, Max Semenik wrote:
Does EL honor Do Not Track?
Due to controversies around the “Do Not Track” header, “honors” here is a difficult term for me. But currently EventLogging logs events that came with “DNT: 1” :-(
Best regards, Christian
P.S.: I am actively seeking community expectations around “Do Not Track” handling and general privacy expectations around WMF Analytics. So please do voice opinions. Be it here, private email, IRC or through some other means.
On Fri, May 16, 2014 at 2:18 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
On Thu, May 15, 2014 at 02:54:24PM -0700, Max Semenik wrote:
Does EL honor Do Not Track?
Due to controversies around the “Do Not Track” header, “honors” here is a difficult term for me. But currently EventLogging logs events that came with “DNT: 1” :-(
Best regards, Christian
P.S.: I am actively seeking community expectations around “Do Not Track” handling and general privacy expectations around WMF Analytics. So please do voice opinions. Be it here, private email, IRC or through some other means.
When we evaluated the last spec draft (Jan/Feb?) "do not track" in the specification quite clearly and explicitly meant "do not allow tracking by *third parties*". So the tracking we do internally is permissible, whether or not DNT: 1 is set.
FAQ about it is here: https://meta.wikimedia.org/wiki/Privacy_policy/FAQ#What_are_Do_Not_Track_.28...
That said, a few weeks ago W3C published a last call draft, and I have not evaluated it yet, so things may have changed. (As late as December the draft defined neither "track" nor "third party", which was... frustrating.)
Luis
Hi Luis,
On Fri, May 16, 2014 at 01:44:12PM -0700, Luis Villa wrote:
When we evaluated the last spec draft (Jan/Feb?) "do not track" in the specification quite clearly and explicitly meant "do not allow tracking by *third parties*". So the tracking we do internally is permissible, whether or not DNT: 1 is set.
According to the W3C draft document, I guess we should be fine.
But the W3C draft as it currently stands misses the people.
And I'd much rather see us matching people's expectations than W3C's (to which not many buy in around DNT).
I gave some citations in a parallel thread [1], but since in a privacy discussion today, there was the call for more official statements from higher body's, let me add a quote from Neelie Kroes, European Commissioner for Digital Agenda [2]:
DNT has a lot of potential because it can apply: First, to all networked devices and applications Second, to all types of tracking and Third, to all purposes of tracking.
That's a much broader DNT vision than W3C's. People more buy into this broader interpretation than W3C's.
And there are also other more technical and concrete interpretations of the DNT header. For example EFF's pretty new one used in their Privacy Badger:
https://www.eff.org/dnt-policy
As late as December the draft defined neither "track" nor "third party", which was... frustrating.
That sentiment to the W3C's DNT drafts is shared by many :-D Although meanwhile those definitions have been added, they do not help in meeting people's expectations.
Have fun, Christian
[1] http://lists.wikimedia.org/pipermail/analytics/2014-May/002052.html [2] http://europa.eu/rapid/press-release_SPEECH-11-461_en.htm
Hi Aaron,
On Thu, May 15, 2014 at 04:49:21PM -0500, Aaron Halfaker wrote:
If we hashed user IDs, we'd not be able to compute statistics about the images that UW users uploaded or their other work.
Right.
I agree that such statistics would be interesting.
But not being able to compute such statistics is a good thing too, as—from my point of view—the OP's question did not call for such statistics.
Being able to correlate UW success with other work and the experience level of the editor seems like a clearly important thing.
Yes, it might seem like a clearly important thing for you. But for me, it is beyond the /current/ question from OP.
Of course, the question can be refined and details added upon need. But adding details that are /currently/ not needed and asked for is premature optimization.
Have fun, Christian
On Thu, May 15, 2014 at 1:42 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
tracking users through the site is evil. I'd prefer we do not do it.
I do not plan to add user id tracking until there is a specific need. (That specific need would be probably some sort of user cohorts and we can see at that time whether there is a more privacy-sensitive way of doing that.) For now, just a random token will do fine. Since UploadWizard is a single-page application (we do not need to track users across page loads, at least for a funnel analysis), we don't even need to put it into local storage.
Hi Gergo,
On Thu, May 15, 2014 at 03:19:26PM -0700, Gergo Tisza wrote:
I do not plan to add user id tracking until there is a specific need. [...] For now, just a random token will do fine.
I am glad to read that it is off of the plate at least for now. Thanks! That's really appreciated :-)
Have fun, Christian
The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
I would strongly advise against using cookies for this purpose. Cookies will easily get bloated if we set a precedence of using them to 'support' event logging metrics. Bloated cookies are a concern from both the performance and architectural stand point.
On Wed, May 14, 2014 at 2:58 PM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:
Hey guys,
Here's how I'd do it.
*Assumption:* Only logged-in users can start the UW funnel
*Schemas:*
UploadWizardStep
Stored when the user loads a new step of the Upload Wizard
- user_id : int -- The user's identifier
- flow_initialized : str -- The timestamp at which the current flow
through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
- step : int -- 1 - 4 of the UW process
UploadWizardRightsSelection
Stored when the user selects a "rights" option.
- user_id : int -- The user's identifier
- flow_initialized : str -- The timestamp at which the current flow
through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
- rights_selected : enum("own", "other) -- The rights that a user
selected (note that multiple selections actions can take place for a single flow)
I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.
Make sense?
-Aaron
On Wed, May 14, 2014 at 3:53 AM, Gergo Tisza gtisza@wikimedia.org wrote:
On Wed, May 14, 2014 at 1:16 AM, Lars Aronsson lars@aronsson.se wrote:
Yes. But is this a given fact, or something that might change?
We do intend to change it. You can see out plans (in a somewhat undigested form) at http://etherpad.wikimedia.org/p/design-multimedia-uploader But usage metrics from the current interface can still be helpful for designing a different one.
So how much energy and resources are we
spending on making it slightly better, rather than designing something very different?
That's the million dollar question... given that small improvements will have instant effect (but is wasted time in the long run), while a big redesign will take several months (I am being optimistic here...), we will have to do some mix of the two, but exactly what mix that will be is an open question.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Wed, May 14, 2014 at 5:58 AM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:
Hey guys,
Here's how I'd do it.
*Assumption:* Only logged-in users can start the UW funnel
*Schemas:*
UploadWizardStep
Stored when the user loads a new step of the Upload Wizard
- user_id : int -- The user's identifier
- flow_initialized : str -- The timestamp at which the current flow
through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
- step : int -- 1 - 4 of the UW process
UploadWizardRightsSelection
Stored when the user selects a "rights" option.
- user_id : int -- The user's identifier
- flow_initialized : str -- The timestamp at which the current flow
through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
- rights_selected : enum("own", "other) -- The rights that a user
selected (note that multiple selections actions can take place for a single flow)
I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.
Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.
(will need to be stored in a cookie and reset at loads of step 1)
We don't even need this part since UploadWizard is a single-page application with no page load between the steps, so we can just store the token in memory. I don't want to log userids unless we really need them, so I'll just go with initial timestamp + random number. I don't think connecting separate upload attempts by the same user is particularly useful at this point.
One reason you may choose to record the a user_id in the future is to compare the flow for _new_ vs. _experienced_ editors/uploaders. Experienced users are likely to have substantially different behavior as they'll have had time to learn their way around UI quirks.
Either way, I'm glad to hear that your needs are met without including user_ids for now and I support your decision to not store them until they are needed.
-Aaron
On Thu, May 15, 2014 at 6:51 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Wed, May 14, 2014 at 5:58 AM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:
Hey guys,
Here's how I'd do it.
*Assumption:* Only logged-in users can start the UW funnel
*Schemas:*
UploadWizardStep
Stored when the user loads a new step of the Upload Wizard
- user_id : int -- The user's identifier
- flow_initialized : str -- The timestamp at which the current flow
through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
- step : int -- 1 - 4 of the UW process
UploadWizardRightsSelection
Stored when the user selects a "rights" option.
- user_id : int -- The user's identifier
- flow_initialized : str -- The timestamp at which the current flow
through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
- rights_selected : enum("own", "other) -- The rights that a user
selected (note that multiple selections actions can take place for a single flow)
I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.
Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.
(will need to be stored in a cookie and reset at loads of step 1)
We don't even need this part since UploadWizard is a single-page application with no page load between the steps, so we can just store the token in memory. I don't want to log userids unless we really need them, so I'll just go with initial timestamp + random number. I don't think connecting separate upload attempts by the same user is particularly useful at this point.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Thu, May 15, 2014 at 6:51 PM, Aaron Halfaker ahalfaker@wikimedia.orgwrote:
One reason you may choose to record the a user_id in the future is to compare the flow for _new_ vs. _experienced_ editors/uploaders. Experienced users are likely to have substantially different behavior as they'll have had time to learn their way around UI quirks.
I was planning to use user_touched and/or user_editcount on server side to determine a cohort and then pass that via the makeGlobalVariablesScript hook to JS. That could be inconvenient since we cannot analyze past data, but have to wait for new data to be collected every time we define a now cohort. In my experience with MediaViewer, though, almost all our data-driven decisions were based on data which we collected with a specific purpose in mind. We collected lots of data with an "it will probably be good for something" mindset, and it turned out to be not so useful - whenever we wanted to use it to answer some specific question, it turned out that there were some small mistakes or inconsistencies which made it questionable, and which we would have surely catched had we set up the data collection with that specific question in mind. So I am not too worried about that.
Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.
So we agree you do not need to use cookies right? Being a single page app you should not need them. As you said you actually do not even need local storage.
I don't see how UserTiming is related. That API is about obtaining sub-millisecond precision - that is useful when you >are building a 3D rendering engine or similar extremely time-sensitive feature, but generally using regular millisecond >numbers is OK.
While the API is part of the performance spec it can be used for anything that needs to track 'workflow' and time across those steps. It's usefulness comes not only from precision when it comes to time but also from being able to track steps with a clear api.
what if the user comes back after a month?
If you use session storage the events disappear when the user closes the browser. That means we are biasing the stats towards successful users, as they are guaranteed to make another page load in >the same session, while unsuccessful ones might just leave for a while. Trying to compensate for that by making guesstimates, as you suggest, is a path I would rather not take.
Being a single page app you do not need session storage, of course. Also you should be able to report every step w/o issues.
On Fri, May 16, 2014 at 1:51 AM, Gergo Tisza gtisza@wikimedia.org wrote:
On Wed, May 14, 2014 at 5:58 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Hey guys,
Here's how I'd do it.
Assumption: Only logged-in users can start the UW funnel
Schemas:
UploadWizardStep
Stored when the user loads a new step of the Upload Wizard
user_id : int -- The user's identifier flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) step : int -- 1 - 4 of the UW process
UploadWizardRightsSelection
Stored when the user selects a "rights" option.
user_id : int -- The user's identifier flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) rights_selected : enum("own", "other) -- The rights that a user selected (note that multiple selections actions can take place for a single flow)
I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow. I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.
Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.
(will need to be stored in a cookie and reset at loads of step 1)
We don't even need this part since UploadWizard is a single-page application with no page load between the steps, so we can just store the token in memory. I don't want to log userids unless we really need them, so I'll just go with initial timestamp + random number. I don't think connecting separate upload attempts by the same user is particularly useful at this point.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Fri, May 16, 2014 at 1:10 AM, Nuria Ruiz nuria@wikimedia.org wrote:
So we agree you do not need to use cookies right? Being a single page app you should not need them. As you said you actually do not even need local storage.
Well, the one use case that is not covered by simply reporting everything as soon as it happens is Pau's request to track the time spent in failed conversion steps, from entering the step to closing the window/navigating away. To do that, we would need a leave event which is either saved when the user leaves (but an asynchronous request would be lost much of the time, and a synchronous one would decrease site performance), or store the event and replay it as soon as the user is on a wiki page again. I guess we can just use localstorage for that; with the flow_id multiple tabs is not a problem, and since we only lose the last event when the logging fails, not the whole event chain, we don't have to worry about the results becoming biased by non-localStorage-supporting browsers or infrequent users.
To do that, we would need a leave event which is either saved when the user leaves (but an asynchronous request >would be lost much of the time,
Correct
and a synchronous one would decrease site performance),
Correct
or store the event and replay it as soon as the user is on a wiki page again. I guess we can just use localstorage for that
Very well, thank you. The least cookies connected to EL the better.
On Mon, May 19, 2014 at 7:29 AM, Gergo Tisza gtisza@wikimedia.org wrote:
On Fri, May 16, 2014 at 1:10 AM, Nuria Ruiz nuria@wikimedia.org wrote:
So we agree you do not need to use cookies right? Being a single page app you should not need them. As you said you actually do not even need local storage.
Well, the one use case that is not covered by simply reporting everything as soon as it happens is Pau's request to track the time spent in failed conversion steps, from entering the step to closing the window/navigating away. To do that, we would need a leave event which is either saved when the user leaves (but an asynchronous request would be lost much of the time, and a synchronous one would decrease site performance), or store the event and replay it as soon as the user is on a wiki page again. I guess we can just use localstorage for that; with the flow_id multiple tabs is not a problem, and since we only lose the last event when the logging fails, not the whole event chain, we don't have to worry about the results becoming biased by non-localStorage-supporting browsers or infrequent users.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics