One reason you may choose to record the a user_id in the future is to compare the flow for _new_ vs. _experienced_ editors/uploaders.  Experienced users are likely to have substantially different behavior as they'll have had time to learn their way around UI quirks. 

Either way, I'm glad to hear that your needs are met without including user_ids for now and I support your decision to not store them until they are needed. 

-Aaron 


On Thu, May 15, 2014 at 6:51 PM, Gergo Tisza <gtisza@wikimedia.org> wrote:
On Wed, May 14, 2014 at 5:58 AM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:
Hey guys,

Here's how I'd do it. 

Assumption: Only logged-in users can start the UW funnel

Schemas:

UploadWizardStep

Stored when the user loads a new step of the Upload Wizard
  • user_id : int -- The user's identifier
  • flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
  • step : int -- 1 - 4 of the UW process
UploadWizardRightsSelection

Stored when the user selects a "rights" option.
  • user_id : int -- The user's identifier
  • flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
  • rights_selected : enum("own", "other) -- The rights that a user selected (note that multiple selections actions can take place for a single flow)
I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow.  I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.  

Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.

(will need to be stored in a cookie and reset at loads of step 1)

We don't even need this part since UploadWizard is a single-page application with no page load between the steps, so we can just store the token in memory.
I don't want to log userids unless we really need them, so I'll just go with initial timestamp + random number. I don't think connecting separate upload attempts by the same user is particularly useful at this point. 

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics