On Wed, May 14, 2014 at 5:58 AM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:
Hey guys,

Here's how I'd do it. 

Assumption: Only logged-in users can start the UW funnel

Schemas:

UploadWizardStep

Stored when the user loads a new step of the Upload Wizard
  • user_id : int -- The user's identifier
  • flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
  • step : int -- 1 - 4 of the UW process
UploadWizardRightsSelection

Stored when the user selects a "rights" option.
  • user_id : int -- The user's identifier
  • flow_initialized : str -- The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1)
  • rights_selected : enum("own", "other) -- The rights that a user selected (note that multiple selections actions can take place for a single flow)
I'd make a pass over the DB, to identify the last RightsSelection for each flow_initialization (if any) to figure out what an uploading user settled on during a particular flow.  I'd also look at how many selections a user makes per flow to see evidence of confusion & indecisiveness or maybe just exploration of the UI.  

Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend.

(will need to be stored in a cookie and reset at loads of step 1)

We don't even need this part since UploadWizard is a single-page application with no page load between the steps, so we can just store the token in memory.
I don't want to log userids unless we really need them, so I'll just go with initial timestamp + random number. I don't think connecting separate upload attempts by the same user is particularly useful at this point.