Hi all,

a little more detail from the funnel analysis of UploadWizard (if you haven't been following the other funnel thread, [[mw:UploadWizard/Funnel_analysis]] has a quick summary).

Users repeat the upload process many times

The main thing I am trying to understand at this point is why people use the "upload another file" button so much. UploadWizard allows uploading up to 50 files at the same time, which should be more then enough for the average user, but our click-tracking data shows that most people click through the tutorial-file-deed-details-thanks screens, then click on the upload more button (which effectively resets the process and starts again from the file screen), then click through the screens again, then click on the upload more button again, then do the same again, and again, and again. (Doing this fifty times in a row is not uncommon.) This suggests some fundamental failing in UW - Sage suggested it is the instability of uploading more than a few files at the same time. I wonder if others have relevant experience?

Errors do not seem to be the main problem

I have tried to identify the reason for failed UploadWizard sessions (a series of UploadWizard events logged on the same page which are not terminated by reaching the thanks page) by checking what the last event was, and assuming that for failed sessions caused by errors, that error would be the last event. Assuming this is sound, errors do not seem to be the main problem - they only appear at the end of ~25% of the failed sessions (which is ~8% of the total sessions).

Top errors

That said, here is a list of error codes (these are mostly API error codes, but a few are internal to UploadWizard) sorted by frequency, collected over ~1000 sessions:

| filename             |    20 |
| badtoken             |    19 |
| missingresult        |    14 |
| title                |    13 |
| publishfailed        |    11 |
| stasherror           |     7 |
| server-error         |     3 |
| fileexists-forbidden |     2 |
| filetype-banned-type |     1 |
| unknown              |     1 |
| verification-error   |     1 |
| unknownerror         |     1 |

A little explanation about the more frequent ones:
  • filename: these seem to be user errors - most often invalid filetype (doc, bmp etc), sometimes no extension at all or trying to add the same file twice.
  • badtoken: some sort of CSRF token expiration; bug 69691
  • missingresult: returned by the upload API in the details step when the uploaded file has gone missing; bug 43967
  • title: an error about duplicate files (i.e. the same file already exists on Commons) that somehow happens in the details step instead of the file step.
  • publishfailed: this seems to be some sort of race condition: first api call to publish a file from stash puts it into the job queue and sets it status to pending, second call will throw this error.
  • stasherror: could be lots of things. bug 56302bug 54028 and more.

Some suggestions based on the findings so far

Quick wins:
  • review UX for "fatal user errors" (i.e. when UploadWizard says "you can't upload this file type") - is the error message helpful?
  • review and improve api error messages (api-error-*), possibly override them with UW-specific ones. Do they identify next steps? Do they even exist?(e.g. api-error-publishfailed does not.)
  • renew token on badtoken error (bug 69691)
  • make sure that the specific error message thrown by ApiUpload::dieUsage gets logged somewhere. Currently we only log a generic message derived from the API error code, so e.g. all the dozen different UploadStashException subclasses are reported with the same message.
  • poll for success on publishfailed error (unlike its name suggest, it seems to be actually meaning something like "publish in progress")
Medium wins:
  • understand better why people repeat the upload process so often. This might reveal serious UX deficiencies or functional errors (e.g. in an older thread about funnel analysis, Sage claims uploading more than three files at the same time is too unreliable for him).
  • Investigate if there is a low-effort way to recover entered details when the upload process has to be restarted. (There are drop-in solutions like garlic.js or sisyphus.js but the very dynamic nature of UW forms might be a problem.)
  • figure out why are some title errors only reported in the details step
  • log information about uploaded files to better identify size- or filetype-specific issues
Bigger / longer-term effort:
  • figure out a way to retry when the user already entered all the details but publishing the file failed. (This points towards the per-file-workflow-instead-of-global-workflow direction.)
  • make stashed / async uploads rely on the database instead of the session (bug 43967)