Hi all,
a little more detail from the funnel analysis of UploadWizard (if you
haven't been following the other funnel thread,
[[mw:UploadWizard/Funnel_analysis]]
<https://www.mediawiki.org/wiki/UploadWizard/Funnel_analysis> has a quick
summary).
*Users repeat the upload process many times*
The main thing I am trying to understand at this point is why people use
the "upload another file" button so much. UploadWizard allows uploading up
to 50 files at the same time, which should be more then enough for the
average user, but our click-tracking data shows that most people click
through the tutorial-file-deed-details-thanks screens, then click on the
upload more button (which effectively resets the process and starts again
from the file screen), then click through the screens again, then click on
the upload more button again, then do the same again, and again, and again.
(Doing this fifty times in a row is not uncommon.) This suggests some
fundamental failing in UW - Sage suggested it is the instability of
uploading more than a few files at the same time. I wonder if others have
relevant experience?
*Errors do not seem to be the main problem*
I have tried to identify the reason for failed UploadWizard sessions (a
series of UploadWizard events logged on the same page which are not
terminated by reaching the thanks page) by checking what the last event
was, and assuming that for failed sessions caused by errors, that error
would be the last event. Assuming this is sound, errors do not seem to be
the main problem - they only appear at the end of ~25% of the failed
sessions (which is ~8% of the total sessions).
*Top errors*
That said, here is a list of error codes (these are mostly API error codes,
but a few are internal to UploadWizard) sorted by frequency, collected over
~1000 sessions:
| filename | 20 |
| badtoken | 19 |
| missingresult | 14 |
| title | 13 |
| publishfailed | 11 |
| stasherror | 7 |
| server-error | 3 |
| fileexists-forbidden | 2 |
| filetype-banned-type | 1 |
| unknown | 1 |
| verification-error | 1 |
| unknownerror | 1 |
A little explanation about the more frequent ones:
- filename: these seem to be user errors - most often invalid filetype
(doc, bmp etc), sometimes no extension at all or trying to add the same
file twice.
- badtoken: some sort of CSRF token expiration; bug 69691
<https://bugzilla.wikimedia.org/show_bug.cgi?id=69691>
- missingresult: returned by the upload API in the details step when the
uploaded file has gone missing; bug 43967
<https://bugzilla.wikimedia.org/show_bug.cgi?id=43967>
- title: an error about duplicate files (i.e. the same file already
exists on Commons) that somehow happens in the details step instead of the
file step.
- publishfailed: this seems to be some sort of race condition: first api
call to publish a file from stash puts it into the job queue and sets it
status to pending, second call will throw this error.
- stasherror: could be lots of things. bug 56302
<https://bugzilla.wikimedia.org/show_bug.cgi?id=56302>, bug 54028
<https://bugzilla.wikimedia.org/show_bug.cgi?id=54028> and more.
*Some suggestions based on the findings so far*
Quick wins:
- review UX for "fatal user errors" (i.e. when UploadWizard says "you
can't upload this file type") - is the error message helpful?
- review and improve api error messages (api-error-*), possibly override
them with UW-specific ones. Do they identify next steps? Do they even
exist?(e.g. api-error-publishfailed does not.)
- renew token on badtoken error (bug 69691
<https://bugzilla.wikimedia.org/show_bug.cgi?id=69691>)
- make sure that the specific error message thrown by
ApiUpload::dieUsage gets logged somewhere. Currently we only log a generic
message derived from the API error code, so e.g. all the dozen different
UploadStashException subclasses are reported with the same message.
- poll for success on publishfailed error (unlike its name suggest, it
seems to be actually meaning something like "publish in progress")
Medium wins:
- understand better why people repeat the upload process so often. This
might reveal serious UX deficiencies or functional errors (e.g. in an older
thread about funnel analysis, Sage claims uploading more than three files
at the same time is too unreliable for him).
- Investigate if there is a low-effort way to recover entered details
when the upload process has to be restarted. (There are drop-in solutions
like garlic.js <http://garlicjs.org/> or sisyphus.js
<https://github.com/simsalabim/sisyphus> but the very dynamic nature of
UW forms might be a problem.)
- figure out why are some title errors only reported in the details step
- log information
<https://meta.wikimedia.org/wiki/Schema:UploadWizardFlowEvent> about
uploaded files to better identify size- or filetype-specific issues
Bigger / longer-term effort:
- figure out a way to retry when the user already entered all the
details but publishing the file failed. (This points towards the
per-file-workflow-instead-of-global-workflow direction.)
- make stashed / async uploads rely on the database instead of the
session (bug 43967 <https://bugzilla.wikimedia.org/show_bug.cgi?id=43967>
)