Hi all,
we have recently added some funnel [1] logging to UploadWizard. A nice dashboard is in the works, but here are some preliminary results, showing the number of virtual pageviews for each step of UploadWizard.
mysql:research@s1-analytics-slave.eqiad.wmnet [log]> select event_step, count(*), count(*)/3623 as survival_rate from UploadWizardStep_8612364 group by event_step order by survival_rate desc; +------------+----------+---------------+ | event_step | count(*) | survival_rate | +------------+----------+---------------+ | tutorial | 3623 | 1.0000 | | file | 3496 | 0.9649 | | deeds | 2433 | 0.6715 | | details | 2373 | 0.6550 | | thanks | 2109 | 0.5821 | +------------+----------+---------------+
This is based on about a day's worth of logs (25.5 hours) - the logging code was deployed to Commons yesterday.
The big drop is apparently in the file upload step (almost 30% - well over 1000 uploads a day). Some of that might be intentional (upload caught by badtitle filter etc), but even so the drop is huge. Given that that step is rather simple from a UX point of view, it seems that upload bugs are a bigger problem right now than design issues. (The license selection - deeds -> details - on the other hand is unexpectedly unproblematic; I would have expected it to be the main source of confusion, but actually adding description etc. seems worse.)
The next step would be to log JS/upload errors, I suppose. Also, it would be nice to know which dropoffs are final and which are reloads/restarts. The Navigation Timing API can tell apart reloads and normal navigation, alternatively we could maybe group by IP + useragent + time bucket to find retries.
Thanks, Gergo!
This is really helpful data, which will inform our development plans for Upload Wizard.
Identifying the file upload step as the main pain point is invaluable. It would be great if we could track the upload error messages as you suggest, so we can better understand what’s holding users back on this step. Also, do we have a way of finding out how often an upload hangs because of a server-side issue, rather than a user issue? Should we create tickets for these new tasks?
It would also be great if we could review this funnel data again in a few days, so we can see if this pattern is steady or if it varies over time. Can we track whether the drop-off is caused by people uploading multiple files? or large files? We might also want to look at whether certain browsers or platforms are experiencing more issues than others — or whether or not casual users are dropping off more than experienced users.
In any case, this is absolutely wonderful. Thank you so much for shedding more light on this important process, in the middle of all your other responsibilities :)
To be continued,
Fabrice
On Jun 4, 2014, at 12:55 PM, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
we have recently added some funnel [1] logging to UploadWizard. A nice dashboard is in the works, but here are some preliminary results, showing the number of virtual pageviews for each step of UploadWizard.
mysql:research@s1-analytics-slave.eqiad.wmnet [log]> select event_step, count(*), count(*)/3623 as survival_rate from UploadWizardStep_8612364 group by event_step order by survival_rate desc; +------------+----------+---------------+ | event_step | count(*) | survival_rate | +------------+----------+---------------+ | tutorial | 3623 | 1.0000 | | file | 3496 | 0.9649 | | deeds | 2433 | 0.6715 | | details | 2373 | 0.6550 | | thanks | 2109 | 0.5821 | +------------+----------+---------------+
This is based on about a day's worth of logs (25.5 hours) - the logging code was deployed to Commons yesterday.
The big drop is apparently in the file upload step (almost 30% - well over 1000 uploads a day). Some of that might be intentional (upload caught by badtitle filter etc), but even so the drop is huge. Given that that step is rather simple from a UX point of view, it seems that upload bugs are a bigger problem right now than design issues. (The license selection - deeds -> details - on the other hand is unexpectedly unproblematic; I would have expected it to be the main source of confusion, but actually adding description etc. seems worse.)
The next step would be to log JS/upload errors, I suppose. Also, it would be nice to know which dropoffs are final and which are reloads/restarts. The Navigation Timing API can tell apart reloads and normal navigation, alternatively we could maybe group by IP + useragent + time bucket to find retries. _______________________________________________ Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
_______________________________
Fabrice Florin Product Manager Wikimedia Foundation
On Wed, Jun 4, 2014 at 1:19 PM, Fabrice Florin fflorin@wikimedia.org wrote:
Also, do we have a way of finding out how often an upload hangs because of a server-side issue, rather than a user issue? Should we create tickets for these new tasks?
Logging server-side errors should be relatively easy since they return an error status via the API and the client-side code can prepare for that. I might include it in #541 https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/541. Client-side errors are harder to catch. #127 https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/127 will help identify very frequent errors, I'm not sure if we want to deal with JS error specifically in the funnel logging logic.
It would also be great if we could review this funnel data again in a few days, so we can see if this pattern is steady or if it varies over time. Can we track whether the drop-off is caused by people uploading multiple files?
Yes, logging the number of files is part of #541.
or large files?
More problematic since we only learn the file size if the upload succeeds. Will see if there is an easy way to determine whether an upload failure was caused by a large file (we can probably dig out the number of chunks from a lower layer of UW).
We might also want to look at whether certain browsers or platforms are experiencing more issues than others — or whether or not casual users are dropping off more than experienced users.
We log user agents, but nothing about the users themselves (for privacy reasons). If you want to differentiate between user cohorts, you should come up with an exact definition, and we can add that to the logs.
I remember from observations that one of the biggest problems in the file upload page (esp with multiple files) is that people are actually not able to find the 'Next' button because it is below the fold and sometimes even beyond the right side of the page...
Might be something to check with measuring. The other case we had there is that people do find the next button, but then after pressing it, the script crashes or it became super slow. So a two step measuring might be worthwhile there.
On Wed, Jun 4, 2014 at 9:55 PM, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
we have recently added some funnel [1] logging to UploadWizard. A nice dashboard is in the works, but here are some preliminary results, showing the number of virtual pageviews for each step of UploadWizard.
mysql:research@s1-analytics-slave.eqiad.wmnet [log]> select event_step, count(*), count(*)/3623 as survival_rate from UploadWizardStep_8612364 group by event_step order by survival_rate desc; +------------+----------+---------------+ | event_step | count(*) | survival_rate | +------------+----------+---------------+ | tutorial | 3623 | 1.0000 | | file | 3496 | 0.9649 | | deeds | 2433 | 0.6715 | | details | 2373 | 0.6550 | | thanks | 2109 | 0.5821 | +------------+----------+---------------+
This is based on about a day's worth of logs (25.5 hours) - the logging code was deployed to Commons yesterday.
The big drop is apparently in the file upload step (almost 30% - well over 1000 uploads a day). Some of that might be intentional (upload caught by badtitle filter etc), but even so the drop is huge. Given that that step is rather simple from a UX point of view, it seems that upload bugs are a bigger problem right now than design issues. (The license selection - deeds -> details - on the other hand is unexpectedly unproblematic; I would have expected it to be the main source of confusion, but actually adding description etc. seems worse.)
The next step would be to log JS/upload errors, I suppose. Also, it would be nice to know which dropoffs are final and which are reloads/restarts. The Navigation Timing API can tell apart reloads and normal navigation, alternatively we could maybe group by IP + useragent + time bucket to find retries.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Gergo Tisza, 04/06/2014 21:55:
The big drop is apparently in the file upload step (almost 30% - well over 1000 uploads a day). Some of that might be intentional (upload caught by badtitle filter etc), but even so the drop is huge. Given that that step is rather simple from a UX point of view, it seems that upload bugs are a bigger problem right now than design issues.
It's indeed interesting that we don't lose many people with the tutorial and release step, but the worst drop in my opinion is from "details" to "thanks": if 9 % of those who went so far in the process get lost, we're wasting a lot of people's time and something is really wrong. (Possibly with the backend, or the usual problems with Internet Explorer.) The drop from "file" to "deeds" may have the same cause or even be normal, i.e. people looking at the interface out of curiosity without actually wanting to upload something. How can we filter those, maybe exclude from the count the users who exited the page on purpose or few seconds after loading it?
Nemo
On Thu, Jun 5, 2014 at 10:43 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
It's indeed interesting that we don't lose many people with the tutorial and release step, but the worst drop in my opinion is from "details" to "thanks": if 9 % of those who went so far in the process get lost, we're wasting a lot of people's time and something is really wrong. (Possibly with the backend, or the usual problems with Internet Explorer.)
This is what I find most frustrating with UW. My uploads frequently fail to complete *after* I press submit with all the details. This generally happens with batch uploads (meaning, in many cases, I wasted I *lot* of time putting in all those details). There is some issue with publishing the details where a file stalls after it has been published but before its publication is confirmed, and it only works on three files at a time, so if this happens more than three times in a large batch, none of the subsequent files get published. I've taken to limiting my uploads to ~5 files at a time because of this, or in some cases using the far more reliable Commonist for batch uploads. (It's not an IE issue for me, although it may be a Linux issue.)
-Sage
Gergo Tisza, 04/06/2014 21:55:
+------------+----------+---------------+ | event_step | count(*) | survival_rate | +------------+----------+---------------+ | tutorial | 3623 | 1.0000 | | file | 3496 | 0.9649 | | deeds | 2433 | 0.6715 | | details | 2373 | 0.6550 | | thanks | 2109 | 0.5821 | +------------+----------+---------------+
There are also users claiming that upload of any file above 12 MB regularly fails for them: https://commons.wikimedia.org/?oldid=126801967#uploading_files_with_sizes_.3... Presumably that's for failures related to the upload stash? It would be nice to at least have some semi-specific bug report acknowledging the failures for users to follow updates and report additional data points. We'd then have something to link in response to such reports.
Nemo
Hi all,
On Wed, Jun 4, 2014 at 9:55 PM, Gergo Tisza gtisza@wikimedia.org wrote:
we have recently added some funnel [1] logging to UploadWizard. A nice dashboard is in the works, but here are some preliminary results, showing the number of virtual pageviews for each step of UploadWizard.
this took a while (I wonder what happened ;) but a dashboard is available now: http://multimedia-metrics.wmflabs.org/dashboards/uw
We also collected information about API errors; more details about that and the funnel analysis in general are available at https://www.mediawiki.org/wiki/UploadWizard/Funnel_analysis
The situation haven't changed since June: upload attempts with Upload Wizard fail with an embarrassingly high frequency, with about one in three attempts being abandoned by the user.
The data is too numerous and too complex to just look at it and see what's the problem (also, maybe we are not even collecting the right kind of data), so we need to come up with hypotheses about the problem(s) and test them. As the people with the most experience about UploadWizard problems, your suggestions would be invaluably helpful.
On Thu, Sep 25, 2014 at 10:35 AM, Gergo Tisza gtisza@wikimedia.org wrote:
a dashboard is available now: http://multimedia-metrics.wmflabs.org/dashboards/uw
Very nice.
The situation haven't changed since June: upload attempts fail with an embarrassingly high frequency
I would guess a lot of this is due to "DOES NOT ACCEPT FILENAMES THAT END IN <foo>". This blocks many of my uploads (spreadsheets, bio molecules, map routes, 3D images). Others I try to upload in a not-accepted format (e.g., odf, epub) before going and transcoding them.
IMO we should accept all free-format files on upload, even if after acceptance we notify the uploader that they are held in a quarantine area for sanitizing/processing. If that is too hard, we could create a stub page on upload that points to an Internet Archive page where their file has been uploaded. Either would feel like partial success in using the upload form.
Sam
https://commons.wikimedia.org/wiki/Commons:File_types#Unsupported_file_types
On that note, it would be good to collect which unsupported extensions people are trying to upload. Statistics about that would tell us what new format support would have the biggest impact.
On Thu, Sep 25, 2014 at 11:25 PM, Samuel Klein meta.sj@gmail.com wrote:
On Thu, Sep 25, 2014 at 10:35 AM, Gergo Tisza gtisza@wikimedia.org wrote:
a dashboard is available now: http://multimedia-metrics.wmflabs.org/dashboards/uw
Very nice.
The situation haven't changed since June: upload attempts fail with an embarrassingly high frequency
I would guess a lot of this is due to "DOES NOT ACCEPT FILENAMES THAT END IN <foo>". This blocks many of my uploads (spreadsheets, bio molecules, map routes, 3D images). Others I try to upload in a not-accepted format (e.g., odf, epub) before going and transcoding them.
IMO we should accept all free-format files on upload, even if after acceptance we notify the uploader that they are held in a quarantine area for sanitizing/processing. If that is too hard, we could create a stub page on upload that points to an Internet Archive page where their file has been uploaded. Either would feel like partial success in using the upload form.
Sam
https://commons.wikimedia.org/wiki/Commons:File_types#Unsupported_file_types
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Good idea, Gilles!
I just created a ticket for that data collection:
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/920
This will help us prioritize where to focus our efforts in this space.
Another thing we could do is to provide more helpful error messages for unsupported file formats, so we can help users figure out how to transcode their files into supported formats.
But let’s first identify the file formats people try to upload the most, then we can prioritize the error message improvements.
Sam, thanks for clarifying the use cases that cause problems for you, that is really helpful!
-f
On Sep 26, 2014, at 7:13 AM, Gilles Dubuc gilles@wikimedia.org wrote:
On that note, it would be good to collect which unsupported extensions people are trying to upload. Statistics about that would tell us what new format support would have the biggest impact.
On Thu, Sep 25, 2014 at 11:25 PM, Samuel Klein meta.sj@gmail.com wrote: On Thu, Sep 25, 2014 at 10:35 AM, Gergo Tisza gtisza@wikimedia.org wrote:
a dashboard is available now: http://multimedia-metrics.wmflabs.org/dashboards/uw
Very nice.
The situation haven't changed since June: upload attempts fail with an embarrassingly high frequency
I would guess a lot of this is due to "DOES NOT ACCEPT FILENAMES THAT END IN <foo>". This blocks many of my uploads (spreadsheets, bio molecules, map routes, 3D images). Others I try to upload in a not-accepted format (e.g., odf, epub) before going and transcoding them.
IMO we should accept all free-format files on upload, even if after acceptance we notify the uploader that they are held in a quarantine area for sanitizing/processing. If that is too hard, we could create a stub page on upload that points to an Internet Archive page where their file has been uploaded. Either would feel like partial success in using the upload form.
Sam
https://commons.wikimedia.org/wiki/Commons:File_types#Unsupported_file_types
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
_______________________________
Fabrice Florin Product Manager, Multimedia Wikimedia Foundation
Gilles Dubuc, 26/09/2014 16:13:
On that note, it would be good to collect which unsupported extensions people are trying to upload. Statistics about that would tell us what new format support would have the biggest impact.
Or what file extensions correlate the most with user confusion. :p https://xkcd.com/1301/ I'm not sure it can predict anything but it would certainly be interesting, there may be some surprises.
Nemo
multimedia@lists.wikimedia.org