New subject: Image Load Study - Goals, Questions and Outcomes

18 Mar 2014

I'm merging the discussion back with the list. If you're on the list and
have been following this discussion, the last 6 emails quoted below are
probably interesting.

...
  Have you considered comparing the Media Viewer image
load to the current
 image load on Commons? 

That's what I meant when I talked about comparing Media Viewer to the file
page (basically the experience without media viewer). Yes, I think we can
and should compare to Commons specifically, as files hosted on Commons are
the majority on our large hosted wikis, as opposed to files uploaded
directly to the current wiki.

Either way, I think it would be useful to have some form of image load
...
  metric for each of our key pilot sites

Aren't all pilot sites hosted in the same place? I think tracking them on
the detailed network performance level, as we already do, is sufficient to
spot issues in the ops realm where one site would get unusually slower
compared to the others. We can check to be certain whether or not Media
Viewer vs File page gives us different results between the sites, but if
they're all hosted in the same place, they should give us the same results.
If I'm wrong about hosting location, then yes, we should definitely track
them.

If you could make a practical proposal towards that simple goal, that would
...
  be wonderful. 

My practical proposal is that right now we:
- improve the current limn graphs to cover useful information that I dug up
manually in https://www.mediawiki.org/wiki/Multimedia/Performance_Analysis
- write the Media Viewer vs File page/Commons test
- look at the results of this test and define *simple* acceptance criteria
for now and the future, similar to the ones I've suggested earlier
- re-do a full analysis of the results (limn graphs + vs. test) and check
if there's anything preventing us from launching

And if time allows:
- add the thumb dimensions to the markup in core, so that we can display
the blurred thumbnail 200-300ms sooner on average (helps with perception)
- investigate the API calls that are considerably slower than others to see
if there's anything we can improve on that front

All the other ideas regarding measuring speed perception are worth keeping
in mind, but I don't think they're worth doing right now, given the short
timeframe we're looking at. I think it's something that we should keep up
our sleeve and use if we launch and the feedback is really negative with a
lot of people complaining that Media Viewer feels slower than the old way.
By that point we'll have done everything we can in terms of improving real
performance, so studying the effect on users of our speed perception
strategies is pretty much the only thing we'll have left as actionable. I
see it as a last resort, because if people complain, it's more likely that
the cause is the real performance, not the tricks we've used to make it
appear faster. Tricks can be counter-effective, though, that's why I'm not
ruling this part out, just postponing it.

On Tue, Mar 18, 2014 at 5:27 PM, Fabrice Florin &lt;fflorin(a)wikimedia.org&gt;wrote;wrote:

...
  Thanks, Gilles, this is really, really thoughtful!

 Have you considered comparing the Media Viewer image load to the current
 image load on Commons? That would at least give us a sense of how much
 longer it takes with Media Viewer than before?

 Either way, I think it would be useful to have some form of image load
 metric for each of our key pilot sites, so we can get a sense of how long
 it takes for people to view the images.

 If you could make a practical proposal towards that simple goal, that
 would be wonderful.

 Cheers,

 Fabrice

 On Mar 18, 2014, at 8:08 AM, Gilles Dubuc &lt;gilles(a)wikimedia.org&gt; wrote:

 having a measure that helps us to get an idea of whether efforts in that
  front are working is useful

 "Perceived speed" by definition, is subjective, it can't be measured
 automatically. Efforts to measure figures like the amount of time the
 blurred image takes to show up are in my opinion, a waste of time, because
 once we have this figure the only effect it'll have is the team saying "oh,
 cool" at the result, and then there won't be anything actionable. Because
 Media Viewer's task isn't to display blurred images, the measure of success
 for the project is time taken for the actual image showing up. Even if you
 measure very precisely how long a spinner or a placeholder is on the screen
 for, it won't answer the real question which is: does Media Viewer *feel*faster with
those visual tricks/preloading? And I think that's an entirely
 separate debate than performance, one that would need to be answered by
 questionnaires and user testing.

 Do we include also the access to the file (that some users do to view it
  in more details) as part of the things we compare
with Media Viewer?

 Do you mean the full resolution image? I think a more pertinent question
 than comparing timing would be to measure how many times people open the
 full resolution image with and without Media Viewer. This is more of a
 general visit statistic to follow. I imagine that the question you want to
 answer here is whether people feel less the need to open the full
 resolution image, thanks to the extra detail visible on the screen with
 media viewer. I'm not sure that we can measure that, though, because media
 viewer and the file page can both access the same full resolution image.
 I.e. a CDN hit on the full res doesn't mean the person opened the full
 resolution specifically, it could be that media viewer displayed it because
 the person's screen resolution required it, or that someone gave them a
 link to the full resolution image, etc. It's an interesting question, but
 at a glance measuring it seems quite difficult.

  Having an idea of how much such feature is used
could provide us an
 estimate of the time saved by the user 

 We can't pretend to measure "time saved by the user" with javascript
 measurement. It just isn't measurable because it depends on the user's
 workflow. Someone could rightfully argue that Media Viewer "takes more
 time" because in their old workflow they're used to opening tons of tabs in
 the background, and they spend enough time on each tab that interests them
 that they can very quickly close and dismiss the ones they're not
 interested in, and every time they see another tab it's already finished
 loading. The advantage of opening tabs here in terms of time saved is
 simply because that user's manual image preloading technique is a lot more
 aggressive than media viewer's. So, no, we can't measure the "time saved by
 a user" in javascript, publishing a figure that calls itself that would be
 misleading and would probably backfire on us. To measure "time saved by the
 user" we'd need a user testing session where we compare how long people
 take to go through two very similar image galleries with media viewer vs
 without it. Then, with a big enough sample, you can argue that one is a
 time saver compared to the other one.

 On Tue, Mar 18, 2014 at 3:26 PM, Pau Giner &lt;pginer(a)wikimedia.org&gt; wrote:

  On the topic of measuring detailed things from an
end-user perspective
  (time it takes to display the blurred thumb, to
hit next, etc.) I think
 that they're too complex to be worth doing at the moment, and we have
 nothing to compare them against. 

 I agree that "perceived performance" as the name suggest is subjective,
 but having a measure that helps us to get an idea of whether efforts in
 that front are working is useful. The time the user waits until seeing some
 kind of progress (e.g, getting the blurry image) is useful in that context.

 For example, the amount of time it takes to display the blurred image has
  no equivalent on the file page, so that figure
can't really be used to
 determine success. 

 For the file page it is true that the perceived performance will
 approximate the real one.
 Do we include also the access to the file (that some users do to view it
 in more details) as part of the things we compare with Media Viewer? If
 that is the case, big
files<https://upload.wikimedia.org/wikipedia/commons/4/45/Empire_State_B…
that
 are progressively loaded by browsers would be more comparable, and it will
 be interesting to check both times (showing something vs. showing the
 complete image) in MediaViewer and outside of it.

 Another interesting time to take into account, is the time saved through
 navigation controls. Having an idea of how much such feature is used could
 provide us an estimate of the time saved by the user (which currently has
 to go back and forth dealing with additional page loads or tab switching in
 the browser).

 Having said all that, I totally understand that measuring the real
 performance is considered more priority than the perceived performance, but
 measures to estimate the later should not be overlooked.

 Pau

 On Tue, Mar 18, 2014 at 12:15 PM, Gilles Dubuc &lt;gilles(a)wikimedia.org&gt;wrote;wrote:

  Goal: The Media Viewer would be considered
accepted if it can display
  1-2 Mb images in less than 3 seconds at least 80%
of the time, during the
 course of a week.

 The issue with that goal is that it's performance is almost entirely out
 of our hands. As seen in my preliminary analysis of the data (
 https://www.mediawiki.org/wiki/Multimedia/Performance_Analysis ), some
 places like Russia seem to have terrible performance compared to average
 internet speed in those countries and there's nothing we (multimedia team)
 can do to change that. Chances are, if media viewer can't display a 1-2MB
 image in less than 3 seconds 80% of the time, neither can the file page or
 any wiki page with an image of the same size on it. Because the issue is
 most likely the connectivity between users in those countries and our
 servers, not the technique used to deliver it (as part of the pageload or
 loaded by JS).

 I think that the main issue with the goal options I've seen so far is
 that they focus on general performance of Media Viewer as an isolated
 entity. The network performance tracking we've set up is good to identify
 issues on our end. For example an API call that might be too slow and that
 maybe we can optimize, or the fact that we could patch mediawiki core by
 adding thumb dimensions to display the thumb sooner. Also, it helps us keep
 track of any ops issues that might be affecting the product we're
 responsible for (media viewer) on an ongoing basis. They help making sure
 that we're doing the most we can to make things fast.

 What these network performance stats aren't good for, though, is to
 determine whether media viewer is successful as a product. Because the
 performance of our servers, our CDNs and our networking infrastructure are
 all bundled up in the same figure, indistinguishable from one another. It
 doesn't tell us if Media Viewer is good in the context of an infrastructure
 that won't change overnight.

 I think the only measure of success we can do in our realm is how
 opening an image in media viewer compares to opening a file page or not.
 We're not tracking that yet. The only way we could do that on the user's
 end I can think of is to load a file page in an invisible iframe and
 measure how long it takes for it to load, and better yet how long it takes
 for the image on that file page to load too. And compare that to an image
 load in the media viewer. However it's really challenging to measure that,
 because we can't stop the user from navigating images in the media viewer
 while we attempt to measure a file page in an iframe, and the navigating
 they do would trigger requests that use up bandwidth, etc. Thus, I don't
 think we can get pertinent figures collected directly from users that will
 tell us if media viewer is doing a good job in terms of performance or not,
 because there would be too much noise in the data collection.

 I think that automated testing is the way to go, we should package this
 performance measurement (media viewer vs file page) as a series of browser
 tests and check the figures that way. Even better if they can run on
 something like cloudbees where there would be some latency between where
 the tests run and our servers. Now, there are variables at play when making
 a media viewer/file page comparison:

 - Is the JS already cached? As Gergo mentioned, the JS being uncached
 will happen the first time and then every 30 days-ish or whenever we update
 media viewer (once a week at most, usually). I think we should measure both
 variants (with JS cached and with JS not cached), to assess how bad the
 effect of cold cache is. There are a number of ways we could address this
 issue, some more aggressive than others in terms of bandwidth (eg. preload
 the JS when the mouse cursor gets near a thumbnail, preload the JS after
 the pageload is done, etc.). This is worth measuring because it's
 actionable. The reason why we haven't taken those measures yet is that
 they're a balancing act (wasting people's bandwidth vs providing a faster
 experience).

 - What screen resolution are we testing against? The bigger the
 resolution, the bigger the image, the slower the image load. I couldn't
 find any figures about the average desktop screen resolution of people
 visiting our wikis. Maybe someone knows where to get that figure if we have
 it? On that front we could either test the performance of the most common
 resolutions, or test the performance of the average resolution.

 - Varnish cache hit/varnish cache miss. We know that's a big slowdown
 when it happens, and we know that this won't get solved for another few
 months. That variable, however, also applies to file pages. The image on
 the file page is a thumb too and it can be a varnish miss as well. We don't
 see it often because it stops as soon as one person (usually the author)
 visits the file page. Media viewer just increases the probability of
 hitting a varnish cache miss because we have a few buckets instead of a
 single size/bucket for the file page. I think this is an isolated problem
 and actually one that needs more serious math to measure the effect of. Why
 more serious math? Because for one, it depends on the distribution of
 desktop resolutions among our visitors, compared to the buckets we've
 picked. If for example a given bucket size covers 80% of our visitors, then
 in 80% of the cases, the effect of varnish misses is exactly the same as
 the file page. We also have to consider if it's worth spending time
 studying this issue at all, knowing that a few months from now ops will
 have the disk capacity that will allow us to pregenerate the bucket sizes
 we need. And knowing that there's literally nothing we can do about it at
 this point, besides reducing the amount of buckets to reduce the likelihood
 of being the first person to hit one. My recommendation for that issue is
 that we use the technical performance data we're collecting already to
 determine what percentage of image views are affected by it over time on
 wikis that have signed up for the launch. Then we'll get an idea of how bad
 it really is on a wiki where everyone has media viewer (because, by network
 effect, the more people there are, the less likely you will be to be the
 first person to use media viewer on a given file). But it's not worth
 obsessing over right now, because the low traffic of the tests sites makes
 it happen to us a whole lot more than it would in a context where every
 visitor has media viewer.

 So, once we've settled what we do with the above variables, we can come
 up with acceptance criteria for media viewer's performance, which could
 look like:
 - with a cold JS cache, on an average desktop resolution, with a varnish
 hit, media viewer shows the image in at most 100% of the time it takes for
 the file page to do the same
 - with a warm JS cache, on an average desktop resolution, with a varnish
 hit, media viewer shows the image in at most 75% of the time it takes for
 the file page to do the same
 - with a warm JS cache, on an large desktop resolution, with a varnish
 hit, media viewer shows the image in at most 120% of the time it takes for
 the file page to do the same

 An added advantage to making this measurement automated is that it can
 be baked in as a test failure/success criteria. So if suddenly we make a
 code change that mistakenly makes the experience slower than our criteria,
 the team would be notified automatically.

 On the topic of measuring detailed things from an end-user perspective
 (time it takes to display the blurred thumb, to hit next, etc.) I think
 that they're too complex to be worth doing at the moment, and we have
 nothing to compare them against. For example, the amount of time it takes
 to display the blurred image has no equivalent on the file page, so that
 figure can't really be used to determine success. Graphs of those figures
 expressed in user-centric terms would be easier to understand to outsiders,
 but in terms of troubleshooting technical issues they're not better than
 the data we're already collecting. They're worse, in fact, because any
 number of things could happen on the users' computers between action A and
 action B (browsers freezing tabs comes to mind) that would quickly render a
 lot of those virtual user-centric figures meaningless. I think we should
 focus on what makes the core experience better, not spending time building
 entertaining graphs.

 On Tue, Mar 18, 2014 at 1:00 AM, Fabrice Florin &lt;fflorin(a)wikimedia.org&gt;wrote;wrote:

> Hi Multimedia team (keeping it to a short list so we can reach closure
> soon on this important topic):
>
> Did you have any comments on my email of Friday on the Image Load
> Study? (see below) That proposal was based on last week's conversations
> with you guys.
>
> If this general direction works for you, I propose the following main
> acceptance criteria from a performance standpoint:
>
> Goal: The Media Viewer would be considered accepted if it can display
  1-2 Mb images in less than 3 seconds at least 80%
of the time, during the
 course of a week.
 > Verification: This goal could be verified with a histogram showing
> total load events in a week for 1-2 Mb images, with these deciles: number
> of image load events in under 1 second? in 1-2 seconds? in 2-3 seconds? ...
> and so on, up to 10 seconds or more. If 80% of these events take place in
> the first three deciles, we would have reached our goal.
>
> Would this seem like a reasonable basic measure of success for us in
> coming weeks? Or would you recommend another goal?
>
> If we had more time, we could track a variety of other goals, but I am
> looking for a single metric we can focus on and actually measure in time
> for launch. If we want more granular criteria, I proposed other possible
> performance targets by image size in card #149.
>
> On the assumption that this is a good direction to pursue, I propose we
> focus on the following 4 high priority cards for our next steps:
>
> #149 Define acceptance performance criteria for the media viewer (see
> above, let's edit as needed to reflect our team goal)
> https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/149
>
> #364 Instrumentation for timing of image load, lightbox UI load
> https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/364
>
> #292 Histograms and decile charts for performance
> https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/292
>
> #198 Analyze Image Load Data with Dashboards
> https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/198/
>
> I also created this Metrics Tasks Wall, based on Gergo's Epic
> Story:#359, to make it easier to track all these tickets:
>
>
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards?favorit…
>
> Given our primary goal proposed above, I would recommend that we
> prioritize #364 and #292 over #198 -- and postpone the bandwith-related
> tickets, as recommended in the P.S. below.
>
>  Please let me know what you think and what you recommend for our next
> steps.
>
> Thanks,
>
>
> Fabrice
>
>
> P.S.: For now, I recommend that we de-emphasize these bandwith-related
> metrics, since they are unlikely to happen in our time-frame:
>
> #361 Collect bandwidth stats
> https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/361
>
> #340 More Image Load Dashboards by Bandwidth
> https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/340
>
>
> On Mar 14, 2014, at 4:13 PM, Fabrice Florin &lt;fflorin(a)wikimedia.org&gt;
> wrote:
>
> Hi everyone,
>
> We would appreciate your advice on our upcoming research study of image
> load times on Media Viewer.
>
> Here are proposed goals, questions and outcomes for this study. They
> are presented for discussion purposes, not as a prescriptive requirement -
> and will be adjusted based on your feedback.
>
>
> *I. Goals*
> The goal of this study is to determine whether or not Media Viewer is
> loading images fast enough for the majority of our users in most common
> situations.
>
> As a typical user of the Media Viewer, I want images to load quickly,
> in just a few seconds, so I don't have to wait a long time to see them.
>
> Here are our recommended performance targets for image load times by
> connection speed, to match user expectations on the Web:
> * 1-2 seconds for a medium-size image on a fast connection
> * 2-3 seconds for the same image on a medium connection
> * 5-8 seconds for the same image a slow connection
>
> If tracking connection speeds is too hard in our time-frame, we could
> base our performance targets on image size instead. For example:
> * 1-2 seconds for a small-size image on a medium connection
> * 2-3 seconds for medium-size image on the same connection
> * 5-8 seconds for large-size image on the same connection
>
> Definitions:
> * Image load time = the number of seconds from when you click on a
> thumbnail to when you see the full image
> * Image size:  large = over 2Mb, medium = 1 to 2Mb, small = under 1Mb
> * Connection speed: fast = over 256 Kbs, medium = 64 to 256 Kbs, slow =
> under 64 Kbs
>
> The above numbers are for discussion purposes, and can be adjusted
> based on your feedback.
>
>
> *II. Questions*
> Here are the main research questions we propose to answer about image
> load performance.
>
> *1. How long does it take for an image to load for the conditions
> below?*
> (image load = total time from thumbnail click to full image display)
>
> a. by image size:
>     load times for large images? medium images? small images?
>
> b. by web site:
>     load times for mediawiki.org? commons? enwiki? frwiki? huwiki?
> other sites?
>
> c. by connection speed: (optional)
>     load times for fast connections? medium connections?
> small connections? (this may not be feasible in our time frame)
>
> d. by daypart: (optional)
>     load times for morning? afternoon? evening? night time? (to show if
> performance slows during peak hours)
>
> This question could be answered by storing the timestamp for thumbnail
> clicks, as well as the timestamp for the full image display, then log the
> difference.
>
> We would then prepare different bar graphs for each condition set
> above, with categories on the vertical axis, and number of seconds on the
> horizontal axis. The graphs could be based on data from the last 7 days.
>
>
> *2. How often does the image load time exceed our performance targets
> above?*
>
> a. by load time in a day:
>     number of images that load in under 1 second? in 1-2 seconds? in
> 2-3 seconds? ... and so on, up to 10 seconds or more
>
> b. by load time in a week:
>     number of images that load in under 1 second? in 1-2 seconds? in
> 2-3 seconds? ... and so on, up to 10 seconds or more
>
>  This question could be answered by preparing different histograms,
> with number of images on the vertical axis, and number of seconds on the
> horizontal axis (deciles).
>
>
> *III. Outcomes*
> To answer these questions, we plan to collect data during our upcoming
> pilots on different sites in April.
>
> Based on these pilot results, we will need to make decisions about the
> wider deployments planned for May.
>
> Here are possible outcomes from this study:
>
> Outcome 1: Favorable - e.g.: 80% of images load quickly
> Action: Go ahead with current release plan to deploy Media Viewer
> everywhere by default.
>
> Scenario 2: Neutral - e.g.: 50% of images load quickly
> Action: Go ahead with current release plan, but deploy Media Viewer as
> an opt-in feature on wikis that don't want it by default
>
> Scenario 3: Unfavorable - e.g.: 20% of images load quickly
> Action: Revisit release plan: consider making this opt-in everywhere --
> or work on faster image load solutions.
>
>
> We would be grateful for your comments on this, so we can refine our
> plans before we start this study next week. Please let us know which
> metrics above seem most important, given that we only have a few developer
> days to collect and analyze a few key metrics in coming weeks, to determine
> if we are meeting our objectives. Some related links are included below,
> for your convenience.
>
> To end on a positive note, we just deployed yesterday a new version of
> Media Viewer that is much faster, thanks to all the fine work from our
> development team. This morning, I looked at a variety on 'non-popular'
> images on enwiki today, and the Media Viewer experience was quite good
> overall. Most images load within the 2 second maximum which we
> recommend for a 'fast' connection -- and this was a home wifi connection. I
> realize this is completely anecdotal, and not supported by hard data, so we
> can't make any decisions about it. But it makes me hopeful that we are
> getting close to our objectives. Even compared to large commercial sites
> like Flickr, we hold up pretty well on this computer. :)
>
> Thanks for your interest in this project.
>
> All the best,
>
>
> Fabrice
>
>
> _______________________________
>
>
> *USEFUL LINKS*
>
> * Media Viewer Release Plan:
> https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Release_Plan
>
> * First Media Viewer Metrics:
> http://multimedia-metrics.wmflabs.org/dashboards/mmv Metrics
>
> * Media Viewer Test Page:
> https://commons.wikimedia.org/wiki/Commons:Lightbox_demo
>
> * Metrics Tasks under consideration (Mingle):
>
>
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards?favorit…
>
> * Next Development Cycle (Mingle):
> http://ur1.ca/gtvvr
>
> * About Media Viewer:
> https://www.mediawiki.org/wiki/Multimedia/About_Media_Viewer
>
>
> _______________________________
>
> Fabrice Florin
> Product Manager, Multimedia
> Wikimedia Foundation
>
> http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
>
>
>
>
>
>   _______________________________
>
> Fabrice Florin
> Product Manager
> Wikimedia Foundation
>
> http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
>
>
>
>

 --
 Pau Giner
 Interaction Designer
 Wikimedia Foundation

 _______________________________

 Fabrice Florin
 Product Manager
 Wikimedia Foundation

 http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)

Re: [Multimedia] Image Load Study - Goals, Questions and Outcomes