I'm merging the discussion back with the list. If you're on the list and have been following this discussion, the last 6 emails quoted below are probably interesting.

Have you considered comparing the Media Viewer image load to the current image load on Commons?

That's what I meant when I talked about comparing Media Viewer to the file page (basically the experience without media viewer). Yes, I think we can and should compare to Commons specifically, as files hosted on Commons are the majority on our large hosted wikis, as opposed to files uploaded directly to the current wiki.

Either way, I think it would be useful to have some form of image load metric for each of our key pilot sites

Aren't all pilot sites hosted in the same place? I think tracking them on the detailed network performance level, as we already do, is sufficient to spot issues in the ops realm where one site would get unusually slower compared to the others. We can check to be certain whether or not Media Viewer vs File page gives us different results between the sites, but if they're all hosted in the same place, they should give us the same results. If I'm wrong about hosting location, then yes, we should definitely track them.

If you could make a practical proposal towards that simple goal, that would be wonderful.

My practical proposal is that right now we:

- improve the current limn graphs to cover useful information that I dug up manually in https://www.mediawiki.org/wiki/Multimedia/Performance_Analysis

- write the Media Viewer vs File page/Commons test

- look at the results of this test and define simple acceptance criteria for now and the future, similar to the ones I've suggested earlier

- re-do a full analysis of the results (limn graphs + vs. test) and check if there's anything preventing us from launching

And if time allows:

- add the thumb dimensions to the markup in core, so that we can display the blurred thumbnail 200-300ms sooner on average (helps with perception)

- investigate the API calls that are considerably slower than others to see if there's anything we can improve on that front

All the other ideas regarding measuring speed perception are worth keeping in mind, but I don't think they're worth doing right now, given the short timeframe we're looking at. I think it's something that we should keep up our sleeve and use if we launch and the feedback is really negative with a lot of people complaining that Media Viewer feels slower than the old way. By that point we'll have done everything we can in terms of improving real performance, so studying the effect on users of our speed perception strategies is pretty much the only thing we'll have left as actionable. I see it as a last resort, because if people complain, it's more likely that the cause is the real performance, not the tricks we've used to make it appear faster. Tricks can be counter-effective, though, that's why I'm not ruling this part out, just postponing it.

On Tue, Mar 18, 2014 at 5:27 PM, Fabrice Florin <fflorin@wikimedia.org> wrote:

Thanks, Gilles, this is really, really thoughtful!

Have you considered comparing the Media Viewer image load to the current image load on Commons? That would at least give us a sense of how much longer it takes with Media Viewer than before?

Either way, I think it would be useful to have some form of image load metric for each of our key pilot sites, so we can get a sense of how long it takes for people to view the images.

If you could make a practical proposal towards that simple goal, that would be wonderful.

Cheers,

Fabrice

On Mar 18, 2014, at 8:08 AM, Gilles Dubuc <gilles@wikimedia.org> wrote:

having a measure that helps us to get an idea of whether efforts in that front are working is useful

"Perceived speed" by definition, is subjective, it can't be measured automatically. Efforts to measure figures like the amount of time the blurred image takes to show up are in my opinion, a waste of time, because once we have this figure the only effect it'll have is the team saying "oh, cool" at the result, and then there won't be anything actionable. Because Media Viewer's task isn't to display blurred images, the measure of success for the project is time taken for the actual image showing up. Even if you measure very precisely how long a spinner or a placeholder is on the screen for, it won't answer the real question which is: does Media Viewer feel faster with those visual tricks/preloading? And I think that's an entirely separate debate than performance, one that would need to be answered by questionnaires and user testing.

Do we include also the access to the file (that some users do to view it in more details) as part of the things we compare with Media Viewer?

Do you mean the full resolution image? I think a more pertinent question than comparing timing would be to measure how many times people open the full resolution image with and without Media Viewer. This is more of a general visit statistic to follow. I imagine that the question you want to answer here is whether people feel less the need to open the full resolution image, thanks to the extra detail visible on the screen with media viewer. I'm not sure that we can measure that, though, because media viewer and the file page can both access the same full resolution image. I.e. a CDN hit on the full res doesn't mean the person opened the full resolution specifically, it could be that media viewer displayed it because the person's screen resolution required it, or that someone gave them a link to the full resolution image, etc. It's an interesting question, but at a glance measuring it seems quite difficult.

Having an idea of how much such feature is used could provide us an estimate of the time saved by the user

We can't pretend to measure "time saved by the user" with javascript measurement. It just isn't measurable because it depends on the user's workflow. Someone could rightfully argue that Media Viewer "takes more time" because in their old workflow they're used to opening tons of tabs in the background, and they spend enough time on each tab that interests them that they can very quickly close and dismiss the ones they're not interested in, and every time they see another tab it's already finished loading. The advantage of opening tabs here in terms of time saved is simply because that user's manual image preloading technique is a lot more aggressive than media viewer's. So, no, we can't measure the "time saved by a user" in javascript, publishing a figure that calls itself that would be misleading and would probably backfire on us. To measure "time saved by the user" we'd need a user testing session where we compare how long people take to go through two very similar image galleries with media viewer vs without it. Then, with a big enough sample, you can argue that one is a time saver compared to the other one.

On Tue, Mar 18, 2014 at 3:26 PM, Pau Giner <pginer@wikimedia.org> wrote:

On the topic of measuring detailed things from an end-user perspective (time it takes to display the blurred thumb, to hit next, etc.) I think that they're too complex to be worth doing at the moment, and we have nothing to compare them against.

I agree that "perceived performance" as the name suggest is subjective, but having a measure that helps us to get an idea of whether efforts in that front are working is useful. The time the user waits until seeing some kind of progress (e.g, getting the blurry image) is useful in that context.

For example, the amount of time it takes to display the blurred image has no equivalent on the file page, so that figure can't really be used to determine success.

For the file page it is true that the perceived performance will approximate the real one.
Do we include also the access to the file (that some users do to view it in more details) as part of the things we compare with Media Viewer? If that is the case, big files that are progressively loaded by browsers would be more comparable, and it will be interesting to check both times (showing something vs. showing the complete image) in MediaViewer and outside of it.

Another interesting time to take into account, is the time saved through navigation controls. Having an idea of how much such feature is used could provide us an estimate of the time saved by the user (which currently has to go back and forth dealing with additional page loads or tab switching in the browser).

Having said all that, I totally understand that measuring the real performance is considered more priority than the perceived performance, but measures to estimate the later should not be overlooked.

Pau

On Tue, Mar 18, 2014 at 12:15 PM, Gilles Dubuc <gilles@wikimedia.org> wrote:

Goal: The Media Viewer would be considered accepted if it can display 1-2 Mb images in less than 3 seconds at least 80% of the time, during the course of a week.

The issue with that goal is that it's performance is almost entirely out of our hands. As seen in my preliminary analysis of the data ( https://www.mediawiki.org/wiki/Multimedia/Performance_Analysis ), some places like Russia seem to have terrible performance compared to average internet speed in those countries and there's nothing we (multimedia team) can do to change that. Chances are, if media viewer can't display a 1-2MB image in less than 3 seconds 80% of the time, neither can the file page or any wiki page with an image of the same size on it. Because the issue is most likely the connectivity between users in those countries and our servers, not the technique used to deliver it (as part of the pageload or loaded by JS).

I think that the main issue with the goal options I've seen so far is that they focus on general performance of Media Viewer as an isolated entity. The network performance tracking we've set up is good to identify issues on our end. For example an API call that might be too slow and that maybe we can optimize, or the fact that we could patch mediawiki core by adding thumb dimensions to display the thumb sooner. Also, it helps us keep track of any ops issues that might be affecting the product we're responsible for (media viewer) on an ongoing basis. They help making sure that we're doing the most we can to make things fast.

What these network performance stats aren't good for, though, is to determine whether media viewer is successful as a product. Because the performance of our servers, our CDNs and our networking infrastructure are all bundled up in the same figure, indistinguishable from one another. It doesn't tell us if Media Viewer is good in the context of an infrastructure that won't change overnight.

I think the only measure of success we can do in our realm is how opening an image in media viewer compares to opening a file page or not. We're not tracking that yet. The only way we could do that on the user's end I can think of is to load a file page in an invisible iframe and measure how long it takes for it to load, and better yet how long it takes for the image on that file page to load too. And compare that to an image load in the media viewer. However it's really challenging to measure that, because we can't stop the user from navigating images in the media viewer while we attempt to measure a file page in an iframe, and the navigating they do would trigger requests that use up bandwidth, etc. Thus, I don't think we can get pertinent figures collected directly from users that will tell us if media viewer is doing a good job in terms of performance or not, because there would be too much noise in the data collection.

I think that automated testing is the way to go, we should package this performance measurement (media viewer vs file page) as a series of browser tests and check the figures that way. Even better if they can run on something like cloudbees where there would be some latency between where the tests run and our servers. Now, there are variables at play when making a media viewer/file page comparison:

- Is the JS already cached? As Gergo mentioned, the JS being uncached will happen the first time and then every 30 days-ish or whenever we update media viewer (once a week at most, usually). I think we should measure both variants (with JS cached and with JS not cached), to assess how bad the effect of cold cache is. There are a number of ways we could address this issue, some more aggressive than others in terms of bandwidth (eg. preload the JS when the mouse cursor gets near a thumbnail, preload the JS after the pageload is done, etc.). This is worth measuring because it's actionable. The reason why we haven't taken those measures yet is that they're a balancing act (wasting people's bandwidth vs providing a faster experience).

- What screen resolution are we testing against? The bigger the resolution, the bigger the image, the slower the image load. I couldn't find any figures about the average desktop screen resolution of people visiting our wikis. Maybe someone knows where to get that figure if we have it? On that front we could either test the performance of the most common resolutions, or test the performance of the average resolution.

- Varnish cache hit/varnish cache miss. We know that's a big slowdown when it happens, and we know that this won't get solved for another few months. That variable, however, also applies to file pages. The image on the file page is a thumb too and it can be a varnish miss as well. We don't see it often because it stops as soon as one person (usually the author) visits the file page. Media viewer just increases the probability of hitting a varnish cache miss because we have a few buckets instead of a single size/bucket for the file page. I think this is an isolated problem and actually one that needs more serious math to measure the effect of. Why more serious math? Because for one, it depends on the distribution of desktop resolutions among our visitors, compared to the buckets we've picked. If for example a given bucket size covers 80% of our visitors, then in 80% of the cases, the effect of varnish misses is exactly the same as the file page. We also have to consider if it's worth spending time studying this issue at all, knowing that a few months from now ops will have the disk capacity that will allow us to pregenerate the bucket sizes we need. And knowing that there's literally nothing we can do about it at this point, besides reducing the amount of buckets to reduce the likelihood of being the first person to hit one. My recommendation for that issue is that we use the technical performance data we're collecting already to determine what percentage of image views are affected by it over time on wikis that have signed up for the launch. Then we'll get an idea of how bad it really is on a wiki where everyone has media viewer (because, by network effect, the more people there are, the less likely you will be to be the first person to use media viewer on a given file). But it's not worth obsessing over right now, because the low traffic of the tests sites makes it happen to us a whole lot more than it would in a context where every visitor has media viewer.

So, once we've settled what we do with the above variables, we can come up with acceptance criteria for media viewer's performance, which could look like:
- with a cold JS cache, on an average desktop resolution, with a varnish hit, media viewer shows the image in at most 100% of the time it takes for the file page to do the same

- with a warm JS cache, on an average desktop resolution, with a varnish hit, media viewer shows the image in at most 75% of the time it takes for the file page to do the same
- with a warm JS cache, on an large desktop resolution, with a varnish hit, media viewer shows the image in at most 120% of the time it takes for the file page to do the same

An added advantage to making this measurement automated is that it can be baked in as a test failure/success criteria. So if suddenly we make a code change that mistakenly makes the experience slower than our criteria, the team would be notified automatically.

On the topic of measuring detailed things from an end-user perspective (time it takes to display the blurred thumb, to hit next, etc.) I think that they're too complex to be worth doing at the moment, and we have nothing to compare them against. For example, the amount of time it takes to display the blurred image has no equivalent on the file page, so that figure can't really be used to determine success. Graphs of those figures expressed in user-centric terms would be easier to understand to outsiders, but in terms of troubleshooting technical issues they're not better than the data we're already collecting. They're worse, in fact, because any number of things could happen on the users' computers between action A and action B (browsers freezing tabs comes to mind) that would quickly render a lot of those virtual user-centric figures meaningless. I think we should focus on what makes the core experience better, not spending time building entertaining graphs.

On Tue, Mar 18, 2014 at 1:00 AM, Fabrice Florin <fflorin@wikimedia.org> wrote:

Hi Multimedia team (keeping it to a short list so we can reach closure soon on this important topic):

Did you have any comments on my email of Friday on the Image Load Study? (see below) That proposal was based on last week’s conversations with you guys.

If this general direction works for you, I propose the following main acceptance criteria from a performance standpoint:

Goal: The Media Viewer would be considered accepted if it can display 1-2 Mb images in less than 3 seconds at least 80% of the time, during the course of a week.

Verification: This goal could be verified with a histogram showing total load events in a week for 1-2 Mb images, with these deciles: number of image load events in under 1 second? in 1-2 seconds? in 2-3 seconds? … and so on, up to 10 seconds or more. If 80% of these events take place in the first three deciles, we would have reached our goal.

Would this seem like a reasonable basic measure of success for us in coming weeks? Or would you recommend another goal?

If we had more time, we could track a variety of other goals, but I am looking for a single metric we can focus on and actually measure in time for launch. If we want more granular criteria, I proposed other possible performance targets by image size in card #149.

On the assumption that this is a good direction to pursue, I propose we focus on the following 4 high priority cards for our next steps:

#149 Define acceptance performance criteria for the media viewer (see above, let’s edit as needed to reflect our team goal)

https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/149

#364 Instrumentation for timing of image load, lightbox UI load

https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/364

#292 Histograms and decile charts for performance

https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/292

#198 Analyze Image Load Data with Dashboards

https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/198/

I also created this Metrics Tasks Wall, based on Gergo’s Epic Story:#359, to make it easier to track all these tickets:
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards?favorite_id=11060&view=Metrics+Wall

Given our primary goal proposed above, I would recommend that we prioritize #364 and #292 over #198 — and postpone the bandwith-related tickets, as recommended in the P.S. below.

Please let me know what you think and what you recommend for our next steps.

Thanks,

Fabrice

P.S.: For now, I recommend that we de-emphasize these bandwith-related metrics, since they are unlikely to happen in our time-frame:

#361 Collect bandwidth stats
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/361

#340 More Image Load Dashboards by Bandwidth
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/340

On Mar 14, 2014, at 4:13 PM, Fabrice Florin <fflorin@wikimedia.org> wrote:

Hi everyone,

We would appreciate your advice on our upcoming research study of image load times on Media Viewer.

Here are proposed goals, questions and outcomes for this study. They are presented for discussion purposes, not as a prescriptive requirement - and will be adjusted based on your feedback.

I. Goals
The goal of this study is to determine whether or not Media Viewer is loading images fast enough for the majority of our users in most common situations.

As a typical user of the Media Viewer, I want images to load quickly, in just a few seconds, so I don't have to wait a long time to see them.

Here are our recommended performance targets for image load times by connection speed, to match user expectations on the Web:
* 1-2 seconds for a medium-size image on a fast connection
* 2-3 seconds for the same image on a medium connection
* 5-8 seconds for the same image a slow connection

If tracking connection speeds is too hard in our time-frame, we could base our performance targets on image size instead. For example:

* 1-2 seconds for a small-size image on a medium connection
* 2-3 seconds for medium-size image on the same connection
* 5-8 seconds for large-size image on the same connection

Definitions:

* Image load time = the number of seconds from when you click on a thumbnail to when you see the full image
* Image size: large = over 2Mb, medium = 1 to 2Mb, small = under 1Mb
* Connection speed: fast = over 256 Kbs, medium = 64 to 256 Kbs, slow = under 64 Kbs

The above numbers are for discussion purposes, and can be adjusted based on your feedback.

II. Questions
Here are the main research questions we propose to answer about image load performance.

1. How long does it take for an image to load for the conditions below?
(image load = total time from thumbnail click to full image display)

a. by image size:

load times for large images? medium images? small images?

b. by web site:
load times for mediawiki.org? commons? enwiki? frwiki? huwiki? other sites?

c. by connection speed: (optional)
load times for fast connections? medium connections? small connections? (this may not be feasible in our time frame)

d. by daypart: (optional)

load times for morning? afternoon? evening? night time? (to show if performance slows during peak hours)

This question could be answered by storing the timestamp for thumbnail clicks, as well as the timestamp for the full image display, then log the difference.

We would then prepare different bar graphs for each condition set above, with categories on the vertical axis, and number of seconds on the horizontal axis. The graphs could be based on data from the last 7 days.

2. How often does the image load time exceed our performance targets above?

a. by load time in a day:
number of images that load in under 1 second? in 1-2 seconds? in 2-3 seconds? … and so on, up to 10 seconds or more

b. by load time in a week:
number of images that load in under 1 second? in 1-2 seconds? in 2-3 seconds? … and so on, up to 10 seconds or more

This question could be answered by preparing different histograms, with number of images on the vertical axis, and number of seconds on the horizontal axis (deciles).

III. Outcomes

To answer these questions, we plan to collect data during our upcoming pilots on different sites in April.

Based on these pilot results, we will need to make decisions about the wider deployments planned for May.

Here are possible outcomes from this study:

Outcome 1: Favorable - e.g.: 80% of images load quickly
Action: Go ahead with current release plan to deploy Media Viewer everywhere by default.

Scenario 2: Neutral - e.g.: 50% of images load quickly
Action: Go ahead with current release plan, but deploy Media Viewer as an opt-in feature on wikis that don’t want it by default

Scenario 3: Unfavorable - e.g.: 20% of images load quickly
Action: Revisit release plan: consider making this opt-in everywhere — or work on faster image load solutions.

We would be grateful for your comments on this, so we can refine our plans before we start this study next week. Please let us know which metrics above seem most important, given that we only have a few developer days to collect and analyze a few key metrics in coming weeks, to determine if we are meeting our objectives. Some related links are included below, for your convenience.

To end on a positive note, we just deployed yesterday a new version of Media Viewer that is much faster, thanks to all the fine work from our development team. This morning, I looked at a variety on 'non-popular' images on enwiki today, and the Media Viewer experience was quite good overall. Most images load within the 2 second maximum which we recommend for a ‘fast’ connection — and this was a home wifi connection. I realize this is completely anecdotal, and not supported by hard data, so we can’t make any decisions about it. But it makes me hopeful that we are getting close to our objectives. Even compared to large commercial sites like Flickr, we hold up pretty well on this computer. :)

Thanks for your interest in this project.

All the best,

Fabrice

_______________________________

USEFUL LINKS

* Media Viewer Release Plan:
https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Release_Plan

* First Media Viewer Metrics:
http://multimedia-metrics.wmflabs.org/dashboards/mmv Metrics

* Media Viewer Test Page:
https://commons.wikimedia.org/wiki/Commons:Lightbox_demo

* Metrics Tasks under consideration (Mingle):

https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards?favorite_id=11060&view=Metrics+Wall

* Next Development Cycle (Mingle):
http://ur1.ca/gtvvr

* About Media Viewer:
https://www.mediawiki.org/wiki/Multimedia/About_Media_Viewer

_______________________________

Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation

http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)

_______________________________

Fabrice Florin
Product Manager
Wikimedia Foundation

http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)

--
Pau Giner
Interaction Designer
Wikimedia Foundation

_______________________________

Fabrice Florin
Product Manager
Wikimedia Foundation

http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)