On Tue, Mar 18, 2014 at 3:24 PM, Gilles Dubuc <gilles@wikimedia.org> wrote:
I think page load time stats are already collected

Taking into account the time it takes to load images on the page as well?

Not sure. IIRC there is an extension doing loading page load stats via the ResourceTiming API, but I don't remember where I saw it or what exactly it did.
I guess if there is such a thing, Ori would know about it?
 
One thing we could do in addition that is to replicate the thumbnail URL generation logic on the JS side.

Are you saying that we could actually do without thumbnailinfo API call entirely when it comes to knowing the URL of the thumb sizes we need? This would be huge, thumbnailinfo takes 429ms on average. Displaying the actual image almost half a second sooner on average would be a massive performance gain.

Well, we could try :) The thumbnail URL is very simple in the huge majority of cases ( http://upload.wikimedia.org/<site>/thumb/<sha1 stuff>/<filename>/<width>px-<filename>, the site and sha1 part can be guessed from the existing thumbnail - if we have one, but that covers our existing use cases ) and very complicated in the rest (any number of additional parameters next to the width which might or might not be required, e.g. TIFF might have a page number; also there is a completely different scheme for long filenames where the normal scheme would exceed some length limit for old IE compatibility). Generating a thumbnail URL locally which works 99.9% of the cases would be very easy, if we know the full size of the image and have some existing thumbnail URL to start from. Deciding whether we are in the 99.9% would be less easy but still not that complicated (alternatively, we can just try to use it and listen for a 404). Covering the rest of the cases might or might not be possible; each file format has its own handler and I am unfamiliar with many of them. At best, it would be very painful to maintain the logic in two places.

(By the way, do those averages include beta sites? If they do, I would be cautious with the numbers: the beta cluster swamps our logging because there is no sampling on it, and it is significantly slower.)

Or do you mean that it would only work on a default wiki configuration, in which case the thumb URL guessed by JS would only be used for a request+abort ahead of time to trigger the thumb generation if it needs to be triggered?

Wiki configuration might make the URL unguessable in extremely rare cases [1] but usually it is not problematic as far as I can see; the uncertain part is the file type handler. (Of course what type of handlers are installed could be considered part of the wiki configuration.) PNG/JPG/GIF are easy to guess though, and formats where you always use a thumbnail (as opposed to the original file) are also easy (so SVG is no problem either).

As said above, I think we could use thumb URL guesses to try to load the file and call the imageinfo API from an onerror handler if needed. Even if that is not an option, I think we could make safe guesses for PNG/JPG/GIF/SVG (and recognize without making any request when we are unable to guess) which is almost all of our images.

That's how I see it anyway, more experienced multimedia people might still be able to poke any number of holes in it :)

Do our analytics tools support virtual pageviews?

As far as I know, our "analytics tools" is pretty much a grep on the varnish logs...  [2] at least that's what is publicly available, there might be more sophisticated internal tools that I haven't heard of.


[1] specifically, if the image we start from is full-sized and we need a smaller thumbnail, we need to know the thumbnail URL prefix, which could be anything. But 1) it is extremely rare that we need to display a smaller size than the thumbnail we start from, and 2) we get the thumb URL via the repoinfo API, we just don't wait for repoinfo with the image loading currently. (Also, we could just hardcode the WMF thumbnail URL prefix.)

[2] http://dumps.wikimedia.org/other/pagecounts-raw/