Re: [Multimedia] Image Load Study - Goals, Questions and Outcomes - Multimedia

List overview All Threads
Download

newer

Re: [Multimedia] Image Load Study - Goals, Questions and Outcomes

older

Survey on Photo Sharing

Re: [Multimedia] Image Load Study...

Aaron Arcos

19 Mar 2014 19 Mar '14

3:45 a.m.

+ multimedia On Tue, Mar 18, 2014 at 11:44 AM, Aaron Arcos <aarcos.wiki(a)gmail.com> wrote:

...

On Tue, Mar 18, 2014 at 8:47 AM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:

Gilles, thanks for the great analysis!

+1, good stuff !, I like where the discussion is heading, in particular the idea of comparing MV against the "current" implementation via a test, that's the way to go.

Attachments:

attachment.htm (text/html — 1.1 KB)

Show replies by date

Derk-Jan Hartman

19 Mar 19 Mar

4:47 a.m.

New subject: Image Load Study - Goals, Questions and Outcomes

I just realized something... Independent of this study/measurement, this project when ultimately launched, is gonna have a significant effect on raw page view counts. Therefore it is something that needs to be taken into account the overall view/edit statistics... Might want to brainstorm about that with Erik Zachte DJ On 18 mrt. 2014, at 19:45, Aaron Arcos <aarcos.wiki(a)gmail.com> wrote:

...

+ multimedia On Tue, Mar 18, 2014 at 11:44 AM, Aaron Arcos <aarcos.wiki(a)gmail.com> wrote: On Tue, Mar 18, 2014 at 8:47 AM, Gergo Tisza <gtisza(a)wikimedia.org> wrote: Gilles, thanks for the great analysis! +1, good stuff !, I like where the discussion is heading, in particular the idea of comparing MV against the "current" implementation via a test, that's the way to go. _______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Gilles Dubuc

7:24 a.m.

New subject: Image Load Study - Goals, Questions and Outcomes

...

I think page load time stats are already collected

Taking into account the time it takes to load images on the page as well? We could fire off requests for all the thumbnails we need, then abort those

...

requests almost immediately.

I think that's a great idea! It's what I wanted to do in UploadWizard for the various bucket sizes, but I never thought of doing it right in Media Viewer. It doesn't solve Varnish misses for the first image you open, but it certainly guarantees smoother browsing through the other images. I'm all for doing it already, with a reasonable preloading distance (3 images forwards and backwards?) and only for the current bucket size. Another possibility would be to measure if file page usage drops, although

...

this is complicated by the fact that MediaViewer uses the file page in share links.

I think we're going to link to the article with the hash link as soon as we make that work for everyone, regardless of having Media Viewer turned on in their preferences or not. But I think the real difficulty isn't counting image views on the Media Viewer size, it's counting when people click to open the full resolution on the file page. From the stats alone we can't take away the bots and the hotlinking in a way that makes us certain the remainder is solely deliberate human interaction. I think the only sane way is having some JS on the file page tracking clicks. All of this seems like a lot of work to me for such a minor stat. Even looking for these complicated solutions here, I feel that we're wasting our time over this one... One thing we could do in addition that is to replicate the thumbnail URL

...

generation logic on the JS side.

Are you saying that we could actually do without thumbnailinfo API call entirely when it comes to knowing the URL of the thumb sizes we need? This would be huge, thumbnailinfo takes 429ms on average. Displaying the actual image almost half a second sooner on average would be a massive performance gain. Or do you mean that it would only work on a default wiki configuration, in which case the thumb URL guessed by JS would only be used for a request+abort ahead of time to trigger the thumb generation if it needs to be triggered?

...

is gonna have a significant effect on raw page view counts

Do our analytics tools support virtual pageviews? On Tue, Mar 18, 2014 at 8:47 PM, Derk-Jan Hartman <hartman.wiki(a)gmail.com>wrote;wrote:

...

On Tue, Mar 18, 2014 at 8:47 AM, Gergo Tisza <gtisza(a)wikimedia.org>wrote;wrote:

Gilles, thanks for the great analysis!

+1, good stuff !, I like where the discussion is heading, in particular the idea of comparing MV against the "current" implementation via a test, that's the way to go.

_______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia _______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Gergo Tisza

8:52 a.m.

New subject: Image Load Study - Goals, Questions and Outcomes

On Tue, Mar 18, 2014 at 3:24 PM, Gilles Dubuc <gilles(a)wikimedia.org> wrote:

...

I think page load time stats are already collected

Taking into account the time it takes to load images on the page as well?

Not sure. IIRC there is an extension doing loading page load stats via the ResourceTiming API, but I don't remember where I saw it or what exactly it did. I guess if there is such a thing, Ori would know about it?

...

One thing we could do in addition that is to replicate the thumbnail URL

generation logic on the JS side.

Well, we could try :) The thumbnail URL is very simple in the huge majority of cases ( http://upload.wikimedia.org/<site>/thumb/<sha1 stuff>/<filename>/<width>px-<filename>, the site and sha1 part can be guessed from the existing thumbnail - if we have one, but that covers our existing use cases ) and very complicated in the rest (any number of additional parameters next to the width which might or might not be required, e.g. TIFF might have a page number; also there is a completely different scheme for long filenames where the normal scheme would exceed some length limit for old IE compatibility). Generating a thumbnail URL locally which works 99.9% of the cases would be very easy, if we know the full size of the image and have some existing thumbnail URL to start from. Deciding whether we are in the 99.9% would be less easy but still not that complicated (alternatively, we can just try to use it and listen for a 404). Covering the rest of the cases might or might not be possible; each file format has its own handler and I am unfamiliar with many of them. At best, it would be very painful to maintain the logic in two places. (By the way, do those averages include beta sites? If they do, I would be cautious with the numbers: the beta cluster swamps our logging because there is no sampling on it, and it is significantly slower.) Or do you mean that it would only work on a default wiki configuration, in

...

which case the thumb URL guessed by JS would only be used for a request+abort ahead of time to trigger the thumb generation if it needs to be triggered?

Wiki configuration might make the URL unguessable in extremely rare cases [1] but usually it is not problematic as far as I can see; the uncertain part is the file type handler. (Of course what type of handlers are installed could be considered part of the wiki configuration.) PNG/JPG/GIF are easy to guess though, and formats where you always use a thumbnail (as opposed to the original file) are also easy (so SVG is no problem either). As said above, I think we could use thumb URL guesses to try to load the file and call the imageinfo API from an onerror handler if needed. Even if that is not an option, I think we could make safe guesses for PNG/JPG/GIF/SVG (and recognize without making any request when we are unable to guess) which is almost all of our images. That's how I see it anyway, more experienced multimedia people might still be able to poke any number of holes in it :) Do our analytics tools support virtual pageviews?

...

As far as I know, our "analytics tools" is pretty much a grep on the varnish logs... [2] at least that's what is publicly available, there might be more sophisticated internal tools that I haven't heard of. [1] specifically, if the image we start from is full-sized and we need a smaller thumbnail, we need to know the thumbnail URL prefix, which could be anything. But 1) it is extremely rare that we need to display a smaller size than the thumbnail we start from, and 2) we get the thumb URL via the repoinfo API, we just don't wait for repoinfo with the image loading currently. (Also, we could just hardcode the WMF thumbnail URL prefix.) [2] http://dumps.wikimedia.org/other/pagecounts-raw/

Gilles Dubuc

10:41 a.m.

New subject: Image Load Study - Goals, Questions and Outcomes

...

(By the way, do those averages include beta sites?

Nope, beta sites aren't part of these averages. It's mediawiki,org, commons and the production wikipedias. On Wed, Mar 19, 2014 at 12:52 AM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:

...

On Tue, Mar 18, 2014 at 3:24 PM, Gilles Dubuc <gilles(a)wikimedia.org>wrote;wrote:

I think page load time stats are already collected

Taking into account the time it takes to load images on the page as well?

One thing we could do in addition that is to replicate the thumbnail URL

generation logic on the JS side.

which case the thumb URL guessed by JS would only be used for a request+abort ahead of time to trigger the thumb generation if it needs to be triggered?

3690

days inactive

3691

days old

multimedia@lists.wikimedia.org

Manage subscription

4 comments

4 participants

tags (0)

participants (4)

Aaron Arcos
Derk-Jan Hartman
Gergo Tisza
Gilles Dubuc