+ multimedia
On Tue, Mar 18, 2014 at 11:44 AM, Aaron Arcos aarcos.wiki@gmail.com wrote:
On Tue, Mar 18, 2014 at 8:47 AM, Gergo Tisza gtisza@wikimedia.org wrote:
Gilles, thanks for the great analysis!
+1, good stuff !, I like where the discussion is heading, in particular the idea of comparing MV against the "current" implementation via a test, that's the way to go.
I just realized something...
Independent of this study/measurement, this project when ultimately launched, is gonna have a significant effect on raw page view counts. Therefore it is something that needs to be taken into account the overall view/edit statistics... Might want to brainstorm about that with Erik Zachte
DJ
On 18 mrt. 2014, at 19:45, Aaron Arcos aarcos.wiki@gmail.com wrote:
- multimedia
On Tue, Mar 18, 2014 at 11:44 AM, Aaron Arcos aarcos.wiki@gmail.com wrote: On Tue, Mar 18, 2014 at 8:47 AM, Gergo Tisza gtisza@wikimedia.org wrote: Gilles, thanks for the great analysis!
+1, good stuff !, I like where the discussion is heading, in particular the idea of comparing MV against the "current" implementation via a test, that's the way to go.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
I think page load time stats are already collected
Taking into account the time it takes to load images on the page as well?
We could fire off requests for all the thumbnails we need, then abort those
requests almost immediately.
I think that's a great idea! It's what I wanted to do in UploadWizard for the various bucket sizes, but I never thought of doing it right in Media Viewer. It doesn't solve Varnish misses for the first image you open, but it certainly guarantees smoother browsing through the other images. I'm all for doing it already, with a reasonable preloading distance (3 images forwards and backwards?) and only for the current bucket size.
Another possibility would be to measure if file page usage drops, although
this is complicated by the fact that MediaViewer uses the file page in share links.
I think we're going to link to the article with the hash link as soon as we make that work for everyone, regardless of having Media Viewer turned on in their preferences or not. But I think the real difficulty isn't counting image views on the Media Viewer size, it's counting when people click to open the full resolution on the file page. From the stats alone we can't take away the bots and the hotlinking in a way that makes us certain the remainder is solely deliberate human interaction. I think the only sane way is having some JS on the file page tracking clicks. All of this seems like a lot of work to me for such a minor stat. Even looking for these complicated solutions here, I feel that we're wasting our time over this one...
One thing we could do in addition that is to replicate the thumbnail URL
generation logic on the JS side.
Are you saying that we could actually do without thumbnailinfo API call entirely when it comes to knowing the URL of the thumb sizes we need? This would be huge, thumbnailinfo takes 429ms on average. Displaying the actual image almost half a second sooner on average would be a massive performance gain.
Or do you mean that it would only work on a default wiki configuration, in which case the thumb URL guessed by JS would only be used for a request+abort ahead of time to trigger the thumb generation if it needs to be triggered?
is gonna have a significant effect on raw page view counts
Do our analytics tools support virtual pageviews?
On Tue, Mar 18, 2014 at 8:47 PM, Derk-Jan Hartman hartman.wiki@gmail.comwrote:
I just realized something...
Independent of this study/measurement, this project when ultimately launched, is gonna have a significant effect on raw page view counts. Therefore it is something that needs to be taken into account the overall view/edit statistics... Might want to brainstorm about that with Erik Zachte
DJ
On 18 mrt. 2014, at 19:45, Aaron Arcos aarcos.wiki@gmail.com wrote:
- multimedia
On Tue, Mar 18, 2014 at 11:44 AM, Aaron Arcos aarcos.wiki@gmail.comwrote:
On Tue, Mar 18, 2014 at 8:47 AM, Gergo Tisza gtisza@wikimedia.orgwrote:
Gilles, thanks for the great analysis!
+1, good stuff !, I like where the discussion is heading, in particular the idea of comparing MV against the "current" implementation via a test, that's the way to go.
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
On Tue, Mar 18, 2014 at 3:24 PM, Gilles Dubuc gilles@wikimedia.org wrote:
I think page load time stats are already collected
Taking into account the time it takes to load images on the page as well?
Not sure. IIRC there is an extension doing loading page load stats via the ResourceTiming API, but I don't remember where I saw it or what exactly it did. I guess if there is such a thing, Ori would know about it?
One thing we could do in addition that is to replicate the thumbnail URL
generation logic on the JS side.
Are you saying that we could actually do without thumbnailinfo API call entirely when it comes to knowing the URL of the thumb sizes we need? This would be huge, thumbnailinfo takes 429ms on average. Displaying the actual image almost half a second sooner on average would be a massive performance gain.
Well, we could try :) The thumbnail URL is very simple in the huge majority of cases ( http://upload.wikimedia.org/<site>/thumb/<sha1 stuff>/<filename>/<width>px-<filename>, the site and sha1 part can be guessed from the existing thumbnail - if we have one, but that covers our existing use cases ) and very complicated in the rest (any number of additional parameters next to the width which might or might not be required, e.g. TIFF might have a page number; also there is a completely different scheme for long filenames where the normal scheme would exceed some length limit for old IE compatibility). Generating a thumbnail URL locally which works 99.9% of the cases would be very easy, if we know the full size of the image and have some existing thumbnail URL to start from. Deciding whether we are in the 99.9% would be less easy but still not that complicated (alternatively, we can just try to use it and listen for a 404). Covering the rest of the cases might or might not be possible; each file format has its own handler and I am unfamiliar with many of them. At best, it would be very painful to maintain the logic in two places.
(By the way, do those averages include beta sites? If they do, I would be cautious with the numbers: the beta cluster swamps our logging because there is no sampling on it, and it is significantly slower.)
Or do you mean that it would only work on a default wiki configuration, in
which case the thumb URL guessed by JS would only be used for a request+abort ahead of time to trigger the thumb generation if it needs to be triggered?
Wiki configuration might make the URL unguessable in extremely rare cases [1] but usually it is not problematic as far as I can see; the uncertain part is the file type handler. (Of course what type of handlers are installed could be considered part of the wiki configuration.) PNG/JPG/GIF are easy to guess though, and formats where you always use a thumbnail (as opposed to the original file) are also easy (so SVG is no problem either).
As said above, I think we could use thumb URL guesses to try to load the file and call the imageinfo API from an onerror handler if needed. Even if that is not an option, I think we could make safe guesses for PNG/JPG/GIF/SVG (and recognize without making any request when we are unable to guess) which is almost all of our images.
That's how I see it anyway, more experienced multimedia people might still be able to poke any number of holes in it :)
Do our analytics tools support virtual pageviews?
As far as I know, our "analytics tools" is pretty much a grep on the varnish logs... [2] at least that's what is publicly available, there might be more sophisticated internal tools that I haven't heard of.
[1] specifically, if the image we start from is full-sized and we need a smaller thumbnail, we need to know the thumbnail URL prefix, which could be anything. But 1) it is extremely rare that we need to display a smaller size than the thumbnail we start from, and 2) we get the thumb URL via the repoinfo API, we just don't wait for repoinfo with the image loading currently. (Also, we could just hardcode the WMF thumbnail URL prefix.)
(By the way, do those averages include beta sites?
Nope, beta sites aren't part of these averages. It's mediawiki,org, commons and the production wikipedias.
On Wed, Mar 19, 2014 at 12:52 AM, Gergo Tisza gtisza@wikimedia.org wrote:
On Tue, Mar 18, 2014 at 3:24 PM, Gilles Dubuc gilles@wikimedia.orgwrote:
I think page load time stats are already collected
Taking into account the time it takes to load images on the page as well?
Not sure. IIRC there is an extension doing loading page load stats via the ResourceTiming API, but I don't remember where I saw it or what exactly it did. I guess if there is such a thing, Ori would know about it?
One thing we could do in addition that is to replicate the thumbnail URL
generation logic on the JS side.
Are you saying that we could actually do without thumbnailinfo API call entirely when it comes to knowing the URL of the thumb sizes we need? This would be huge, thumbnailinfo takes 429ms on average. Displaying the actual image almost half a second sooner on average would be a massive performance gain.
Well, we could try :) The thumbnail URL is very simple in the huge majority of cases ( http://upload.wikimedia.org/<site>/thumb/<sha1 stuff>/<filename>/<width>px-<filename>, the site and sha1 part can be guessed from the existing thumbnail - if we have one, but that covers our existing use cases ) and very complicated in the rest (any number of additional parameters next to the width which might or might not be required, e.g. TIFF might have a page number; also there is a completely different scheme for long filenames where the normal scheme would exceed some length limit for old IE compatibility). Generating a thumbnail URL locally which works 99.9% of the cases would be very easy, if we know the full size of the image and have some existing thumbnail URL to start from. Deciding whether we are in the 99.9% would be less easy but still not that complicated (alternatively, we can just try to use it and listen for a 404). Covering the rest of the cases might or might not be possible; each file format has its own handler and I am unfamiliar with many of them. At best, it would be very painful to maintain the logic in two places.
(By the way, do those averages include beta sites? If they do, I would be cautious with the numbers: the beta cluster swamps our logging because there is no sampling on it, and it is significantly slower.)
Or do you mean that it would only work on a default wiki configuration, in
which case the thumb URL guessed by JS would only be used for a request+abort ahead of time to trigger the thumb generation if it needs to be triggered?
Wiki configuration might make the URL unguessable in extremely rare cases [1] but usually it is not problematic as far as I can see; the uncertain part is the file type handler. (Of course what type of handlers are installed could be considered part of the wiki configuration.) PNG/JPG/GIF are easy to guess though, and formats where you always use a thumbnail (as opposed to the original file) are also easy (so SVG is no problem either).
As said above, I think we could use thumb URL guesses to try to load the file and call the imageinfo API from an onerror handler if needed. Even if that is not an option, I think we could make safe guesses for PNG/JPG/GIF/SVG (and recognize without making any request when we are unable to guess) which is almost all of our images.
That's how I see it anyway, more experienced multimedia people might still be able to poke any number of holes in it :)
Do our analytics tools support virtual pageviews?
As far as I know, our "analytics tools" is pretty much a grep on the varnish logs... [2] at least that's what is publicly available, there might be more sophisticated internal tools that I haven't heard of.
[1] specifically, if the image we start from is full-sized and we need a smaller thumbnail, we need to know the thumbnail URL prefix, which could be anything. But 1) it is extremely rare that we need to display a smaller size than the thumbnail we start from, and 2) we get the thumb URL via the repoinfo API, we just don't wait for repoinfo with the image loading currently. (Also, we could just hardcode the WMF thumbnail URL prefix.)
[2] http://dumps.wikimedia.org/other/pagecounts-raw/
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia
multimedia@lists.wikimedia.org