So, in sequence:

Gergo: Either the false anchors are sent to the server or some conniving elf has been inserting thousands of fake requests into our logs ;). I'm seeing a lot of requests with #mediaviewer/ URLs, some internal and some with referers from outside the WMF (implying someone following a link). The proposed ways forward are useful, but as Erik M says, reorganising active products for the sake of avoiding a pageviews filter is probably not worth it unless it's a truly trivial change, so let's just stick with the status quo for now and I'll build in a filter.

Gilles: see above, re Erik's comments.

Thanks to everyone for their commentary and help; I'll build a filter into the definition this morning :)

On 26 November 2014 at 05:07, Gilles Dubuc <gilles@wikimedia.org> wrote:
Server logs of page hits provide less and less value in terms of knowing what people are doing (was it ever possible to truly tell bots apart from humans? to compensate for caching proxies run by organizations?), the more client-side and mobile apps we develop. I think that it's inevitable that any meaningful tracking will have to be done client-side. Looking for ways to adapt our URL schemes for the sake of server logs seems like rearranging the deck chairs on the titanic to me. We should be trying to put as little work into it as possible. Our stats efforts should be rather focused on more fine-grained client-side and mobile tracking, which is what we need to truly answer questions, even on our old "static" pages like the articles themselves. The same way that I've been working on tracking how long images are being viewed for at the Amsterdam hackathon in preparation for Erik Zachte's RFC on image views, we should be doing the same sort of measurements on articles.

On Wed, Nov 26, 2014 at 12:51 AM, Gergo Tisza <gtisza@wikimedia.org> wrote:
On Tue, Nov 25, 2014 at 1:59 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Actually, I'd argue it's not equivalent at all, for two reasons:

  1. it doesn't present all of the same data. In fact, it presents very little data, compared to a pageview of the "File" page;
  2. The argument behind MMV is, as I understand it, that people are focusing on the images. It is designed so that people do so, on the basis that people clicking on images probably want those images. As such, it'd be inaccurate to weight it as equivalent to say https://az.wikipedia.org/wiki/Mar%C3%A7ello_Malpigi in textual value - we believe (correct me if I'm wrong) that someone clicking for an image wants a media file, not a wall of text.

MediaViewer hash loads and File page requests have little to do with each other. File page request happens when 1) someone clicks on a thumbnail, 2) someone shares the URL of a file page and someone else follows that URL. In the case of MediaViewer, only the first case results in a text/html request to the server. The second case (which is about 30x more frequent) only results in a bunch of AJAX calls and an image request (actually more than one, due to preloading). Those AJAX calls could easily be made unique, if that is of any interest.

So basically when you click on an image, MediaViewer uses AJAX requests to load some of the information from the file page, then creates an <img> tag so the browser loads a large image thumbnail. When you visit an URL ending in #mediaviewer/..., that just tells the MV code to simulate an image click as soon as the page has loaded.

_______________________________________________
Multimedia mailing list
Multimedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/multimedia



_______________________________________________
Multimedia mailing list
Multimedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/multimedia




--
Oliver Keyes
Research Analyst
Wikimedia Foundation