On Thu, Feb 5, 2015 at 1:09 PM, Oliver Keyes okeyes@wikimedia.org wrote:
My concern is not pageview-related, my concern is simply me being stuffy and grumbling that we're making the client execute an extra request that takes non-null bandwidth (and making the server handle said requests) solely for GLAM purposes - noting that I accept that it is necessary /for/ those purposes, and that I am totally fine with us doing that if we can't see a smarter and less disruptive way of achieving the same thing.
On a very generic level, accurately understanding user behavior (which is key for writing high-quality software and producing high-quality content) is not possible without client-side code having a way to send information to the server, and that will take extra requests and extra bandwidth. In the long term, there might be smarter ways which take less extra requests and less bandwidth (request batching, websockets, SPDY etc); file view stats tracking needs to be done soon-ish though and I don't think there is an easy and quick way which is less disruptive than the naive approach of generating fake request logs.
To put that disruptiveness in perspective, though, a random English Wikipedia page seems to be around 50K total traffic with a warm cache. A random image request by MediaViewer is maybe 100-200K. An empty request to log a virtual file view is a few hundred bytes so it will increase the traffic by ~0.1%. By the most inclusive definition, there are about 25M file views per day in MediaViewer; total server requests are in the range of 2B per month according to this slightly outdated stat http://stats.wikimedia.org/#requests so again the increase is in the range of ~0.1%.