On Thu, Feb 5, 2015 at 1:09 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
My concern is not pageview-related, my concern is simply me being
stuffy and grumbling that we're making the client execute an extra
request that takes non-null bandwidth (and making the server handle
said requests) solely for GLAM purposes - noting that I accept that it
is necessary /for/ those purposes, and that I am totally fine with us
doing that if we can't see a smarter and less disruptive way of
achieving the same thing.

 On a very generic level, accurately understanding user behavior (which is key for writing high-quality software and producing high-quality content) is not possible without client-side code having a way to send information to the server, and that will take extra requests and extra bandwidth. In the long term, there might be smarter ways which take less extra requests and less bandwidth (request batching, websockets, SPDY etc); file view stats tracking needs to be done soon-ish though and I don't think there is an easy and quick way which is less disruptive than the naive approach of generating fake request logs.

To put that disruptiveness in perspective, though, a random English Wikipedia page seems to be around 50K total traffic with a warm cache. A random image request by MediaViewer is maybe 100-200K. An empty request to log a virtual file view is a few hundred bytes so it will increase the traffic by ~0.1%. By the most inclusive definition, there are about 25M file views per day in MediaViewer; total server requests are in the range of 2B per month according to this slightly outdated stat so again the increase is in the range of ~0.1%.