Thanks Timo for taking the time to write this.
>There are also non-MediaWiki environments (ab)using bits.wikimedia.org and bypassing the startup module. As such these are loading javascript modules directly, regardless of browser. There are at least two of these that I know of:I think our raw hive data probably does not includes the traffic from tools or wikipedia.org (need to confirm). But even if it did, the traffic of tools on bits is not significant compared to the one from wikipedia thus does not affect the overall results as we are throwing away the longtail. Note that couple days worth of traffic might be more than a 1 billion requests for javascript on bits.
>Actually, there are probably about a dozen more exceptions I can think of. I don't believe it is feasibly possible to filter everything out.Statistically I do not think you need to, given the volume of traffic in wikipedia versus the other sources, you just cannot report results with a precision of, say, 0.001%. Even very small wikis - whose traffic is insignificant compared to english wikipedia- are also being thrown away.