Hi,
Here's a few thoughts about what may influence the data you're gathering.
The decision of whether a browser has sufficient support for our Grade A runtime happens
client-side based on a combination of feature tests and (unfortunately) user-agent
sniffing.
For this reason, our bootstrap script is written using only the most basic syntax and
prototype methods (as any other methods would cause a run-time exception). For those
familiar, this is somewhat similar to PHP version detection in MediaWiki. The file has to
parse and run to a certain point in very old environments.
The following requests are not part of our primary javascript payload and should be
excluded when interpreting
bits.wikimedia.org requests for purposes of javascript
"support":
* stylesheets (e.g. ".css" requests as well as load.php?...&only=styles
requests)
* images (e.g. ".png", ".svg" etc. as well as
load.php?...&image=.. requests)
* favicons and apple-touch icons (e.g.
bits.wikimedia.org/favicon/..,
bits.wikimedia.org/apple-touch/..)
* fonts (e.g.
bits.wikimedia.org/static-../../fonts/..)
* events (e.g.
bits.wikimedia.org/event.gif,
bits.wikimedia.org/statsv)
* startup module (
bits.wikimedia.org/../load.php?..modules=startup)
There are also non-MediaWiki environments (ab)using
bits.wikimedia.org and bypassing the
startup module. As such these are loading javascript modules directly, regardless of
browser. There are at least two of these that I know of:
1) Tool labs tools. Developers there may use
bits.wikimedia.org to serve modules like
jQuery UI. They may circumvent the startup module and unconditionally load those (which
will cause errors in older browsers, but they don't care or are unaware of how this
works).
2) Portals such as
www.wikipedia.org and others.
For the data to be as reliable as feasibly possible, one would want to filter out these
"forged" requests not produced by MediaWiki. The best way to filter out requests
that bypassed the startup module is to filter out requests with no version= query
parameter. As well as request with an outdated version parameter (since they can copy an
old url and hardcode it in their app).
Actually, there are probably about a dozen more exceptions I can think of. I don't
believe it is feasibly possible to filter everything out. Perhaps focus your next
data-gathering window on a specific payload url - instead of trying to catch all
javascript payloads with exclusions for wrong ones.
For example, right now in MediaWiki 1.25wmf18 the jquery/mediawiki base payload has
version 20150225T221331Z and is requested by the startup module from url (grabbed from the
Network tab in Chrome Dev Tools):
https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=e…
Using only a specific url like that to gather user agents that support javascript will
have considerably less false positives.
If you want to incorporate multiple wikis, it'll be a little more work to get all the
right urls, but handpicking a dozen wikis will probably be good enough.
This also has the advantage of not being biased by devices cache size. Because, unlike all
other modules, the base module is not cached in the LocalStorage. It will still benefit
HTTP 304 caching however. It would help to have your window start simultaneously with the
deployment of a new wmf branch to
en.wikipedia.org (and other wikis you include in the
experiment) so there's a fresh start with caching.
</braindump>
— Timo
On 18 Feb 2015, at 18:07, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
Do you think
it's worth getting the UA distribution for CSS requests & correlate it with the
distribution for page / JS loading?
Yes, we can do that. I would need to gather a
new dataset for it so I've made a new task for it
(
https://phabricator.wikimedia.org/T89847), marking this one as complete:
https://phabricator.wikimedia.org/T88560
I also like to do some research regarding IE6 /IE7 as we should see those (according to
our code:
https://github.com/wikimedia/mediawiki/blob/master/resources/src/startup.js) in
the no JS list but we only see some UA agents there. There are definitely IE6/IE7 browsers
to which we are serving javascript, just have to look in detail what is what we are
serving there. Will report on this. Looks like this startup.js file is being served to all
browsers regardless, so I might need to do some more fine grained queries.
Just consider the 3% as your approximate upper bound for overall traffic, big bots
removed. If you just count mobile traffic, numbers in percentage are, of course, a lot
higher.
Thanks,
Nuria
On 17 Feb 2015, at 03:38, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
Gabriel:
I have run through the data and have a rough estimate of how many of our pageviews are
requested from browsers w/o strong javascript support. It is a preliminary rough estimate
but I think is pretty useful.
TL;DR
According to our new pageview definition
(
https://meta.wikimedia.org/wiki/Research:Page_view) about 10% of pageviews come from
clients w/o much javascript support. But - BIG CAVEAT- this includes bots requests. If you
remove the easy-too-spot-big-bots the percentage is <3%.
Details here (still some homework to do regarding IE6 and IE7)
https://www.mediawiki.org/wiki/Analytics/Reports/ClientsWithoutJavascript
Thanks,
Nuria