Thanks Max for digging into this :)

I'm no analytics guy, but I am a little concerned about the sample size and duration of the internal logging that we've done - sampling 1/10000 for only a few days for data about something we generally know usage to already be low seems to me like it might be difficult to get accurate numbers. Can someone from the analytics team chime in and let us know if the approach is sound and if we should trust the data Max has come up with? This has big implications as it will play role in determining whether or not we continue supporting WAP devices and providing WAP access to the sites.

Thanks everyone!


On Tue, Sep 3, 2013 at 10:40 AM, Erik Zachte <ezachte@wikimedia.org> wrote:
Sadly you need to take squid log based reports with a grain of salt.
Several incomplete maintenance jobs have taken their toll.

Each report starts with a long list of unsolved bugs.
Among those https://bugzilla.wikimedia.org/show_bug.cgi?id=46273

So yeah better trust your own data.

Erik


-----Original Message-----
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik
Sent: Tuesday, September 03, 2013 5:33 PM
To: analytics@lists.wikimedia.org; Wikimedia developers; mobile-l
Subject: [Analytics] Mobile stats

Hi, I have a few questions regarding mobile stats.

I need to determine a real percentage of WAP browsers. At first glance, [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M / 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at [2] which have different numbers and a different ratio.

I did my own research: because during browser detection in Varnish WAPness is detected mostly by looking at accept header and because our current analytics infrastructure doesn't log it, I quickly whipped up a code that recorded user-agent and accept of every 10,000th request for mobile page views hitting apaches.

According to several days worth of data, out of 14917 logged requests
1445 contained vnd.wap.wml in Accept: headers in any form. That's more than what is logged for frontend responses, however it is expected as WAP should have worse cache hit rate and thus should hit apaches more often.

Next, our WAP detection code is very simple: user-agent is checked against a few major browser IDs (all of them are HTML-capable and this check is not actually needed anymore and will go away soon) and if still not known, we consider every device that sends Accept:
header "vnd.wap.wml" (but not "application/vnd.wap.xhtml+xml"), to be WAP-only. If we apply these rules, we get only 68 entries that qualify as WAP which is 0.05% of all mobile requests.

The question is, what's wrong: my research or stats.wikimedia.org?

And if it's indeed just 0.05%, we should probably^W definitely kill WAP support on our mobile site as it's virtually unmaintained.

-----
[1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm
[2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm



--
Best regards,
  Max Semenik ([[User:MaxSem]])


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l



--
Arthur Richards
Software Engineer, Mobile
[[User:Awjrichards]]
IRC: awjr
+1-415-839-6885 x6687