Hi, I have a few questions regarding mobile stats.
I need to determine a real percentage of WAP browsers. At first glance, [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M / 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at [2] which have different numbers and a different ratio.
I did my own research: because during browser detection in Varnish WAPness is detected mostly by looking at accept header and because our current analytics infrastructure doesn't log it, I quickly whipped up a code that recorded user-agent and accept of every 10,000th request for mobile page views hitting apaches.
According to several days worth of data, out of 14917 logged requests 1445 contained vnd.wap.wml in Accept: headers in any form. That's more than what is logged for frontend responses, however it is expected as WAP should have worse cache hit rate and thus should hit apaches more often.
Next, our WAP detection code is very simple: user-agent is checked against a few major browser IDs (all of them are HTML-capable and this check is not actually needed anymore and will go away soon) and if still not known, we consider every device that sends Accept: header "vnd.wap.wml" (but not "application/vnd.wap.xhtml+xml"), to be WAP-only. If we apply these rules, we get only 68 entries that qualify as WAP which is 0.05% of all mobile requests.
The question is, what's wrong: my research or stats.wikimedia.org?
And if it's indeed just 0.05%, we should probably^W definitely kill WAP support on our mobile site as it's virtually unmaintained.
----- [1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm [2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm
Sadly you need to take squid log based reports with a grain of salt. Several incomplete maintenance jobs have taken their toll.
Each report starts with a long list of unsolved bugs. Among those https://bugzilla.wikimedia.org/show_bug.cgi?id=46273
So yeah better trust your own data.
Erik
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Tuesday, September 03, 2013 5:33 PM To: analytics@lists.wikimedia.org; Wikimedia developers; mobile-l Subject: [Analytics] Mobile stats
Hi, I have a few questions regarding mobile stats.
I need to determine a real percentage of WAP browsers. At first glance, [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M / 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at [2] which have different numbers and a different ratio.
I did my own research: because during browser detection in Varnish WAPness is detected mostly by looking at accept header and because our current analytics infrastructure doesn't log it, I quickly whipped up a code that recorded user-agent and accept of every 10,000th request for mobile page views hitting apaches.
According to several days worth of data, out of 14917 logged requests 1445 contained vnd.wap.wml in Accept: headers in any form. That's more than what is logged for frontend responses, however it is expected as WAP should have worse cache hit rate and thus should hit apaches more often.
Next, our WAP detection code is very simple: user-agent is checked against a few major browser IDs (all of them are HTML-capable and this check is not actually needed anymore and will go away soon) and if still not known, we consider every device that sends Accept: header "vnd.wap.wml" (but not "application/vnd.wap.xhtml+xml"), to be WAP-only. If we apply these rules, we get only 68 entries that qualify as WAP which is 0.05% of all mobile requests.
The question is, what's wrong: my research or stats.wikimedia.org?
And if it's indeed just 0.05%, we should probably^W definitely kill WAP support on our mobile site as it's virtually unmaintained.
----- [1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm [2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm
-- Best regards, Max Semenik ([[User:MaxSem]])
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks Max for digging into this :)
I'm no analytics guy, but I am a little concerned about the sample size and duration of the internal logging that we've done - sampling 1/10000 for only a few days for data about something we generally know usage to already be low seems to me like it might be difficult to get accurate numbers. Can someone from the analytics team chime in and let us know if the approach is sound and if we should trust the data Max has come up with? This has big implications as it will play role in determining whether or not we continue supporting WAP devices and providing WAP access to the sites.
Thanks everyone!
On Tue, Sep 3, 2013 at 10:40 AM, Erik Zachte ezachte@wikimedia.org wrote:
Sadly you need to take squid log based reports with a grain of salt. Several incomplete maintenance jobs have taken their toll.
Each report starts with a long list of unsolved bugs. Among those https://bugzilla.wikimedia.org/show_bug.cgi?id=46273
So yeah better trust your own data.
Erik
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Tuesday, September 03, 2013 5:33 PM To: analytics@lists.wikimedia.org; Wikimedia developers; mobile-l Subject: [Analytics] Mobile stats
Hi, I have a few questions regarding mobile stats.
I need to determine a real percentage of WAP browsers. At first glance, [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M / 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at [2] which have different numbers and a different ratio.
I did my own research: because during browser detection in Varnish WAPness is detected mostly by looking at accept header and because our current analytics infrastructure doesn't log it, I quickly whipped up a code that recorded user-agent and accept of every 10,000th request for mobile page views hitting apaches.
According to several days worth of data, out of 14917 logged requests 1445 contained vnd.wap.wml in Accept: headers in any form. That's more than what is logged for frontend responses, however it is expected as WAP should have worse cache hit rate and thus should hit apaches more often.
Next, our WAP detection code is very simple: user-agent is checked against a few major browser IDs (all of them are HTML-capable and this check is not actually needed anymore and will go away soon) and if still not known, we consider every device that sends Accept: header "vnd.wap.wml" (but not "application/vnd.wap.xhtml+xml"), to be WAP-only. If we apply these rules, we get only 68 entries that qualify as WAP which is 0.05% of all mobile requests.
The question is, what's wrong: my research or stats.wikimedia.org?
And if it's indeed just 0.05%, we should probably^W definitely kill WAP support on our mobile site as it's virtually unmaintained.
[1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm [2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this purpose as long as the data can be extracted from the user-agent string.
D On Wed, Sep 4, 2013 at 1:50 PM, Arthur Richards arichards@wikimedia.orgwrote:
Thanks Max for digging into this :)
I'm no analytics guy, but I am a little concerned about the sample size and duration of the internal logging that we've done - sampling 1/10000 for only a few days for data about something we generally know usage to already be low seems to me like it might be difficult to get accurate numbers. Can someone from the analytics team chime in and let us know if the approach is sound and if we should trust the data Max has come up with? This has big implications as it will play role in determining whether or not we continue supporting WAP devices and providing WAP access to the sites.
Thanks everyone!
On Tue, Sep 3, 2013 at 10:40 AM, Erik Zachte ezachte@wikimedia.orgwrote:
Sadly you need to take squid log based reports with a grain of salt. Several incomplete maintenance jobs have taken their toll.
Each report starts with a long list of unsolved bugs. Among those https://bugzilla.wikimedia.org/show_bug.cgi?id=46273
So yeah better trust your own data.
Erik
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Tuesday, September 03, 2013 5:33 PM To: analytics@lists.wikimedia.org; Wikimedia developers; mobile-l Subject: [Analytics] Mobile stats
Hi, I have a few questions regarding mobile stats.
I need to determine a real percentage of WAP browsers. At first glance, [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M / 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at [2] which have different numbers and a different ratio.
I did my own research: because during browser detection in Varnish WAPness is detected mostly by looking at accept header and because our current analytics infrastructure doesn't log it, I quickly whipped up a code that recorded user-agent and accept of every 10,000th request for mobile page views hitting apaches.
According to several days worth of data, out of 14917 logged requests 1445 contained vnd.wap.wml in Accept: headers in any form. That's more than what is logged for frontend responses, however it is expected as WAP should have worse cache hit rate and thus should hit apaches more often.
Next, our WAP detection code is very simple: user-agent is checked against a few major browser IDs (all of them are HTML-capable and this check is not actually needed anymore and will go away soon) and if still not known, we consider every device that sends Accept: header "vnd.wap.wml" (but not "application/vnd.wap.xhtml+xml"), to be WAP-only. If we apply these rules, we get only 68 entries that qualify as WAP which is 0.05% of all mobile requests.
The question is, what's wrong: my research or stats.wikimedia.org?
And if it's indeed just 0.05%, we should probably^W definitely kill WAP support on our mobile site as it's virtually unmaintained.
[1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm [2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I have some concerns about looking at this from a perspective of 'WAP browsers make up only X% of worldwide traffic'. This approach may bias towards the results of the US and Europe, where the majority of the internet users will overshadow what devices are in use in less developed areas like Africa.
I think this is worthwhile to look into, but I'd suggest being able to collect it in a way that we can look at the results on a country by country basis - so that we know whether dropping WAP has a significantly larger impact in say, Uganda, Bangladesh, etc.
- Dan
On Wed, Sep 4, 2013 at 5:04 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this purpose as long as the data can be extracted from the user-agent string.
D
On Wed, Sep 4, 2013 at 1:50 PM, Arthur Richards arichards@wikimedia.orgwrote:
Thanks Max for digging into this :)
I'm no analytics guy, but I am a little concerned about the sample size and duration of the internal logging that we've done - sampling 1/10000 for only a few days for data about something we generally know usage to already be low seems to me like it might be difficult to get accurate numbers. Can someone from the analytics team chime in and let us know if the approach is sound and if we should trust the data Max has come up with? This has big implications as it will play role in determining whether or not we continue supporting WAP devices and providing WAP access to the sites.
Thanks everyone!
On Tue, Sep 3, 2013 at 10:40 AM, Erik Zachte ezachte@wikimedia.orgwrote:
Sadly you need to take squid log based reports with a grain of salt. Several incomplete maintenance jobs have taken their toll.
Each report starts with a long list of unsolved bugs. Among those https://bugzilla.wikimedia.org/show_bug.cgi?id=46273
So yeah better trust your own data.
Erik
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Tuesday, September 03, 2013 5:33 PM To: analytics@lists.wikimedia.org; Wikimedia developers; mobile-l Subject: [Analytics] Mobile stats
Hi, I have a few questions regarding mobile stats.
I need to determine a real percentage of WAP browsers. At first glance, [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M / 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at [2] which have different numbers and a different ratio.
I did my own research: because during browser detection in Varnish WAPness is detected mostly by looking at accept header and because our current analytics infrastructure doesn't log it, I quickly whipped up a code that recorded user-agent and accept of every 10,000th request for mobile page views hitting apaches.
According to several days worth of data, out of 14917 logged requests 1445 contained vnd.wap.wml in Accept: headers in any form. That's more than what is logged for frontend responses, however it is expected as WAP should have worse cache hit rate and thus should hit apaches more often.
Next, our WAP detection code is very simple: user-agent is checked against a few major browser IDs (all of them are HTML-capable and this check is not actually needed anymore and will go away soon) and if still not known, we consider every device that sends Accept: header "vnd.wap.wml" (but not "application/vnd.wap.xhtml+xml"), to be WAP-only. If we apply these rules, we get only 68 entries that qualify as WAP which is 0.05% of all mobile requests.
The question is, what's wrong: my research or stats.wikimedia.org?
And if it's indeed just 0.05%, we should probably^W definitely kill WAP support on our mobile site as it's virtually unmaintained.
[1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm [2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On 05.09.2013, 4:04 Diederik wrote:
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this purpose as long as the data can be extracted from the user-agent string.
Unfortunately, accept is no less important here. So, to enumerate our requirements as a result of this thread: * Sampling rate the same as wikistats (1/1000). * No less than a week worth of data. * User-agent: * Accept: * Country from GeoIP to determine the share of developing countries. * Wiki to determine if some wikis are more dependant on WAP than other ones.
Anything else?
For a breakdown per country, the higher the sampling rate the better, as the data will become reliable even for smaller countries with a not so great adoption rate of Wikipedia.
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Thursday, September 05, 2013 12:28 PM To: Diederik van Liere Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.; mobile-l; Wikimedia developers Subject: Re: [Analytics] [WikimediaMobile] Mobile stats
On 05.09.2013, 4:04 Diederik wrote:
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this purpose as long as the data can be extracted from the user-agent string.
Unfortunately, accept is no less important here. So, to enumerate our requirements as a result of this thread: * Sampling rate the same as wikistats (1/1000). * No less than a week worth of data. * User-agent: * Accept: * Country from GeoIP to determine the share of developing countries. * Wiki to determine if some wikis are more dependant on WAP than other ones.
Anything else?
-- Best regards, Max Semenik ([[User:MaxSem]])
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Would adding the accept header to the x-analytics header be worthwhile for this? On Sep 5, 2013 4:16 AM, "Erik Zachte" ezachte@wikimedia.org wrote:
For a breakdown per country, the higher the sampling rate the better, as the data will become reliable even for smaller countries with a not so great adoption rate of Wikipedia.
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Thursday, September 05, 2013 12:28 PM To: Diederik van Liere Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.; mobile-l; Wikimedia developers Subject: Re: [Analytics] [WikimediaMobile] Mobile stats
On 05.09.2013, 4:04 Diederik wrote:
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this purpose as long as the data can be extracted from the user-agent string.
Unfortunately, accept is no less important here. So, to enumerate our requirements as a result of this thread:
- Sampling rate the same as wikistats (1/1000).
- No less than a week worth of data.
- User-agent:
- Accept:
- Country from GeoIP to determine the share of developing countries.
- Wiki to determine if some wikis are more dependant on WAP than other ones.
Anything else?
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Wikipedia Zero traffic (IP address and MCC/MNC matching as expected) shows in one day of requests (zero.tsv.log-20130907) roughly 7-9% of page responses having a Content-Type response of "text/vnd.wap.wml", presuming field #11 (or index 10 if you're indexing from 0) in zero.tsv.log-<date> is the Content-Type. Do I understand correctly that field as Content-Type?
Thanks. -Adam
On Thu, Sep 5, 2013 at 9:27 AM, Arthur Richards arichards@wikimedia.orgwrote:
Would adding the accept header to the x-analytics header be worthwhile for this? On Sep 5, 2013 4:16 AM, "Erik Zachte" ezachte@wikimedia.org wrote:
For a breakdown per country, the higher the sampling rate the better, as the data will become reliable even for smaller countries with a not so great adoption rate of Wikipedia.
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Thursday, September 05, 2013 12:28 PM To: Diederik van Liere Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.; mobile-l; Wikimedia developers Subject: Re: [Analytics] [WikimediaMobile] Mobile stats
On 05.09.2013, 4:04 Diederik wrote:
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this purpose as long as the data can be extracted from the user-agent string.
Unfortunately, accept is no less important here. So, to enumerate our requirements as a result of this thread:
- Sampling rate the same as wikistats (1/1000).
- No less than a week worth of data.
- User-agent:
- Accept:
- Country from GeoIP to determine the share of developing countries.
- Wiki to determine if some wikis are more dependant on WAP than other ones.
Anything else?
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Hoi, Is the Wikipedia-Zero traffic information part of the mobile statistics or is it something completely separate thing? Thanks, GerardM
On 10 September 2013 03:26, Adam Baso abaso@wikimedia.org wrote:
Wikipedia Zero traffic (IP address and MCC/MNC matching as expected) shows in one day of requests (zero.tsv.log-20130907) roughly 7-9% of page responses having a Content-Type response of "text/vnd.wap.wml", presuming field #11 (or index 10 if you're indexing from 0) in zero.tsv.log-<date> is the Content-Type. Do I understand correctly that field as Content-Type?
Thanks. -Adam
On Thu, Sep 5, 2013 at 9:27 AM, Arthur Richards <arichards@wikimedia.org
wrote:
Would adding the accept header to the x-analytics header be worthwhile
for
this? On Sep 5, 2013 4:16 AM, "Erik Zachte" ezachte@wikimedia.org wrote:
For a breakdown per country, the higher the sampling rate the better, as the data will become reliable even for smaller countries with a not so great adoption rate of Wikipedia.
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Thursday, September 05, 2013 12:28 PM To: Diederik van Liere Cc: A mailing list for the Analytics Team at WMF and everybody who has
an
interest in Wikipedia and analytics.; mobile-l; Wikimedia developers Subject: Re: [Analytics] [WikimediaMobile] Mobile stats
On 05.09.2013, 4:04 Diederik wrote:
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this purpose as long as the data can be extracted from the user-agent string.
Unfortunately, accept is no less important here. So, to enumerate our requirements as a result of this thread:
- Sampling rate the same as wikistats (1/1000).
- No less than a week worth of data.
- User-agent:
- Accept:
- Country from GeoIP to determine the share of developing countries.
- Wiki to determine if some wikis are more dependant on WAP than other ones.
Anything else?
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Somewhere in between, I think.
Wikipedia Zero's main extension, ZeroRatedMobileAccess, relies upon the mobile web's main extension, MobileFrontend. Wikipedia Zero access is served across [lang.].zero.wikipedia.org and [lang.].m.wikipedia.org.
As I understand, the general Varnish logs capture both the Wikipedia Zero-based and the non-Wikipedia Zero-based mobile web access. These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Wikipedia Zero for the mobile web will in all likelihood have a higher rate of WAP device usage and WAP content served when compared to the general Wikipedia for the mobile web stats. It's likely that, to at least some extent, that higher WAP usage in participating Wikipedia Zero markets, would be washed out by the relatively higher adoption of smartphones in wealthier markets.
Please do let me know in case of a need for further clarification!
-Adam
On Tue, Sep 10, 2013 at 4:04 AM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
Hoi, Is the Wikipedia-Zero traffic information part of the mobile statistics or is it something completely separate thing? Thanks, GerardM
On 10 September 2013 03:26, Adam Baso abaso@wikimedia.org wrote:
Wikipedia Zero traffic (IP address and MCC/MNC matching as expected)
shows
in one day of requests (zero.tsv.log-20130907) roughly 7-9% of page responses having a Content-Type response of "text/vnd.wap.wml", presuming field #11 (or index 10 if you're indexing from 0) in zero.tsv.log-<date>
is
the Content-Type. Do I understand correctly that field as Content-Type?
Thanks. -Adam
On Thu, Sep 5, 2013 at 9:27 AM, Arthur Richards <arichards@wikimedia.org
wrote:
Would adding the accept header to the x-analytics header be worthwhile
for
this? On Sep 5, 2013 4:16 AM, "Erik Zachte" ezachte@wikimedia.org wrote:
For a breakdown per country, the higher the sampling rate the better,
as
the data will become reliable even for smaller countries with a not so great adoption rate of Wikipedia.
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Thursday, September 05, 2013 12:28 PM To: Diederik van Liere Cc: A mailing list for the Analytics Team at WMF and everybody who has
an
interest in Wikipedia and analytics.; mobile-l; Wikimedia developers Subject: Re: [Analytics] [WikimediaMobile] Mobile stats
On 05.09.2013, 4:04 Diederik wrote:
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this
purpose
as long as the data can be extracted from the user-agent string.
Unfortunately, accept is no less important here. So, to enumerate our requirements as a result of this thread:
- Sampling rate the same as wikistats (1/1000).
- No less than a week worth of data.
- User-agent:
- Accept:
- Country from GeoIP to determine the share of developing countries.
- Wiki to determine if some wikis are more dependant on WAP than other ones.
Anything else?
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a "zero=" tag in the X-Analytics header:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
Do I understand correctly that field as Content-Type?
Yup again! The varnishncsa format string that is currently being beamed at udp2log is here:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
On Sep 10, 2013, at 4:25 PM, Adam Baso abaso@wikimedia.org wrote:
Somewhere in between, I think.
Wikipedia Zero's main extension, ZeroRatedMobileAccess, relies upon the mobile web's main extension, MobileFrontend. Wikipedia Zero access is served across [lang.].zero.wikipedia.org and [lang.].m.wikipedia.org.
As I understand, the general Varnish logs capture both the Wikipedia Zero-based and the non-Wikipedia Zero-based mobile web access. These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Wikipedia Zero for the mobile web will in all likelihood have a higher rate of WAP device usage and WAP content served when compared to the general Wikipedia for the mobile web stats. It's likely that, to at least some extent, that higher WAP usage in participating Wikipedia Zero markets, would be washed out by the relatively higher adoption of smartphones in wealthier markets.
Please do let me know in case of a need for further clarification!
-Adam
On Tue, Sep 10, 2013 at 4:04 AM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
Hoi, Is the Wikipedia-Zero traffic information part of the mobile statistics or is it something completely separate thing? Thanks, GerardM
On 10 September 2013 03:26, Adam Baso abaso@wikimedia.org wrote:
Wikipedia Zero traffic (IP address and MCC/MNC matching as expected)
shows
in one day of requests (zero.tsv.log-20130907) roughly 7-9% of page responses having a Content-Type response of "text/vnd.wap.wml", presuming field #11 (or index 10 if you're indexing from 0) in zero.tsv.log-<date>
is
the Content-Type. Do I understand correctly that field as Content-Type?
Thanks. -Adam
On Thu, Sep 5, 2013 at 9:27 AM, Arthur Richards <arichards@wikimedia.org
wrote:
Would adding the accept header to the x-analytics header be worthwhile
for
this? On Sep 5, 2013 4:16 AM, "Erik Zachte" ezachte@wikimedia.org wrote:
For a breakdown per country, the higher the sampling rate the better,
as
the data will become reliable even for smaller countries with a not so great adoption rate of Wikipedia.
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Thursday, September 05, 2013 12:28 PM To: Diederik van Liere Cc: A mailing list for the Analytics Team at WMF and everybody who has
an
interest in Wikipedia and analytics.; mobile-l; Wikimedia developers Subject: Re: [Analytics] [WikimediaMobile] Mobile stats
On 05.09.2013, 4:04 Diederik wrote:
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this
purpose
as long as the data can be extracted from the user-agent string.
Unfortunately, accept is no less important here. So, to enumerate our requirements as a result of this thread:
- Sampling rate the same as wikistats (1/1000).
- No less than a week worth of data.
- User-agent:
- Accept:
- Country from GeoIP to determine the share of developing countries.
- Wiki to determine if some wikis are more dependant on WAP than other
ones.
Anything else?
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial.
On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto otto@wikimedia.org wrote:
These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a "zero=" tag in the X-Analytics header:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
Do I understand correctly that field as Content-Type?
Yup again! The varnishncsa format string that is currently being beamed at udp2log is here:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
On Sep 10, 2013, at 4:25 PM, Adam Baso abaso@wikimedia.org wrote:
Somewhere in between, I think.
Wikipedia Zero's main extension, ZeroRatedMobileAccess, relies upon the mobile web's main extension, MobileFrontend. Wikipedia Zero access is served across [lang.].zero.wikipedia.org and [lang.].m.wikipedia.org.
As I understand, the general Varnish logs capture both the Wikipedia Zero-based and the non-Wikipedia Zero-based mobile web access. These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Wikipedia Zero for the mobile web will in all likelihood have a higher
rate
of WAP device usage and WAP content served when compared to the general Wikipedia for the mobile web stats. It's likely that, to at least some extent, that higher WAP usage in participating Wikipedia Zero markets, would be washed out by the relatively higher adoption of smartphones in wealthier markets.
Please do let me know in case of a need for further clarification!
-Adam
On Tue, Sep 10, 2013 at 4:04 AM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
Hoi, Is the Wikipedia-Zero traffic information part of the mobile statistics
or
is it something completely separate thing? Thanks, GerardM
On 10 September 2013 03:26, Adam Baso abaso@wikimedia.org wrote:
Wikipedia Zero traffic (IP address and MCC/MNC matching as expected)
shows
in one day of requests (zero.tsv.log-20130907) roughly 7-9% of page responses having a Content-Type response of "text/vnd.wap.wml",
presuming
field #11 (or index 10 if you're indexing from 0) in
zero.tsv.log-<date>
is
the Content-Type. Do I understand correctly that field as Content-Type?
Thanks. -Adam
On Thu, Sep 5, 2013 at 9:27 AM, Arthur Richards <
arichards@wikimedia.org
wrote:
Would adding the accept header to the x-analytics header be worthwhile
for
this? On Sep 5, 2013 4:16 AM, "Erik Zachte" ezachte@wikimedia.org wrote:
For a breakdown per country, the higher the sampling rate the better,
as
the data will become reliable even for smaller countries with a not
so
great adoption rate of Wikipedia.
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Thursday, September 05, 2013 12:28 PM To: Diederik van Liere Cc: A mailing list for the Analytics Team at WMF and everybody who
has
an
interest in Wikipedia and analytics.; mobile-l; Wikimedia developers Subject: Re: [Analytics] [WikimediaMobile] Mobile stats
On 05.09.2013, 4:04 Diederik wrote:
> Heya, > I would suggest to at least run it for a 7 day period so you capture > at least the weekly time-trends, increasing the sample size should > also be recommendable. We can help setup a udp-filter for this
purpose
> as long as the data can be extracted from the user-agent string.
Unfortunately, accept is no less important here. So, to enumerate our requirements as a result of this thread:
- Sampling rate the same as wikistats (1/1000).
- No less than a week worth of data.
- User-agent:
- Accept:
- Country from GeoIP to determine the share of developing countries.
- Wiki to determine if some wikis are more dependant on WAP than
other
ones.
Anything else?
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
After looking at Varnish VCL with Adam, we discovered a bug in regex resulting in many phones being detected as WAP when they shouldn't be. Since the older change[1] simplifying detection had also fixed this bug, Brandon Black deployed it and since today the usage share of WAP should seriously drop. We will be monitoring the situation and revisit the issue of WAP popularity once we have enough data.
[1] https://gerrit.wikimedia.org/r/83919
On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso abaso@wikimedia.org wrote:
Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial.
On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto otto@wikimedia.org wrote:
These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a "zero=" tag in the X-Analytics header:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
Do I understand correctly that field as Content-Type?
Yup again! The varnishncsa format string that is currently being beamed at udp2log is here:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
That's awesome - thanks Max and Adam; it's great to see the last vestiges of X-Device finally disappear!
On Tue, Sep 17, 2013 at 1:07 PM, Max Semenik maxsem.wiki@gmail.com wrote:
After looking at Varnish VCL with Adam, we discovered a bug in regex resulting in many phones being detected as WAP when they shouldn't be. Since the older change[1] simplifying detection had also fixed this bug, Brandon Black deployed it and since today the usage share of WAP should seriously drop. We will be monitoring the situation and revisit the issue of WAP popularity once we have enough data.
[1] https://gerrit.wikimedia.org/r/83919
On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso abaso@wikimedia.org wrote:
Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial.
On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto otto@wikimedia.org wrote:
These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a "zero=" tag in the X-Analytics header:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
Do I understand correctly that field as Content-Type?
Yup again! The varnishncsa format string that is currently being beamed at udp2log is here:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
-- Best regards, Max Semenik ([[User:MaxSem]])
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
A run on yesterday's valid Wikipedia Zero hits showed that user agents NOT supporting HTML (i.e., only supporting WAP) is only 0.098 - 0.108 *percent*.
Assuming a bunch of complaints don't come in (e.g., "I'm getting tag soup!", as Max might say), I think we could make a reasonable case to stop supporting WAP through the formal channels (blog, mailing list(s), etc.).
-Adam
On Tue, Sep 17, 2013 at 1:11 PM, Arthur Richards arichards@wikimedia.orgwrote:
That's awesome - thanks Max and Adam; it's great to see the last vestiges of X-Device finally disappear!
On Tue, Sep 17, 2013 at 1:07 PM, Max Semenik maxsem.wiki@gmail.comwrote:
After looking at Varnish VCL with Adam, we discovered a bug in regex resulting in many phones being detected as WAP when they shouldn't be. Since the older change[1] simplifying detection had also fixed this bug, Brandon Black deployed it and since today the usage share of WAP should seriously drop. We will be monitoring the situation and revisit the issue of WAP popularity once we have enough data.
[1] https://gerrit.wikimedia.org/r/83919
On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso abaso@wikimedia.org wrote:
Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial.
On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto otto@wikimedia.org wrote:
These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a "zero=" tag in the X-Analytics header:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
Do I understand correctly that field as Content-Type?
Yup again! The varnishncsa format string that is currently being beamed at udp2log is here:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
-- Best regards, Max Semenik ([[User:MaxSem]])
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
+Analytics
On Thu, Sep 19, 2013 at 1:57 PM, Adam Baso abaso@wikimedia.org wrote:
A run on yesterday's valid Wikipedia Zero hits showed that user agents NOT supporting HTML (i.e., only supporting WAP) is only 0.098 - 0.108 *percent*.
Assuming a bunch of complaints don't come in (e.g., "I'm getting tag soup!", as Max might say), I think we could make a reasonable case to stop supporting WAP through the formal channels (blog, mailing list(s), etc.).
-Adam
On Tue, Sep 17, 2013 at 1:11 PM, Arthur Richards arichards@wikimedia.orgwrote:
That's awesome - thanks Max and Adam; it's great to see the last vestiges of X-Device finally disappear!
On Tue, Sep 17, 2013 at 1:07 PM, Max Semenik maxsem.wiki@gmail.comwrote:
After looking at Varnish VCL with Adam, we discovered a bug in regex resulting in many phones being detected as WAP when they shouldn't be. Since the older change[1] simplifying detection had also fixed this bug, Brandon Black deployed it and since today the usage share of WAP should seriously drop. We will be monitoring the situation and revisit the issue of WAP popularity once we have enough data.
[1] https://gerrit.wikimedia.org/r/83919
On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso abaso@wikimedia.org wrote:
Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial.
On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto otto@wikimedia.orgwrote:
These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a "zero=" tag in the X-Analytics header:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
Do I understand correctly that field as Content-Type?
Yup again! The varnishncsa format string that is currently being beamed at udp2log is here:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
-- Best regards, Max Semenik ([[User:MaxSem]])
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Oh awesome! Glad y'all found it!
On Sep 19, 2013, at 5:01 PM, Adam Baso abaso@wikimedia.org wrote:
+Analytics
On Thu, Sep 19, 2013 at 1:57 PM, Adam Baso abaso@wikimedia.org wrote: A run on yesterday's valid Wikipedia Zero hits showed that user agents NOT supporting HTML (i.e., only supporting WAP) is only 0.098 - 0.108 *percent*.
Assuming a bunch of complaints don't come in (e.g., "I'm getting tag soup!", as Max might say), I think we could make a reasonable case to stop supporting WAP through the formal channels (blog, mailing list(s), etc.).
-Adam
On Tue, Sep 17, 2013 at 1:11 PM, Arthur Richards arichards@wikimedia.org wrote: That's awesome - thanks Max and Adam; it's great to see the last vestiges of X-Device finally disappear!
On Tue, Sep 17, 2013 at 1:07 PM, Max Semenik maxsem.wiki@gmail.com wrote: After looking at Varnish VCL with Adam, we discovered a bug in regex resulting in many phones being detected as WAP when they shouldn't be. Since the older change[1] simplifying detection had also fixed this bug, Brandon Black deployed it and since today the usage share of WAP should seriously drop. We will be monitoring the situation and revisit the issue of WAP popularity once we have enough data.
[1] https://gerrit.wikimedia.org/r/83919
On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso abaso@wikimedia.org wrote: Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial.
On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto otto@wikimedia.org wrote:
These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic.
Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a "zero=" tag in the X-Analytics header:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
Do I understand correctly that field as Content-Type?
Yup again! The varnishncsa format string that is currently being beamed at udp2log is here:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df...
-- Best regards, Max Semenik ([[User:MaxSem]])
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
wikitech-l@lists.wikimedia.org