There's a ticket for removing mobile.wikipedia.org and wap.wikipedia.org domains/subdomains, which are legacy domain names superceded by m.wikipedia.org and its subdomains.
https://phabricator.wikimedia.org/T104942
The rationale for the removal of these legacy domain names is to help support HSTS preloading in browsers with the existing TLS SAN cert.
After review of the ticket, can anyone think of a compelling reason to keep those old domain names?
I'm going to open a separate thread on mobile-l about this given this is more mobile-targeted, yet some people only operate on one of wikitech-l or mobile-l.
-Adam
... Have we done any analysis on usage of those subdomains?
On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org wrote:
There's a ticket for removing mobile.wikipedia.org and wap.wikipedia.org domains/subdomains, which are legacy domain names superceded by m.wikipedia.org and its subdomains.
https://phabricator.wikimedia.org/T104942
The rationale for the removal of these legacy domain names is to help support HSTS preloading in browsers with the existing TLS SAN cert.
After review of the ticket, can anyone think of a compelling reason to keep those old domain names?
I'm going to open a separate thread on mobile-l about this given this is more mobile-targeted, yet some people only operate on one of wikitech-l or mobile-l.
-Adam _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Looks like the user pageviews for wap.wikipedia.org and mobile.wikipedia.org subdomains are approximately 0.02% of the size of pageviews for m.wikipedia.org subdomains based on a recent one day check.
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. mobile.wikipedia.org') and is_pageview = true and agent_type = 'user';
35,543
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and uri_host like '%.m.wikipedia.org' and is_pageview = true and agent_type = 'user';
202,024,891
On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com wrote:
... Have we done any analysis on usage of those subdomains?
On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org wrote:
There's a ticket for removing mobile.wikipedia.org and wap.wikipedia.org domains/subdomains, which are legacy domain names superceded by m.wikipedia.org and its subdomains.
https://phabricator.wikimedia.org/T104942
The rationale for the removal of these legacy domain names is to help support HSTS preloading in browsers with the existing TLS SAN cert.
After review of the ticket, can anyone think of a compelling reason to
keep
those old domain names?
I'm going to open a separate thread on mobile-l about this given this is more mobile-targeted, yet some people only operate on one of wikitech-l
or
mobile-l.
-Adam _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Can we look at a wider sample? using a single day as judgement factor is a bad idea. However if the data supports your position I dont see any serious problems. You might want to take a look at either the UA's or refering sources to see if there is a primary source for the traffic and mitigate that.
On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso abaso@wikimedia.org wrote:
Looks like the user pageviews for wap.wikipedia.org and mobile.wikipedia.org subdomains are approximately 0.02% of the size of pageviews for m.wikipedia.org subdomains based on a recent one day check.
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. mobile.wikipedia.org') and is_pageview = true and agent_type = 'user';
35,543
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and uri_host like '%.m.wikipedia.org' and is_pageview = true and agent_type = 'user';
202,024,891
On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com wrote:
... Have we done any analysis on usage of those subdomains?
On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org wrote:
There's a ticket for removing mobile.wikipedia.org and
wap.wikipedia.org
domains/subdomains, which are legacy domain names superceded by m.wikipedia.org and its subdomains.
https://phabricator.wikimedia.org/T104942
The rationale for the removal of these legacy domain names is to help support HSTS preloading in browsers with the existing TLS SAN cert.
After review of the ticket, can anyone think of a compelling reason to
keep
those old domain names?
I'm going to open a separate thread on mobile-l about this given this
is
more mobile-targeted, yet some people only operate on one of wikitech-l
or
mobile-l.
-Adam _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi John --
What do you think would be a better sample? My feeling is that a 24 hour period captures global usage and we're currently at about .01% of page views come to these domains is a pretty good indicator. Keep in mind we're doing this for a legitimate technical reason and not arbitrarily. Looking at the UAs is a good idea and we will do that.
thanks,
-Toby
On Thu, Jul 16, 2015 at 6:55 AM, John phoenixoverride@gmail.com wrote:
Can we look at a wider sample? using a single day as judgement factor is a bad idea. However if the data supports your position I dont see any serious problems. You might want to take a look at either the UA's or refering sources to see if there is a primary source for the traffic and mitigate that.
On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso abaso@wikimedia.org wrote:
Looks like the user pageviews for wap.wikipedia.org and mobile.wikipedia.org subdomains are approximately 0.02% of the size of pageviews for m.wikipedia.org subdomains based on a recent one day check.
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. mobile.wikipedia.org') and is_pageview = true and agent_type = 'user';
35,543
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and uri_host like '%.m.wikipedia.org' and is_pageview = true and agent_type = 'user';
202,024,891
On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com wrote:
... Have we done any analysis on usage of those subdomains?
On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org
wrote:
There's a ticket for removing mobile.wikipedia.org and
wap.wikipedia.org
domains/subdomains, which are legacy domain names superceded by m.wikipedia.org and its subdomains.
https://phabricator.wikimedia.org/T104942
The rationale for the removal of these legacy domain names is to help support HSTS preloading in browsers with the existing TLS SAN cert.
After review of the ticket, can anyone think of a compelling reason
to
keep
those old domain names?
I'm going to open a separate thread on mobile-l about this given this
is
more mobile-targeted, yet some people only operate on one of
wikitech-l
or
mobile-l.
-Adam _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
1 day isnt much to base a decision on, especially on a global level. Normally I would use a sample set of at least a week, to a month of values. Sorry if I seem like im being a pain, I have just seen a lot of bad choices made based off limited data sets. With a wider data set we might find that Tuesdays are the slowest day for traffic, or some other factor that skews the data. Ensuring data validation is important when making these types of calls based off the working dataset.
On Thu, Jul 16, 2015 at 11:05 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi John --
What do you think would be a better sample? My feeling is that a 24 hour period captures global usage and we're currently at about .01% of page views come to these domains is a pretty good indicator. Keep in mind we're doing this for a legitimate technical reason and not arbitrarily. Looking at the UAs is a good idea and we will do that.
thanks,
-Toby
On Thu, Jul 16, 2015 at 6:55 AM, John phoenixoverride@gmail.com wrote:
Can we look at a wider sample? using a single day as judgement factor is
a
bad idea. However if the data supports your position I dont see any
serious
problems. You might want to take a look at either the UA's or refering sources to see if there is a primary source for the traffic and mitigate that.
On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso abaso@wikimedia.org wrote:
Looks like the user pageviews for wap.wikipedia.org and mobile.wikipedia.org subdomains are approximately 0.02% of the size of pageviews for m.wikipedia.org subdomains based on a recent one day check.
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. mobile.wikipedia.org') and is_pageview = true and agent_type = 'user';
35,543
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and uri_host like '%.m.wikipedia.org' and is_pageview = true and agent_type = 'user';
202,024,891
On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com
wrote:
... Have we done any analysis on usage of those subdomains?
On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org
wrote:
There's a ticket for removing mobile.wikipedia.org and
wap.wikipedia.org
domains/subdomains, which are legacy domain names superceded by m.wikipedia.org and its subdomains.
https://phabricator.wikimedia.org/T104942
The rationale for the removal of these legacy domain names is to
help
support HSTS preloading in browsers with the existing TLS SAN cert.
After review of the ticket, can anyone think of a compelling reason
to
keep
those old domain names?
I'm going to open a separate thread on mobile-l about this given
this
is
more mobile-targeted, yet some people only operate on one of
wikitech-l
or
mobile-l.
-Adam _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
No problem, I'll run some extra queries.
On Thu, Jul 16, 2015 at 9:04 AM, John phoenixoverride@gmail.com wrote:
1 day isnt much to base a decision on, especially on a global level. Normally I would use a sample set of at least a week, to a month of values. Sorry if I seem like im being a pain, I have just seen a lot of bad choices made based off limited data sets. With a wider data set we might find that Tuesdays are the slowest day for traffic, or some other factor that skews the data. Ensuring data validation is important when making these types of calls based off the working dataset.
On Thu, Jul 16, 2015 at 11:05 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi John --
What do you think would be a better sample? My feeling is that a 24 hour period captures global usage and we're currently at about .01% of page views come to these domains is a pretty good indicator. Keep in mind
we're
doing this for a legitimate technical reason and not arbitrarily. Looking at the UAs is a good idea and we will do that.
thanks,
-Toby
On Thu, Jul 16, 2015 at 6:55 AM, John phoenixoverride@gmail.com wrote:
Can we look at a wider sample? using a single day as judgement factor
is
a
bad idea. However if the data supports your position I dont see any
serious
problems. You might want to take a look at either the UA's or refering sources to see if there is a primary source for the traffic and
mitigate
that.
On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso abaso@wikimedia.org
wrote:
Looks like the user pageviews for wap.wikipedia.org and mobile.wikipedia.org subdomains are approximately 0.02% of the size of pageviews for m.wikipedia.org subdomains based on a recent one day check.
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. mobile.wikipedia.org') and is_pageview = true and agent_type = 'user';
35,543
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and uri_host like '%.m.wikipedia.org' and is_pageview = true and agent_type = 'user';
202,024,891
On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com
wrote:
... Have we done any analysis on usage of those subdomains?
On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org
wrote:
There's a ticket for removing mobile.wikipedia.org and
wap.wikipedia.org
domains/subdomains, which are legacy domain names superceded by m.wikipedia.org and its subdomains.
https://phabricator.wikimedia.org/T104942
The rationale for the removal of these legacy domain names is to
help
support HSTS preloading in browsers with the existing TLS SAN
cert.
After review of the ticket, can anyone think of a compelling
reason
to
keep
those old domain names?
I'm going to open a separate thread on mobile-l about this given
this
is
more mobile-targeted, yet some people only operate on one of
wikitech-l
or
mobile-l.
-Adam _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I posted the query results at https://phabricator.wikimedia.org/T104942#1458981. I _believe_ there wasn't significant skew on particular days that would taint the initially reported number, although there were small variations as expected.
On Thu, Jul 16, 2015 at 9:33 AM, Adam Baso abaso@wikimedia.org wrote:
No problem, I'll run some extra queries.
On Thu, Jul 16, 2015 at 9:04 AM, John phoenixoverride@gmail.com wrote:
1 day isnt much to base a decision on, especially on a global level. Normally I would use a sample set of at least a week, to a month of values. Sorry if I seem like im being a pain, I have just seen a lot of bad choices made based off limited data sets. With a wider data set we might find that Tuesdays are the slowest day for traffic, or some other factor that skews the data. Ensuring data validation is important when making these types of calls based off the working dataset.
On Thu, Jul 16, 2015 at 11:05 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi John --
What do you think would be a better sample? My feeling is that a 24 hour period captures global usage and we're currently at about .01% of page views come to these domains is a pretty good indicator. Keep in mind
we're
doing this for a legitimate technical reason and not arbitrarily.
Looking
at the UAs is a good idea and we will do that.
thanks,
-Toby
On Thu, Jul 16, 2015 at 6:55 AM, John phoenixoverride@gmail.com
wrote:
Can we look at a wider sample? using a single day as judgement factor
is
a
bad idea. However if the data supports your position I dont see any
serious
problems. You might want to take a look at either the UA's or refering sources to see if there is a primary source for the traffic and
mitigate
that.
On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso abaso@wikimedia.org
wrote:
Looks like the user pageviews for wap.wikipedia.org and mobile.wikipedia.org subdomains are approximately 0.02% of the size of pageviews for m.wikipedia.org subdomains based on a recent one day check.
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. mobile.wikipedia.org') and is_pageview = true and agent_type = 'user';
35,543
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and uri_host like '%.m.wikipedia.org' and is_pageview = true and agent_type = 'user';
202,024,891
On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com
wrote:
... Have we done any analysis on usage of those subdomains?
On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org
wrote:
> There's a ticket for removing mobile.wikipedia.org and
wap.wikipedia.org
> domains/subdomains, which are legacy domain names superceded by > m.wikipedia.org and its subdomains. > > https://phabricator.wikimedia.org/T104942 > > The rationale for the removal of these legacy domain names is to
help
> support HSTS preloading in browsers with the existing TLS SAN
cert.
> > After review of the ticket, can anyone think of a compelling
reason
to
keep > those old domain names? > > I'm going to open a separate thread on mobile-l about this given
this
is
> more mobile-targeted, yet some people only operate on one of
wikitech-l
or > mobile-l. > > -Adam > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Is this good to go?
On Thu, Jul 16, 2015 at 3:03 PM, Adam Baso abaso@wikimedia.org wrote:
I posted the query results at https://phabricator.wikimedia.org/T104942#1458981. I _believe_ there wasn't significant skew on particular days that would taint the initially reported number, although there were small variations as expected.
On Thu, Jul 16, 2015 at 9:33 AM, Adam Baso abaso@wikimedia.org wrote:
No problem, I'll run some extra queries.
On Thu, Jul 16, 2015 at 9:04 AM, John phoenixoverride@gmail.com wrote:
1 day isnt much to base a decision on, especially on a global level. Normally I would use a sample set of at least a week, to a month of values. Sorry if I seem like im being a pain, I have just seen a lot of bad choices made based off limited data sets. With a wider data set we might find that Tuesdays are the slowest day for traffic, or some other factor that skews the data. Ensuring data validation is important when making these types of calls based off the working dataset.
On Thu, Jul 16, 2015 at 11:05 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi John --
What do you think would be a better sample? My feeling is that a 24
hour
period captures global usage and we're currently at about .01% of page views come to these domains is a pretty good indicator. Keep in mind
we're
doing this for a legitimate technical reason and not arbitrarily.
Looking
at the UAs is a good idea and we will do that.
thanks,
-Toby
On Thu, Jul 16, 2015 at 6:55 AM, John phoenixoverride@gmail.com
wrote:
Can we look at a wider sample? using a single day as judgement
factor is
a
bad idea. However if the data supports your position I dont see any
serious
problems. You might want to take a look at either the UA's or
refering
sources to see if there is a primary source for the traffic and
mitigate
that.
On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso abaso@wikimedia.org
wrote:
Looks like the user pageviews for wap.wikipedia.org and mobile.wikipedia.org subdomains are approximately 0.02% of the size of pageviews for m.wikipedia.org subdomains based on a recent one day check.
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. mobile.wikipedia.org') and is_pageview = true and agent_type = 'user';
35,543
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and uri_host like '%.m.wikipedia.org' and is_pageview = true and agent_type = 'user';
202,024,891
On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com
wrote:
> ... Have we done any analysis on usage of those subdomains? > > On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org
wrote:
> > > There's a ticket for removing mobile.wikipedia.org and wap.wikipedia.org > > domains/subdomains, which are legacy domain names superceded by > > m.wikipedia.org and its subdomains. > > > > https://phabricator.wikimedia.org/T104942 > > > > The rationale for the removal of these legacy domain names is
to
help
> > support HSTS preloading in browsers with the existing TLS SAN
cert.
> > > > After review of the ticket, can anyone think of a compelling
reason
to
> keep > > those old domain names? > > > > I'm going to open a separate thread on mobile-l about this
given
this
is > > more mobile-targeted, yet some people only operate on one of
wikitech-l
> or > > mobile-l. > > > > -Adam > > _______________________________________________ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Anyone against this?
On Wed, Jul 22, 2015 at 12:55 PM, Adam Baso abaso@wikimedia.org wrote:
Is this good to go?
On Thu, Jul 16, 2015 at 3:03 PM, Adam Baso abaso@wikimedia.org wrote:
I posted the query results at https://phabricator.wikimedia.org/T104942#1458981. I _believe_ there wasn't significant skew on particular days that would taint the initially reported number, although there were small variations as expected.
On Thu, Jul 16, 2015 at 9:33 AM, Adam Baso abaso@wikimedia.org wrote:
No problem, I'll run some extra queries.
On Thu, Jul 16, 2015 at 9:04 AM, John phoenixoverride@gmail.com wrote:
1 day isnt much to base a decision on, especially on a global level. Normally I would use a sample set of at least a week, to a month of values. Sorry if I seem like im being a pain, I have just seen a lot of bad choices made based off limited data sets. With a wider data set we might find that Tuesdays are the slowest day for traffic, or some other factor that skews the data. Ensuring data validation is important when making these types of calls based off the working dataset.
On Thu, Jul 16, 2015 at 11:05 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi John --
What do you think would be a better sample? My feeling is that a 24
hour
period captures global usage and we're currently at about .01% of page views come to these domains is a pretty good indicator. Keep in mind
we're
doing this for a legitimate technical reason and not arbitrarily.
Looking
at the UAs is a good idea and we will do that.
thanks,
-Toby
On Thu, Jul 16, 2015 at 6:55 AM, John phoenixoverride@gmail.com
wrote:
Can we look at a wider sample? using a single day as judgement
factor is
a
bad idea. However if the data supports your position I dont see any
serious
problems. You might want to take a look at either the UA's or
refering
sources to see if there is a primary source for the traffic and
mitigate
that.
On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso abaso@wikimedia.org
wrote:
> Looks like the user pageviews for wap.wikipedia.org and > mobile.wikipedia.org > subdomains are approximately 0.02% of the size of pageviews for > m.wikipedia.org subdomains based on a recent one day check. > > hive> select count(*) from > wmf.webrequest where > year = 2015 and month = 7 and day = 14 > and access_method = 'mobile web' > and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. > mobile.wikipedia.org') > and is_pageview = true and agent_type = 'user'; > > 35,543 > > hive> select count(*) from > wmf.webrequest where > year = 2015 and month = 7 and day = 14 > and access_method = 'mobile web' > and uri_host like '%.m.wikipedia.org' > and is_pageview = true and agent_type = 'user'; > > 202,024,891 > > > On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com
wrote:
> > > ... Have we done any analysis on usage of those subdomains? > > > > On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso <abaso@wikimedia.org
wrote: > > > > > There's a ticket for removing mobile.wikipedia.org and > wap.wikipedia.org > > > domains/subdomains, which are legacy domain names superceded
by
> > > m.wikipedia.org and its subdomains. > > > > > > https://phabricator.wikimedia.org/T104942 > > > > > > The rationale for the removal of these legacy domain names is
to
help
> > > support HSTS preloading in browsers with the existing TLS SAN
cert.
> > > > > > After review of the ticket, can anyone think of a compelling
reason
to > > keep > > > those old domain names? > > > > > > I'm going to open a separate thread on mobile-l about this
given
this
> is > > > more mobile-targeted, yet some people only operate on one of wikitech-l > > or > > > mobile-l. > > > > > > -Adam > > > _______________________________________________ > > > Wikitech-l mailing list > > > Wikitech-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > _______________________________________________ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 24 July 2015 at 16:18, Adam Baso abaso@wikimedia.org wrote:
Anyone against this?
Do it.
J.
Hi,
If you look at https://phabricator.wikimedia.org/T104942#1436332 (linked from this thread, before Adam posted his own data) an analysis was done on a file called "per-domain-count" which we previously extracted from sampled 1:1000 logs for approximately 25 days for all kinds of domain-popularity purposes and cleanups that we've been doing as part of the HTTPS project (more background at https://phabricator.wikimedia.org/T102827#1429852 and also see T102826, T102814, T102815).
Those logs above are sampled and aren't as accurate as the Hadoop data Adam used due to other infrastructure faults that have happened in that 25-day period but they are generally okay for extracting those broad conclusions, especially if we look at the relative popularity of e.g. .wap. vs. .m. rather than the absolute numbers.
Finally, note that in any case there is a hard limitation of a look-behind window of 90 days due to our data retention policy, as well as practical considerations for extracting results from unsampled logs for larger periods of time. You're absolutely right, though, that a 1-day sample is usually not enough, especially considering the seasonality of data like e.g. a very different mobile-to-desktop ratio on weekends.
Faidon
On Thu, Jul 16, 2015 at 09:55:14AM -0400, John wrote:
Can we look at a wider sample? using a single day as judgement factor is a bad idea. However if the data supports your position I dont see any serious problems. You might want to take a look at either the UA's or refering sources to see if there is a primary source for the traffic and mitigate that.
On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso abaso@wikimedia.org wrote:
Looks like the user pageviews for wap.wikipedia.org and mobile.wikipedia.org subdomains are approximately 0.02% of the size of pageviews for m.wikipedia.org subdomains based on a recent one day check.
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%. mobile.wikipedia.org') and is_pageview = true and agent_type = 'user';
35,543
hive> select count(*) from wmf.webrequest where year = 2015 and month = 7 and day = 14 and access_method = 'mobile web' and uri_host like '%.m.wikipedia.org' and is_pageview = true and agent_type = 'user';
202,024,891
On Thu, Jul 16, 2015 at 5:41 AM, John phoenixoverride@gmail.com wrote:
... Have we done any analysis on usage of those subdomains?
On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso abaso@wikimedia.org wrote:
There's a ticket for removing mobile.wikipedia.org and
wap.wikipedia.org
domains/subdomains, which are legacy domain names superceded by m.wikipedia.org and its subdomains.
https://phabricator.wikimedia.org/T104942
The rationale for the removal of these legacy domain names is to help support HSTS preloading in browsers with the existing TLS SAN cert.
After review of the ticket, can anyone think of a compelling reason to
keep
those old domain names?
I'm going to open a separate thread on mobile-l about this given this
is
more mobile-targeted, yet some people only operate on one of wikitech-l
or
mobile-l.
-Adam _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org