+External
Hi, I realized I don't get any responses from internal--but Joseph sent me something helpful to me this morning so I saw all the responses..up to that point. I think.
Anyway, thanks for the help!! The strange thing for me seems to be that the numbers I get don't make that much sense to me. For beta, (using query below) I get:
Unique IPs num_pvs referrer
3638 5967 external
1972 5760 internal
I would have expected a much larger external-->internal referrer ratio. In other words, I would have expected that the vast majority of sessions or even ips only hit the site 1x in a given hour. Instead, I am seeing that 54% of IPs are clicking a link within that hour... I would probably expect to see #'s no more than 10%.
I am probably doing something wrong, right? I *know* that I am making convenient assumptions here that do not apply to edge cases, so let's not consider those unless you think they make a big difference. Perhaps by using the referer field I am inherently leaving out all of the external traffic for which we do not have data?
Thanks!
-J
SELECT COUNT(DISTINCT ip) AS Unique_IPs, x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs, CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external' END AS session_depth FROM wmf.webrequest WHERE TRUE = TRUE AND webrequest_source = 'mobile' AND year = 2015 AND month = 5 AND day = 25 and hour = 1 AND agent_type = "user" AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NOT NULL AND uri_host like "%en.m.wikipedia.org%" GROUP BY CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external' END, x_analytics_map['mf-m'] ORDER BY hits DESC LIMIT 50;
On Thu, May 28, 2015 at 2:30 PM, Jon Katz jkatz@wikimedia.org wrote:
Hi, Trying to run a hive query to rough-count number of 1-page-only, 'sessions' on mobile-web Here is the error I get
FAILED: ParseException line 15:22 missing KW_END at 'device_family' near 'device_family' line 15:35 missing EOF at ''] <> "Spider"\n AND is_pageview = TRUE\n AND x_analytics_map['' near 'device_family
Here is the query:
SELECT COUNT(DISTINCT ip) AS hits, x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs, CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'Misc’ END AS session_depth FROM wmf.webrequest WHERE YEAR = 2015 AND MONTH = 5 AND DAY = 25 AND user_agent_map['device_family'] <> "Spider" AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NOT NULL AND uri_host like "%en.m.wikipedia.org%" GROUP BY session_depth, mobile_site ORDER BY hits DESC LIMIT 50;
Any advice?
Thanks!
Jon
Hi, Just bumping this question. Any thoughts on why my query might be returning such a high number of internal on mobile web beta? Doesn't this go against the wisdom of average pv/session = ~1?
Apologies if someones responded and I for some reason didn't see... Thanks, J
On Fri, May 29, 2015 at 11:10 AM, Jon Katz jkatz@wikimedia.org wrote:
+External
Hi, I realized I don't get any responses from internal--but Joseph sent me something helpful to me this morning so I saw all the responses..up to that point. I think.
Anyway, thanks for the help!! The strange thing for me seems to be that the numbers I get don't make that much sense to me. For beta, (using query below) I get:
Unique IPs num_pvs referrer
3638 5967 external
1972 5760 internal
I would have expected a much larger external-->internal referrer ratio. In other words, I would have expected that the vast majority of sessions or even ips only hit the site 1x in a given hour. Instead, I am seeing that 54% of IPs are clicking a link within that hour... I would probably expect to see #'s no more than 10%.
I am probably doing something wrong, right? I *know* that I am making convenient assumptions here that do not apply to edge cases, so let's not consider those unless you think they make a big difference. Perhaps by using the referer field I am inherently leaving out all of the external traffic for which we do not have data?
Thanks!
-J
SELECT COUNT(DISTINCT ip) AS Unique_IPs, x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs, CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external' END AS session_depth FROM wmf.webrequest WHERE TRUE = TRUE AND webrequest_source = 'mobile' AND year = 2015 AND month = 5 AND day = 25 and hour = 1 AND agent_type = "user" AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NOT NULL AND uri_host like "%en.m.wikipedia.org%" GROUP BY CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external' END, x_analytics_map['mf-m'] ORDER BY hits DESC LIMIT 50;
On Thu, May 28, 2015 at 2:30 PM, Jon Katz jkatz@wikimedia.org wrote:
Hi, Trying to run a hive query to rough-count number of 1-page-only, 'sessions' on mobile-web Here is the error I get
FAILED: ParseException line 15:22 missing KW_END at 'device_family' near 'device_family' line 15:35 missing EOF at ''] <> "Spider"\n AND is_pageview = TRUE\n AND x_analytics_map['' near 'device_family
Here is the query:
SELECT COUNT(DISTINCT ip) AS hits, x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs, CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'Misc’ END AS session_depth FROM wmf.webrequest WHERE YEAR = 2015 AND MONTH = 5 AND DAY = 25 AND user_agent_map['device_family'] <> "Spider" AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NOT NULL AND uri_host like "%en.m.wikipedia.org%" GROUP BY session_depth, mobile_site ORDER BY hits DESC LIMIT 50;
Any advice?
Thanks!
Jon
Jon, you might have missed this version of the query from Joseph, which should be more direct and faster (pasted below). However, I'm not sure you can deduce much about the session length from this result. It seems to me you'd have to group by IPs and track referrers over time. However, that doesn't really tell you much as lots of people show up with the same IP, IPs change over time (pretty quickly), etc. From Oliver's work on sessions, I remember that it's not really possible to find sessions over any significant period of time because the data changed too much. Talking to Oliver about this might help. That said, the percent of unique IPs that might be clicking through the site once or twice, according to this data, does not seem too high. I'd believe it.
-- results for query below
Total MapReduce CPU Time Spent: *3 minutes 56 seconds 660 msec*
OK
hits mobile_site is_internal num_pvs
5246 b false 8818
2878 b true 9100
31 a true 118 31 a false 64
SELECT
COUNT(DISTINCT ip) AS hits,
x_analytics_map['mf-m'] AS mobile_site,
(referer_class = "internal") as is_internal,
count(*) AS num_pvs
FROM
wmf.webrequest
WHERE TRUE = TRUE
AND webrequest_source = 'mobile'
AND year = 2015
AND month = 5
AND day = 25
AND hour = 1
AND agent_type <> 'spider'
AND is_pageview = TRUESELECT COUNT(DISTINCT ip) AS hits, x_analytics_map['mf-m'] AS mobile_site, (referer_class = "internal") as is_internal, count(*) AS num_pvs FROM wmf.webrequest WHERE TRUE = TRUE AND webrequest_source = 'mobile' AND year = 2015 AND month = 5 AND day = 25 AND hour = 1 AND agent_type <> 'spider' AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NOT NULL AND access_method IN ('mobile app', 'mobile web') GROUP BY (referer_class = "internal"), x_analytics_map['mf-m'] ORDER BY hits DESC LIMIT 50;
On Wed, Jun 3, 2015 at 11:27 PM, Jon Katz jkatz@wikimedia.org wrote:
Hi, Just bumping this question. Any thoughts on why my query might be returning such a high number of internal on mobile web beta? Doesn't this go against the wisdom of average pv/session = ~1?
Apologies if someones responded and I for some reason didn't see... Thanks, J
On Fri, May 29, 2015 at 11:10 AM, Jon Katz jkatz@wikimedia.org wrote:
+External
Hi, I realized I don't get any responses from internal--but Joseph sent me something helpful to me this morning so I saw all the responses..up to that point. I think.
Anyway, thanks for the help!! The strange thing for me seems to be that the numbers I get don't make that much sense to me. For beta, (using query below) I get:
Unique IPs num_pvs referrer
3638 5967 external
1972 5760 internal
I would have expected a much larger external-->internal referrer ratio. In other words, I would have expected that the vast majority of sessions or even ips only hit the site 1x in a given hour. Instead, I am seeing that 54% of IPs are clicking a link within that hour... I would probably expect to see #'s no more than 10%.
I am probably doing something wrong, right? I *know* that I am making convenient assumptions here that do not apply to edge cases, so let's not consider those unless you think they make a big difference. Perhaps by using the referer field I am inherently leaving out all of the external traffic for which we do not have data?
Thanks!
-J
SELECT COUNT(DISTINCT ip) AS Unique_IPs, x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs, CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external' END AS session_depth FROM wmf.webrequest WHERE TRUE = TRUE AND webrequest_source = 'mobile' AND year = 2015 AND month = 5 AND day = 25 and hour = 1 AND agent_type = "user" AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NOT NULL AND uri_host like "%en.m.wikipedia.org%" GROUP BY CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external' END, x_analytics_map['mf-m'] ORDER BY hits DESC LIMIT 50;
On Thu, May 28, 2015 at 2:30 PM, Jon Katz jkatz@wikimedia.org wrote:
Hi, Trying to run a hive query to rough-count number of 1-page-only, 'sessions' on mobile-web Here is the error I get
FAILED: ParseException line 15:22 missing KW_END at 'device_family' near 'device_family' line 15:35 missing EOF at ''] <> "Spider"\n AND is_pageview = TRUE\n AND x_analytics_map['' near 'device_family
Here is the query:
SELECT COUNT(DISTINCT ip) AS hits, x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs, CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'Misc’ END AS session_depth FROM wmf.webrequest WHERE YEAR = 2015 AND MONTH = 5 AND DAY = 25 AND user_agent_map['device_family'] <> "Spider" AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NOT NULL AND uri_host like "%en.m.wikipedia.org%" GROUP BY session_depth, mobile_site ORDER BY hits DESC LIMIT 50;
Any advice?
Thanks!
Jon
Analytics-internal mailing list Analytics-internal@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics-internal
On Thu, Jun 4, 2015 at 5:26 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
However, I'm not sure you can deduce much about the session length from this result.
Thanks, Dan.
I think I am missing some basic understanding of this parameter then. In my understanding, if I saw 1000 pageviews total, and 999 of them were internal referrals, then this would indicate 1 user who visited 1000 (999 by clicking on internal links). If I instead saw 1000 pageviews and none of them were internal referrals, it would indicate that nobody clicked on internal links on wikipedia during that time period. (Obviously visitations start and stop before and after the period in question, but ignoring that overlap). Is that correct?
I think I am missing some basic understanding of this parameter then. In my understanding, if I saw 1000 pageviews total, and 999 of them were internal referrals, then this would indicate 1 user who visited 1000 (999 by clicking on internal links). If I instead saw 1000 pageviews and none of them were internal referrals, it would indicate that nobody clicked on internal links on wikipedia during that time period. (Obviously visitations start and stop before and after the period in question, but ignoring that overlap). Is that correct?
By the pigeon hole principle, yes, if we had 1000 total pageviews and 999 internal referrer pageviews, then it'd be one user. But that's not what we're seeing, we're seeing 5k or so distinct IPs hit 9k or so pages from the outside. Then 3k or so distinct IPs hit 9k or so pages with internal referrers. I was trying to make two points:
* there could be many users for each IP and many IPs for each user, in fairly unpredictable combinations * Given the numbers we're seeing, it's more like some number wildly hovering around 1/2 of our users are clicking through and checking out a few other articles. This is a little different from normal, and that seems expected since these are beta and alpha users. When looking at mobile users in general, I get a much lower ratio of internal to external referers as I'd expect:
unique_ips is_internal num_pvs 3213398 false 6778157 731019 true 2310153
SELECT COUNT(DISTINCT ip) AS unique_ips, (referer_class = "internal") as is_internal, count(*) AS num_pvs FROM wmf.webrequest WHERE TRUE = TRUE AND webrequest_source = 'mobile' AND year = 2015 AND month = 5 AND day = 25 AND hour = 1 AND agent_type <> 'spider' AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NULL AND access_method IN ('mobile app', 'mobile web') GROUP BY (referer_class = "internal") ORDER BY unique_ips DESC LIMIT 50;