+External

Hi,
I realized I don't get any responses from internal--but Joseph sent me something helpful to me this morning so I saw all the responses..up to that point. I think.

Anyway, thanks for the help!!  The strange thing for me seems to be that the numbers I get don't make that much sense to me.
For beta, (using query below) I get: 

Unique IPs  num_pvs referrer

3638  5967 external

1972  5760 internal

I would have expected a much larger external-->internal referrer ratio.  In other words, I would have expected that the vast majority of sessions or even ips only hit the site 1x in a given hour.  Instead, I am seeing that 54% of IPs are clicking a link within that hour...  I would probably expect to see #'s no more than 10%.

I am probably doing something wrong, right? I know that I am making convenient assumptions here that do not apply to edge cases, so let's not consider those unless you think they make a big difference.  Perhaps by using the referer field I am inherently leaving out all of the external traffic for which we do not have data?

Thanks!

-J


SELECT
COUNT(DISTINCT ip) AS Unique_IPs,
x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs,
CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external' END AS session_depth
FROM
  wmf.webrequest
WHERE TRUE = TRUE
  AND webrequest_source = 'mobile'
  AND year = 2015
  AND month = 5
  AND day = 25
  and hour = 1
  AND agent_type = "user"
  AND is_pageview = TRUE
  AND x_analytics_map['mf-m'] IS NOT NULL
  AND uri_host like "%en.m.wikipedia.org%"
GROUP BY
  CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external' END,
  x_analytics_map['mf-m']
ORDER BY hits DESC
LIMIT 50;


On Thu, May 28, 2015 at 2:30 PM, Jon Katz <jkatz@wikimedia.org> wrote:
Hi,
Trying to run a hive query to rough-count number of 1-page-only, 'sessions' on mobile-web  Here is the error I get


FAILED: ParseException line 15:22 missing KW_END at 'device_family' near 'device_family'
line 15:35 missing EOF at ''] <> "Spider"\n  AND is_pageview = TRUE\n  AND x_analytics_map['' near 'device_family

Here is the query:

SELECT
COUNT(DISTINCT ip) AS hits,
x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs,
CASE
        WHEN referer LIKE "%en.m.wikipedia%"
        THEN 'internal'
        ELSE 'Misc’
        END AS session_depth
FROM
  wmf.webrequest
WHERE
  YEAR = 2015
  AND MONTH = 5
  AND DAY = 25
  AND user_agent_map['device_family'] <> "Spider"
  AND is_pageview = TRUE
  AND x_analytics_map['mf-m'] IS NOT NULL
  AND uri_host like "%en.m.wikipedia.org%"
GROUP BY session_depth, mobile_site
ORDER BY hits DESC
LIMIT 50;


Any advice?

Thanks!

Jon