I have a query help question. Trying to run the following to see how many
clicks are generated by ~3000 pages in english wikipedia. I get 0 results
when I run this query, but that seems unlikely as it means nobody clicked
on the mobile web's barack obama page (or any of the other pages I am
looking at) during the hour in question. I am almost sure I have a stupid
'where' clause in here that is blocking me. Any thoughts?
the table of pages I am looking at is in hive under
Here is a sample:
category title url
Category:20th-century American politicians Al Gore Al_Gore
Category:20th-century American politicians Barbara Boxer Barbara_Boxer
Category:20th-century American politicians Ben Cardin Ben_Cardin
Here is my query:
SELECT * from jkatz.browse_prototype_page_list4 L
INNER JOIN (
SELECT count(*) AS num_pvs, referer, uri_path
WHERE TRUE = TRUE
AND webrequest_source = 'mobile'
AND year = 2015
AND month = 5
AND day = 25
AND hour = 1
AND agent_type <> 'spider'
AND is_pageview = TRUE
AND x_analytics_map['mf-m'] IS NULL
AND access_method = 'mobile web'
AND uri_host like '%en.wikipedia.org%'
on R.referer = concat('http://en.m.wikipedia.org/wiki/'
ORDER BY category