Thanks, Nuria. I really appreciate the guidance. -J
On Wed, Apr 15, 2015 at 4:17 PM, Nuria Ruiz nuria@wikimedia.org wrote:
Some things to have in mind:
- Bots AND user_agent_map['device_family'] <> "Spider"
Doesn't remove all bots, only very prominent ones, so stats still include traffic from say, wmf robots, for example.
- Sampling:
Strangely, the event-logs for specific actions showed much higher
traffic for
beta but this is likely due to different user groups?
It is worth checking whether you are gathering data in equal sample rates for all 3 sites. For tyraffic you are comparing days for eventlogging data you are comparing an unknown interval of time.
- Daily versus weekly
It is also worth checking weekly figures as they might differ.
- MobileWebUIClickTracking
I thought this schema had been split in several and was no longer in use, again, there is no time interval in your query.
Thanks,
Nuria
On Wed, Apr 15, 2015 at 12:14 PM, Jon Katz jkatz@wikimedia.org wrote:
Hi, Just following up to thank you guys.
Kevin and Ottomoata guided me through getting set up on hive and I made the first queries I needed to! Here were the results (if you're interested) Best,
J
4/12 full day
*mobile_site* *Pageviews* *% of stable* a 8,578 0.004% b 46,705 0.019% stable 244,816,000 Strangely, the event-logs for specific actions showed much higher traffic for beta but this is likely due to different user groups? (multiple days) *mobile_site* * search events* *% of stable* alpha 18,768 0.01% beta 153,617 0.09% stable 178,118,332
queries:
for alpha beta SELECT x_analytics_map['mf-m'] AS mobile_site, count(*) FROM wmf.webrequest WHERE access_method = "mobile web" AND YEAR = 2015 AND MONTH = 4 AND DAY = 12 AND user_agent_map['device_family'] <> "Spider" AND is_pageview = TRUE AND x_analytics_map['mf-m'] IS NOT NULL GROUP BY x_analytics_map['mf-m'] LIMIT 50;
For stable: SELECT count(*) FROM wmf.webrequest tablesample(BUCKET 1 OUT OF 1000 ON rand()) WHERE access_method = "mobile web" AND YEAR = 2015 AND MONTH = 4 AND DAY = 12 AND is_pageview = TRUE AND user_agent_map['device_family'] <> "Spider" AND x_analytics_map['mf-m'] IS NULL LIMIT 50;
For events: SELECT count(uuid), event_name, event_mobileMode FROM MobileWebUIClickTracking_10742159 GROUP BY event_name, event_mobileMode;