I think I am missing some basic understanding of this parameter then. In
my understanding, if I saw 1000 pageviews total, and 999 of them were
internal referrals, then this would indicate 1 user who visited 1000 (999
by clicking on internal links). If I instead saw 1000 pageviews and none of
them were internal referrals, it would indicate that nobody clicked on
internal links on wikipedia during that time period. (Obviously
visitations start and stop before and after the period in question, but
ignoring that overlap). Is that correct?
By the pigeon hole principle, yes, if we had 1000 total pageviews and 999
internal referrer pageviews, then it'd be one user. But that's not what
we're seeing, we're seeing 5k or so distinct IPs hit 9k or so pages from
the outside. Then 3k or so distinct IPs hit 9k or so pages with internal
referrers. I was trying to make two points:
* there could be many users for each IP and many IPs for each user, in
fairly unpredictable combinations
* Given the numbers we're seeing, it's more like some number wildly
hovering around 1/2 of our users are clicking through and checking out a
few other articles. This is a little different from normal, and that seems
expected since these are beta and alpha users. When looking at mobile
users in general, I get a much lower ratio of internal to external referers
as I'd expect:
unique_ips is_internal num_pvs
3213398 false 6778157
731019 true 2310153
SELECT COUNT(DISTINCT ip) AS unique_ips,
(referer_class = "internal") as is_internal,
count(*) AS num_pvs
FROM wmf.webrequest
WHERE TRUE = TRUE
AND webrequest_source = 'mobile'
AND year = 2015
AND month = 5
AND day = 25
AND hour = 1
AND agent_type <> 'spider'
AND is_pageview = TRUE
AND x_analytics_map['mf-m'] IS NULL
AND access_method IN ('mobile app', 'mobile web')
GROUP BY
(referer_class = "internal")
ORDER BY unique_ips DESC
LIMIT 50;