may offer some clues
on missing Referer data in the common cases. Just search for the term *
Referer* on that page. I suspect, but would require David's expertise to
know, that app or otherwise out-of-band mobile context-to-user agent
transitions (e.g., invocation of protocol handlers from a native search
screen) may also have something to do with this, at least after filtering
out screen scrapers and other bots.
I see that Asher asks about filtering out that stuff on a separate reply,
which would allow would make the long tail more uniformly distributed in
the hit plot.
On Wed, Apr 24, 2013 at 9:17 PM, Dario Taraborelli <
thanks for sharing this, the referral data is particularly fascinating. I
mentioned during the quarterly review that I'd love to get a better sense
of (1) the proportion of requests in the mobile request logs lacking a
referral, (2) the possible causes of this gap and (3) to what extent these
missing entries introduce a bias in the referral ranking.
The 3rd most popular query (according to your dumps) is ビッグダディ (japanese
for "Big Daddy"), which presumably refers to this guy:
What's interesting is that there's no such entry on the japanese Wikipedia
and I am baffled that people may have landed on the website via a search
engine query for a non-existing article.
Do you have an explanation for this or am I misinterpreting what you mean
by search query?
On Apr 24, 2013, at 8:40 PM, David Schoonover <dsc(a)wikimedia.org> wrote:
As promised earlier today in the Analytics weekly showcase, I've got a few
interesting bits of data to share from playing with the new Mobile Site
# Visits to Mobile Site, 4/21/2013
- Total Visits: 51,624,103
- Unique Visitors: 37,736,120
- Total Pageviews: 104,972,033
- Avg Pageviews per Session: 2.0334
- Max Pageviews in one Session: 141,882
## Standard Site
- Visits: 51,603,221
- Unique Visitors: 37,723,188
- Pageviews: 104,910,382
- Avg Pageviews per Session: 2.033
## Alpha Site
- Visits: 986
- Unique Visitors: 822
- Pageviews: 7,087
- Avg Pageviews per Session: 7.188
## Beta Site
- Visits: 19,896
- Unique Visitors: 16,235
- Pageviews: 54,564
- Avg Pageviews per Session: 2.742
- A session (or "visit") is defined as all activity with less than 30
minutes between each hit. Intuitively speaking, a session ends when the
user hasn't done anything in 30m.
- As we do not set visitor_id cookies for all users, the "unique visitors"
metric was calculated using hash(ip_address + users_agent) as visitor_id.
- This job looked at all requests to the mobile site on 4/21/2013, which
is 75.17 GB of request logs.
- The job took ~17 minutes to process the day into 15.3 GB of sessions.
- The summary above took maybe 10 minutes to set up/write in Hive, and the
job took maybe 7 minutes.
In addition to that summary, I ran a few jobs on the entry_referer field
-- the URL that referred the user to us when the session started. Obvious
caveats: this is only one day of data, and it's only the mobile site. Draw
conclusions with care.
First, I pulled out the top referring domains. It's mostly as you'd expect
-- search engines -- though you'll also note that several Wikipedia mobile
sites show up. My working hypothesis is that people don't tend to close
tabs on smartphones; when they later come back, it is often to an open
Wikipedia tab: clicking a link or perform a search means the referrer is
Since -- as expected -- so much of the data pertained to search engines, I
also calculated the top search queries and top keywords that sent people to
us. (For keywords, I've filtered out common "stop words": de, of, in, is,
la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y,
e.) In both, you see the predictable: lots of searches for porn, for
"facebook", for "wiki", etc. But you also see a few things that
- Tons of Japanese. Japan is the most mobile-enabled country in the world
so I guess we should have expected to see many searches in Japanese show up
in the top queries. I've left them URL-encoded in the results -- you'll see
them as weird lines with % in them.
- Apparently people search for movies and TV so they can spoil their fun
by reading about them on Wikipedia. Both of "movies" and "film" show
the top keywords; Iron Man 1, 2, AND 3 all show up in the top search
queries. I didn't expect this was a major use-case, but -- wikigroaning
aside -- it's an interesting fact.
I'm sure we're only scratching the surface here. This is an exciting
dataset, and I'm sure there's lots more to learn!
The full results:
- Top Referring Entry Domains:
- Top Referring Entry Search Queries:
- Top Referring Entry Search Keywords:
Questions are welcome!
Analytics mailing list