Hiya all,

As promised earlier today in the Analytics weekly showcase, I've got a few interesting bits of data to share from playing with the new Mobile Site Sessions dataset.


# Visits to Mobile Site, 4/21/2013

- Total Visits:                             51,624,103
- Unique Visitors:                          37,736,120
- Total Pageviews:                         104,972,033
- Avg Pageviews per Session:                    2.0334
- Max Pageviews in one Session:                141,882

## Standard Site
- Visits:                                   51,603,221
- Unique Visitors:                          37,723,188
- Pageviews:                               104,910,382
- Avg Pageviews per Session:                     2.033

## Alpha Site
- Visits:                                          986
- Unique Visitors:                                 822
- Pageviews:                                     7,087
- Avg Pageviews per Session:                     7.188

## Beta Site
- Visits:                                       19,896
- Unique Visitors:                              16,235
- Pageviews:                                    54,564
- Avg Pageviews per Session:                     2.742



## Notes
- A session (or "visit") is defined as all activity with less than 30 minutes between each hit. Intuitively speaking, a session ends when the user hasn't done anything in 30m.
- As we do not set visitor_id cookies for all users, the "unique visitors" metric was calculated using hash(ip_address + users_agent) as visitor_id.
- This job looked at all requests to the mobile site on 4/21/2013, which is 75.17 GB of request logs.
- The job took ~17 minutes to process the day into 15.3 GB of sessions.
- The summary above took maybe 10 minutes to set up/write in Hive, and the job took maybe 7 minutes.



In addition to that summary, I ran a few jobs on the entry_referer field -- the URL that referred the user to us when the session started. Obvious caveats: this is only one day of data, and it's only the mobile site. Draw conclusions with care.

First, I pulled out the top referring domains. It's mostly as you'd expect -- search engines -- though you'll also note that several Wikipedia mobile sites show up. My working hypothesis is that people don't tend to close tabs on smartphones; when they later come back, it is often to an open Wikipedia tab: clicking a link or perform a search means the referrer is still us.

Since -- as expected -- so much of the data pertained to search engines, I also calculated the top search queries and top keywords that sent people to us. (For keywords, I've filtered out common "stop words": de, of, in, is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y, e.) In both, you see the predictable: lots of searches for porn, for "facebook", for "wiki", etc. But you also see a few things that surprised me:

- Tons of Japanese. Japan is the most mobile-enabled country in the world so I guess we should have expected to see many searches in Japanese show up in the top queries. I've left them URL-encoded in the results -- you'll see them as weird lines with % in them.

- Apparently people search for movies and TV so they can spoil their fun by reading about them on Wikipedia. Both of "movies" and "film" show up in the top keywords; Iron Man 1, 2, AND 3 all show up in the top search queries. I didn't expect this was a major use-case, but -- wikigroaning aside -- it's an interesting fact.

I'm sure we're only scratching the surface here. This is an exciting dataset, and I'm sure there's lots more to learn!

The full results:
- Top Referring Entry Domains: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mobile_sessions-2013-04-21-top_entry_domains.tsv
- Top Referring Entry Search Queries: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mobile_sessions-2013-04-21-top_entry_search_queries.tsv
- Top Referring Entry Search Keywords: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mobile_sessions-2013-04-21-top_entry_keywords.tsv

Questions are welcome!


--
David Schoonover
dsc@wikimedia.org