Hiya all,
As promised earlier today in the Analytics weekly showcase, I've got a few interesting bits of data to share from playing with the new Mobile Site Sessions dataset.
# Visits to Mobile Site, 4/21/2013
- Total Visits: 51,624,103 - Unique Visitors: 37,736,120 - Total Pageviews: 104,972,033 - Avg Pageviews per Session: 2.0334 - Max Pageviews in one Session: 141,882
## Standard Site - Visits: 51,603,221 - Unique Visitors: 37,723,188 - Pageviews: 104,910,382 - Avg Pageviews per Session: 2.033
## Alpha Site - Visits: 986 - Unique Visitors: 822 - Pageviews: 7,087 - Avg Pageviews per Session: 7.188
## Beta Site - Visits: 19,896 - Unique Visitors: 16,235 - Pageviews: 54,564 - Avg Pageviews per Session: 2.742
## Notes - A session (or "visit") is defined as all activity with less than 30 minutes between each hit. Intuitively speaking, a session ends when the user hasn't done anything in 30m. - As we do not set visitor_id cookies for all users, the "unique visitors" metric was calculated using hash(ip_address + users_agent) as visitor_id. - This job looked at all requests to the mobile site on 4/21/2013, which is 75.17 GB of request logs. - The job took ~17 minutes to process the day into 15.3 GB of sessions. - The summary above took maybe 10 minutes to set up/write in Hive, and the job took maybe 7 minutes.
In addition to that summary, I ran a few jobs on the entry_referer field -- the URL that referred the user to us when the session started. Obvious caveats: this is only one day of data, and it's only the mobile site. Draw conclusions with care.
First, I pulled out the top referring domains. It's mostly as you'd expect -- search engines -- though you'll also note that several Wikipedia mobile sites show up. My working hypothesis is that people don't tend to close tabs on smartphones; when they later come back, it is often to an open Wikipedia tab: clicking a link or perform a search means the referrer is still us.
Since -- as expected -- so much of the data pertained to search engines, I also calculated the top search queries and top keywords that sent people to us. (For keywords, I've filtered out common "stop words": de, of, in, is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y, e.) In both, you see the predictable: lots of searches for porn, for "facebook", for "wiki", etc. But you also see a few things that surprised me:
- Tons of Japanese. Japan is the most mobile-enabled country in the world so I guess we should have expected to see many searches in Japanese show up in the top queries. I've left them URL-encoded in the results -- you'll see them as weird lines with % in them.
- Apparently people search for movies and TV so they can spoil their fun by reading about them on Wikipedia. Both of "movies" and "film" show up in the top keywords; Iron Man 1, 2, AND 3 all show up in the top search queries. I didn't expect this was a major use-case, but -- wikigroaning aside -- it's an interesting fact.
I'm sure we're only scratching the surface here. This is an exciting dataset, and I'm sure there's lots more to learn!
The full results: - Top Referring Entry Domains: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo... - Top Referring Entry Search Queries: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo... - Top Referring Entry Search Keywords: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
Questions are welcome!
-- David Schoonover dsc@wikimedia.org
Dave,
thanks for sharing this, the referral data is particularly fascinating. I mentioned during the quarterly review that I'd love to get a better sense of (1) the proportion of requests in the mobile request logs lacking a referral, (2) the possible causes of this gap and (3) to what extent these missing entries introduce a bias in the referral ranking.
The 3rd most popular query (according to your dumps) is ビッグダディ (japanese for "Big Daddy"), which presumably refers to this guy: http://metro.co.uk/2013/03/20/giant-japanese-spider-crab-big-daddy-arrives-a... What's interesting is that there's no such entry on the japanese Wikipedia and I am baffled that people may have landed on the website via a search engine query for a non-existing article. Do you have an explanation for this or am I misinterpreting what you mean by search query?
Dario
On Apr 24, 2013, at 8:40 PM, David Schoonover dsc@wikimedia.org wrote:
Hiya all,
As promised earlier today in the Analytics weekly showcase, I've got a few interesting bits of data to share from playing with the new Mobile Site Sessions dataset.
# Visits to Mobile Site, 4/21/2013
- Total Visits: 51,624,103
- Unique Visitors: 37,736,120
- Total Pageviews: 104,972,033
- Avg Pageviews per Session: 2.0334
- Max Pageviews in one Session: 141,882
## Standard Site
- Visits: 51,603,221
- Unique Visitors: 37,723,188
- Pageviews: 104,910,382
- Avg Pageviews per Session: 2.033
## Alpha Site
- Visits: 986
- Unique Visitors: 822
- Pageviews: 7,087
- Avg Pageviews per Session: 7.188
## Beta Site
- Visits: 19,896
- Unique Visitors: 16,235
- Pageviews: 54,564
- Avg Pageviews per Session: 2.742
## Notes
- A session (or "visit") is defined as all activity with less than 30 minutes between each hit. Intuitively speaking, a session ends when the user hasn't done anything in 30m.
- As we do not set visitor_id cookies for all users, the "unique visitors" metric was calculated using hash(ip_address + users_agent) as visitor_id.
- This job looked at all requests to the mobile site on 4/21/2013, which is 75.17 GB of request logs.
- The job took ~17 minutes to process the day into 15.3 GB of sessions.
- The summary above took maybe 10 minutes to set up/write in Hive, and the job took maybe 7 minutes.
In addition to that summary, I ran a few jobs on the entry_referer field -- the URL that referred the user to us when the session started. Obvious caveats: this is only one day of data, and it's only the mobile site. Draw conclusions with care.
First, I pulled out the top referring domains. It's mostly as you'd expect -- search engines -- though you'll also note that several Wikipedia mobile sites show up. My working hypothesis is that people don't tend to close tabs on smartphones; when they later come back, it is often to an open Wikipedia tab: clicking a link or perform a search means the referrer is still us.
Since -- as expected -- so much of the data pertained to search engines, I also calculated the top search queries and top keywords that sent people to us. (For keywords, I've filtered out common "stop words": de, of, in, is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y, e.) In both, you see the predictable: lots of searches for porn, for "facebook", for "wiki", etc. But you also see a few things that surprised me:
Tons of Japanese. Japan is the most mobile-enabled country in the world so I guess we should have expected to see many searches in Japanese show up in the top queries. I've left them URL-encoded in the results -- you'll see them as weird lines with % in them.
Apparently people search for movies and TV so they can spoil their fun by reading about them on Wikipedia. Both of "movies" and "film" show up in the top keywords; Iron Man 1, 2, AND 3 all show up in the top search queries. I didn't expect this was a major use-case, but -- wikigroaning aside -- it's an interesting fact.
I'm sure we're only scratching the surface here. This is an exciting dataset, and I'm sure there's lots more to learn!
The full results:
- Top Referring Entry Domains: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Queries: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Keywords: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
Questions are welcome!
-- David Schoonover dsc@wikimedia.org _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Dario, https://code.google.com/p/browsersec/wiki/Part2 may offer some clues on missing Referer data in the common cases. Just search for the term * Referer* on that page. I suspect, but would require David's expertise to know, that app or otherwise out-of-band mobile context-to-user agent transitions (e.g., invocation of protocol handlers from a native search screen) may also have something to do with this, at least after filtering out screen scrapers and other bots.
I see that Asher asks about filtering out that stuff on a separate reply, which would allow would make the long tail more uniformly distributed in the hit plot.
On Wed, Apr 24, 2013 at 9:17 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Dave,
thanks for sharing this, the referral data is particularly fascinating. I mentioned during the quarterly review that I'd love to get a better sense of (1) the proportion of requests in the mobile request logs lacking a referral, (2) the possible causes of this gap and (3) to what extent these missing entries introduce a bias in the referral ranking.
The 3rd most popular query (according to your dumps) is ビッグダディ (japanese for "Big Daddy"), which presumably refers to this guy: http://metro.co.uk/2013/03/20/giant-japanese-spider-crab-big-daddy-arrives-a... What's interesting is that there's no such entry on the japanese Wikipedia and I am baffled that people may have landed on the website via a search engine query for a non-existing article. Do you have an explanation for this or am I misinterpreting what you mean by search query?
Dario
On Apr 24, 2013, at 8:40 PM, David Schoonover dsc@wikimedia.org wrote:
Hiya all,
As promised earlier today in the Analytics weekly showcase, I've got a few interesting bits of data to share from playing with the new Mobile Site Sessions dataset.
# Visits to Mobile Site, 4/21/2013
- Total Visits: 51,624,103
- Unique Visitors: 37,736,120
- Total Pageviews: 104,972,033
- Avg Pageviews per Session: 2.0334
- Max Pageviews in one Session: 141,882
## Standard Site
- Visits: 51,603,221
- Unique Visitors: 37,723,188
- Pageviews: 104,910,382
- Avg Pageviews per Session: 2.033
## Alpha Site
- Visits: 986
- Unique Visitors: 822
- Pageviews: 7,087
- Avg Pageviews per Session: 7.188
## Beta Site
- Visits: 19,896
- Unique Visitors: 16,235
- Pageviews: 54,564
- Avg Pageviews per Session: 2.742
## Notes
- A session (or "visit") is defined as all activity with less than 30
minutes between each hit. Intuitively speaking, a session ends when the user hasn't done anything in 30m.
- As we do not set visitor_id cookies for all users, the "unique visitors"
metric was calculated using hash(ip_address + users_agent) as visitor_id.
- This job looked at all requests to the mobile site on 4/21/2013, which
is 75.17 GB of request logs.
- The job took ~17 minutes to process the day into 15.3 GB of sessions.
- The summary above took maybe 10 minutes to set up/write in Hive, and the
job took maybe 7 minutes.
In addition to that summary, I ran a few jobs on the entry_referer field -- the URL that referred the user to us when the session started. Obvious caveats: this is only one day of data, and it's only the mobile site. Draw conclusions with care.
First, I pulled out the top referring domains. It's mostly as you'd expect -- search engines -- though you'll also note that several Wikipedia mobile sites show up. My working hypothesis is that people don't tend to close tabs on smartphones; when they later come back, it is often to an open Wikipedia tab: clicking a link or perform a search means the referrer is still us.
Since -- as expected -- so much of the data pertained to search engines, I also calculated the top search queries and top keywords that sent people to us. (For keywords, I've filtered out common "stop words": de, of, in, is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y, e.) In both, you see the predictable: lots of searches for porn, for "facebook", for "wiki", etc. But you also see a few things that surprised me:
- Tons of Japanese. Japan is the most mobile-enabled country in the world
so I guess we should have expected to see many searches in Japanese show up in the top queries. I've left them URL-encoded in the results -- you'll see them as weird lines with % in them.
- Apparently people search for movies and TV so they can spoil their fun
by reading about them on Wikipedia. Both of "movies" and "film" show up in the top keywords; Iron Man 1, 2, AND 3 all show up in the top search queries. I didn't expect this was a major use-case, but -- wikigroaning aside -- it's an interesting fact.
I'm sure we're only scratching the surface here. This is an exciting dataset, and I'm sure there's lots more to learn!
The full results:
- Top Referring Entry Domains:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Queries:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Keywords:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
Questions are welcome!
-- David Schoonover dsc@wikimedia.org _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Wed, Apr 24, 2013 at 9:17 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Dave,
thanks for sharing this, the referral data is particularly fascinating. I mentioned during the quarterly review that I'd love to get a better sense of (1) the proportion of requests in the mobile request logs lacking a referral, (2) the possible causes of this gap and (3) to what extent these missing entries introduce a bias in the referral ranking.
The 3rd most popular query (according to your dumps) is ビッグダディ (japanese for "Big Daddy"), which presumably refers to this guy: http://metro.co.uk/2013/03/20/giant-japanese-spider-crab-big-daddy-arrives-a... What's interesting is that there's no such entry on the japanese Wikipedia and I am baffled that people may have landed on the website via a search engine query for a non-existing article. Do you have an explanation for this or am I misinterpreting what you mean by search query?
There *is* an article on this on ja.wiki :) It may have been renamed since then, but it's still the 2nd Google hit for ビッグダディ: http://ja.wikipedia.org/wiki/%E7%97%9B%E5%BF%AB!%E3%83%93%E3%83%83%E3%82%B0%...
Dario
On Apr 24, 2013, at 8:40 PM, David Schoonover dsc@wikimedia.org wrote:
Hiya all,
As promised earlier today in the Analytics weekly showcase, I've got a few interesting bits of data to share from playing with the new Mobile Site Sessions dataset.
# Visits to Mobile Site, 4/21/2013
- Total Visits: 51,624,103
- Unique Visitors: 37,736,120
- Total Pageviews: 104,972,033
- Avg Pageviews per Session: 2.0334
- Max Pageviews in one Session: 141,882
## Standard Site
- Visits: 51,603,221
- Unique Visitors: 37,723,188
- Pageviews: 104,910,382
- Avg Pageviews per Session: 2.033
## Alpha Site
- Visits: 986
- Unique Visitors: 822
- Pageviews: 7,087
- Avg Pageviews per Session: 7.188
## Beta Site
- Visits: 19,896
- Unique Visitors: 16,235
- Pageviews: 54,564
- Avg Pageviews per Session: 2.742
## Notes
- A session (or "visit") is defined as all activity with less than 30
minutes between each hit. Intuitively speaking, a session ends when the user hasn't done anything in 30m.
- As we do not set visitor_id cookies for all users, the "unique visitors"
metric was calculated using hash(ip_address + users_agent) as visitor_id.
- This job looked at all requests to the mobile site on 4/21/2013, which
is 75.17 GB of request logs.
- The job took ~17 minutes to process the day into 15.3 GB of sessions.
- The summary above took maybe 10 minutes to set up/write in Hive, and the
job took maybe 7 minutes.
In addition to that summary, I ran a few jobs on the entry_referer field -- the URL that referred the user to us when the session started. Obvious caveats: this is only one day of data, and it's only the mobile site. Draw conclusions with care.
First, I pulled out the top referring domains. It's mostly as you'd expect -- search engines -- though you'll also note that several Wikipedia mobile sites show up. My working hypothesis is that people don't tend to close tabs on smartphones; when they later come back, it is often to an open Wikipedia tab: clicking a link or perform a search means the referrer is still us.
Since -- as expected -- so much of the data pertained to search engines, I also calculated the top search queries and top keywords that sent people to us. (For keywords, I've filtered out common "stop words": de, of, in, is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y, e.) In both, you see the predictable: lots of searches for porn, for "facebook", for "wiki", etc. But you also see a few things that surprised me:
- Tons of Japanese. Japan is the most mobile-enabled country in the world
so I guess we should have expected to see many searches in Japanese show up in the top queries. I've left them URL-encoded in the results -- you'll see them as weird lines with % in them.
- Apparently people search for movies and TV so they can spoil their fun
by reading about them on Wikipedia. Both of "movies" and "film" show up in the top keywords; Iron Man 1, 2, AND 3 all show up in the top search queries. I didn't expect this was a major use-case, but -- wikigroaning aside -- it's an interesting fact.
I'm sure we're only scratching the surface here. This is an exciting dataset, and I'm sure there's lots more to learn!
The full results:
- Top Referring Entry Domains:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Queries:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Keywords:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
Questions are welcome!
-- David Schoonover dsc@wikimedia.org _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
ah good catch M, that totally makes sense (it's a tv show, not a giant crab).
On Apr 25, 2013, at 10:19 AM, Maryana Pinchuk mpinchuk@wikimedia.org wrote:
On Wed, Apr 24, 2013 at 9:17 PM, Dario Taraborelli dtaraborelli@wikimedia.org wrote: Dave,
thanks for sharing this, the referral data is particularly fascinating. I mentioned during the quarterly review that I'd love to get a better sense of (1) the proportion of requests in the mobile request logs lacking a referral, (2) the possible causes of this gap and (3) to what extent these missing entries introduce a bias in the referral ranking.
The 3rd most popular query (according to your dumps) is ビッグダディ (japanese for "Big Daddy"), which presumably refers to this guy: http://metro.co.uk/2013/03/20/giant-japanese-spider-crab-big-daddy-arrives-a... What's interesting is that there's no such entry on the japanese Wikipedia and I am baffled that people may have landed on the website via a search engine query for a non-existing article. Do you have an explanation for this or am I misinterpreting what you mean by search query?
There is an article on this on ja.wiki :) It may have been renamed since then, but it's still the 2nd Google hit for ビッグダディ: http://ja.wikipedia.org/wiki/%E7%97%9B%E5%BF%AB!%E3%83%93%E3%83%83%E3%82%B0%...
Dario
On Apr 24, 2013, at 8:40 PM, David Schoonover dsc@wikimedia.org wrote:
Hiya all,
As promised earlier today in the Analytics weekly showcase, I've got a few interesting bits of data to share from playing with the new Mobile Site Sessions dataset.
# Visits to Mobile Site, 4/21/2013
- Total Visits: 51,624,103
- Unique Visitors: 37,736,120
- Total Pageviews: 104,972,033
- Avg Pageviews per Session: 2.0334
- Max Pageviews in one Session: 141,882
## Standard Site
- Visits: 51,603,221
- Unique Visitors: 37,723,188
- Pageviews: 104,910,382
- Avg Pageviews per Session: 2.033
## Alpha Site
- Visits: 986
- Unique Visitors: 822
- Pageviews: 7,087
- Avg Pageviews per Session: 7.188
## Beta Site
- Visits: 19,896
- Unique Visitors: 16,235
- Pageviews: 54,564
- Avg Pageviews per Session: 2.742
## Notes
- A session (or "visit") is defined as all activity with less than 30 minutes between each hit. Intuitively speaking, a session ends when the user hasn't done anything in 30m.
- As we do not set visitor_id cookies for all users, the "unique visitors" metric was calculated using hash(ip_address + users_agent) as visitor_id.
- This job looked at all requests to the mobile site on 4/21/2013, which is 75.17 GB of request logs.
- The job took ~17 minutes to process the day into 15.3 GB of sessions.
- The summary above took maybe 10 minutes to set up/write in Hive, and the job took maybe 7 minutes.
In addition to that summary, I ran a few jobs on the entry_referer field -- the URL that referred the user to us when the session started. Obvious caveats: this is only one day of data, and it's only the mobile site. Draw conclusions with care.
First, I pulled out the top referring domains. It's mostly as you'd expect -- search engines -- though you'll also note that several Wikipedia mobile sites show up. My working hypothesis is that people don't tend to close tabs on smartphones; when they later come back, it is often to an open Wikipedia tab: clicking a link or perform a search means the referrer is still us.
Since -- as expected -- so much of the data pertained to search engines, I also calculated the top search queries and top keywords that sent people to us. (For keywords, I've filtered out common "stop words": de, of, in, is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y, e.) In both, you see the predictable: lots of searches for porn, for "facebook", for "wiki", etc. But you also see a few things that surprised me:
Tons of Japanese. Japan is the most mobile-enabled country in the world so I guess we should have expected to see many searches in Japanese show up in the top queries. I've left them URL-encoded in the results -- you'll see them as weird lines with % in them.
Apparently people search for movies and TV so they can spoil their fun by reading about them on Wikipedia. Both of "movies" and "film" show up in the top keywords; Iron Man 1, 2, AND 3 all show up in the top search queries. I didn't expect this was a major use-case, but -- wikigroaning aside -- it's an interesting fact.
I'm sure we're only scratching the surface here. This is an exciting dataset, and I'm sure there's lots more to learn!
The full results:
- Top Referring Entry Domains: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Queries: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Keywords: http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
Questions are welcome!
-- David Schoonover dsc@wikimedia.org _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Maryana Pinchuk Associate Product Manager, Wikimedia Foundation wikimediafoundation.org
Maryana, that Wikipedia article is about a TV series which is being broadcasted since 2006, but I don't think it's very popular.
On the other hand, nobody seems to mention the crab Big Daddy in the Japanese internet culture.
*--* *Haitham Shammaa* *Contribution Research Manager* *Wikimedia Foundation*
*Imagine a world in which every single human being can freely share in the sum of all knowledge. * *Click the "edit" button now, and help us make it a reality!*
On Thu, Apr 25, 2013 at 10:19 AM, Maryana Pinchuk mpinchuk@wikimedia.orgwrote:
On Wed, Apr 24, 2013 at 9:17 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Dave,
thanks for sharing this, the referral data is particularly fascinating. I mentioned during the quarterly review that I'd love to get a better sense of (1) the proportion of requests in the mobile request logs lacking a referral, (2) the possible causes of this gap and (3) to what extent these missing entries introduce a bias in the referral ranking.
The 3rd most popular query (according to your dumps) is ビッグダディ (japanese for "Big Daddy"), which presumably refers to this guy: http://metro.co.uk/2013/03/20/giant-japanese-spider-crab-big-daddy-arrives-a... What's interesting is that there's no such entry on the japanese Wikipedia and I am baffled that people may have landed on the website via a search engine query for a non-existing article. Do you have an explanation for this or am I misinterpreting what you mean by search query?
There *is* an article on this on ja.wiki :) It may have been renamed since then, but it's still the 2nd Google hit for ビッグダディ: http://ja.wikipedia.org/wiki/%E7%97%9B%E5%BF%AB!%E3%83%93%E3%83%83%E3%82%B0%...
Dario
On Apr 24, 2013, at 8:40 PM, David Schoonover dsc@wikimedia.org wrote:
Hiya all,
As promised earlier today in the Analytics weekly showcase, I've got a few interesting bits of data to share from playing with the new Mobile Site Sessions dataset.
# Visits to Mobile Site, 4/21/2013
- Total Visits: 51,624,103
- Unique Visitors: 37,736,120
- Total Pageviews: 104,972,033
- Avg Pageviews per Session: 2.0334
- Max Pageviews in one Session: 141,882
## Standard Site
- Visits: 51,603,221
- Unique Visitors: 37,723,188
- Pageviews: 104,910,382
- Avg Pageviews per Session: 2.033
## Alpha Site
- Visits: 986
- Unique Visitors: 822
- Pageviews: 7,087
- Avg Pageviews per Session: 7.188
## Beta Site
- Visits: 19,896
- Unique Visitors: 16,235
- Pageviews: 54,564
- Avg Pageviews per Session: 2.742
## Notes
- A session (or "visit") is defined as all activity with less than 30
minutes between each hit. Intuitively speaking, a session ends when the user hasn't done anything in 30m.
- As we do not set visitor_id cookies for all users, the "unique
visitors" metric was calculated using hash(ip_address + users_agent) as visitor_id.
- This job looked at all requests to the mobile site on 4/21/2013, which
is 75.17 GB of request logs.
- The job took ~17 minutes to process the day into 15.3 GB of sessions.
- The summary above took maybe 10 minutes to set up/write in Hive, and
the job took maybe 7 minutes.
In addition to that summary, I ran a few jobs on the entry_referer field -- the URL that referred the user to us when the session started. Obvious caveats: this is only one day of data, and it's only the mobile site. Draw conclusions with care.
First, I pulled out the top referring domains. It's mostly as you'd expect -- search engines -- though you'll also note that several Wikipedia mobile sites show up. My working hypothesis is that people don't tend to close tabs on smartphones; when they later come back, it is often to an open Wikipedia tab: clicking a link or perform a search means the referrer is still us.
Since -- as expected -- so much of the data pertained to search engines, I also calculated the top search queries and top keywords that sent people to us. (For keywords, I've filtered out common "stop words": de, of, in, is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y, e.) In both, you see the predictable: lots of searches for porn, for "facebook", for "wiki", etc. But you also see a few things that surprised me:
- Tons of Japanese. Japan is the most mobile-enabled country in the world
so I guess we should have expected to see many searches in Japanese show up in the top queries. I've left them URL-encoded in the results -- you'll see them as weird lines with % in them.
- Apparently people search for movies and TV so they can spoil their fun
by reading about them on Wikipedia. Both of "movies" and "film" show up in the top keywords; Iron Man 1, 2, AND 3 all show up in the top search queries. I didn't expect this was a major use-case, but -- wikigroaning aside -- it's an interesting fact.
I'm sure we're only scratching the surface here. This is an exciting dataset, and I'm sure there's lots more to learn!
The full results:
- Top Referring Entry Domains:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Queries:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Keywords:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
Questions are welcome!
-- David Schoonover dsc@wikimedia.org _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Maryana Pinchuk Associate Product Manager, Wikimedia Foundation wikimediafoundation.org
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I re-ran the sessions job including IP in the output. Several things:
- I'm happy to report that we are correctly filtering out the WMF public IPs, though there are about 100k hits per day from 10.x.x.x IPs (about 0.5%, LVS health checks) that we missed. We'll update the filter to include those.
- So, who is it? I ran the IPs of the top sessions through whois and tried to extract the org name. The results (omitting IP for privacy reasons) are here:
https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Ai_u2wTiMldddHN...
A pretty interesting list.
-- David Schoonover dsc@wikimedia.org
On Thu, Apr 25, 2013 at 10:38 AM, Haitham Shammaa hshammaa@wikimedia.orgwrote:
Maryana, that Wikipedia article is about a TV series which is being broadcasted since 2006, but I don't think it's very popular.
On the other hand, nobody seems to mention the crab Big Daddy in the Japanese internet culture.
*--* *Haitham Shammaa* *Contribution Research Manager* *Wikimedia Foundation*
*Imagine a world in which every single human being can freely share in the sum of all knowledge. * *Click the "edit" button now, and help us make it a reality!*
On Thu, Apr 25, 2013 at 10:19 AM, Maryana Pinchuk mpinchuk@wikimedia.orgwrote:
On Wed, Apr 24, 2013 at 9:17 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Dave,
thanks for sharing this, the referral data is particularly fascinating. I mentioned during the quarterly review that I'd love to get a better sense of (1) the proportion of requests in the mobile request logs lacking a referral, (2) the possible causes of this gap and (3) to what extent these missing entries introduce a bias in the referral ranking.
The 3rd most popular query (according to your dumps) is ビッグダディ (japanese for "Big Daddy"), which presumably refers to this guy: http://metro.co.uk/2013/03/20/giant-japanese-spider-crab-big-daddy-arrives-a... What's interesting is that there's no such entry on the japanese Wikipedia and I am baffled that people may have landed on the website via a search engine query for a non-existing article. Do you have an explanation for this or am I misinterpreting what you mean by search query?
There *is* an article on this on ja.wiki :) It may have been renamed since then, but it's still the 2nd Google hit for ビッグダディ: http://ja.wikipedia.org/wiki/%E7%97%9B%E5%BF%AB!%E3%83%93%E3%83%83%E3%82%B0%...
Dario
On Apr 24, 2013, at 8:40 PM, David Schoonover dsc@wikimedia.org wrote:
Hiya all,
As promised earlier today in the Analytics weekly showcase, I've got a few interesting bits of data to share from playing with the new Mobile Site Sessions dataset.
# Visits to Mobile Site, 4/21/2013
- Total Visits: 51,624,103
- Unique Visitors: 37,736,120
- Total Pageviews: 104,972,033
- Avg Pageviews per Session: 2.0334
- Max Pageviews in one Session: 141,882
## Standard Site
- Visits: 51,603,221
- Unique Visitors: 37,723,188
- Pageviews: 104,910,382
- Avg Pageviews per Session: 2.033
## Alpha Site
- Visits: 986
- Unique Visitors: 822
- Pageviews: 7,087
- Avg Pageviews per Session: 7.188
## Beta Site
- Visits: 19,896
- Unique Visitors: 16,235
- Pageviews: 54,564
- Avg Pageviews per Session: 2.742
## Notes
- A session (or "visit") is defined as all activity with less than 30
minutes between each hit. Intuitively speaking, a session ends when the user hasn't done anything in 30m.
- As we do not set visitor_id cookies for all users, the "unique
visitors" metric was calculated using hash(ip_address + users_agent) as visitor_id.
- This job looked at all requests to the mobile site on 4/21/2013, which
is 75.17 GB of request logs.
- The job took ~17 minutes to process the day into 15.3 GB of sessions.
- The summary above took maybe 10 minutes to set up/write in Hive, and
the job took maybe 7 minutes.
In addition to that summary, I ran a few jobs on the entry_referer field -- the URL that referred the user to us when the session started. Obvious caveats: this is only one day of data, and it's only the mobile site. Draw conclusions with care.
First, I pulled out the top referring domains. It's mostly as you'd expect -- search engines -- though you'll also note that several Wikipedia mobile sites show up. My working hypothesis is that people don't tend to close tabs on smartphones; when they later come back, it is often to an open Wikipedia tab: clicking a link or perform a search means the referrer is still us.
Since -- as expected -- so much of the data pertained to search engines, I also calculated the top search queries and top keywords that sent people to us. (For keywords, I've filtered out common "stop words": de, of, in, is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, o, y, e.) In both, you see the predictable: lots of searches for porn, for "facebook", for "wiki", etc. But you also see a few things that surprised me:
- Tons of Japanese. Japan is the most mobile-enabled country in the
world so I guess we should have expected to see many searches in Japanese show up in the top queries. I've left them URL-encoded in the results -- you'll see them as weird lines with % in them.
- Apparently people search for movies and TV so they can spoil their fun
by reading about them on Wikipedia. Both of "movies" and "film" show up in the top keywords; Iron Man 1, 2, AND 3 all show up in the top search queries. I didn't expect this was a major use-case, but -- wikigroaning aside -- it's an interesting fact.
I'm sure we're only scratching the surface here. This is an exciting dataset, and I'm sure there's lots more to learn!
The full results:
- Top Referring Entry Domains:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Queries:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
- Top Referring Entry Search Keywords:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mo...
Questions are welcome!
-- David Schoonover dsc@wikimedia.org _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Maryana Pinchuk Associate Product Manager, Wikimedia Foundation wikimediafoundation.org
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
This data is great to see!
On Wed, Apr 24, 2013 at 8:40 PM, David Schoonover dsc@wikimedia.org wrote:
- Max Pageviews in one Session: 141,882
Re: this max pageviews/session number - does the job attempt to filter out bots? If not, curious to see how doing so would impact the numbers.
Also curious what impact google's recently deployed trial to send android / ios users to their own version of our mobile site from search results is having on traffic.