Hi Vipul,
First, I find it quite interesting to see your extensive work on aggregating stats.grok.se data!
I can see how that fills a gap in our tool set.
A few comments on your extensive analysis (full page version):
=Media coverage=
You say "The decline received news coverage in January 2014, beginning with Wikipediocracy and followed by the Daily Dot, The Register, and The Examiner …"
The sites you mention present charts which at the time only showed non-mobile traffic. After this issue on apparent decline was raised, those charts were updated to also show mobile traffic. But these obsolete versions of the charts keep coming back to haunt us ;-)
See recent charts see e.g. http://stats.wikimedia.org/EN/ReportCardTopWikis.htm (last chart per language)
In some languages mobile traffic almost equals non-mobile traffic, e.g.
Japanese http://stats.wikimedia.org/EN/SummaryJA.htm and
Arabic http://stats.wikimedia.org/EN/SummaryAR.htm
Your tracking of historic page views for a set of pages about colors unfortunately suffers from the fact that WMF page view dumps did not contain mobile views either until relatively recently.
In September 2014 a separate set of page view dumps was released: https://dumps.wikimedia.org/other/pagecounts-all-sites/2014/ which does include mobile and zero traffic.
However it seems from http://stats.grok.se/about those extended dumps are not used at stats.grok.se even now.
=New content=
You say "The reason that overall pageview counts for Wikipedia haven’t declined much between 2013 and 2014 is the creation of new pages. In other words, the growth in views based on the creation of new pages is the main factor compensating for what would otherwise be an even steeper decline."
Frankly, purely untuitively I find that hard to believe, much of the new content will reach few users initially, as we accumulated such a wide range of topics, and depth of coverage, already in earlier years. (of course there are new pages that go trending imediately)
This could be tested by removing all titles from a recent page views dump which also occur above some threshold say a year ago, and tally what remains.
Using monthly aggregated files would make this more reliable. http://dumps.wikimedia.org/other/pagecounts-ez/merged/
This would be a rough assesment, order of magnitude, for views to new articles only.
Page view dumps also contain non existing page titles (which people type manually in the address bar, and a small proportion of articles which got deleted), hence the threshold.
=Trend last 24 months=
Looking at our monthly stats http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm , in paricular 2nd row with bar charts, the trend for last 24 months, the overall trend seems to be stable. Of course trends differ per language, but the overall trend is not sloping down. (first bar chart in that row, column Sigma)
Best regards,
Erik Zachte
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Vipul Naik
Sent: Thursday, March 26, 2015 15:49
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: [Analytics] Draft blog post on decline in Wikipedia pageviews: looking for analytics explanations
Hello all,
I've written a blog post about the decline in pageviews to some Wikipedia pages over the last few years. The blog post can be found here:
I'm planning to post a condensed version of this to a more public and widely-read forum, and would like to get any feedback on it from people who are more familiar with the analytics.
In particular, I'm interested in potential explanations of the pageview decline that I might have missed, such as changes to the way the analytics are logged, or changes to people's browser or operating system use patterns, or changes in bot frequency. I tried looking for data on trends in these, and although I could find some overall data on the use of various browsers and bot access, I couldn't get trend data that I could directly compare with the individual pageview data. So if you have easily usable sources of such trend data (stretching at least as far back as 2013), or if you have some theories based on looking at such data in the past, I'd really like to know.
So that you don't have to read the whole post to get an idea of what I'm talking about, I'll paste below two sample tables from the post that illustrate the sort of trends I am talking about:
Trends in colors (notice the sharp decline from 2013 to 2014; a monthly breakdown found in the post shows that the decline was steady from March 2013 to June 2014):
Page name
Pageviews in year 2014
Pageviews in year 2013
Pageviews in year 2012
Pageviews in year 2011
Pageviews in year 2010
Pageviews in year 2009
Pageviews in year 2008
Total
Percentage
Tags
6.9M
16.1
7.6M
17.8
2M
4.6
5.3M
12.3
1.7M
4
5.5M
12.8
6.2M
14.6
509K
1.2
4.2M
9.8
2.9M
6.8
Total
3.6M
7.1M
6.6M
6M
6.9M
6.5M
6M
43M
100
–
Percentage
8.5
16.7
15.4
14
16
15.3
14
100
–
–
Trends in the most populated countries (note the peaking in 2010 and decline from then to 2014):
Page name
Pageviews in year 2014
Pageviews in year 2013
Pageviews in year 2012
Pageviews in year 2011
Pageviews in year 2010
Pageviews in year 2009
Pageviews in year 2008
Total
Percentage
Tags
45M
9
73M
14.5
129M
25.7
28M
5.5
37M
7.4
28M
5.7
18M
3.5
44M
8.8
19M
3.8
50M
10
31M
6.1
Total
59M
69M
74M
74M
103M
65M
59M
502M
100
–
Percentage
11.7
13.8
14.7
14.7
20.4
12.9
11.8
100
–
–
I'm also pasting my hypothesis list from near the end of the post below:
1. Google’s Knowledge Graph: This is the hypothesis raised in Wikipediocracy, the Daily Dot, and the Register. The Knowledge Graph was introduced in 2012. Through 2013, Google rolled out snippets based on the Knowledge Graph in its search results. So if, for instance, you only wanted the birth date and nationality of a musician, Googling would show you that information right in the search results and you wouldn’t need to click through to the Wikipedia page.
2. Other means of accessing Wikipedia’s knowledge that don’t involve viewing it directly: For instance, Apple’s Siri tool uses data from Wikipedia, and people making queries to this tool may get information from Wikipedia without hitting the encyclopedia.
3. Changes to pageview-counting methods or to the number of false pageviews generated by bots and crawlers: While this probably isn’t sufficient to explain the entire decline, it could well be the case that some of the decline arises from bots and crawlers becoming more efficient in terms of the frequency with which they crawl Wikipedia and generate false pageviews. The official estimate is that about 15% of pageviews are bot-generated,
4. Substitution away from Wikipedia to other pages that are becoming more search-optimized and growing in number: For many topics, Wikipedia may have been clearly the best information source a few years back (as judged by Google), but the growth of niche information sources, as well as better search methods, have displaced it from its undisputed leadership position.
5. Substitution away from coarser, broader pages to finer, narrower pages within Wikipedia: While this cannot directly explain an overall decline in pageviews, it can explain a decline in pageviews for particular kinds of pages. Indeed, I suspect that this is partly what’s going on with the early decline of pageviews.
Thanks,
Vipul
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics