Hello all,

I've written a blog post about the decline in pageviews to some Wikipedia pages over the last few years. The blog post can be found here:

http://vipulnaik.com/blog/the-great-decline-in-wikipedia-pageviews-full-version/

I'm planning to post a condensed version of this to a more public and widely-read forum, and would like to get any feedback on it from people who are more familiar with the analytics.

In particular, I'm interested in potential explanations of the pageview decline that I might have missed, such as changes to the way the analytics are logged, or changes to people's browser or operating system use patterns, or changes in bot frequency. I tried looking for data on trends in these, and although I could find some overall data on the use of various browsers and bot access, I couldn't get trend data that I could directly compare with the individual pageview data. So if you have easily usable sources of such trend data (stretching at least as far back as 2013), or if you have some theories based on looking at such data in the past, I'd really like to know.

So that you don't have to read the whole post to get an idea of what I'm talking about, I'll paste below two sample tables from the post that illustrate the sort of trends I am talking about:

Trends in colors (notice the sharp decline from 2013 to 2014; a monthly breakdown found in the post shows that the decline was steady from March 2013 to June 2014):

Page name Pageviews in year 2014 Pageviews in year 2013 Pageviews in year 2012 Pageviews in year 2011 Pageviews in year 2010 Pageviews in year 2009 Pageviews in year 2008 Total Percentage Tags
Black 431K 1.5M 1.3M 778K 900K 1M 958K 6.9M 16.1 Colors
Blue 710K 1.3M 1M 987K 1.2M 1.2M 1.1M 7.6M 17.8 Colors
Brown 192K 284K 318K 292K 308K 300K 277K 2M 4.6 Colors
Green 422K 844K 779K 707K 882K 885K 733K 5.3M 12.3 Colors
Orange 133K 181K 251K 259K 275K 313K 318K 1.7M 4 Colors
Purple 524K 906K 847K 895K 865K 841K 592K 5.5M 12.8 Colors
Red 568K 797K 912K 1M 1.1M 873K 938K 6.2M 14.6 Colors
Violet 56K 96K 75K 77K 69K 71K 65K 509K 1.2 Colors
White 301K 795K 615K 545K 788K 575K 581K 4.2M 9.8 Colors
Yellow 304K 424K 453K 433K 452K 427K 398K 2.9M 6.8 Colors
Total 3.6M 7.1M 6.6M 6M 6.9M 6.5M 6M 43M 100
Percentage 8.5 16.7 15.4 14 16 15.3 14 100

Trends in the most populated countries (note the peaking in 2010 and decline from then to 2014):

Page name Pageviews in year 2014 Pageviews in year 2013 Pageviews in year 2012 Pageviews in year 2011 Pageviews in year 2010 Pageviews in year 2009 Pageviews in year 2008 Total Percentage Tags
China 5.7M 6.8M 7.8M 6.1M 6.9M 5.7M 6.1M 45M 9 Countries
India 8.8M 12M 12M 11M 14M 8.8M 7.6M 73M 14.5 Countries
United States 13M 15M 18M 18M 34M 16M 15M 129M 25.7 Countries
Indonesia 5.3M 5.2M 3.7M 3.6M 4.2M 3.1M 2.5M 28M 5.5 Countries
Brazil 4.8M 4.9M 5.3M 5.5M 7.5M 4.9M 4.3M 37M 7.4 Countries
Pakistan 2.9M 4.5M 4.4M 4.3M 5.2M 4M 3.2M 28M 5.7 Countries
Bangladesh 2.2M 2.9M 3M 2.8M 2.9M 2.2M 1.7M 18M 3.5 Countries
Russia 5.6M 5.6M 6.5M 6.8M 8.6M 5.4M 5.8M 44M 8.8 Countries
Nigeria 2.6M 2.6M 2.9M 3M 3.5M 2.6M 2M 19M 3.8 Countries
Japan 4.8M 6.4M 6.5M 8.3M 10M 7.3M 6.6M 50M 10 Countries
Mexico 3.1M 3.9M 4.3M 4.3M 5.9M 4.7M 4.5M 31M 6.1 Countries
Total 59M 69M 74M 74M 103M 65M 59M 502M 100
Percentage 11.7 13.8 14.7 14.7 20.4 12.9 11.8 100

I'm also pasting my hypothesis list from near the end of the post below:

  1. Google’s Knowledge Graph: This is the hypothesis raised in Wikipediocracy, the Daily Dot, and the Register. The Knowledge Graph was introduced in 2012. Through 2013, Google rolled out snippets based on the Knowledge Graph in its search results. So if, for instance, you only wanted the birth date and nationality of a musician, Googling would show you that information right in the search results and you wouldn’t need to click through to the Wikipedia page.
  2. Other means of accessing Wikipedia’s knowledge that don’t involve viewing it directly: For instance, Apple’s Siri tool uses data from Wikipedia, and people making queries to this tool may get information from Wikipedia without hitting the encyclopedia.
  3. Changes to pageview-counting methods or to the number of false pageviews generated by bots and crawlers: While this probably isn’t sufficient to explain the entire decline, it could well be the case that some of the decline arises from bots and crawlers becoming more efficient in terms of the frequency with which they crawl Wikipedia and generate false pageviews. The official estimate is that about 15% of pageviews are bot-generated,
  4. Substitution away from Wikipedia to other pages that are becoming more search-optimized and growing in number: For many topics, Wikipedia may have been clearly the best information source a few years back (as judged by Google), but the growth of niche information sources, as well as better search methods, have displaced it from its undisputed leadership position.
  5. Substitution away from coarser, broader pages to finer, narrower pages within Wikipedia: While this cannot directly explain an overall decline in pageviews, it can explain a decline in pageviews for particular kinds of pages. Indeed, I suspect that this is partly what’s going on with the early decline of pageviews.

Thanks,

Vipul