Hi Vipul,

 

First, I find it quite interesting to see your extensive work on aggregating stats.grok.se data!

I can see how that fills a gap in our tool set.

 

A few comments on your extensive analysis (full page version):

 

=Media coverage=

 

You say "The decline received news coverage in January 2014, beginning with Wikipediocracy and followed by the Daily DotThe Register, and The Examiner …"

The sites you mention present charts which at the time only showed non-mobile traffic. After this issue on apparent decline was raised, those charts were updated to also show mobile traffic. But these obsolete versions of the charts keep coming back to haunt us ;-)


See recent charts see e.g. http://stats.wikimedia.org/EN/ReportCardTopWikis.htm (last chart per language)

 

In some languages mobile traffic almost equals non-mobile traffic, e.g.

Japanese http://stats.wikimedia.org/EN/SummaryJA.htm and

Arabic http://stats.wikimedia.org/EN/SummaryAR.htm

 

=Stats.grok.se=

 

Your tracking of historic page views for a set of pages about colors unfortunately suffers from the fact that WMF page view dumps did not contain mobile views either until relatively recently.

In September 2014 a separate set of page view dumps was released: https://dumps.wikimedia.org/other/pagecounts-all-sites/2014/ which does include mobile and zero traffic.

However it seems from http://stats.grok.se/about those extended dumps are not used at stats.grok.se even now.

 

=New content=

 

You say "The reason that overall pageview counts for Wikipedia haven’t declined much between 2013 and 2014 is the creation of new pages. In other words, the growth in views based on the creation of new pages is the main factor compensating for what would otherwise be an even steeper decline."

 

Frankly, purely untuitively I find that hard to believe, much of the new content will reach few users initially, as we accumulated such a wide range of topics, and depth of coverage, already in earlier years. (of course there are new pages that go trending imediately)   

 

This could be tested by removing all titles from a recent page views dump which also occur above some threshold say a year ago, and tally what remains.

Using monthly aggregated files would make this more reliable. http://dumps.wikimedia.org/other/pagecounts-ez/merged/

This would be a rough assesment, order of magnitude, for views to new articles only.

Page view dumps also contain non existing page titles (which people type manually in the address bar, and a small proportion of articles which got deleted), hence the threshold.

 

=Trend last 24 months=

 

Looking at our monthly stats http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm , in paricular 2nd row with bar charts, the trend for last 24 months, the overall trend seems to be stable. Of course trends differ per language, but the overall trend is not sloping down. (first bar chart in that row, column Sigma)

 

Best regards,

 

Erik Zachte

 

 

 

 

From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Vipul Naik
Sent: Thursday, March 26, 2015 15:49
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: [Analytics] Draft blog post on decline in Wikipedia pageviews: looking for analytics explanations

 

Hello all,

 

I've written a blog post about the decline in pageviews to some Wikipedia pages over the last few years. The blog post can be found here:

 

http://vipulnaik.com/blog/the-great-decline-in-wikipedia-pageviews-full-version/

 

I'm planning to post a condensed version of this to a more public and widely-read forum, and would like to get any feedback on it from people who are more familiar with the analytics.

 

In particular, I'm interested in potential explanations of the pageview decline that I might have missed, such as changes to the way the analytics are logged, or changes to people's browser or operating system use patterns, or changes in bot frequency. I tried looking for data on trends in these, and although I could find some overall data on the use of various browsers and bot access, I couldn't get trend data that I could directly compare with the individual pageview data. So if you have easily usable sources of such trend data (stretching at least as far back as 2013), or if you have some theories based on looking at such data in the past, I'd really like to know.

 

So that you don't have to read the whole post to get an idea of what I'm talking about, I'll paste below two sample tables from the post that illustrate the sort of trends I am talking about:

 

Trends in colors (notice the sharp decline from 2013 to 2014; a monthly breakdown found in the post shows that the decline was steady from March 2013 to June 2014):

Page name

Pageviews in year 2014

Pageviews in year 2013

Pageviews in year 2012

Pageviews in year 2011

Pageviews in year 2010

Pageviews in year 2009

Pageviews in year 2008

Total

Percentage

Tags

Black

431K

1.5M

1.3M

778K

900K

1M

958K

6.9M

16.1

Colors

Blue

710K

1.3M

1M

987K

1.2M

1.2M

1.1M

7.6M

17.8

Colors

Brown

192K

284K

318K

292K

308K

300K

277K

2M

4.6

Colors

Green

422K

844K

779K

707K

882K

885K

733K

5.3M

12.3

Colors

Orange

133K

181K

251K

259K

275K

313K

318K

1.7M

4

Colors

Purple

524K

906K

847K

895K

865K

841K

592K

5.5M

12.8

Colors

Red

568K

797K

912K

1M

1.1M

873K

938K

6.2M

14.6

Colors

Violet

56K

96K

75K

77K

69K

71K

65K

509K

1.2

Colors

White

301K

795K

615K

545K

788K

575K

581K

4.2M

9.8

Colors

Yellow

304K

424K

453K

433K

452K

427K

398K

2.9M

6.8

Colors

Total

3.6M

7.1M

6.6M

6M

6.9M

6.5M

6M

43M

100

Percentage

8.5

16.7

15.4

14

16

15.3

14

100

 

Trends in the most populated countries (note the peaking in 2010 and decline from then to 2014):

Page name

Pageviews in year 2014

Pageviews in year 2013

Pageviews in year 2012

Pageviews in year 2011

Pageviews in year 2010

Pageviews in year 2009

Pageviews in year 2008

Total

Percentage

Tags

China

5.7M

6.8M

7.8M

6.1M

6.9M

5.7M

6.1M

45M

9

Countries

India

8.8M

12M

12M

11M

14M

8.8M

7.6M

73M

14.5

Countries

United States

13M

15M

18M

18M

34M

16M

15M

129M

25.7

Countries

Indonesia

5.3M

5.2M

3.7M

3.6M

4.2M

3.1M

2.5M

28M

5.5

Countries

Brazil

4.8M

4.9M

5.3M

5.5M

7.5M

4.9M

4.3M

37M

7.4

Countries

Pakistan

2.9M

4.5M

4.4M

4.3M

5.2M

4M

3.2M

28M

5.7

Countries

Bangladesh

2.2M

2.9M

3M

2.8M

2.9M

2.2M

1.7M

18M

3.5

Countries

Russia

5.6M

5.6M

6.5M

6.8M

8.6M

5.4M

5.8M

44M

8.8

Countries

Nigeria

2.6M

2.6M

2.9M

3M

3.5M

2.6M

2M

19M

3.8

Countries

Japan

4.8M

6.4M

6.5M

8.3M

10M

7.3M

6.6M

50M

10

Countries

Mexico

3.1M

3.9M

4.3M

4.3M

5.9M

4.7M

4.5M

31M

6.1

Countries

Total

59M

69M

74M

74M

103M

65M

59M

502M

100

Percentage

11.7

13.8

14.7

14.7

20.4

12.9

11.8

100

I'm also pasting my hypothesis list from near the end of the post below:

1.      Google’s Knowledge Graph: This is the hypothesis raised in Wikipediocracy, the Daily Dot, and the Register. The Knowledge Graph was introduced in 2012. Through 2013, Google rolled out snippets based on the Knowledge Graph in its search results. So if, for instance, you only wanted the birth date and nationality of a musician, Googling would show you that information right in the search results and you wouldn’t need to click through to the Wikipedia page.

2.      Other means of accessing Wikipedia’s knowledge that don’t involve viewing it directly: For instance, Apple’s Siri tool uses data from Wikipedia, and people making queries to this tool may get information from Wikipedia without hitting the encyclopedia.

3.      Changes to pageview-counting methods or to the number of false pageviews generated by bots and crawlers: While this probably isn’t sufficient to explain the entire decline, it could well be the case that some of the decline arises from bots and crawlers becoming more efficient in terms of the frequency with which they crawl Wikipedia and generate false pageviews. The official estimate is that about 15% of pageviews are bot-generated,

4.      Substitution away from Wikipedia to other pages that are becoming more search-optimized and growing in number: For many topics, Wikipedia may have been clearly the best information source a few years back (as judged by Google), but the growth of niche information sources, as well as better search methods, have displaced it from its undisputed leadership position.

5.      Substitution away from coarser, broader pages to finer, narrower pages within Wikipedia: While this cannot directly explain an overall decline in pageviews, it can explain a decline in pageviews for particular kinds of pages. Indeed, I suspect that this is partly what’s going on with the early decline of pageviews.

Thanks,

Vipul