Hi Matthew,

I don't know too many of the back-end details, but I've used the traffic files a fair bit. They count the raw requests, even for pages that don't exist or are just redirects to pages with normalized titles. "Main_page" is a redirect for "Main_Page". So when someone requests "Main_page" it gets counted and then so does "Main_Page" after the redirect occurs. Looking through one of the traffic files I see:

{en,MAIN_PAGE,2,36908}
{en,Main+Page,2,36907}
{en,Main-Page,5,92053}
{en,Main-page,2,36896}
{en,MainPage,4,72800}
{en,Main_Page,316431,8344162628}
{en,Main_Page/android-app:/org.wikipedia/http/en.m.wikipedia.org/wiki/Main_Page,21,544446}
{en,Main_page,181,4150875}
{en,Mainpage,13,183594}
{en,main_page,2,36934}


Date: Sat, 28 Feb 2015 15:14:47 -0500
From: ruttleym@googlemail.com
To: analytics@lists.wikimedia.org
Subject: [Analytics] Odd data in dumps

Hi Analytics,

I've been digging through some of the wiki page count files and found some strange results.
In several files, the Main_page visit count is vastly lower than expected:

mruttley$ cat pagecounts-20141101-170000 | grep "^en Main_page"
en Main_page 260 6202982
mruttley$ cat pagecounts-20150201-170000 | grep "^en Main_page"

en Main_page 200 4802139


Only 260 and 200 page views!

What do you reckon? Am I doing it wrong?

Best regards,

Matthew

_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics