Hi Analytics,
I've been digging through some of the wiki page count files and found some strange results. In several files, the Main_page visit count is vastly lower than expected:
mruttley$ cat pagecounts-20141101-170000 | grep "^en Main_page"
en Main_page 260 6202982
mruttley$ cat pagecounts-20150201-170000 | grep "^en Main_page"
en Main_page 200 4802139
Only 260 and 200 page views!
What do you reckon? Am I doing it wrong?
Best regards,
Matthew
Hi Analytics,
Just as a follow-up to my previous email, I downloaded all the files for 2015-02-01 and found that the total is 4245. However, the stats.grok.se page (http://stats.grok.se/json/en/201502/Main_page) lists their findings as 13160432.
Would appreciate any pointers on this!
Best regards,
Matthew
On 28 February 2015 at 15:14, Matthew Ruttley ruttleym@googlemail.com wrote:
Hi Analytics,
I've been digging through some of the wiki page count files and found some strange results. In several files, the Main_page visit count is vastly lower than expected:
mruttley$ cat pagecounts-20141101-170000 | grep "^en Main_page"
en Main_page 260 6202982
mruttley$ cat pagecounts-20150201-170000 | grep "^en Main_page"
en Main_page 200 4802139
Only 260 and 200 page views!
What do you reckon? Am I doing it wrong?
Best regards,
Matthew
Hi Matthew,
I don't know too many of the back-end details, but I've used the traffic files a fair bit. They count the raw requests, even for pages that don't exist or are just redirects to pages with normalized titles. "Main_page" is a redirect for "Main_Page". So when someone requests "Main_page" it gets counted and then so does "Main_Page" after the redirect occurs. Looking through one of the traffic files I see:
{en,MAIN_PAGE,2,36908} {en,Main+Page,2,36907} {en,Main-Page,5,92053} {en,Main-page,2,36896} {en,MainPage,4,72800} {en,Main_Page,316431,8344162628} {en,Main_Page/android-app:/org.wikipedia/http/en.m.wikipedia.org/wiki/Main_Page,21,544446} {en,Main_page,181,4150875} {en,Mainpage,13,183594} {en,main_page,2,36934}
Date: Sat, 28 Feb 2015 15:14:47 -0500 From: ruttleym@googlemail.com To: analytics@lists.wikimedia.org Subject: [Analytics] Odd data in dumps
Hi Analytics, I've been digging through some of the wiki page count files and found some strange results.In several files, the Main_page visit count is vastly lower than expected: mruttley$ cat pagecounts-20141101-170000 | grep "^en Main_page" en Main_page 260 6202982 mruttley$ cat pagecounts-20150201-170000 | grep "^en Main_page"
en Main_page 200 4802139
Only 260 and 200 page views! What do you reckon? Am I doing it wrong? Best regards, Matthew
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Getting aggregation right is hard. :) The only wiki I know whose counts are surely wrong is wmfwiki: https://phabricator.wikimedia.org/T51266
Nemo
Hi Matthew,
Depending on what you are trying to achieve, you should use something like this:
grep -i "^en Main.page\s" pagecounts-20150201-170000
en Main+Page 1 17365
en Main-Page 2 34736
en Main_Page 726807 14563554828
en Main_page 200 4802139
en main_page 2 34781
On Sat, Feb 28, 2015 at 9:14 PM, Matthew Ruttley ruttleym@googlemail.com wrote:
Hi Analytics,
I've been digging through some of the wiki page count files and found some strange results. In several files, the Main_page visit count is vastly lower than expected:
mruttley$ cat pagecounts-20141101-170000 | grep "^en Main_page"
en Main_page 260 6202982
mruttley$ cat pagecounts-20150201-170000 | grep "^en Main_page"
en Main_page 200 4802139
Only 260 and 200 page views!
What do you reckon? Am I doing it wrong?
Best regards,
Matthew
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics