Re: [Analytics] Per-namespace pageview data from half a year ago

8 Jan 2015


      On Thu, Jan 8, 2015 at 3:02 AM, Gergo Tisza gtisza@wikimedia.org wrote:
...
On Wed, Jan 7, 2015 at 6:25 PM, Oliver Keyes okeyes@wikimedia.org wrote:
...
We get 120,000 requests a second. We're not storing them all for six
months. But we do have sampled logs going back that far.
That would be great! Are those in Hadoop?
They're on stat1002 in /a/squid/archive/sampled/
And the webrequest format is:
https://wikitech.wikimedia.org/wiki/Cache_log_format
Note that the namespaces only show up in the title of the pages in the raw
URL, so it's still going to be a bit painful to parse them out.  But folks
around here have done stuff like that, maybe someone can chime in with some
handy scripts?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Per-namespace pageview data from half a year ago