Re: [Analytics] Per-namespace pageview data from half a year ago

8 Jan 2015


      On 8 January 2015 at 03:02, Gergo Tisza gtisza@wikimedia.org wrote:
...
On Wed, Jan 7, 2015 at 6:25 PM, Oliver Keyes okeyes@wikimedia.org wrote:
...
We get 120,000 requests a second. We're not storing them all for six
months. But we do have sampled logs going back that far.
That would be great! Are those in Hadoop?
On Wed, Jan 7, 2015 at 11:36 PM, Oliver Keyes okeyes@wikimedia.org wrote:
...
Not particularly, I don't think - except to remember that namespace
names are localised, so you're going to have a whale of a time
matching them (unless you just look for file endings, I guess).
In the case of NavigationTiming the nsid is recorded, so that wasn't a
problem; but it has only been added around May, so for the period before
that there is no namespace information at all.
Localized file namespace doesn't sound so bad - I can look up all
translations in Translatewiki, and construct a regexp or a similar
condition. There could be fun exceptions like namespace translations which
have changed recently, but I would be fine with assuming the error caused by
that is not significant.
Well, yes; a 750-option regex run over 6 million rows for a day of
data. A whale of a time ;p. You can also just use the API's
namespaceNames and namespaceAliases code.
...

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Per-namespace pageview data from half a year ago