Re: [Wikitech-l] template for number of page views?

21 Jan 2007

      Platonides wrote:
...
Tim Starling wrote:
...
But it's not going to happen unless someone gets around to writing a
program which:

Accepts URLs on stdin, separated by line breaks

Seems simple.
...

Identifies plain page views

I assume you mean any /wiki/XXXX url, with no '?'. Quite easy, too.
...

Breaks them down into per-page counts as described

And do it really fast... If wgArticleId was also sent, sorting and using 
the hashtable, would be easier.
...

Provides a TCP query interface

I'd share the memory hashtable between process, and simply add a 
'reader' one. We can live with race conditions, too.
There's only one log stream so I would think there would be only one
process. The task is log analysis, not log collection.
...
...

Does all that for 30k req/s using less than 10% CPU and 2GB memory

You mean 10% of the *cluster CPU*, isn't? ;)
10% of one processor. Maybe we could relax it if that proves to be
impossible, but there will only be one log host for now, and there might
be lots of log analysis tools, so we don't want any CPU hogs. Think C++,
not perl.
...
...
Impossible?
We could start by profiling. Read data for 5 minutes, compute it for 25.
This would low the rate to 1k req/s.
I could make a log snippet available for optimisation purposes, but
ultimately, it will have to work on the full stream. Sampling would give
you an unacceptable noise floor for the majority of those 25 million
articles.
(to nobody in particular) Thinking about the pipe buffer issues we had the
other day, it might make sense to recompile the kernel on henbane to have
a larger pipe buffer, to cut down on context switches. At 30k req/s, it
would fill every 1.2ms.
-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] template for number of page views?