Tim Starling wrote:
But it's not going to happen unless someone gets
around to writing a
program which:
* Accepts URLs on stdin, separated by line breaks
Seems simple.
* Identifies plain page views
I assume you mean
any /wiki/XXXX url, with no '?'. Quite easy, too.
* Breaks them down into per-page counts as described
And do it really fast... If wgArticleId was also sent, sorting and using
the hashtable, would be easier.
* Provides a TCP query interface
I'd share
the memory hashtable between process, and simply add a
'reader' one. We can live with race conditions, too.
* Does all that for 30k req/s using less than 10% CPU
and 2GB memory
You mean 10% of the *cluster CPU*, isn't? ;)
Impossible?
We could start by profiling. Read
data for 5 minutes, compute it for 25.
This would low the rate to 1k req/s.