On 10/08/10 15:16, Rob Lanphier wrote:
We have a single collection point for all of our logging, which is actually just a sampling of the overall traffic (designed to be roughly one out of every 1000 hits). The process is described here: http://wikitech.wikimedia.org/view/Squid_logging
My understanding is that this code is also involved somewhere: http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/ ...but I'm a little unclear what the relationship between that code and code in trunk/udplog.
Maybe you should find out who wrote the relevant code and set up the relevant infrastructure, and ask them directly. It's not difficult to find out who it was.
At any rate, there are a couple of problems with the way that it works:
- Once we saturate the NIC on the logging machine, the quality of
our sampling degrades pretty rapidly. We've generally had a problem with that over the past few months.
We haven't saturated any NICs.
-- Tim Starling