Re: [Wikitech-l] Wikimedia logging infrastructure

10 Aug 2010


      On 10-08-10 07:16, Rob Lanphier wrote:
...
At any rate, there are a couple of problems with the way that it works:

Once we saturate the NIC on the logging machine, the quality of

our sampling degrades pretty rapidly.  We've generally had a problem
with that over the past few months.
As already stated elsewhere, we didn't really saturate any NICs, just
some socket buffers. Because of the large number of configured log
pipes, the software (udp2log) could not empty the socket buffers fast
enough.
...
If this were your typical commercial operation, the answer would be
"why aren't you just logging into Streambase?" (or some other data
warehousing storage solution).  I'm not suggesting that we do that (or
even look at any of the solutions that bill themselves as open source
alternatives), since, while our needs are increasing, we still aren't
planning to be anywhere near as sophisticated as a lot of data
tracking orgs.  Still, it's worth asking questions about our existing
setup.  Should we be looking optimize our existing single-box setup,
extending our software to have multi-node collection, or looking at a
whole new collection strategy?
Besides the ideas that are currently being kicked around of improving or
rewriting the udp log collection software, there's also always the
short-term, easy option of sending a multicast UDP stream, and having
multiple collectors with distinct log pipes setup. E.g. one machine for
the sampled logging, and another, independent machine to do all the
special purpose log streams. I do like more efficient software solutions
rather than throwing more iron at the problem, though. :)
-- 
Mark Bergsma mark@wikimedia.org
Operations Engineer, Wikimedia Foundation

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Wikimedia logging infrastructure