Brion Vibber wrote:
Andrew Gray wrote:
Is that "keep recording but ignore them", or disable in the sense of turn off logging totally? Just curious...
After a few months of having logs that you're not reading fill up the servers' hard disks every few days, you turn them off. :)
-- brion vibber (brion @ pobox.com)
How about a cron job that turns logging on, then off, intermittently? Eg.
For example, on each server, have a cron job that does this:
Every 5 mins: Is logging on? Then: turn it off Else: generate a random number If it's == 0 mod 1000: Then: turn logging on Else: do nothing
This way, you get representative short blocks of 5 minutes of traffic, kicking in once every three days or so on each of the 100 or so servers at random times of the day or night. This would also suffice for gross statistical analysis, and wouldn't require any modification of the squid code, just a short external shell script.
Log-rotation should handle the rest and prevent the disks filling up, since the average sampling rate would then be low enough to cope with.
-- Neil
These ideas sound creative :) Also, if the problem is *storage* of logs, rather than the server hit of *creating* the logs, then would it not be possible to write some log analysis routines that massively summarise the logs, then delete them? Such a thing could be run once a day.
It would only need to store the number of hits per day to each page, which, even with 1 million articles, would only be 4 megabytes, right? :)
Steve
On 3/31/06, Neil Harris usenet@tonal.clara.co.uk wrote:
Brion Vibber wrote:
Andrew Gray wrote:
Is that "keep recording but ignore them", or disable in the sense of turn off logging totally? Just curious...
After a few months of having logs that you're not reading fill up the servers' hard disks every few days, you turn them off. :)
-- brion vibber (brion @ pobox.com)
How about a cron job that turns logging on, then off, intermittently? Eg.
For example, on each server, have a cron job that does this:
Every 5 mins: Is logging on? Then: turn it off Else: generate a random number If it's == 0 mod 1000: Then: turn logging on Else: do nothing
This way, you get representative short blocks of 5 minutes of traffic, kicking in once every three days or so on each of the 100 or so servers at random times of the day or night. This would also suffice for gross statistical analysis, and wouldn't require any modification of the squid code, just a short external shell script.
Log-rotation should handle the rest and prevent the disks filling up, since the average sampling rate would then be low enough to cope with.
-- Neil
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org