On 05/07/06, Steve Bennett stevage@gmail.com wrote:
- why those are not kept
Last I heard, disk space. Logging each of ~11-12 thousand hits per second => full disk.
- whether they could be turned on for brief periods (like 24 hours) to
allow periodic data collection
Possible, see below...
- what alternative solutions might exist
It seems like there are at least three different places where log data could be collected:
- on the mysql database - probably very "expensive"
Forget it.
- on mediawiki (ie, in php code) - probably much more attractive,
could add tuning to only record every 10th or 100th hit or whatever
You can't "store" stuff in PHP, it would have to log to the file system or elsewhere anyway.
- on the "squids" (presumably, proxy servers) - no idea
Squid in this case refers to the Squid web caching software. "The Squids" is our semi affectionate name for bundles of caching proxies that stop millions of queries from killing the rest of our cluster.
Is there absolutely no way that data could be collected at any of these points, even for short periods, and even filtered?
It's been thrown about before a lot, and a lot of "perhaps" is said, but not a lot of work is done. Periodic statistics collection could mean the sample is not quite consistent, but...meh.
There are lots of people stating it can be done, but not a lot of them doing it.
Rob Church