> We don't have a machine dedicated to stats generation yet; core
> operations still take priority, and old machines aren't suitable for a
> script that apparently needs massive amounts of memory to process its
data.
Brion, I assume you can't wait to stop doing whatever you are doing and
verify the 'apparently' yourself ;)
Scripts are online since years, albeit not (yet) in SVN.
Perl is known for its tendency to spend memory in order to save time (both
in execution and in development).
Hashes are the main culprit, or blessing, which way you look at it.
Disabling edit counts for anons would make a substantial difference, with
limited effect on the output.
But that would be a stop gap solution and shortsighted. There are other very
interesting stats that would fill the space.
To name one example: I would love to generate statistics on how the content
of Wikipedia becomes less geeky, by analysing and visualising trends in
edits/articles/views per category (cluster) per month.
There is a saying that instead of spending time on optimizing perl it is
more efficient to take a job washing cars to save for adequate hardware.
Had I taken that advice I would have saved a lot of time to do more useful
work than over-optimizing a job that has outgrown its current platform.
We are not talking tens of thousands of dollars, just a run of the mill
reasonably fast machine with above average memory and a above average
harddisk, yet both in the commodity range.
Many 15 year olds in the first world spend more on their 3D gaming machine.
Erik Zachte