[Foundation-l] Wikimedia Projects Growth Animated

Wed Mar 26 18:57:40 UTC 2008

Hi!

this may sound as a heresy, but for some jobs, that are short in time- 
span, but need lots of CPU capacity we could try using Amazon's EC2  
or any other grid computing service (maybe some university wants to  
donate cluster time?).
That would be much cheaper than allocating high-performance-high- 
bucks hardware to projects like this.

Really, we have a capable cluster that has extra-CPU capacity for  
distributed tasks, but anything what needs lots-of-memory in single  
location simply doesn't scale.
Most of our tasks are scaled out, where lots of smaller machines can  
do lots of big work, so this wikistats job is the only one which  
cannot be distributed this way.

Eventually we may run Hadoop,Gearman or similar framework for  
statistics job distribution, but really, first of all the actual  
tasks have to be minimized to smaller segments, for map/reduce  
operation, if needed.
I don't see many problems (except setting the whole grid up)  
allocating job execution resources during off peak, on 10, 20 or 100  
nodes, as long as it doesn't have exceptional resource needs on a  
single node.  It would be very nice practice for many other future  
jobs too.

BR,
-- 
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]