New subject: Distributed process pool (fwd)

18 Mar 2006

...
  Date: Mar 17, 2006 10:24 AM
 Subject: [Wikitech-l] Distributed process pool
 To: wikitech-l(a)wikimedia.org

 Hello,

 Is there a way analyze the wikipedia logs to figure out what processes
 takes the most time? There is no immediate need, but I wanted to shoot
 off an idea to consider. If we were able to capture the processes that
 do take a heavy server load and push them onto a distributed process
 pool, would it help wikipedia or mediawiki in general? I'd imagine there
 would be a difference with process time over the speed of network
 traffic. Let say we determined that the code that creates a diff for two
 pages is a hog and can be put into the pool. We could use something like
 BOINC, http://boinc.berkeley.edu/, to standardize the pool. We can add
 the diff process to the pool as the server load gets heavy.  The use of
 BOINC is more specific to research tasks, and it would need to be
 different for mediawiki. I just used the idea to keep this message short
 to get your feedback.

Jonathan,

Jared and I are actually working on a project to research and build a
peer-based distributed hosting framework for large free-content sites like
Wiki{m,p}edia.  In a practical sense, we're using Wikipedia as a starting
point for the inquiry.

If we get accurate statistics on what kinds of processes these are and
what quantity of resources they consume, we might be able to integrate
these considerations into the simulation environment we'll be using to
evaluate various architechtures for distributed hosting.

I'm finally subscribing to the wikitech list, and I'll be interested in
learning about anything related that doesn't come up there.

-Erik

Re: [Wikitech-l] Distributed process pool (fwd)