Re: [Wikitech-l] Parallel computing project

26 Oct 2010

Στις 26-10-2010, ημέρα Τρι, και ώρα 16:25 +0200, ο/η Platonides έγραψε:
...
  Robert Rohde wrote:
  Many of the things done for the statistical
analysis of database dumps
 should be suitable for parallelization (e.g. break the dump into
 chunks, process the chunks in parallel and sum the results).  You
 could talk to Erik Zachte.  I don't know if his code has already been
 designed for parallel processing though.  
 I don't think it's a good candidate since you are presumably using
 compressed files, and its decompression linearises it (and is most
 likely the bottleneck, too). 
If one were clever (and I have some code that would enable one to be
clever), one could seek to some point in the (bzip2-compressed) file and
uncompress from there before processing.  Running a bunch of jobs each
decompressing only their small piece then becomes feasible.  I don't
have code that does this for gz or 7z; afaik these do not do compression
in discrete blocks.

Ariel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Parallel computing project