Many of the things done for the statistical analysis of database dumps
should be suitable for parallelization (e.g. break the dump into
chunks, process the chunks in parallel and sum the results). You
could talk to Erik Zachte. I don't know if his code has already been
designed for parallel processing though.
Another option might be to look at the methods for compressing old
revisions (is [1] still current?).
I make heavy use of parallel processing in my professional work (not
related to wikis), but I can't really think of any projects I have at
hand that would be accessible and completable in a month.
-Robert Rohde
[1]
http://www.mediawiki.org/wiki/Manual:CompressOld.php
On Sun, Oct 24, 2010 at 5:42 PM, Aryeh Gregor
<Simetrical+wikilist(a)gmail.com> wrote:
This term I'm taking a course in high-performance
computing
<http://cs.nyu.edu/courses/fall10/G22.2945-001/index.html>, and I have
to pick a topic for a final project. According to the assignment
<http://cs.nyu.edu/courses/fall10/G22.2945-001/final-project.pdf>,
"The only real requirement is that it be something in parallel." In
the class, we covered
* Microoptimization of single-threaded code (efficient use of CPU cache, etc.)
* Multithreaded programming using OpenMP
* GPU programming using OpenCL
and will probably briefly cover distributed computing over multiple
machines with MPI. I will have access to a high-performance cluster
at NYU, including lots of CPU nodes and some high-end GPUs. Unlike
most of the other people in the class, I don't have any interesting
science projects I'm working on, so something useful to
MediaWiki/Wikimedia/Wikipedia is my first thought. If anyone has any
suggestions, please share. (If you have non-Wikimedia-related ones,
I'd also be interested in hearing about them offlist.) They shouldn't
be too ambitious, since I have to finish them in about a month, while
doing work for three other courses and a bunch of other stuff.
My first thought was to write a GPU program to crack MediaWiki
password hashes as quickly as possible, then use what we've studied in
class about GPU architecture to design a hash function that would be
as slow as possible to crack on a GPU relative to its PHP execution
speed, as Tim suggested a while back. However, maybe there's
something more interesting I could do.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l