Re: [Wikitech-l] Parallel computing project

26 Oct 2010

      Many of the things done for the statistical analysis of database dumps
should be suitable for parallelization (e.g. break the dump into
chunks, process the chunks in parallel and sum the results).  You
could talk to Erik Zachte.  I don't know if his code has already been
designed for parallel processing though.
Another option might be to look at the methods for compressing old
revisions (is [1] still current?).
I make heavy use of parallel processing in my professional work (not
related to wikis), but I can't really think of any projects I have at
hand that would be accessible and completable in a month.
-Robert Rohde
[1] http://www.mediawiki.org/wiki/Manual:CompressOld.php
On Sun, Oct 24, 2010 at 5:42 PM, Aryeh Gregor
Simetrical+wikilist@gmail.com wrote:
...
This term I'm taking a course in high-performance computing
http://cs.nyu.edu/courses/fall10/G22.2945-001/index.html, and I have
to pick a topic for a final project.  According to the assignment
http://cs.nyu.edu/courses/fall10/G22.2945-001/final-project.pdf,
"The only real requirement is that it be something in parallel."  In
the class, we covered

Microoptimization of single-threaded code (efficient use of CPU cache, etc.)
Multithreaded programming using OpenMP
GPU programming using OpenCL

and will probably briefly cover distributed computing over multiple
machines with MPI.  I will have access to a high-performance cluster
at NYU, including lots of CPU nodes and some high-end GPUs.  Unlike
most of the other people in the class, I don't have any interesting
science projects I'm working on, so something useful to
MediaWiki/Wikimedia/Wikipedia is my first thought.  If anyone has any
suggestions, please share.  (If you have non-Wikimedia-related ones,
I'd also be interested in hearing about them offlist.)  They shouldn't
be too ambitious, since I have to finish them in about a month, while
doing work for three other courses and a bunch of other stuff.
My first thought was to write a GPU program to crack MediaWiki
password hashes as quickly as possible, then use what we've studied in
class about GPU architecture to design a hash function that would be
as slow as possible to crack on a GPU relative to its PHP execution
speed, as Tim suggested a while back.  However, maybe there's
something more interesting I could do.

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Parallel computing project