On 24/10/10 17:42, Aryeh Gregor wrote:
This term I'm taking a course in high-performance
computing
<http://cs.nyu.edu/courses/fall10/G22.2945-001/index.html>, and I have
to pick a topic for a final project. According to the assignment
<http://cs.nyu.edu/courses/fall10/G22.2945-001/final-project.pdf>,
"The only real requirement is that it be something in parallel." In
the class, we covered
* Microoptimization of single-threaded code (efficient use of CPU cache, etc.)
* Multithreaded programming using OpenMP
* GPU programming using OpenCL
I've occasionally wondered how hard it would be possible to
parallelize a parser. It's generally not done, despite the fact that
parsers are so slow and useful.
Some file formats can certainly be parsed in a parallel way, if you
partition them in the right way. For example, if you were parsing a
CSV file, you could partition on the line breaks. You can't do that by
scanning the whole file O(N) since that would defeat the purpose, but
you can seek ahead to a suitable byte position, and then scan forwards
for the next line break to partition at.
For more complex file formats, there are various approaches. Googling
tells me that this is a well-studied problem for XML.
Obviously for an assessable project, you don't want to dig yourself
into a hole too big to get out of. If you chose XML you could just
follow the previous work. JavaScript might be tractable. Attempting to
parse wikitext would be insane.
-- Tim Starling