Awesome!! I forwarded it to DBpedia developers. I think, the Parsoid project might interest some of our people. How is it possible to join? Or is it Wikimedia internal development? Is there a parsoid mailing list?
You are very welcome to join- http://www.mediawiki.org/wiki/Parsoid has most of the information to get you started. We are using this mailing list for discussions. You can also catch me in the #mediawiki IRC channel as gwicke.
Can JS handle this? I read somewhere, that it was several magnitudes slower than other languages... Maybe this is not true for node-JS.
Competition between JS runtimes has improved performance a lot in the last years. See for example the fun Computer Language Benchmarks Game: http://shootout.alioth.debian.org/u32/which-programming-languages-are-fastes...
It is still hard to beat C or C++ performance for memory-dominated tasks of course.
All the data in our mappings wiki was created to "mark up" Wikipedia template parameters. So please try to reuse it. I think there are almost 200 active users in http://mappings.dbpedia.org/ who have added extra parsing information to thousands of templates in Wikipedia across 20 languages. You can download and reuse it or we can also add your requirements to it.
Our primary requirement is marking up all top-level template arguments (and generated content like image thumbnails) to enable editing in the visual editor. The editor could however also benefit from type information, so refining vocabulary information (and perhaps mapping into an ontology) is also interesting to us. We should definitely collaborate on this.
What do you think about embedding schema information (maybe RDFa profiles?) into the noinclude section of a template page?
Gabriel