2010/3/23 Aryeh Gregor Simetrical+wikilist@gmail.com:
I've never used PHP for real programming, but how difficult would it be to write a really simple, stupid first pass at a DFA parser? I suspect I'd need much more than three months to make it useful, but would it be possible to implement some coherent subset of the features? E.g., building the LR0 automaton, at least?
I don't think you'd need a "real" parser here. Mostly we just use preg_split() for this sort of thing. I'm not familiar with formal grammars and such, so I can't say what the concrete disadvantages of that approach are.
DFAs parse regular languages, which means those languages can also be expressed as regexes. In fact, the regexes accepted by the preg_*() functions allow certain extensions to the language theory definition of regular expressions, allowing them to describe certain non-regular languages as well. In short: preg_split() can do everything a DFA can do, and more. The only reason to use a DFA parser would be performance, but since the preg_*() functions are so heavily optimized I don't think that'll be an issue.
I suggested a Python port because http://www.mediawiki.org/wiki/Summer_of_Code_2010#MediaWiki_core lists it as a potential project idea. I was under the impression that people around here did not want to leave texvc in OCaml. Is this wrong?
No, it's right. Conrad is crazy. :P
Having it in a language no one understands is a bad thing and leads to maintenance not happening, so yeah, we definitely want it rewritten in PHP. If the PHP implementation turns out to be too slow to run on WMF, for instance, we could do a C++ port à la wikidiff2 (a C++ port of our ludicrously slow PHP diff implementation).
Roan Kattouw (Catrope)