2010/3/23 Aryeh Gregor <Simetrical+wikilist(a)gmail.com>om>:
I've never
used PHP for real programming, but how difficult would it be
to write a really simple, stupid first pass at a DFA parser? I suspect
I'd need much more than three months to make it useful, but would it be
possible to implement some coherent subset of the features? E.g.,
building the LR0 automaton, at least?
I don't think you'd need a "real" parser here. Mostly we just use
preg_split() for this sort of thing. I'm not familiar with formal
grammars and such, so I can't say what the concrete disadvantages of
that approach are.
DFAs parse regular languages, which means those languages can also be
expressed as regexes. In fact, the regexes accepted by the preg_*()
functions allow certain extensions to the language theory definition
of regular expressions, allowing them to describe certain non-regular
languages as well. In short: preg_split() can do everything a DFA can
do, and more. The only reason to use a DFA parser would be
performance, but since the preg_*() functions are so heavily optimized
I don't think that'll be an issue.
I suggested a
Python port because
http://www.mediawiki.org/wiki/Summer_of_Code_2010#MediaWiki_core
lists it as a potential project idea. I was under the impression that
people around here did not want to leave texvc in OCaml. Is this wrong?
No, it's right. Conrad is crazy. :P
Having it in a language no one understands is a bad thing and leads to
maintenance not happening, so yeah, we definitely want it rewritten in
PHP. If the PHP implementation turns out to be too slow to run on WMF,
for instance, we could do a C++ port à la wikidiff2 (a C++ port of our
ludicrously slow PHP diff implementation).
Roan Kattouw (Catrope)