Phil Boswell wrote:
I was wondering whether we could define a [[en:Parsing
expression grammar]]
which could then be made into a [[en:Packrat parser]].
As far as I can tell from those articles, and the references therein, this
would help a great deal.
I looked into this a while ago when I was dabbling at writing a parser
for wikitext, and came to the conclusion it was probably not practical
for MediaWiki use. The advantage of packrat parsers, of course, is that
they allow (the equivalent of) full GLR grammar, not some subset like
LR(1), so writing a grammar is not very hard even for a complex one like
wikitext.
The disadvantage is that the RAM needed for the parser state grows
linearly in the *total input size*---not in the depth of the parse tree
as with most---and the constant on that is fairly large as well. For
some of our larger pages, back-of-the-envelope calculations using the
data in the master's thesis about packrat parsing yield estimates of
3-6MB of RAM needed for the parser state. I imagine Wikipedia sometimes
parses enough pages simultaneously to be eating up gigabytes of RAM at
these levels.
-Mark