Virgil Ierubino wrote:
There is no clear purpose to expressing Wikitext in
EBNF, simply
because the possibilities of the use of such an expression are
undefined and large. The EBNF expression could be used to create
a validator, and a validator could be used to warn users when
they've accidentally typed bad syntax - or simply to determine
when a page is and is not written in Wikitext.
This is a valid purpose or problem, but an EBNF parser is not a
good solution to it. The kind of parsers you build with YACC or
Bison, or even hand-written recursive descent parsers, are good at
parsing correct language, but not very good at reporting syntax
errors in a way that is useful for corrections. One difference
between GCC (the GNU C compiler) and many (early) commercial C/C++
compilers is that GCC gives very useful error messages, because it
knows what mistakes developers typically make. This wisdom is
normally not encoded in a BNF grammar.
Suppose you wanted to solve the problem described above. You'd
start by downloading a Wikipedia database dump containing the
complete edit history. For practical reasons, you'd start with
one of the smaller languages, such as Latin or Faroese. You'd
then go through every edit to find patterns of the most common
minor corrections. Perhaps mismatching ''' and '' or === and ==,
which result in <i>'and</i> or <h2>=and</h2>. Then you'd
go
through a database dump of the current versions to find such error
patterns. Obviously, regexp matching is superior to any BNF parser
for this. You can either post a list of errors found, or make a
toolserver application where a user can click for alternative
corrections that are semi-automatically applied. This is hard
work. It is useful work. You could be busy for a year doing
this. And it would make Wikipedia better. The actual
implementation would use whatever languages and tools that are
best fit to solve the problem. You'd be the hero at the next
Wikimania conference when you present your paper about this work.
But the project doesn't start with creating an EBNF parser.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se