There's a discussion on wikipedia-l about the exact syntax of an URL and whether a punctuation mark at the end of the URL should be considered part of the URL or not.
This made me think: Would it make sense to make a formal BNF grammar for the Wikipedia text format, so a LALR(1) parser could be made for it? Would that make any sense at all with PHP, or just be too hard to code and inflexible?
Only ten years ago, people would use C programming and YACC to solve problems like this, and reg.exp based parsing was considered just too inefficient.
wikitech-l@lists.wikimedia.org