On Thu, Jul 25, 2002 at 12:17:26PM +0200, Lars Aronsson wrote:
On Thu, 25 Jul 2002 lcrocker@nupedia.com wrote:
This made me think: Would it make sense to make a formal BNF grammar for the Wikipedia text format, so a LALR(1) parser could be made for it? Would that make any sense at all with PHP, or just be too hard to code and inflexible?
I'd love to have a formal grammar of some kind (I think regexps would be fine), and I agree with Jan that a totally wiki-specific syntax would be far better than out current mish-mash of HTML and wiki markup. But I'm not sure if it's not already too late to revisit those decisions.
But if it isn't, I'll be happy to discuss what a syntax might look like.
Wiki is still a new concept. Think how HTML was based on SGML, then evolved into HTML 2, 3, 4, 5, and then XML came along, because people understood from the HTML experience that SGML was overly complex.
There is a big world of PhpWikis out there with [single bracket] link syntax. There are other wiki implementations with different ideas about syntax. But no wiki is as big as Wikipedia, so this is the most concentrated amount of experience. This is where a format standard should or at least could start to form.
I tried to make formal grammar of Wikipedia, LALR, regexps of whatever, and I can tell you that it's next to impossible if almost arbitrary HTML markup is allowed.
Especially HTML tables syntax is difficult to parse, so maybe we should make our own ?
Without HTML tables I think that we could limit what kind of HTML is allowed and make some sane formal syntax.
It's not easy to design simple table markup that: * allows multicolumn and multirow cells * allows cell attributes * can nest tables * allows all constructs that HTML allows inside cells, i.e. multiple paragraphs, lists etc. * is readable * is easy to write
So I suggest that you check http://sf.net/projects/freetable I made this a while ago to allow simpler HTML tables. It seems to be working and is used by WebMake and WebsiteMetaLanguage.
Syntax looks something like this:
<wwwtable border=1> (1,1) column 1, row 1 (+,) the same column, next row (*,2) column 2 in any row (*,3) align=center columns 3 should be centered (1,3) Some centered text (3,3) Other centered text </wwwtable>
What is converted to: <table border=1> <tr> <td>column 1, row 1</td> <td>column 2 in any row</td> <td align=center>columns 3 should be centered Some centered text</td> </tr> <tr> <td>the same column, next row</td> <td>column 2 in any row</td> <td align=center>columns 3 should be centered</td> </tr> <tr> <td> </td> <td>column 2 in any row</td> <td align=center>columns 3 should be centered Other centered text</td> </tr> </table>
I'd say it's much better than what wikipedia currently uses.