Tables, again ... - Wikitext-l

30 Jun 2011


      Now, when there are more people on this list I thought I might bring up
tables for discussion again.  There are two things that I would like to
have specifyed: treatment of "table garbage", and mixing of table flavours.
There are two flavours of tables: html-tables and wikitext tables.  A
wikitext table has the structure:
^'{|'
table garbage
^'|' block element contents
^'|-'
table garbage
^'|}'
An html table has the structure:
'<table>'
table garbage
'<tr>'
table garbage
'<td>' block element contents '</td>'
table garbage
'</tr>'
table garbage
'</table>'
MediaWiki processes tables by extracting any recognizable part of the
table from text, and writing out the rendered html at a position right
_after_ the position where the table appears.  The things that I call
"table garbage" are left in place and will thus suprisingly appear
before the table in the rendered output.  (Table garbage is parsed the
same way as block element contents.)
1. How should the treatment of table garbage be specified?  My
   recommendation is to change the semantics compared to the original
   and just specify that table garbage should be ignored.
The behavior of mediawiki is that the internal table tokens ('<td>',
'<tr>' etc for html tables and ^'|', ^'|-' etc for wikitext tables) are
activated when opening up a table of the corresponding type.  But when
nesting tables of different types, the internal table tokens can be used
more or less interchangeably.
<table>
<td>
{|
| cell <td> cell <tr><td> cell
|-
| cell
|}
</table>
renders as this html:
<table>
<td>
<table>
<tr>
<td> cell </td><td> cell <tr></td><td> cell
</td></tr>
<tr>
<td> cell
</td></tr></table>
</table>
I have previously suggested that it should be specifyed that only the
internal table tokens of the right type can used.  Thus, opening a
wikitext table inside an html table would activate parsing of the
wikitext table tokens and deactivate parsing of html table tokens.  This
is a behavior that I find appealing.  But since PEGs are currently in
fashion, this is a behavior that might be problematic to implement.  So
there is also a third alternative: implicitly terminate the inner table
when encountering table tokens from the outer table, which should be
straightforward to implement with a PEG grammar.
So to summarize the alternatives:
1. Once both types of tables have been opened, use internal tokens
interchangeably.
2. Let inner tables take precedence and disable tokens of outer table type.
3. Let outer tables take precedence and implicitly terminate inner table
if table tokens of outer table type is encountered.
Which should be specified?  I recommend 2 or 3.
Best regards,
Andreas Jonsson