On HTML elements in wikitext - Wikitext-l

17 Aug 2010

Mixing HTML elements with wikitext is a grey area.  How the HTML tags
in the wikitext interact with the wikitext elements does not seem very
well defined.  Therefore, I will make up some rules
where I try to preserve any legitimate use of html elements, but with
some restrictions to avoid some problems:

1. Do not allow html block elements inside wikitext lists.  For examle
    this is no longer allowed:

    * item1 <li> item2

    The problem is that it becomes very hard to determine where the
    wikitext list element ends.  Should it run until the inner block
    elements are properly closed?  Should the inner block elements be
    implicitly closed at the end of the list element?  What if the
    inner html is malformed?  What if a new wikitext list element is
    opened before the inner block elements is closed?  The current
    behavior does not seem very well defined, it seems to generate
    garbage html in most cases.  Wikitext lists inside html list elements
    should be ok though:

<ul><li>item1</li>
<li>
    * item 2.1 ( </li> tag will not be active here )
</li>
</ul>

2. Do not allow table html tags inside wikitext tables, unless opened
    up by a nested html table, which disables wikitext table tokens
    until the html table is properly closed:

    {|
     | col1 <td> tag disabled, so still col1
     | <table><td> implicitly open up <tbody> and <tr>
    |} wikitext table tokens disabled, thus still in html table.
    {|
     | However allow wikitext tables to nest inside html tables. Here 
the html
       tokens, <td>,<tr> etc., are once again disabled.
     |}
</table> (inner close tags implied).
     | col2
    |}

    So, we'll get two different kinds of table contexts, which may be
    arbitrarily nested, but not mixed.

So, the question is, would the restrictions allow sufficient
backwards compliancy with the current parser?

Some other thoughts on parsing of HTML-like tags:

* <br [attributes]> Also allowed in the form <br [attributes/>.  No
</br> tag exists.

* <hr [attributes]> Same as <br> except that it also terminates inline
   text.

* <img [attributes]> Same as <br> except that it is enabled/disabled
   via a configuration option.

* <p [attributes]> Opening tag enables closing tag </p> and disables
   itself until the end of the current inlined text. <p> opens up a
   new paragraph, </p> closes the current inlined text.

* Inlined html elements.  These can be used for long term formatting.
   The context will make sure they are correctly nested, closed on end
   of inlined text and reopened at beginning of inlined text.  They are
   permanently closed at the corresponding end tag, or at end of
   article.  Variants:

   * non-nesting (that disables the start-tag when entered)/
     nesting (that adds to a nesting level when entered).

   * may be empty/may not be empty (empty instances will be ignored)

* Block html elements.  Start and end tags terminate inline text.
   (They may _not_ be nested inside paragraphs.). Inline text inside
<ol> and <ul> implies <li>, inlined text inside <dl> implies
<dd>,
   inline text inside <div> implies <p>, inline text inside <table>
   implies <tbody><tr><td>, <h1>-<h6> disables wikitext
block element
   tokens, in addition to all html block element tokens except the
   correspondig closing </hX> token.

* <pre> disables all html elements and all block elements (both wikitext
    and html block elements).

* <ins> and <del> will be inline if occuring inside inlined text.
   Otherwise block.

* <a> disables wikitext link tokens.

* Tag extensions are treated like <nowiki>; the contents are passed
   verbatim to the corresponding callback function.  The parser may be
   called recursively if the extension needs to parse wikitext.

Best Regards,

Andreas Jonsson