Re: [Wikitext-l] On HTML elements in wikitext

18 Aug 2010

      Hi Daniel,
2010-08-18 09:24, Daniel Kinzler skrev:
...
Andreas Jonsson schrieb:
...
Mixing HTML elements with wikitext is a grey area.  How the HTML tags
in the wikitext interact with the wikitext elements does not seem very
well defined.  Therefore, I will make up some rules
where I try to preserve any legitimate use of html elements, but with
some restrictions to avoid some problems:

Do not allow html block elements inside wikitext lists.  For examle
  this is no longer allowed:

item1<li>  item2

What does "not allowed" mean, exactly? What happens if the user enters this? As
by the old mantra, any text is valid wikitext.
I mean that the character sequence "<li>" will not be a token in this 
context.  (It will become three tokens: SPECIAL[<], WORD[li] and 
SPECIAL[>], which should eventually be rendered as &lt;li&gt; by a html 
rendering client.)  Of course, the lexical scanner will accept any 
sequence of characters.
...
So, I think it would make more sense to say that html block elements *terminate*
wikitext lists.
That would be a reasonable alternative.  But I think that it is better 
to disable html block element tokens, because I don't think that it is a 
useful feature to make it possible to terminate a list item with 
anything but a newline or end of file.  I think that it would just more 
confusing for the users.
...
...

Do not allow table html tags inside wikitext tables, unless opened
  up by a nested html table, which disables wikitext table tokens
  until the html table is properly closed:

....
...
 So, we'll get two different kinds of table contexts, which may be
 arbitrarily nested, but not mixed.

As long as arbitrary nesting is supported, I'm all for it! Mixing html and wiki
syntax for table elements leads to a mess with the current parser anyway.
...
*<img [attributes]>  Same as<br>  except that it is enabled/disabled
    via a configuration option.
Additional restrictions may be imposed on any attribute that contains a URL.
At the moment I'm working on the lexer, which will attach a list of 
whatever looks like attributes to the corresponsing token.  Filtering 
the attribute list will be performed at a higher level.  As I understand 
it, the attribute list will never affect wether an opening tag should be 
treated as a token or not.  This is still a br tag:
<br complete *)()()(UF*(*garbage/>
...
...
*<p [attributes]>  Opening tag enables closing tag</p>  and disables
    itself until the end of the current inlined text.<p>  opens up a
    new paragraph,</p>  closes the current inlined text.
Not sure<p>  should be disabled after<p>. Most browsers treat<p>...<p>  as
<p>...</p><p>. That makes more sense, I think.
That's MediaWiki's current behaviour.
<p> foo <p> foo </p>
is rendered as
<p> foo &lt;p&gt; foo </p>
...
...

Inlined html elements.  These can be used for long term formatting.
  The context will make sure they are correctly nested, closed on end
  of inlined text and reopened at beginning of inlined text.  They are
  permanently closed at the corresponding end tag, or at end of
  article.  Variants:

Do we really want inline formatting to span across blocks? I find that very
quircky. I think the format should simply end at the end of the block, that's
it. Interleaved markup is evil.
That's MediaWiki's current behaviour.
...
...

Block html elements.  Start and end tags terminate inline text.
  (They may _not_ be nested inside paragraphs.).

That is: they *terminate* paragraphs.
Inline text is not necessarily contained in a paragraph.
...
...
Inline text inside
<ol>  and<ul>  implies<li>, inlined text inside<dl>  implies<dd>,
fine
...
inline text inside<div>  implies<p>,

err. whot? no!<p>  usually implies margins/padding. if i use<div>foo</div>, i
generally do not want any margins/padding!
Sorry, sometimes I confuse html with DocBook, where all inline text must 
stand inside <para> tags.
...
...
inline text inside<table>
    implies<tbody><tr><td>,<h1>-<h6>  disables wikitext block element
    tokens, in addition to all html block element tokens except the
    correspondig closing</hX>  token.
What exactly does "disable mean here? Do they get stripped? or displayed verbatim?
The corresponding tokens are disabled in the lexical scanner.
...
...
*<pre>  disables all html elements and all block elements (both wikitext
     and html block elements).
<pre>  should disable *all* markup except</pre>. It's actually a lot like<nowiki>.
Lines starting with blanks (please include tabs here!), in contrast, become
pre-formatted, but still allow inline formatting, auto-linking URLs, etc.
Thanks, I had missed that.  I just assumed that they were equivalent.  
It seems that block html elements in an indentet line takes precedence:
Preformatted text? <li> No!
Rendered as:
Preformatted text?
<li> No! </li>
That'll require an extra lookahead on all indented lines. *sigh*
...
...
*<ins>  and<del>  will be inline if occuring inside inlined text.
    Otherwise block.
*<a>  disables wikitext link tokens.
<a>  is not allowed at the moment. I once tried to add support for it, but got
reverted for technical reasons. We might add it to support RFDa (semantic
relations).
You are right, it isn't.  Yippihe! :-)
...
...

Tag extensions are treated like<nowiki>; the contents are passed
  verbatim to the corresponding callback function.  The parser may be
  called recursively if the extension needs to parse wikitext.

Please note that the HTML returned from tag extensions is, at the moment, *not*
passed verbatim, though  it very likely should. See bug 1319, compare bug 12974.
I haven't analysed the tag extensions completely yet.  But I assume that 
the content isn't touched by the parser, and that if the extension wants 
anything inserted into the output stream, it must call the parser 
recursively.
...
Thanks for your great work!
-- daniel

Wikitext-l mailing list
Wikitext-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] On HTML elements in wikitext