The way the <nowiki> tag is currently implemented, any text inside the tag is basically stripped out, held to one side, and reinserted at the last minute. So this:
1: [[pipe.jpg|thumb|A <nowiki>|</nowiki> character]]
works because that stage of the parser never even sees the | character, and it reappears magically after the text has been turned into <div...><img...></div> (I think).
However, what it actually does in a given context is actually hard to pin down. This doesn't work, for example:
2: [[Image:<nowiki>foo</nowiki>.jpg]] - the whole thing is rendered literally.
I was thinking perhaps it could be redefined thus:
"Text surrounded by a <nowiki> block is treated as a literal sequence of characters with no special meaning ascribed to any character other than its literal representation. A nowiki block is a token separator, not whitespace."
That would mean example 2 above would render as if the nowiki tags weren't there.
This would also work:
3: [[Image:foo.jpg|thumb|<nowiki>left</nowiki>]] (caption: "left")
This would be a tricky case:
4: <nowiki> <script badeviljavacode here> </nowiki>
This would render literally, because of the "token separator" aspect:
5: [<nowiki></nowiki>[not a link]].
It would be technically possible to link to pages with bad characters in their names:
6: [[E<nowiki>|</nowiki>eet]]
Would any existing wikitext be broken by this redefinition? I'm not really trying to change the meaning of nowiki, I'm trying to set it down in words, given that the existing definition ("stuff gets stripped out, then replaced at various times") is not really implementable.
Steve
On 11/22/07, Thomas Dalton thomas.dalton@gmail.com wrote:
All we need to do to fix that is to escape the contents of nowiki tags. I can't see that breaking anything.
Heh. Define "escape". It's not that simple.
In a C program, there is exactly one place where you can "escape" anything: in a string literal. This doesn't work: void m\ain (void)...
In wikitext, it could be *anywhere*.
Steve
----- Original Message ----- From: "Steve Bennett" stevagewp@gmail.com To: "Wikitext-l" wikitext-l@lists.wikimedia.org Sent: 22 November 2007 01:44 Subject: Re: [Wikitext-l] How to define <nowiki>?
On 11/22/07, Thomas Dalton thomas.dalton@gmail.com wrote:
All we need to do to fix that is to escape the contents of nowiki tags. I can't see that breaking anything.
Heh. Define "escape". It's not that simple.
In a C program, there is exactly one place where you can "escape" anything: in a string literal. This doesn't work: void m\ain (void)...
In wikitext, it could be *anywhere*.
I think he means pass the contents of <nowiki> through htmlspecialchars() before outputting.
PS - Can we get this list onto Gmane like the others?
- Mark Clements (HappyDog)
On 11/22/07, Mark Clements gmane@kennel17.co.uk wrote:
I think he means pass the contents of <nowiki> through htmlspecialchars() before outputting.
Yes, but all that is assuming the <nowiki> is not embedded inside anything else. How should all the following behave:
[[<nowiki>image</nowiki>:foo.jpg]] [[image:<nowiki>foo</nowiki>.jpg]] [[image:foo.jpg|<nowiki>right</nowiki>|thumb]] [http://foo.com<nowiki></nowiki>bloot] (this one actually works currently, so my "token separator, not whitespace" comment may be wrong) [<nowiki>http</nowiki>://foo.com] <nowiki><b></nowiki>bold</b> <b <nowiki>style="bloot;"</nowiki>
etc etc.
Steve
On 22/11/2007, Steve Bennett stevagewp@gmail.com wrote:
On 11/22/07, Mark Clements gmane@kennel17.co.uk wrote:
I think he means pass the contents of <nowiki> through htmlspecialchars() before outputting.
Yes, but all that is assuming the <nowiki> is not embedded inside anything else.
No, it doesn't. You just replace "<" and ">" with "<" and ">". You don't need to escape the code inside, just escape the <script> tags and you're sorted.
On 11/22/07, Thomas Dalton thomas.dalton@gmail.com wrote:
No, it doesn't. You just replace "<" and ">" with "<" and ">". You don't need to escape the code inside, just escape the <script> tags and you're sorted.
Nope.
[[Image:foo.jpg|my caption has a <nowiki>|</nowiki> pipe inside it.]]
It's ok, I think I have a basic feel for what it has to do. It'll be a while before the edge cases start to matter.
I've just about implemented ====headings==== . They were a lot harder than I expected. But some of the code is so pretty...
header6: '=' '=' '=' '=' '=' '=' '='* simple_text '='* '=' '=' '=' '=' '=' '='; header5: '=' '=' '=' '=' '=' '='* simple_text '='* '=' '=' '=' '=' '='; header4: '=' '=' '=' '=' '='* simple_text '='* '=' '=' '=' '='; header3: '=' '=' '=' '='* simple_text '='* '=' '=' '='; header2: '=' '=' '='* simple_text '='* '=' '=' ; ...
Steve
On 22/11/2007, Steve Bennett stevagewp@gmail.com wrote:
On 11/22/07, Thomas Dalton thomas.dalton@gmail.com wrote:
No, it doesn't. You just replace "<" and ">" with "<" and ">". You don't need to escape the code inside, just escape the <script> tags and you're sorted.
Nope.
[[Image:foo.jpg|my caption has a <nowiki>|</nowiki> pipe inside it.]]
I was only referring to the one problem in your list that I replied to - the one regarding script injections.
On 11/22/07, Thomas Dalton thomas.dalton@gmail.com wrote: ;> I was only referring to the one problem in your list that I replied to
- the one regarding script injections.
Oh, sorry, not paying enough attention.
Come to think of it though:
4: <nowiki> <script badeviljavacode here> </nowiki>
All we need to do to fix that is to escape the contents of nowiki tags
Yes, I guess you're right. The definition of nowiki becomes:
"Text surrounded by a <nowiki> block is treated as a literal sequence of characters with no special meaning ascribed to any character other than its literal representation. The characters <, > and & are escaped as < > and & respectively. A nowiki block is form of whitespace."
Translating < to < does hold up with the "treated as a literal sequence of characters" statement.
Thanks :)
Steve
Steve Bennett wrote:
On 11/22/07, Mark Clements gmane@kennel17.co.uk wrote:
I think he means pass the contents of <nowiki> through htmlspecialchars() before outputting.
Yes, but all that is assuming the <nowiki> is not embedded inside anything else. How should all the following behave:
IMHO <nowiki> means that its content loses its "magicness": it must appear literal (for the user, not in the page source).
So the following would produce links:
[[<nowiki>image</nowiki>:foo.jpg]] [[image:<nowiki>foo</nowiki>.jpg]] [http://foo.com<nowiki></nowiki>bloot] (this one actually works currently, so my "token separator, not whitespace" comment may be wrong)
As you're stripping "magic powers" to words which are not part of a spell. Note that nowiking the namespace can be controversial, as it's not clear if the power resides in the namespace or not. I think the magic is in the square brackets and it should still be rendered as usual. The other option would be to make it a normal link but it's better not to add such esoteric constructs without a need (but could be added if proved easier to be coded!).
However,
[[image:foo.jpg|<nowiki>right</nowiki>|thumb]]
has the text right as right loses its ability to align it.
[<nowiki>http</nowiki>://foo.com]
This doesn't match the URL format so would render literally. If they were double square brackets, it'd make an internal link, as opposed to normal [<a href="http://foo.com">[1]</a>] without <nowiki>
<nowiki><b></nowiki>bold</b>
We want a literal <b>, so output is <b>bold + an unmatching bold closing tag to ignore/produce a warning.
<b <nowiki>style="bloot;"</nowiki>
You call tag <b> with parameter 'style="bloot;"'. As tags don't get parameters separated by spaces, it's ignored(warning).
etc etc.
Steve
wikitext-l@lists.wikimedia.org