I would like to be able to put arbitrary text into a
hover box (aka, HTML title tag), to give the user a
short precis of where a link will go. Looking at
http://meta.wikimedia.org/wiki/Help:Link, I don't see
any way to do this.
If I'm missing something, please let me know. If not,
perhaps we should discuss the possible syntax for this.'
-r
--
http://www.cfcl.com/rdm Rich Morin
http://www.cfcl.com/rdm/resume rdm(a)cfcl.com
http://www.cfcl.com/rdm/weblog +1 650-873-7841
Technical editing and writing, programming, and web development
An automated run of parserTests.php showed the following failures:
Running test Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)... FAILED!
Running test Magic Word: {{NUMBEROFFILES}}... FAILED!
Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED!
Passed 300 of 303 tests (99.01%) FAILED!
Moin,
it occured to me recently that there is a file leak in any extension that
creates external files, like graphviz, graph, or possible even math
(creating PNGs).
It works like that:
Article A contains one <graph> object, called AA
Article B contains two <graph> objects, called BA and BA
When you edit article A, the following happens:
* the new graph code is sent to the extension
* it is hashed
* the hash is the filename that will be used to generate the file,
lets call it "ABCD" for now
* file AB/CD/ABCD is generated, and included in the output
(The hashes are done for two reasons: to save a file if two articles
contain the same text, and to convienently generate short, unique file
names)
Likewise for article B, except that BCDE and BCDF are the hashes, so we
get BC/DE/BCDE and BC/DE/BCDEF as files.
No problem so far, but now what happens if you edit file A again and
change something: A new hash results, like ABXY. This results in the file
AB/XY/ABXY generated.
Note that the file ABCD was never cleaned off. In fact, it is impossible
for the current scheme to clean it up for the following reasons:
* ABCDE could be as well used by page B, since only the content go into
the hash, not the article name. Deleting the file should only be done if
it is not used from any other article. (if the file ever vanishes, a
null-edit is nec. to re-generate it!)
* the extension doesn't even get to know the old text, or the filenames
used on the page, so it can't simple know which file to potentional to
delete
The end effect is that the file cache gets bigger and bigger, and there is
no easy way to clean ununsed files out of it.
Here are a few ideas how to deal with that:
* peridically clean off al files until you are left with X files. (there
is at least on extension already doing this). This does not work, since
the deletion cannot guarantue that the files left over are really used,
and the files delete are no longer used. It's an ugly hack and creates
more problems than it solves.
* Somehow we could track of all filenames used on all articles. Just
think of article B;
first edit creates two entries in the table under "B"
second edit:
* first time extensio runs, it cleans table "B", and adds
new hash
* second run cleans table again, and adds a new HASH
The problem here is that the exention cannot decide which text to
convert is the first on the page (and thus when to clean the table)
* Various other schemes that gen. the hash based on the article name plus
per-article unique ID (potentially given by the user creating the text
ala <graph id="1">). These also require somehow a real big table listing
which files are all used.
The last idea I had are data-urls. These allow emebedding the content
inline, instead linked via a file: http://en.wikipedia.org/wiki/Data:_URL
This would work beautyfull, except for a few bits:
* we would lose the savings that if article A and B contain the same text,
it would no embedded twice.
* the data is in mysql, not on the file system
* it is not supported by IE at all (bummer :-(
* Opera apparently only supports these up to 4K,which is way to little for
being practically usefull :-(
Anyway, the problem needs to be solved, even my testwikie which contains
only 3 SVG graphs already accumulated thousand little files in
images/graph due to the many edits done on these three articles.
Best wishes,
Tels
--
Signed on Fri Apr 7 10:53:56 2006 with key 0x93B84C15.
Visit my photo gallery at http://bloodgate.com/photos/
PGP key on http://bloodgate.com/tels.asc or per email.
"Call me Justin, Justin Case."
On 06/04/06, Rob Church wrote:
> > I created {{TALKSPACE}} and its counterpart {{ARTICLESPACE}} after
> > requesting the equivalent variables several times on this list.
> >
> > I would be more than happy to see them replaced with the equivalent
> > variables now that they seem to have proven their utility.
>
> I'm quite willing to add these magic words, but there is one question
> which needs answering before this happens; what should be done with a
> {{TALKSPACE}} or similar tag if the current namespace doesn't *have* a
> talk namespace?
How about returning the string:
-----------------------
<p><strong class="error">Error: Talkspace of namespace:123 does not
exist.</strong></p>
-----------------------
I.e. Fail fast, and fail loud.
There is some precedent for this - consider the wiki text:
-----------------------
<math>\\&2/x[]</math>
-----------------------
Which currently returns this HTML:
-----------------------
<p><strong class="error">Failed to parse (syntax error):
\\&2/x[]</strong></p>
-----------------------
All the best,
Nick.
An automated run of parserTests.php showed the following failures:
Running test Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)... FAILED!
Running test Magic Word: {{NUMBEROFFILES}}... FAILED!
Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED!
Passed 298 of 301 tests (99%) FAILED!
Aim
===
To see if it's possible to execute arbitrary client-side JavaScript,
using MediaWiki as the delivery system.
Background
==========
To execute arbitrary JavaScript, we mostly need to find a way to get
MediaWiki to allow us to open a tag ("<script"), and to close a tag
(">"). In particular, we need some form of wiki input text that will
produce this as the rendered HTML output.
The MediaWiki parser seems to try to prevent both of these things, by
escaping "<" and ">" characters. For example, if you give it wiki
input like this:
---------------------------------------------------------
<"hello world">
---------------------------------------------------------
Then you get back this HTML:
---------------------------------------------------------
<"hello world">
---------------------------------------------------------
However, the parser is not perfect. There are some inputs that will
give unescaped ">" or "<" back.
The trick is probably to combine these omissions together, in such a
way so as to produce a working exploit. I don't have such a thing yet,
but I suspect it might be possible.
Unescaped Closing tags
======================
Getting the parser to give unescaped closing tags is much easier than
finding unescaped opening tags.
For example, a wiki input of just this:
---------------------------------------------------------
>>>>>>>
---------------------------------------------------------
Will give this HTML output:
---------------------------------------------------------
>>>>>>>
---------------------------------------------------------
(i.e. no escaping).
However, if we try to open one or more tags beforehand, then that
changes. So this wiki input:
---------------------------------------------------------
<
>>>>>>>
---------------------------------------------------------
Gives this HTML output:
---------------------------------------------------------
<
>>>>>>>
---------------------------------------------------------
So in other words, we can do this (close a tag and provide some
JavaScript to be executed), provided we don't use the "<" character:
---------------------------------------------------------
onmouseover="alert(document.cookie)">test
---------------------------------------------------------
... and we will get that literal text back in the HTML.
So to sum up: Any time after we use a "<" character, we lose this
privilege of having unescaped ">" characters.
To me, this feels like it might perhaps be a mistake, because it
allows an attacker an opening that they probably don't need to be
given (i.e. it's a free kick).
Unescaped Opening tags
======================
Almost all uses of "<" in the wiki input will result in "<" in the
HTML output.
However there are some uses that do not. In particular, I've found
that table properties are very weakly restricted, and we can get the
Parser to produce unescaped "<" characters with each of the following
3 inputs:
---------------------------------------------------------
{| WIDTH=[[image:ftp://~
{| ALIGN='''~~~</math>
{| BGCOLOR=<span style="font-weight: bold;">
---------------------------------------------------------
Which will give this HTML output:
---------------------------------------------------------
<table width="[[image:<a" class="external free" title="ftp://~">
<table align="<b><!--LINK">
<table bgcolor="<span">
---------------------------------------------------------
Some observations / problems here:
1) The unescaped "<" characters are in attribute strings. We need
somehow to avoid that, or break out of that, if the browser is to obey
them.
2) The type of the tags is limited ("<a>", "<!--" and "<span>" tags in
the above examples).
3) The final two examples that use "<" will mean that we cannot close
the tag (because, as described above, by using the "<" character we
lose the privilege of having close tags).
However we can avoid problem 3) with this, which never uses a "<" character:
---------------------------------------------------------
{| WIDTH=[[image:ftp://~
onmouseover="alert(document.cookie)">test
---------------------------------------------------------
Which will give this HTML output:
---------------------------------------------------------
<table width="[[image:<a" class="external free" title="ftp://~">
onmouseover="alert(document.cookie)">test
---------------------------------------------------------
... however this still has problems 1) and 2) described above.
Problem 2) may not necessarily be a showstopper (e.g. HTML like: '<a
href="#" onmouseover="alert(document.cookie);">Free Porn!</a>' is not
as powerful as something that successfully uses "onLoad", but it is
predictable that it will work a reasonable percentage of the time).
On the other hand, problem 1) is currently a huge restriction.
Conclusions
===========
* If anyone knows of a way of overcoming problems 1) and 2), or of an
alternate method, then please let me know. By combining that
information with the information above, it may well be possible to
create a working Proof-Of-Concept.
* Why does MediaWiki ever allow unescaped ">" characters? This
behaviour seem to increase the chances of a JavaScript security
problem.
All the best,
Nick.
> There are two conditional functions and a mathematical expression function.
Cool! Dibs on [[:en:Template:Factorial]] :
------------------------
{{if: {{{1}}}=1 | 1 | {{expr: {{{1}}} * {{{{PAGENAME}} | {{expr:
{{{1}}} - 1}} }} }} }}
------------------------
... or something like that.
But why stop there? There are probably many mathematical constructs
that can be implemented now (think how many functions there are in
Excel or OpenOffice's Calc just waiting to be implemented) ... A whole
world of recursive mathematical functions awaits! :-)
Then the message we leave for bored schoolkids (
http://en.wikipedia.org/wiki/Template:Test ) can be updated to include
this: "Did you know you can do some of your maths homework using the
Wikipedia? Click to find out how".
But why stop there? Coupled with a big enough library of mathematical
constructs, you can stop thinking of MediaWiki as being primarily
about collaborative text editing, and start to think of it also as a
remote sandboxed interpreter / code-executor. You could potentially,
for example, write a client that does protein-folding calculations or
crunches SETI data using the Wikipedia as the CPU. Of course, it'd be
wildly impractical (massively slow, prone to failure due to network
problems or site outages, etc), but it might hold a certain "just to
see if it can be done" type of attraction.
> The supported operators (roughly in order of precedence) are:
In all seriousness, if you're going to have "=", you might as well
have the "<" and ">" and "<=" and ">=" operators too (someone is bound
to ask for them).
All the best,
Nick.
> And using semantic markup in a wrong way is even worse. There is no way
> to distinguish semantic and visual markup use of ''...''
When I started here, they were distinguished by putting semantic
emphasis in ''double quotes'' and visual italics in <i>italic
tags</i>.
> Someone would probably change them to ''' or '' respectively, assuming
> ignorance on the part of the original editor. Hell, I would :)
Sure, but lots of people misuse other markup, too. We fix their
markup instead of changing the functionality of the software to cater
to them.
> I'm glad you did, otherwise you and I could have been caught in an
> infinite loop :)
:-) Well, before the change in rendering, your changes would have
been bad and mine good. Now they're the other way around.
> I don't know about "misuse". We have a strong need for italics, and ''
> is a very convenient way of getting them.
Yeah.
> :Indentation like this? Sure, that's what the colon in Wiki markup is
> designed for, as far as I know. I don't think there's even a better
> way of doing that?
Nope. That's the markup for a dictionary definition:
; Word : definition
; Word 2 : definition 2
; Word 3 : long definition 3
: more of the long definition
: even more
; Word 4
: definition 4
It is commonly used for indentation, and has a similar visual effect,
but indentation would actually be something like this:
<p>Normal paragraph</p>
<p style="margin-left:2em;">Indented paragraph</p>
<p>Normal paragraph</p>
On 06/04/06, Steve Bennett <stevage(a)gmail.com> wrote:
> Ok, so what are some real world examples where this template would be
> used? What's an example of using that example in a template which
> doesn't have an accompanying talkspace? Who is likely to be see the
> end result first?
The first example which comes to mind is use within the MediaWiki
namespace, which, when rendered could do all sorts of interesting
things. This is theoretical, of course, since I'm not sure users would
be stupid enough to use these magic words in messages which would
cause problems.
Nevertheless; just because one lot of users won't, doesn't mean
another lot won't. It could happen, and we should decide what to do in
those cases.
Rob Church