Aryeh Gregor wrote:
On Thu, Sep 25, 2008 at 4:39 AM, Tei
<oscar.vives(a)gmail.com> wrote:
Reading the wikipedia html output, I have found
that EditPage.php
produce "+\" has the value for wpEditToken. This token seens
supposedly random, to stop spammers to fill the wikipedia with viagra
links. But It don't seems much random to me, on all computers I have
tested, it seems constant to "+\"
Is that a code bug, or maybe misconfiguration on the wikipedia guys?.
My recollection is that it was a way to detect edits that were passing
through certain broken proxies, which would silently corrupt the edit
form data. By adding some content to the edit token that these
proxies would corrupt as well, the edits would be rejected, while
others would be unaffected. Apparently "+\" will trigger this
particular bug in these particular proxies, so it will prevent
randomly screwing up pages in some cases. The source code/revision
log should have more info.
Yes, it's a kluge I added sometime last year, I think. The problem is
that there's a huge install base of broken "PHP proxy" scripts that
essentially pass all content through the PHP addslashes() function (or,
rather, they have magic_quotes_gpc enabled and don't use stripslashes()).
Trying to edit through such a proxy would, in particular, turn all
apostrophes in the page text into "\'" or even "\\\'". We
used to get
such edits with some regularity. Having a backslash in the edit token
prevents editing via such proxies, since they will mangle it in the same
way. As a nice bonus, it also happens to prevent some widespread spam-
and vandalbots from using those proxies to hide their trails.
(The reason I didn't include an actual apostrophe in the edit token was
that, at least at the time, in some parts of the code the edit token was
being embedded in hardcoded HTML without proper escaping, at least in
some cases with single quotes around it. I didn't feel like tracking
down and fixing all cases of that at the time (though it certainly
should be done if someone hasn't already), so I just used a backslash,
which breaks in the same way but has no special meaning in HTML.)
I'm not sure why the "+" is there; I think it protects against another
type of broken proxy that turns plus signs into spaces, presumably due
to improper URL-decoding of form values.
It's indeed a hack, but it does work remarkably well for all its
simplicity. It'd probably deserve to be documented better, though, so
that others won't have to ask the same question as you did.
--
Ilmari Karonen