Aryeh Gregor wrote:
On Thu, Sep 25, 2008 at 4:39 AM, Tei oscar.vives@gmail.com wrote:
Reading the wikipedia html output, I have found that EditPage.php produce "+" has the value for wpEditToken. This token seens supposedly random, to stop spammers to fill the wikipedia with viagra links. But It don't seems much random to me, on all computers I have tested, it seems constant to "+"
Is that a code bug, or maybe misconfiguration on the wikipedia guys?.
My recollection is that it was a way to detect edits that were passing through certain broken proxies, which would silently corrupt the edit form data. By adding some content to the edit token that these proxies would corrupt as well, the edits would be rejected, while others would be unaffected. Apparently "+" will trigger this particular bug in these particular proxies, so it will prevent randomly screwing up pages in some cases. The source code/revision log should have more info.
Yes, it's a kluge I added sometime last year, I think. The problem is that there's a huge install base of broken "PHP proxy" scripts that essentially pass all content through the PHP addslashes() function (or, rather, they have magic_quotes_gpc enabled and don't use stripslashes()).
Trying to edit through such a proxy would, in particular, turn all apostrophes in the page text into "'" or even "\'". We used to get such edits with some regularity. Having a backslash in the edit token prevents editing via such proxies, since they will mangle it in the same way. As a nice bonus, it also happens to prevent some widespread spam- and vandalbots from using those proxies to hide their trails.
(The reason I didn't include an actual apostrophe in the edit token was that, at least at the time, in some parts of the code the edit token was being embedded in hardcoded HTML without proper escaping, at least in some cases with single quotes around it. I didn't feel like tracking down and fixing all cases of that at the time (though it certainly should be done if someone hasn't already), so I just used a backslash, which breaks in the same way but has no special meaning in HTML.)
I'm not sure why the "+" is there; I think it protects against another type of broken proxy that turns plus signs into spaces, presumably due to improper URL-decoding of form values.
It's indeed a hack, but it does work remarkably well for all its simplicity. It'd probably deserve to be documented better, though, so that others won't have to ask the same question as you did.