I just thought that I'd point out that the random page bug is back again. I though I would wait until someone else reported it this time, but they haven't, so I'll report it again.
PLEASE could the URL rewriting feature be backed out. The URL rewriting bug is a much more severe bug than the '&' issue that the URL rewriting hack fixes -- the whole Wikipedia is currently unusable for readers, as well as most writers.
If a web spider such as Googlebot comes by when this bug is in effect, the whole index for Wikipedia on that search engine will be corrupted for as long as the search engine uses that set of indices: usually about a month.
This also gives another reason why different international Wikis should have the possibility of different front-end hosts: resiliency against software upgrade bugs through diversity. (Databases can still be shared over the network, something that will surely be the next step in scaling).
Regards,
Neil
I restarted apache.
Also, I made the following change in /apache/conf/httpd.conf
RewriteRule ^/wiki/(.*)$ /w/wiki.phtml?title=$1 [NE] # RewriteRule ^/wiki/(.*)$ /w/wiki.phtml?title=${urlencode:$1} [L]
That is, I reverted to the older, simpler form, on the theory that until the bug in urlencode is worked out, it's better to have a *few* broken links to things like /wiki/AT&T, rather than have the whole site freaked out.
There are at least two possible permanent solutions...
1. Fix it so the rewrite works 100% correctly. Since '&' is a legal character in a url, but not in a query string, the _idea_ here is not a bad one. Plus, it's nicer in a way for end users.
2. Ban '&' from article titles to simplify our lives. While it is true that '&' is _legal_ in a url, there's no reason we have to use it, other than usability for articles like 'AT&T'.
But in the meantime, let's not let the site keep breaking completely.
Jimmy Wales wrote:
broken links to things like /wiki/AT&T, rather than have the whole
Right now, http://www.wikipedia.org/wiki/AT%26T leads to "AT" but http://www.wikipedia.org/wiki/AT%2526T leads to "AT&T". This is weird. It seems to be decoded twice (%25 is "%").
Jimmy Wales wrote:
There are at least two possible permanent solutions...
- Fix it so the rewrite works 100% correctly. Since '&' is a legal
character in a url, but not in a query string, the _idea_ here is not a bad one. Plus, it's nicer in a way for end users.
I sent such a fix to wikitech-l yesterday, but since I believe LDC's the one who orignally set up Apache & co on the server I left it to him to recompile it with the same set of options plus the fix. If he doesn't do it soon, I may go ahead anyway.
- Ban '&' from article titles to simplify our lives. While it is true
that '&' is _legal_ in a url, there's no reason we have to use it, other than usability for articles like 'AT&T'.
That's cutting off your nose to spite your face.
-- brion vibber (brion @ pobox.com)
wikipedia-l@lists.wikimedia.org