On Sun, 28 Nov 2004 19:37:21 +0100, Rob Hooft rob@hooft.net wrote:
Petr Kadlec wrote: Indeed it is better to avoid the %XX codes in interwiki links. A reasonably good alternative is formed by using &; and &#; entities. Those are independent of the encoding. The pywikipediabot will take this route for all links that can not be expressed natively, and the interwiki bot will automatically convert all %XX links automatically upon passing (but only if other updates are needed to the page).
Yes, this is sensible, but it doesn't avoid the problem described here - the actual *URL* will still include one encoding or the other, however cunningly the wiki-code is constructed. Neither http://cs.wikipedia.org/wiki/V%C3%A1clav_Havel nor http://en.wikipedia.org/wiki/V%C3%A1clav_Havel is an existing page, since HTML escaping doesn't belong in a URL. [Amusingly, if you click "article", it takes you to the right page, since the "&" hasn't been further escaped in the HTML]
OTOH, I can't actually get that particular example to break anyway :I can happily click the interwiki links between cs.wp and en.wp and the URL gets re-encoded back and forth just fine. I guess someone fixed it already. ?