On Wed, 11 Apr 2012 23:33:53 -0700, Roan Kattouw roan.kattouw@gmail.com wrote:
On Apr 11, 2012 11:01 PM, "Antoine Musso" hashar+wmf@free.fr wrote:
const EXT_URL_REGEX = '/^(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+
)?@)?([\d\w][-\d\w]{0,253}[\d\w].)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(?(&?
([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
ZOMG. Anyway, what I'd do if I had a MediaWiki clone handy is look through Sanitizer.php to see if there's anything in there that handles URLs.
Roan
There's always gitweb. For the de-jure standard repo viewer its urls and navigation is awful (though that's not git's fault, as you can see by some of the other repo viewers that do it better) but it works. All we've got in Sanitizer is href="" handling in attribute sanitization. https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=includes... https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=includes...
This case should really be handled by checking against wfUrlProtocols. And then anything that doesn't match gets sent thorough Title::newFromText. And anything that further causes Title to return null/false should be ignored.