Le 11/04/12 09:27, Kim Eik a écrit :
I have created a patch for the gallery tag and have been given the following review.
https://gerrit.wikimedia.org/r/4609
- JavaScript injection: you can inject javascript: URIs which execute
code when clicked
- plain links ("link=Firefox") are taken as relative URLs which will
randomly work or not work depending on where they're viewed from
<snip>
What would be the recommended way of stripping away javascript from uris? Are there any shared functions which do exactly this? And how would i solve the plain links problem? do a regex check for an absolute uri? e.g http://example.org/foo/bar?
I have added some inline comment on includes/parser/Parser.php patch #7
https://gerrit.wikimedia.org/r/#patch,unified,4609,7,includes/parser/Parser....
Copy pasting it here for later reference:
---------------------------------------------------------------------- const EXT_URL_REGEX = '/^(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+ )?@)?([\d\w][-\d\w]{0,253}[\d\w].)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(?(&? ([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
We would need a parser guru to find out a similar and simpler regex. Anyway you will find hints in includes/parser/Parser.php wfUrlProtocols() gives a regex of protocols allowed in URLs.
Parser::EXT_LINK_URL_CLASS is a regex of character allowed and of those disallowed. That makes sure you find out the end of the URL with various funny case such as 0+3000 which is an ideographic space and is used on Chinese wikis.
Since what you are trying to achieve is really similar to the 'link' parameter handling in parser::makeImage() . Some relevant code:
case 'link': $chars = self::EXT_LINK_URL_CLASS; $prots = $this->mUrlProtocols; // which is wfUrlProtocols() if ( preg_match( "/^($prots)$chars+$/u", $value, $m ) ) { $paramName = 'link-url'; $this->mOutput->addExternalLink( $value ); if ( $this->mOptions->getExternalLinkTarget() ) { $params[$type]['link-target'] = $this->mOptions->getExternalLinkTarget(); } Well you get the idea :-) ----------------------------------------------------------------------
Reading my text again I should have reread myself before saving that comment. Anyway, I am pretty sure we can factor out the code handling 'link' for image and what you are trying to do.