I have created a patch for the gallery tag and have been given the following review.
https://gerrit.wikimedia.org/r/4609
* JavaScript injection: you can inject javascript: URIs which execute code when clicked * plain links ("link=Firefox") are taken as relative URLs which will randomly work or not work depending on where they're viewed from * need parser test cases to demo it working
So my questions are:
What would be the recommended way of stripping away javascript from uris? Are there any shared functions which do exactly this? And how would i solve the plain links problem? do a regex check for an absolute uri? e.g http://example.org/foo/bar? And what is "parser test cases", phpunit tests? or some other form of testing?
Thank you! Kim Eik.
On 04/11/2012 03:27 AM, Kim Eik wrote:
I have created a patch for the gallery tag and have been given the following review.
https://gerrit.wikimedia.org/r/4609
- JavaScript injection: you can inject javascript: URIs which execute
code when clicked
- plain links ("link=Firefox") are taken as relative URLs which will
randomly work or not work depending on where they're viewed from
- need parser test cases to demo it working
So my questions are:
What would be the recommended way of stripping away javascript from uris? Are there any shared functions which do exactly this? And how would i solve the plain links problem? do a regex check for an absolute uri? e.g http://example.org/foo/bar? And what is "parser test cases", phpunit tests? or some other form of testing?
Thank you! Kim Eik.
Hi, Kim, and thanks for the patch!
I can answer your last question: https://www.mediawiki.org/wiki/Parser_tests You might also want to skim https://www.mediawiki.org/wiki/Testing_portal and https://www.mediawiki.org/wiki/Manual:MediaWiki_architecture
Thanks!
On Wed, Apr 11, 2012 at 12:27 AM, Kim Eik kim@heldig.org wrote:
I have created a patch for the gallery tag and have been given the following review.
https://gerrit.wikimedia.org/r/4609
- JavaScript injection: you can inject javascript: URIs which execute
code when clicked
- plain links ("link=Firefox") are taken as relative URLs which will
randomly work or not work depending on where they're viewed from
- need parser test cases to demo it working
So my questions are:
What would be the recommended way of stripping away javascript from uris? Are there any shared functions which do exactly this?
You should check to see how the 'link' parameter is handled on standalone images.
In Parser::makeImage() look for the "case 'link':"; it uses existing regexes to check if the link matches allowed URL schemes, and if not tries to treat it as a page title.
And how would i solve the plain links problem? do a regex check for an
absolute uri? e.g http://example.org/foo/bar? And what is "parser test cases", phpunit tests? or some other form of testing?
Parser test cases live in tests/parser/parserTests.txt, and can be run both through the phpunit test suite and through the standalone parserTests.php -- so a parser test failure should trigger a Jenkins test failure.
Each test case specifies input wikitext and output HTML, to confirm that things operate as expected.
-- brion
Le 11/04/12 09:27, Kim Eik a écrit :
I have created a patch for the gallery tag and have been given the following review.
https://gerrit.wikimedia.org/r/4609
- JavaScript injection: you can inject javascript: URIs which execute
code when clicked
- plain links ("link=Firefox") are taken as relative URLs which will
randomly work or not work depending on where they're viewed from
<snip>
What would be the recommended way of stripping away javascript from uris? Are there any shared functions which do exactly this? And how would i solve the plain links problem? do a regex check for an absolute uri? e.g http://example.org/foo/bar?
I have added some inline comment on includes/parser/Parser.php patch #7
https://gerrit.wikimedia.org/r/#patch,unified,4609,7,includes/parser/Parser....
Copy pasting it here for later reference:
---------------------------------------------------------------------- const EXT_URL_REGEX = '/^(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+ )?@)?([\d\w][-\d\w]{0,253}[\d\w].)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(?(&? ([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
We would need a parser guru to find out a similar and simpler regex. Anyway you will find hints in includes/parser/Parser.php wfUrlProtocols() gives a regex of protocols allowed in URLs.
Parser::EXT_LINK_URL_CLASS is a regex of character allowed and of those disallowed. That makes sure you find out the end of the URL with various funny case such as 0+3000 which is an ideographic space and is used on Chinese wikis.
Since what you are trying to achieve is really similar to the 'link' parameter handling in parser::makeImage() . Some relevant code:
case 'link': $chars = self::EXT_LINK_URL_CLASS; $prots = $this->mUrlProtocols; // which is wfUrlProtocols() if ( preg_match( "/^($prots)$chars+$/u", $value, $m ) ) { $paramName = 'link-url'; $this->mOutput->addExternalLink( $value ); if ( $this->mOptions->getExternalLinkTarget() ) { $params[$type]['link-target'] = $this->mOptions->getExternalLinkTarget(); } Well you get the idea :-) ----------------------------------------------------------------------
Reading my text again I should have reread myself before saving that comment. Anyway, I am pretty sure we can factor out the code handling 'link' for image and what you are trying to do.
On Apr 11, 2012 11:01 PM, "Antoine Musso" hashar+wmf@free.fr wrote:
const EXT_URL_REGEX = '/^(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+
)?@)?([\d\w][-\d\w]{0,253}[\d\w].)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(?(&?
([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
ZOMG. Anyway, what I'd do if I had a MediaWiki clone handy is look through Sanitizer.php to see if there's anything in there that handles URLs.
Roan
On Wed, 11 Apr 2012 23:33:53 -0700, Roan Kattouw roan.kattouw@gmail.com wrote:
On Apr 11, 2012 11:01 PM, "Antoine Musso" hashar+wmf@free.fr wrote:
const EXT_URL_REGEX = '/^(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+
)?@)?([\d\w][-\d\w]{0,253}[\d\w].)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(?(&?
([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
ZOMG. Anyway, what I'd do if I had a MediaWiki clone handy is look through Sanitizer.php to see if there's anything in there that handles URLs.
Roan
There's always gitweb. For the de-jure standard repo viewer its urls and navigation is awful (though that's not git's fault, as you can see by some of the other repo viewers that do it better) but it works. All we've got in Sanitizer is href="" handling in attribute sanitization. https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=includes... https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=includes...
This case should really be handled by checking against wfUrlProtocols. And then anything that doesn't match gets sent thorough Title::newFromText. And anything that further causes Title to return null/false should be ignored.
Le 12/04/12 08:33, Roan Kattouw a écrit : <snip some long regex>
ZOMG. Anyway, what I'd do if I had a MediaWiki clone handy is look through Sanitizer.php to see if there's anything in there that handles URLs.
Parser.php already has some code when dealing with link= parameter for image tags. That is the code I copy pasted above :-D
My latest patch reuses regex already defined in Parser.php. please review @ https://gerrit.wikimedia.org/r/#change,4609
On Thu, Apr 12, 2012 at 10:43 AM, Antoine Musso hashar+wmf@free.fr wrote:
Le 12/04/12 08:33, Roan Kattouw a écrit :
<snip some long regex> > ZOMG. Anyway, what I'd do if I had a MediaWiki clone handy is look through > Sanitizer.php to see if there's anything in there that handles URLs.
Parser.php already has some code when dealing with link= parameter for image tags. That is the code I copy pasted above :-D
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org