I recently try to modernize an extension [1] to use the /_Html _/class and found a problem (at least for me) . Like to receive your comments, and tips.
In several cases, I had to use Htlm::rawElement (*) instead of the safer Html::element because of a nested <div> structure I want to generate like
<div id=outerdiv> outertext-with- -or-something-character
<div id=innerdiv> innertext </div>
</div>
Html::rawElement( 'div', array( 'some-outer-attributes' => 'some-outer-attribute-values'), $outertext . Html:element( 'div' array( 'some-inner-attributes' => 'some-inner-attribute-values'), $innertext
)
After having compared Html methods rawElement and Element, and after having asked around the #mediawiki I found that I have to escape the content manually and could/should use basically one of these two possibilities:
i) The #mediawiki recommended *htmlspecialchars*()
ii) Inside Html:element method I found * strtr( $contents, array(** ** // There's no point in escaping quotes, >, etc. in the contents of** ** // elements.** ** '&' => '&',** ** '<' => '<'** **)*
*Both *are not suited for my case, when $outertext has this " " character in it.
After looking around in class Html and class Xml I found, that some of the methods use $wgContLang->normalize( $string ), and this works for me, too. I put this is into a private wrapper function escapeContent() = *$wg**ContLang->normalize() (not shown here) *
Html::rawElement( 'div', array( 'some-outer-attributes' => 'some-outer-attribute-values'), * ***$wg**ContLang->normalize****( $outertext ) . Html:element( 'div' array( 'some-inner-attributes' => 'some-inner-attribute-values'), $innertext
)
I am however not happy with that approach, because I do not know, if it is correctly applied.
Therefore my questions to you:
1. Is my approach of applying Html class and using ->normalize() correct ? 2. What could I do better, perhaps should I apply a certain Sanitizer::method - or what else ? 3. Perhaps I am fully wrong, then please guide me to find a correct solution.
I will be available on #mediawiki during the evening hours (UTC+2; Wikinaut )
Have you tried seeing if changing the arguments to htmlspecialchars() will work? Note that htmlspecialchars() takes an argument $double_encode, e.g.,
htmlspecialchars( 'text', ENT_QUOTES, 'UTF-8', false );
When set to false, the function will not encode existing HTML entities in the text. More info: http://php.net/manual/en/function.htmlspecialchars.php
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Wed, Jun 5, 2013 at 2:47 PM, Thomas Gries mail@tgries.de wrote:
I recently try to modernize an extension [1] to use the /_Html _/class and found a problem (at least for me) . Like to receive your comments, and tips.
In several cases, I had to use Htlm::rawElement (*) instead of the safer Html::element because of a nested <div> structure I want to generate like
<div id=outerdiv> outertext-with- -or-something-character
<div id=innerdiv> innertext </div>
</div>
Html::rawElement( 'div', array( 'some-outer-attributes' => 'some-outer-attribute-values'), $outertext . Html:element( 'div' array( 'some-inner-attributes' => 'some-inner-attribute-values'), $innertext
)
After having compared Html methods rawElement and Element, and after having asked around the #mediawiki I found that I have to escape the content manually and could/should use basically one of these two possibilities:
i) The #mediawiki recommended *htmlspecialchars*()
ii) Inside Html:element method I found
strtr( $contents, array(** ** // There's no point in escaping quotes, >, etc. in the contents of** ** // elements.** ** '&' => '&',** ** '<' => '<'** **)*
*Both *are not suited for my case, when $outertext has this " " character in it.
After looking around in class Html and class Xml I found, that some of the methods use $wgContLang->normalize( $string ), and this works for me, too. I put this is into a private wrapper function escapeContent() = *$wg**ContLang->normalize() (not shown here)
Html::rawElement( 'div', array( 'some-outer-attributes' => 'some-outer-attribute-values'),
- ***$wg**ContLang->normalize****( $outertext ) . Html:element( 'div' array( 'some-inner-attributes' => 'some-inner-attribute-values'), $innertext
)
I am however not happy with that approach, because I do not know, if it is correctly applied.
Therefore my questions to you:
- Is my approach of applying Html class and using ->normalize()
correct ? 2. What could I do better, perhaps should I apply a certain Sanitizer::method - or what else ? 3. Perhaps I am fully wrong, then please guide me to find a correct solution.
I will be available on #mediawiki during the evening hours (UTC+2; Wikinaut )
[1] https://gerrit.wikimedia.org/r/#/c/67002/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
How about we just introduce a Html::escape to escape for escaping of text to include into html chunks (not attributes, just html; attributes should always be required to pass through Html::expandAttributes.
I'm not sure what the relevance of $wgContLang->normalize is. It's not applicable to html escaping. Xml.php just uses it to make sure that invalid UTF-8 is not outputted by normalizing the binary, likely so that XML processors won't choke.
Now for permitting a " " inside text this is done completely differently, and not done using htmlspecialchars. We have a method that does exactly this type of escaping called Sanitizer::escapeHtmlAllowEntities, however that method uses htmlspecialchars to do the escaping (and we can't be sure it would be safe to make it stop escaping quotes so we can't update it to use Html::escape later). However this method is really just a shortcut that makes calls to two methods. The proper way to permit " " inside text to be escaped is to use `Sanitizer::decodeCharReferences( $text );`. This will decode those character references converting them into real UTF-8 so they can be safely passed through. Then you just pass the text to the proper html escaping method -- which in the future would be Html::escape.
On 06/05/2013 02:47 PM, Thomas Gries wrote:
*Both *are not suited for my case, when $outertext has this " " character in it.
I'm not totally clear what you mean. Is the actual string like 'foo bar  '? If so, where is this string coming from? Ideally, this should not be in the string to begin with (escape at the last minute).
Or, do you mean there's an actual non-breaking space (what ' ' expands to) in the string? As far as I know, that doesn't need to be escaped.
After looking around in class Html and class Xml I found, that some of the methods use $wgContLang->normalize( $string ), and this works for me, too.
As noted, this is not correct. It cleans up Unicode-encoded text. It does not escape HTML.
If you really need to have the string ' ' in the input, and can't fix that, try Tyler's suggestion.
Matt Flaschen
wikitech-l@lists.wikimedia.org