On Thu, Jul 31, 2008 at 11:10 PM, Daniel Friesen dan_the_man@telus.net wrote:
We're escaping for content, not escaping for attributes (attribute escaping should be handled by different code). So does anyone remember the parameters of htmlspecialchars? http://ca.php.net/htmlspecialchars
string **htmlspecialchars** ( string $string [, int $quote_style [, string $charset [, bool $double_encode ]]] ) ($charset since 4.1.0; $double_encode since 5.2.3)
You know that you can use: $text = htmlspecialchars( $text, ENT_NOQUOTES );
And the quotes won't be encoded.
Yes, but something like
html > body { color: red; }
will still break. You miss the point, I think. *Nothing* should be encoded inside <script> or <style>, if you want to remain compatible with HTML.
Though personally... When I make a sanitizer I go for what it's meant to do. Thing like my cleanHtml are meant to make things safe, not escaping of things.
They're meant to make things not just safe but valid. This requires escaping everything that has a special meaning.
So on that, my sanitizers only convert < and > into < and > they don't do any other encoding, and they don't double encode the entities for <>. Cause the point is to make the syntax so that it won't be considered evil html. And only <> needs to be escaped for that purpose.
Quotes also need to be escaped if there's any possibility you'd be in an attribute. And & must always be escaped for normal HTML output if you want to ensure validity, which we do.