On Wed, 29 May 2013 18:33:03 -0700, Tyler Romeo tylerromeo@gmail.com wrote:
On Wed, May 29, 2013 at 9:26 PM, Tim Starling tstarling@wikimedia.orgwrote:
37% for the larger replacement array in Html::expandAttributes(), or for the smaller one in Html::element()? And what was the test case size: how many replaced bytes compared to non-replaced bytes?
If it was the strtr() in Html::element(), which is the only one which gives a size reduction, perhaps you should compare it against htmlspecialchars($s, ENT_NOQUOTES), which should use the same algorithm as plain htmlspecialchars() but with the same size reduction as strtr().
Ran another test. I tested on the string '<&<&<&herllowodsiojgd<&sd<^<6&&"""' repeated 50 times, and I ran the replacement function 500,000 times. The results were:
htmlspecialchars with ENT_NOQUOTES: 14.025s htmlspecialchars without ENT_NOQUOTES: 13.457s strtr: 24.842s str_replace: 13.184s
Of course, these numbers tend to vary +/- 0.25s every time, so take it with a grain of salt.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
These stats look a little less like htmlspecialchars is the most efficient we should use it. And a little more like strtr is implemented inefficiently, we should try using one of the other methods of string replacement.
Reading up online: * http://stackoverflow.com/questions/8177296/when-to-use-strtr-vs-str-replace * http://micro-optimization.com/strtr-vs-str_replace * http://comments.gmane.org/gmane.comp.php.devel/77397
I get the impression that: * strtr iterates and replaces character-by-character while str_replace replaces each pair in order as if you called str_replace multiple times just replacing rather than iterating * strtr can safely do an `a -> b, b -> a` replacement where 'abb' becomes 'baa' while str_replace cannot * strtr's algorithm may be even slower when the strings to be replaced are of varying sizes * strtr is going to be faster in PHP 5.4 as they've changed the algorithm it uses
We aren't doing any replacements that need strtr's guarantee. As long as our & -> & replacement is the first replacement in str_replace's array then it should work exactly as we need it.
So it looks like we should just be replacing most of our strtr uses with str_replace instead.
Also, I'd be interested to see those benchmarks re-run on PHP 5.4 now that I we know that they changed the algorithm.