So I ran a brief benchmark on my vagrant instance recently (nothing fancy, just 50,000 iterations of a single line of code), and I found that htmlspecialchars() performs *significantly* faster than strtr() (a difference of like 37%).
Html::element (and other places in the Html class) prefer using strtr() with a manual list of some elements rather than htmlspecialchars(). The reasoning behind this (I think) is do make the output document slightly smaller by a few bytes by not escaping unnecessary items.
So my question is if the byte reduction is really worth it, or if we would rather have a 37% reduction in escaping speed? *-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Wed, May 29, 2013 at 10:21 AM, Tyler Romeo tylerromeo@gmail.com wrote:
So I ran a brief benchmark on my vagrant instance recently (nothing fancy, just 50,000 iterations of a single line of code), and I found that htmlspecialchars() performs *significantly* faster than strtr() (a difference of like 37%).
Html::element (and other places in the Html class) prefer using strtr() with a manual list of some elements rather than htmlspecialchars(). The reasoning behind this (I think) is do make the output document slightly smaller by a few bytes by not escaping unnecessary items.
So my question is if the byte reduction is really worth it, or if we would rather have a 37% reduction in escaping speed?
Sounds like you're comparing apples to oranges here. What happens to the speed when you change Html::element() and friends? To (average) page size?
-Chad
On Wed, May 29, 2013 at 10:30 AM, Chad innocentkiller@gmail.com wrote:
Sounds like you're comparing apples to oranges here. What happens to the speed when you change Html::element() and friends? To (average) page size?
It seems to reduce the loading speed by about half a millisecond (kind of insignificant), and I have no idea what the page size change is.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
Le 29/05/13 16:21, Tyler Romeo a écrit :
So I ran a brief benchmark on my vagrant instance recently (nothing fancy, just 50,000 iterations of a single line of code), and I found that htmlspecialchars() performs *significantly* faster than strtr() (a difference of like 37%).
Html::element (and other places in the Html class) prefer using strtr() with a manual list of some elements rather than htmlspecialchars(). The reasoning behind this (I think) is do make the output document slightly smaller by a few bytes by not escaping unnecessary items.
So my question is if the byte reduction is really worth it, or if we would rather have a 37% reduction in escaping speed?
I wrote a dumb benchmarking class a while ago under maintenance/benchmarks feel free to add one there.
The original code seems to be by Aryeh in 0120d492b and mentioned indeed "we" liked the size difference.
I would go for speed, that might also makes the code simpler.
On Wed, May 29, 2013 at 1:39 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 29/05/13 16:21, Tyler Romeo a écrit :
So I ran a brief benchmark on my vagrant instance recently (nothing fancy, just 50,000 iterations of a single line of code), and I found that htmlspecialchars() performs *significantly* faster than strtr() (a difference of like 37%).
Html::element (and other places in the Html class) prefer using strtr() with a manual list of some elements rather than htmlspecialchars(). The reasoning behind this (I think) is do make the output document slightly smaller by a few bytes by not escaping unnecessary items.
So my question is if the byte reduction is really worth it, or if we would rather have a 37% reduction in escaping speed?
I wrote a dumb benchmarking class a while ago under maintenance/benchmarks feel free to add one there.
The original code seems to be by Aryeh in 0120d492b and mentioned indeed "we" liked the size difference.
I would go for speed, that might also makes the code simpler.
Perhaps. But if it's only marginally faster (like, half a ms), then the reduced output probably saves more in the long run. But without actual numbers we're really just guessing at possible micro-optimizations.
-Chad
On Wed, May 29, 2013 at 1:43 PM, Chad innocentkiller@gmail.com wrote:
Perhaps. But if it's only marginally faster (like, half a ms), then the reduced output probably saves more in the long run. But without actual numbers we're really just guessing at possible micro-optimizations.
Very true. I shall delve deeper into benchmarking and see if there is any difference. At the very least, htmlspecialchars() is simpler and makes the code easier to understand.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On 30/05/13 00:21, Tyler Romeo wrote:
So I ran a brief benchmark on my vagrant instance recently (nothing fancy, just 50,000 iterations of a single line of code), and I found that htmlspecialchars() performs *significantly* faster than strtr() (a difference of like 37%).
37% for the larger replacement array in Html::expandAttributes(), or for the smaller one in Html::element()? And what was the test case size: how many replaced bytes compared to non-replaced bytes?
If it was the strtr() in Html::element(), which is the only one which gives a size reduction, perhaps you should compare it against htmlspecialchars($s, ENT_NOQUOTES), which should use the same algorithm as plain htmlspecialchars() but with the same size reduction as strtr().
-- Tim Starling
On Wed, May 29, 2013 at 9:26 PM, Tim Starling tstarling@wikimedia.orgwrote:
37% for the larger replacement array in Html::expandAttributes(), or for the smaller one in Html::element()? And what was the test case size: how many replaced bytes compared to non-replaced bytes?
If it was the strtr() in Html::element(), which is the only one which gives a size reduction, perhaps you should compare it against htmlspecialchars($s, ENT_NOQUOTES), which should use the same algorithm as plain htmlspecialchars() but with the same size reduction as strtr().
Ran another test. I tested on the string '<&<&<&herllowodsiojgd<&sd<^<6&&"""' repeated 50 times, and I ran the replacement function 500,000 times. The results were:
htmlspecialchars with ENT_NOQUOTES: 14.025s htmlspecialchars without ENT_NOQUOTES: 13.457s strtr: 24.842s str_replace: 13.184s
Of course, these numbers tend to vary +/- 0.25s every time, so take it with a grain of salt.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Wed, 29 May 2013 18:33:03 -0700, Tyler Romeo tylerromeo@gmail.com wrote:
On Wed, May 29, 2013 at 9:26 PM, Tim Starling tstarling@wikimedia.orgwrote:
37% for the larger replacement array in Html::expandAttributes(), or for the smaller one in Html::element()? And what was the test case size: how many replaced bytes compared to non-replaced bytes?
If it was the strtr() in Html::element(), which is the only one which gives a size reduction, perhaps you should compare it against htmlspecialchars($s, ENT_NOQUOTES), which should use the same algorithm as plain htmlspecialchars() but with the same size reduction as strtr().
Ran another test. I tested on the string '<&<&<&herllowodsiojgd<&sd<^<6&&"""' repeated 50 times, and I ran the replacement function 500,000 times. The results were:
htmlspecialchars with ENT_NOQUOTES: 14.025s htmlspecialchars without ENT_NOQUOTES: 13.457s strtr: 24.842s str_replace: 13.184s
Of course, these numbers tend to vary +/- 0.25s every time, so take it with a grain of salt.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
These stats look a little less like htmlspecialchars is the most efficient we should use it. And a little more like strtr is implemented inefficiently, we should try using one of the other methods of string replacement.
Reading up online: * http://stackoverflow.com/questions/8177296/when-to-use-strtr-vs-str-replace * http://micro-optimization.com/strtr-vs-str_replace * http://comments.gmane.org/gmane.comp.php.devel/77397
I get the impression that: * strtr iterates and replaces character-by-character while str_replace replaces each pair in order as if you called str_replace multiple times just replacing rather than iterating * strtr can safely do an `a -> b, b -> a` replacement where 'abb' becomes 'baa' while str_replace cannot * strtr's algorithm may be even slower when the strings to be replaced are of varying sizes * strtr is going to be faster in PHP 5.4 as they've changed the algorithm it uses
We aren't doing any replacements that need strtr's guarantee. As long as our & -> & replacement is the first replacement in str_replace's array then it should work exactly as we need it.
So it looks like we should just be replacing most of our strtr uses with str_replace instead.
Also, I'd be interested to see those benchmarks re-run on PHP 5.4 now that I we know that they changed the algorithm.
On Wed, May 29, 2013 at 10:22 PM, Daniel Friesen <daniel@nadir-seen-fire.com
wrote:
Also, I'd be interested to see those benchmarks re-run on PHP 5.4 now that I we know that they changed the algorithm.
On PHP 5.4:
htmlspecialchars with ENT_NOQUOTES: 8.548s htmlspecialchars without ENT_NOQUOTES: 8.655s strtr: 18.012s str_replace: 9.657s
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On 30/05/13 11:33, Tyler Romeo wrote:
Ran another test. I tested on the string '<&<&<&herllowodsiojgd<&sd<^<6&&"""' repeated 50 times, and I ran the replacement function 500,000 times. The results were:
htmlspecialchars with ENT_NOQUOTES: 14.025s htmlspecialchars without ENT_NOQUOTES: 13.457s strtr: 24.842s str_replace: 13.184s
Of course, these numbers tend to vary +/- 0.25s every time, so take it with a grain of salt.
With the PHP package used at Wikimedia (5.3.10-1ubuntu3.6+wmf1), with "taskset 1 nice -n-10", I get:
htmlspecialchars with ENT_NOQUOTES: 11.8s htmlspecialchars without ENT_NOQUOTES: 12.0s strtr: 24.8s str_replace: 12.9s
On 30/05/13 12:22, Daniel Friesen wrote:
- strtr is going to be faster in PHP 5.4 as they've changed the algorithm it uses
In 5.4, the strtr() hashtable implementation has been optimised, but it's still a hashtable. In htmlspecialchars(), the lookup table is just a plain C array:
static inline void find_entity_for_char_basic( unsigned int k, const entity_stage3_row *table, const unsigned char **entity, size_t *entity_len) { if (k >= 64U) { *entity = NULL; *entity_len = 0; return; }
*entity = table[k].data.ent.entity; *entity_len = table[k].data.ent.entity_len; }
It's hard to beat that.
-- Tim Starling
Le 30/05/13 06:34, Tim Starling a écrit :
With the PHP package used at Wikimedia (5.3.10-1ubuntu3.6+wmf1), with "taskset 1 nice -n-10", I get:
Added that to a maintenance/benchmarks/README file:
https://gerrit.wikimedia.org/r/66066
For the record, the times I mentioned above were user time and not real time, so changing the priority or affinity of the process wouldn't really affect it that much.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Thu, May 30, 2013 at 3:41 AM, Antoine Musso hashar+wmf@free.fr wrote:
Le 30/05/13 06:34, Tim Starling a écrit :
With the PHP package used at Wikimedia (5.3.10-1ubuntu3.6+wmf1), with "taskset 1 nice -n-10", I get:
Added that to a maintenance/benchmarks/README file:
https://gerrit.wikimedia.org/r/66066
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org