For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545 comes down to:
if ( $x == 1 ) { action1(); } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { action_not_2(); }
If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
I am proposing
if ( $x == 1 ) { action1(); goto not_2; // avoid unnecessary comparison $x == 2 } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { not_2: action_not_2(); }
I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.
I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
do { do { if ( $x === 1 ) { action1(); break; } else { action_not_1(); } if ( $x === 2 ) { action2(); break 2; } } while ( false ); action_not_2(); } while ( false );
But I don't think that's an improvement for readability.
You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.
I am requesting that goto be considered acceptable for micro-optimisation.
When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.
-- Tim Starling
On Fri, Jul 30, 2021 at 10:10 PM Tim Starling tstarling@wikimedia.org wrote:
For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880 comes down to:
if ( $x == 1 ) { action1(); } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { action_not_2(); }
If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
I am proposing
if ( $x == 1 ) { action1(); goto not_2; // avoid unnecessary comparison $x == 2 } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { not_2: action_not_2(); }
I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.
I feel that some people who have an absolutist stance on this issue are directly or indirectly parroting the essay "Go To Statement Considered Harmful" written by Edsger Dijkstra in 1968 [0][1]. I wonder however how many who do so have read the short essay itself and thought about it in the context of the time it was presented. In my own reading of the essay I find Dijkstra arguing for structure and practices in writing software that I think we all take for granted today.
He is arguing for software to have a written structure which makes it easier to form a mental model of how state will change and execution will flow when the program is executed. To set up his argument, Dijkstra states two 'remarks' from his own experience. I invite you to read his original (dense academic) prose, but I will summarize the premises as:
* A program must fulfill the business requirements to be useful. * Human brains are better at static analysis than dynamic analysis, and therefore code should be written to optimize for understandability under static analysis.
These statements seem generally reasonable to me, and I believe that many "standard practices" are in service of these premises. Unit, end-to-end, and user acceptance testing are all tools to validate fulfillment of business requirements. Our collective bias for smaller functions, smaller classes, and 'separation of concerns', along with linters and static checkers like the one commenting on Tim's gerrit patch, are attempts to increase readability and comprehension of our code. None of these things were in any way widely available in 1968, but they are widely accepted tools and practices today.
Treating the title of the essay as dogma however, I feel misses some of the nuance of the argument. This statement from the essay for me is key: "The unbridled use of the go to statement has as an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress."
Dijkstra is arguing against what I would colloquially call 'spaghetti code'; code where the flow of control jumps around a lot and in the process leaves the reader confused about what is expected to happen and why. The flow of execution becomes tangled much as a plate of cooked noodles dumped from a pot. Finding both ends of a particular noodle with a quick visual inspection is a mental and physical challenge, not a trivial task.
I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
do { do { if ( $x === 1 ) { action1(); break; } else { action_not_1(); } if ( $x === 2 ) { action2(); break 2; } } while ( false ); action_not_2(); } while ( false );
But I don't think that's an improvement for readability.
You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.
Tim's example here is nice in that it shows how other PHP language constructs could be used, but that these are also not common constructs and that they do not immediately yield a more understandable function.
I am requesting that goto be considered acceptable for micro-optimisation.
When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.
I am in agreement with Tim. I do not think that any of us should adopt goto as a commonly used tool. I do however think there are situations where a goto and comments actually do produce understandable code which also conforms to a business requirement of keeping wall clock execution time as small as possible.
Now I guess my question to the group is, how can we describe this nuance as a replacement for statements in current coding conventions like "Do not use the goto() syntax introduced in 5.3. PHP may have introduced the feature, but that does not mean we should use it." [2]? Could it be as simple as stating the bias more like "The use of `goto` should be exceedingly rare, always accompanied by comments explaining why it is used (likely for performance), and the author should be prepared for others to challenge the usage"?
[0]: https://dl.acm.org/doi/10.1145/362929.362947 [1]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD215.html [2]: https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP#Other
Bryan
On Sat, Jul 31, 2021 at 6:10 AM Tim Starling tstarling@wikimedia.org wrote:
For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
...
I am proposing
if ( $x == 1 ) { action1(); goto not_2; // avoid unnecessary comparison $x == 2 } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { not_2: action_not_2(); }
...
I am requesting that goto be considered acceptable for micro-optimisation.
ha, what question. the single goto and its target are 5 lines apart, even me php incompetent person can understand it.
you triggered me reading more about it though. the commit comment states it takes 30% less instructions: Measuring instruction count per iteration with perf stat, averaged over 10M iterations, PS1. Test case: Html::openElement('a', [ 'class' => [ 'foo', 'bar' ] ] )
* Baseline: 11160.7265433 * in_array(): 10390.3837233 * dropDefaults() changes: 9674.1248824 * expandAttributes() misc: 9248.1947500 * implode/explode and space check: 8318.9800417 * Sanitizer inline: 8021.7371794
does this mean these changes bring 30% speed improvement? that is incredible! how often is this part called to retrieve one article?
now i understand why legoktm is prepared to rewrite mediawiki in rust (https://www.mediawiki.org/wiki/Template:User_Rust), and why proposals exist to extend php with rust (https://github.com/rethinkphp/php-rs , https://docs.rs/solder/0.1.6/solder/ ). tempted i was to use legoktm's template on my user page, when i finally saw that php is amazing with regular expressions by including pcre c library: https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/rege... .
rupert
It might be interesting to point out to discussions around use of goto in Linux Kernel: https://koblents.com/Ches/Links/Month-Mar-2013/20-Using-Goto-in-Linux-Kernel... https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-...
IIRC (from a presentation by Greg KH I watched but I can't find it now), in Linux, goto forward is fine to use everywhere but goto backwards is only used in the scheduler. And the scheduler is special (to put it mildly). So I think using goto backwards should be highly discouraged and maybe downright banned but forward should be okay.
HTH
On Sun, Aug 1, 2021 at 8:05 AM rupert THURNER rupert.thurner@gmail.com wrote:
On Sat, Jul 31, 2021 at 6:10 AM Tim Starling tstarling@wikimedia.org wrote:
For performance sensitive tight loops, such as parsing and HTML
construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis. ...
I am proposing
if ( $x == 1 ) { action1(); goto not_2; // avoid unnecessary comparison $x == 2 } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { not_2: action_not_2(); }
...
I am requesting that goto be considered acceptable for
micro-optimisation.
ha, what question. the single goto and its target are 5 lines apart, even me php incompetent person can understand it.
you triggered me reading more about it though. the commit comment states it takes 30% less instructions: Measuring instruction count per iteration with perf stat, averaged over 10M iterations, PS1. Test case: Html::openElement('a', [ 'class' => [ 'foo', 'bar' ] ] )
- Baseline: 11160.7265433
- in_array(): 10390.3837233
- dropDefaults() changes: 9674.1248824
- expandAttributes() misc: 9248.1947500
- implode/explode and space check: 8318.9800417
- Sanitizer inline: 8021.7371794
does this mean these changes bring 30% speed improvement? that is incredible! how often is this part called to retrieve one article?
now i understand why legoktm is prepared to rewrite mediawiki in rust (https://www.mediawiki.org/wiki/Template:User_Rust), and why proposals exist to extend php with rust (https://github.com/rethinkphp/php-rs , https://docs.rs/solder/0.1.6/solder/ ). tempted i was to use legoktm's template on my user page, when i finally saw that php is amazing with regular expressions by including pcre c library:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/rege... .
rupert _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On 1/8/21 4:04 pm, rupert THURNER wrote:
you triggered me reading more about it though. the commit comment states it takes 30% less instructions: Measuring instruction count per iteration with perf stat, averaged over 10M iterations, PS1. Test case: Html::openElement('a', [ 'class' => [ 'foo', 'bar' ] ] )
- Baseline: 11160.7265433
- in_array(): 10390.3837233
- dropDefaults() changes: 9674.1248824
- expandAttributes() misc: 9248.1947500
- implode/explode and space check: 8318.9800417
- Sanitizer inline: 8021.7371794
does this mean these changes bring 30% speed improvement? that is incredible!
Well, 30% reduction in instruction count. Time reduction is about 25%, although you can take the reciprocal of that (1/0.75) and call it 34% speed improvement.
I used instruction count rather than time because you can get 4-5 significant figures of accuracy, i.e. the first 4-5 digits stay the same between runs, despite background activity, so you can measure small changes very accurately.
how often is this part called to retrieve one article?
Errr... let's just say there's no need to name a second day after me. It's a small change.
The broader context is T284274 https://phabricator.wikimedia.org/T284274-- I'm trying to make sure you can view the history page with limit=5000 without seeing a timeout. My change to Thanks probably cut render time for big history pages by 50% -- that's how much the 95th and 99th percentile service times dropped by. Html::openElement() is a smaller piece of a smaller piece of the puzzle.
-- Tim Starling
I notice that sonarqubebot complains about the goto, and the reviewer suggests adding "// NOSONAR" to suppress that. I suggest we retain that behavior (frankly I'm not familiar enough with sonarqubebot to know if we could globally suppress it if we wanted to...)
I don't object to a (very) limited use of goto in (very) strategic situations. But IMO it should be unusual enough that forcing the developer to add a comment to explicitly suppress the gripe is reasonable.
Bill Pirkle Software Engineer www.wikimediafoundation.org
On Sun, Aug 1, 2021 at 5:37 AM Tim Starling tstarling@wikimedia.org wrote:
On 1/8/21 4:04 pm, rupert THURNER wrote:
you triggered me reading more about it though. the commit comment states it takes 30% less instructions: Measuring instruction count per iteration with perf stat, averaged over 10M iterations, PS1. Test case: Html::openElement('a', [ 'class' => [ 'foo', 'bar' ] ] )
- Baseline: 11160.7265433
- in_array(): 10390.3837233
- dropDefaults() changes: 9674.1248824
- expandAttributes() misc: 9248.1947500
- implode/explode and space check: 8318.9800417
- Sanitizer inline: 8021.7371794
does this mean these changes bring 30% speed improvement? that is incredible!
Well, 30% reduction in instruction count. Time reduction is about 25%, although you can take the reciprocal of that (1/0.75) and call it 34% speed improvement.
I used instruction count rather than time because you can get 4-5 significant figures of accuracy, i.e. the first 4-5 digits stay the same between runs, despite background activity, so you can measure small changes very accurately.
how often is this part called to retrieve one article?
Errr... let's just say there's no need to name a second day after me. It's a small change.
The broader context is T284274 https://phabricator.wikimedia.org/T284274-- I'm trying to make sure you can view the history page with limit=5000 without seeing a timeout. My change to Thanks probably cut render time for big history pages by 50% -- that's how much the 95th and 99th percentile service times dropped by. Html::openElement() is a smaller piece of a smaller piece of the puzzle.
-- Tim Starling _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On 7/31/21 6:09 AM, Tim Starling wrote:
Certain flow control patterns cannot be implemented efficiently in
PHP without using "goto". The current example in Gerrit 708880 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545 comes down to:
If `goto` really does help, please go to it with no objections from me!
But could you split the guard logic in your example, like this? The `||` operator will short-circuit away the expensive test for "is 2" in the case of "is 1".
$state_1 = ($x == 1); $state_2 = !($state_1 || $x != 2);
if ( $state_1 ) { Â Â Â action1(); } else { Â Â Â action_not_1(); } if ( $state_2 ) { Â Â Â action2(); } else { Â Â Â action_not_2(); }
-Adam W.
For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545 comes down to:
if ( $x == 1 ) { action1(); } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { action_not_2(); }
If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
I am proposing
if ( $x == 1 ) { action1(); goto not_2; // avoid unnecessary comparison $x == 2 } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { not_2: action_not_2(); }
I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.
I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
do { do { if ( $x === 1 ) { action1(); break; } else { action_not_1(); } if ( $x === 2 ) { action2(); break 2; } } while ( false ); action_not_2(); } while ( false );
But I don't think that's an improvement for readability.
You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.
I am requesting that goto be considered acceptable for micro-optimisation.
When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.
-- Tim Starling
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Along that same line of thinking, see if the data can be morphed outside of any tight loops so that inside the loop, you can check for empty(), isset(), or === null. From what I’ve read, those are a lot faster in PHP than comparing with an untyped variable.
From: Adam Wight adam.wight@wikimedia.de Sent: August 2, 2021 4:55 AM To: wikitech-l@lists.wikimedia.org Subject: [Wikitech-l] Re: Goto for microoptimisation
On 7/31/21 6:09 AM, Tim Starling wrote:
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545 comes down to:
If `goto` really does help, please go to it with no objections from me!
But could you split the guard logic in your example, like this? The `||` operator will short-circuit away the expensive test for "is 2" in the case of "is 1".
$state_1 = ($x == 1); $state_2 = !($state_1 || $x != 2);
if ( $state_1 ) { action1(); } else { action_not_1(); } if ( $state_2 ) { action2(); } else { action_not_2(); }
-Adam W.
For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545 comes down to:
if ( $x == 1 ) {
action1();
} else {
action_not_1();
}
if ( $x == 2 ) {
action2();
} else {
action_not_2();
}
If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
I am proposing
if ( $x == 1 ) {
action1();
goto not_2; // avoid unnecessary comparison $x == 2
} else {
action_not_1();
}
if ( $x == 2 ) {
action2();
} else {
not_2:
action_not_2();
}
I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.
I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
do {
do {
if ( $x === 1 ) {
action1();
break;
} else {
action_not_1();
}
if ( $x === 2 ) {
action2();
break 2;
}
} while ( false );
action_not_2();
} while ( false );
But I don't think that's an improvement for readability.
You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.
I am requesting that goto be considered acceptable for micro-optimisation.
When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.
-- Tim Starling
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.orgmailto:wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.orgmailto:wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
For the record, I merged Tim's patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/ last week and was unaware of this email thread.
My thinking was as follows:
1. The implementation does not depend on the goto statement.
That is, it is not used to write overly-clever or complicated logic. If you remove the goto statement, the method behaves the exact same way. And thus the moment it ceases to serve its use (performance optimisation) it can be safely removed without further thought. think this is essential to keeping the code easy to understand and maintain. This one principle actually covers it all for me. The next three points are implied by this: 1a. This use of goto only jumps downward. Jumping backwards (up) would likely violate point 1, and either way would imho be too complicated to think about when debugging or maintaining the code in the future. Especially the potential for an infinite loop. 1b. This use of goto only jumps to a statement within the same function. (In fact, jumping to another file, class, or function is not supported by PHP in the first place. This is literally the only way it can be used. There is some sanity in the language after all!). 1c. This use of goto serves as a performance optimisation for a hot code path. Similarly implied by point 1: If it doesn't change behaviour and doesn't improve performance where it matters, we shouldn't bother using it.
2. An inline comment clearly stays it is a performance optimisation, and explains why it is safe, and how we know the destination is where we would end up regardless. (e.g. "we're inside a condition for X2, so we can skip to the else of X1, and no other statements would run between here and there").
-- Timo
On Sat, 31 Jul 2021 at 05:10, Tim Starling tstarling@wikimedia.org wrote:
For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545 comes down to:
if ( $x == 1 ) { action1(); } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { action_not_2(); }
If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
I am proposing
if ( $x == 1 ) { action1(); goto not_2; // avoid unnecessary comparison $x == 2 } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { not_2: action_not_2(); }
I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.
I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
do { do { if ( $x === 1 ) { action1(); break; } else { action_not_1(); } if ( $x === 2 ) { action2(); break 2; } } while ( false ); action_not_2(); } while ( false );
But I don't think that's an improvement for readability.
You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.
I am requesting that goto be considered acceptable for micro-optimisation.
When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.
-- Tim Starling _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Just to play the devils advocate. I want to point out some example risks (which were of course included in the discussion in the earlier linked literature):
a) someone will introduce
if ($x==1.5)
in the middle or
b) someone will make action1 modifying on $x which might result in $x==2
the behavior would change.
However it is likely that those mistakes will be discover during code review, if the function is not too big and Gerrit hides the line with the goto statement. Thus jumping many lines is more dangerous than jumping only a few lines.
After all, I still think using goto is a good idea now until the php (8+$x) jit compiler might do the job for us in the future.
All the best Physikerwelt
On Sat, 14 Aug 2021, 05:16 Krinkle, krinklemail@gmail.com wrote:
For the record, I merged Tim's patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/ last week and was unaware of this email thread.
My thinking was as follows:
- The implementation does not depend on the goto statement.
That is, it is not used to write overly-clever or complicated logic. If you remove the goto statement, the method behaves the exact same way. And thus the moment it ceases to serve its use (performance optimisation) it can be safely removed without further thought. think this is essential to keeping the code easy to understand and maintain. This one principle actually covers it all for me. The next three points are implied by this: 1a. This use of goto only jumps downward. Jumping backwards (up) would likely violate point 1, and either way would imho be too complicated to think about when debugging or maintaining the code in the future. Especially the potential for an infinite loop. 1b. This use of goto only jumps to a statement within the same function. (In fact, jumping to another file, class, or function is not supported by PHP in the first place. This is literally the only way it can be used. There is some sanity in the language after all!). 1c. This use of goto serves as a performance optimisation for a hot code path. Similarly implied by point 1: If it doesn't change behaviour and doesn't improve performance where it matters, we shouldn't bother using it.
- An inline comment clearly stays it is a performance optimisation, and
explains why it is safe, and how we know the destination is where we would end up regardless. (e.g. "we're inside a condition for X2, so we can skip to the else of X1, and no other statements would run between here and there").
-- Timo
On Sat, 31 Jul 2021 at 05:10, Tim Starling tstarling@wikimedia.org wrote:
For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545 comes down to:
if ( $x == 1 ) { action1(); } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { action_not_2(); }
If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
I am proposing
if ( $x == 1 ) { action1(); goto not_2; // avoid unnecessary comparison $x == 2 } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { not_2: action_not_2(); }
I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.
I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
do { do { if ( $x === 1 ) { action1(); break; } else { action_not_1(); } if ( $x === 2 ) { action2(); break 2; } } while ( false ); action_not_2(); } while ( false );
But I don't think that's an improvement for readability.
You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.
I am requesting that goto be considered acceptable for micro-optimisation.
When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.
-- Tim Starling _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Just a minor point about the code: in expandAtributes(), $spaceSeparatedListAttributes should be initialized outside the loop or as a private static.
From: Krinkle krinklemail@gmail.com Sent: August 13, 2021 11:15 PM To: For developers discussing technical aspects and organization of Wikimedia projects wikitech-l@lists.wikimedia.org Subject: [Wikitech-l] Re: Goto for microoptimisation
For the record, I merged Tim's patchhttps://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/ last week and was unaware of this email thread.
My thinking was as follows:
1. The implementation does not depend on the goto statement.
That is, it is not used to write overly-clever or complicated logic. If you remove the goto statement, the method behaves the exact same way. And thus the moment it ceases to serve its use (performance optimisation) it can be safely removed without further thought. think this is essential to keeping the code easy to understand and maintain. This one principle actually covers it all for me. The next three points are implied by this: 1a. This use of goto only jumps downward. Jumping backwards (up) would likely violate point 1, and either way would imho be too complicated to think about when debugging or maintaining the code in the future. Especially the potential for an infinite loop. 1b. This use of goto only jumps to a statement within the same function. (In fact, jumping to another file, class, or function is not supported by PHP in the first place. This is literally the only way it can be used. There is some sanity in the language after all!). 1c. This use of goto serves as a performance optimisation for a hot code path. Similarly implied by point 1: If it doesn't change behaviour and doesn't improve performance where it matters, we shouldn't bother using it.
2. An inline comment clearly stays it is a performance optimisation, and explains why it is safe, and how we know the destination is where we would end up regardless. (e.g. "we're inside a condition for X2, so we can skip to the else of X1, and no other statements would run between here and there").
-- Timo
On Sat, 31 Jul 2021 at 05:10, Tim Starling <tstarling@wikimedia.orgmailto:tstarling@wikimedia.org> wrote:
For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545 comes down to:
if ( $x == 1 ) {
action1();
} else {
action_not_1();
}
if ( $x == 2 ) {
action2();
} else {
action_not_2();
}
If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
I am proposing
if ( $x == 1 ) {
action1();
goto not_2; // avoid unnecessary comparison $x == 2
} else {
action_not_1();
}
if ( $x == 2 ) {
action2();
} else {
not_2:
action_not_2();
}
I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.
I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
do {
do {
if ( $x === 1 ) {
action1();
break;
} else {
action_not_1();
}
if ( $x === 2 ) {
action2();
break 2;
}
} while ( false );
action_not_2();
} while ( false );
But I don't think that's an improvement for readability.
You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.
I am requesting that goto be considered acceptable for micro-optimisation.
When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.
-- Tim Starling _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.orgmailto:wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.orgmailto:wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
wikitech-l@lists.wikimedia.org