On Fri, Jul 30, 2021 at 10:10 PM Tim Starling tstarling@wikimedia.org wrote:
For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880 comes down to:
if ( $x == 1 ) { action1(); } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { action_not_2(); }
If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
I am proposing
if ( $x == 1 ) { action1(); goto not_2; // avoid unnecessary comparison $x == 2 } else { action_not_1(); } if ( $x == 2 ) { action2(); } else { not_2: action_not_2(); }
I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.
I feel that some people who have an absolutist stance on this issue are directly or indirectly parroting the essay "Go To Statement Considered Harmful" written by Edsger Dijkstra in 1968 [0][1]. I wonder however how many who do so have read the short essay itself and thought about it in the context of the time it was presented. In my own reading of the essay I find Dijkstra arguing for structure and practices in writing software that I think we all take for granted today.
He is arguing for software to have a written structure which makes it easier to form a mental model of how state will change and execution will flow when the program is executed. To set up his argument, Dijkstra states two 'remarks' from his own experience. I invite you to read his original (dense academic) prose, but I will summarize the premises as:
* A program must fulfill the business requirements to be useful. * Human brains are better at static analysis than dynamic analysis, and therefore code should be written to optimize for understandability under static analysis.
These statements seem generally reasonable to me, and I believe that many "standard practices" are in service of these premises. Unit, end-to-end, and user acceptance testing are all tools to validate fulfillment of business requirements. Our collective bias for smaller functions, smaller classes, and 'separation of concerns', along with linters and static checkers like the one commenting on Tim's gerrit patch, are attempts to increase readability and comprehension of our code. None of these things were in any way widely available in 1968, but they are widely accepted tools and practices today.
Treating the title of the essay as dogma however, I feel misses some of the nuance of the argument. This statement from the essay for me is key: "The unbridled use of the go to statement has as an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress."
Dijkstra is arguing against what I would colloquially call 'spaghetti code'; code where the flow of control jumps around a lot and in the process leaves the reader confused about what is expected to happen and why. The flow of execution becomes tangled much as a plate of cooked noodles dumped from a pot. Finding both ends of a particular noodle with a quick visual inspection is a mental and physical challenge, not a trivial task.
I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
do { do { if ( $x === 1 ) { action1(); break; } else { action_not_1(); } if ( $x === 2 ) { action2(); break 2; } } while ( false ); action_not_2(); } while ( false );
But I don't think that's an improvement for readability.
You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.
Tim's example here is nice in that it shows how other PHP language constructs could be used, but that these are also not common constructs and that they do not immediately yield a more understandable function.
I am requesting that goto be considered acceptable for micro-optimisation.
When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.
I am in agreement with Tim. I do not think that any of us should adopt goto as a commonly used tool. I do however think there are situations where a goto and comments actually do produce understandable code which also conforms to a business requirement of keeping wall clock execution time as small as possible.
Now I guess my question to the group is, how can we describe this nuance as a replacement for statements in current coding conventions like "Do not use the goto() syntax introduced in 5.3. PHP may have introduced the feature, but that does not mean we should use it." [2]? Could it be as simple as stating the bias more like "The use of `goto` should be exceedingly rare, always accompanied by comments explaining why it is used (likely for performance), and the author should be prepared for others to challenge the usage"?
[0]: https://dl.acm.org/doi/10.1145/362929.362947 [1]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD215.html [2]: https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP#Other
Bryan