Dear all,
should we allow using PHP's assert [1] in MediaWiki code?
It would allow us to formulate and automatically verify conditions about code, while at the same time providing readable documentation of code for free.
Possible, exemplary use cases would be: - automatically verifyable documentation of code's intent - guarding against logic pitfalls like forgetting to set a variable in all branches of switches, if/else cascades - guarding against using uninitialized variables
What do you think?
Kind regards, Christian
P.S.: For typical MediaWiki use cases, PHP's assert is even faster than throwing exceptions behind 'if'-guards.
[1] http://php.net/manual/en/function.assert.php (Not to confuse with PHPUnit's functions for assertions, which solve a different problem.)
On 18/03/12 20:37, Christian Aistleitner wrote:
Dear all,
should we allow using PHP's assert [1] in MediaWiki code?
It would allow us to formulate and automatically verify conditions about code, while at the same time providing readable documentation of code for free.
Possible, exemplary use cases would be:
- automatically verifyable documentation of code's intent
- guarding against logic pitfalls like forgetting to set a variable in all branches of switches, if/else cascades
- guarding against using uninitialized variables
What do you think?
We use exceptions for that.
P.S.: For typical MediaWiki use cases, PHP's assert is even faster than throwing exceptions behind 'if'-guards.
That's funny, for me "if" is about 10 times faster than assert() in the non-throwing case. Micro-optimisation in PHP usually revolves around minimising the number of function calls, since a function call is relatively complex and expensive compared to other opcodes.
-- Tim Starling
+1 to what Tim said. I effectively said as much about a week ago when this was brought up on IRC.
I'd also add that the behavior of assertions vary based on configuration, which is confusing at best. Unlike MWExceptions, which are all handled the same.
-Chad On Mar 18, 2012 6:10 PM, "Tim Starling" tstarling@wikimedia.org wrote:
On 18/03/12 20:37, Christian Aistleitner wrote:
Dear all,
should we allow using PHP's assert [1] in MediaWiki code?
It would allow us to formulate and automatically verify conditions about code, while at the same time providing readable documentation of code for free.
Possible, exemplary use cases would be:
- automatically verifyable documentation of code's intent
- guarding against logic pitfalls like forgetting to set a variable in all branches of switches, if/else cascades
- guarding against using uninitialized variables
What do you think?
We use exceptions for that.
P.S.: For typical MediaWiki use cases, PHP's assert is even faster than throwing exceptions behind 'if'-guards.
That's funny, for me "if" is about 10 times faster than assert() in the non-throwing case. Micro-optimisation in PHP usually revolves around minimising the number of function calls, since a function call is relatively complex and expensive compared to other opcodes.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Chad,
on Sun, Mar 18, 2012 at 07:18:01PM -0400, Chad wrote:
I'd also add that the behavior of assertions vary based on configuration, which is confusing at best.
Being able to vary based on configuration actually is a feature. An essential one. It lowers assert's impact on performance. But there is no need to mess with configuration. asserts work out of the box. You are only given the possibility to turn them off.
The same holds true for the very software MediaWiki is built on. The software uses and relies on asserts, but gives you the possibility to turn assertion checking off. * MySQL uses asserts [1]. * PHP uses asserts [2].
asserts and the possibility to turn them on and off is not confusing there.
But MySQL and PHP are not the only adopters of asserts. Just take about any quality software. The source code takes advantage of asserts (e.g.: Libreoffice [3])
But it's not only practical software engineering. Literature is also strongly in favor of using asserts as well: In books: E.g.: S. McConnell. Code Complete [4] In papers: E.g.: G. Kudrjavets, N. Nagappan, T. Ball. Assessing the Relationship between Software Assertions and Code Quality: An Empirical Investigation [5] In talks: E.g.: T. Hoare. Assert early, assert often [6]
Kind regards, Christian
[1] E.g.: ./mysql-5.1.59/regex/engine.c:199--206 in the MySQL 5.1.59 tarball:
-----8<-----BEGIN-----8<----- assert(dp == NULL || dp == endp); if (dp != NULL) /* found a shorter one */ break;
/* despite initial appearances, there is no match here */ NOTE("false alarm"); start = m->coldp + 1; /* recycle starting later */ assert(start <= stop); -----8<-----END-----8<-----
And here you clearly see what asserts buy you. With just this snippet of code, the first assert tells you what to expect from “dp” at this point. At development time, this contract is automatically checked and a breach thereof is signalled. On production systems, the asserts are deactivated and are ignored.
[2] E.g.: main/streams/memory.c:86--97 in the PHP 5.3.9 tarball:
-----8<-----BEGIN-----8<----- static size_t php_stream_memory_read(php_stream *stream, char *buf, size_t count TSRMLS_DC) { php_stream_memory_data *ms = (php_stream_memory_data*)stream->abstract; assert(ms != NULL);
if (ms->fpos + count >= ms->fsize) { count = ms->fsize - ms->fpos; stream->eof = 1; } if (count) { assert(ms->data!= NULL); assert(buf!= NULL); -----8<-----END-----8<-----
Again, the asserts tell you what to expect from “ms” etc.
[3] E.g.: sc/source/core/data/markdata.cxx:241--243 in the libreoffice-calc 3.4.4.2 tarball:
-----8<-----BEGIN-----8<----- if ( bMultiMarked ) { DBG_ASSERT(pMultiSel, "bMultiMarked, aber pMultiSel == 0"); -----8<-----END-----8<-----
(At this point, you see the German StarOffice roots of LibreOffice/OpenOffice.org. The German “aber” means “but” in English. So the assertion message would be bMultiMarked, but pMultiSel == 0 in English)
[4] isbn:9780735619678
[5] http://research.microsoft.com/pubs/70290/tr-2006-54.pdf
[6] http://research.microsoft.com/en-us/people/thoare/assertearlyassertoften.ppt Be sure to read the notes within the ppt.
On 19/03/12 21:43, Christian Aistleitner wrote:
Being able to vary based on configuration actually is a feature. An essential one. It lowers assert's impact on performance. But there is no need to mess with configuration. asserts work out of the box. You are only given the possibility to turn them off.
The ability to turn off asserts in C is damaging to system security and stability, and is part of C's toxic culture of trading off program correctness for negligible performance improvements.
There are cases where it does make sense to optimise for every last clock cycle, but such cases are very rare in modern programming.
In another post:
Have you tried real world examples?
[...]
function funcAssert() { assert( '$this->isOpen() && $this->mConn' ); $this->mConn++; }
[...]
assert_options( ASSERT_ACTIVE, 0 );
Yeah, very clever. Look, I have a test case where assert() is faster as well:
<?php function foo() { sleep(100); return true; }
assert_options( ASSERT_ACTIVE, 0 ); $t = microtime(true); assert('foo()'); print (microtime(true) - $t) . "\n"; $t = microtime(true); if (!foo()) { throw Exception('assert!'); } print (microtime(true) - $t) . "\n"; ?>
Wow, assert() is 17 million times faster that if() in this case! We should really use assert()!
My previous test of assert() involved a case where the assert() and the if() were doing roughly the same thing. In such cases, if() is faster, because it is not a function call.
The same holds true for the very software MediaWiki is built on. The software uses and relies on asserts, but gives you the possibility to turn assertion checking off. * MySQL uses asserts [1]. * PHP uses asserts [2].
asserts and the possibility to turn them on and off is not confusing there.
But MySQL and PHP are not the only adopters of asserts. Just take about any quality software. The source code takes advantage of asserts (e.g.: Libreoffice [3])
But it's not only practical software engineering. Literature is also strongly in favor of using asserts as well: In books: E.g.: S. McConnell. Code Complete [4] In papers: E.g.: G. Kudrjavets, N. Nagappan, T. Ball. Assessing the Relationship between Software Assertions and Code Quality: An Empirical Investigation [5] In talks: E.g.: T. Hoare. Assert early, assert often [6]
assert() is better than nothing. It's not better than exceptions and unit tests, especially not in PHP.
assert() in PHP shares very little in common with assert() in C. In C, assert() is an empty macro by default. In PHP, by default it raises a warning. PHP doesn't have macros, so to simulate the C performance feature, you have to put the source code inside a string, hiding it from automated source analysis and maintenance tools, and breaking syntax highlighting. I don't think you can defend the PHP feature with references that talk about the C feature.
-- Tim Starling
Hi Tim,
On Mon, Mar 19, 2012 at 11:20:53PM +1100, Tim Starling wrote:
assert_options( ASSERT_ACTIVE, 0 );
[ unmotivated ranting ]
I was talking about performance on production servers. Obviously. Where else would performance matter?
And yes. On production servers, one typically turns checking assertions off.
assert's are used to catch situations that catch logic errors. You want to do that during /development/. They are a tool for development and documentation.
Asserts are not just another nice way to burn cycles on production systems :D
I doubt that the PHP and MySQL binaries used in production were built with assertion checking enabled, were left unstripped, ...
My previous test of assert() involved a case where the assert() and the if() were doing roughly the same thing.
'assert' and 'if' are not designed to do the same thing ... So why should we care to cripple one to simulate the other?
I'd much rather compare how the available tools get required job done. And for conditions that /always/ hold, assert would be the right tool.
assert() is better than nothing. It's not better than exceptions and unit tests, especially not in PHP.
PHP's assert and unit tests have nothing to do with each other. They are orthogonal tools.
Either way MediaWiki's stance on the issue is clear now: To MediaWiki, PHP's assert is evil.
Fair enough!
Kind regards, Christian
Hi Tim,
on Mon, Mar 19, 2012 at 09:09:58AM +1100, Tim Starling wrote:
On 18/03/12 20:37, Christian Aistleitner wrote:
Dear all,
should we allow using PHP's assert [1] in MediaWiki code?
It would allow us to formulate and automatically verify conditions about code, while at the same time providing readable documentation of code for free.
Possible, exemplary use cases would be:
- automatically verifyable documentation of code's intent
- guarding against logic pitfalls like forgetting to set a variable in all branches of switches, if/else cascades
- guarding against using uninitialized variables
What do you think?
We use exceptions for that.
Yes, this was the motivation for my email.
'If'-guards are fine. Just as exceptions are. They are excellent tools for conditions that /typically/ hold at run-time--but eventually they might fail. In such a case, we want to do classical error handling. It's the right tool for the job.
We can of course decide to keep using if-guards/exceptions when modelling conditions that /unconditionally and always/ hold. However, PHP introduced asserts some 12 years back for just this and only this use case. It's a proven tool. assert is tailored for conditions that /unconditionally and always/ hold. So why not allow this standard tool in our toolbox?
Due to this narrower use case, assert comes with some benefit over if-guards/exceptions in terms of code readability and quality: - We can turn off checking the conditions on production machines, to lower the impact. - assert's syntax shows the condition that holds. [1] - asserts produce good error messages without condition duplication. [2] - asserts clearly stand out in code. [3] - asserts just add the bare necessities to the code and do not clutter up code so much - asserts are less code to write.
P.S.: For typical MediaWiki use cases, PHP's assert is even faster than throwing exceptions behind 'if'-guards.
That's funny, for me "if" is about 10 times faster than assert() in the non-throwing case.
Have you tried real world examples?
Consider for example
$this->isOpen() && $this->mConn
This is a typical condition one could add in many places of DatabaseMysql.php. For this condition asserts are ~16% faster [4].
For this real-world example, the fact that assert takes the condition as string (hence unevaluated) outperforms the penalty due to the function call.
But speed is just in the "P.S.". assert's real benefit would be improved readability, as pointed out above.
Kind regards, Christian
[1] If guards show the negated condition. Hence, when reading the code, you have to mentally negate the condition again before actually knowing what has to hold.
[2] An
assert( 'condA && condB' );
would relate to
if ( ! condA || ! condB ) { throw new MWException( 'condA && condB was violated' ) }
Hence, if e.g.: condA changes, asserts just changes condA and we are done. For if-guards/exceptions, we have to adapt both occurrences of condA. This is somewhat error prone and it's easier for the conditions to run apart.
[3] if-guards/exceptions look like normal code. Hence, you have to mentally reparse it again and again and detect them. Typically IDEs cannot help or highlight only those guards that document code.
IDEs can easily detect and understand asserts. Even REs can find them ;)
[4] Please verify the number yourself. It was obtained by the attached assert_test.php. The output for me was:
RUNS: 10, ITERATIONS: 1000000 assert: 1.818 ifGuard: 2.148 assert: 1.798 ifGuard: 2.151 assert: 1.795 ifGuard: 2.162 assert: 1.798 ifGuard: 2.148 assert: 1.801 ifGuard: 2.154 assert: 1.800 ifGuard: 2.134 assert: 1.788 ifGuard: 2.140 assert: 1.790 ifGuard: 2.141 assert: 1.791 ifGuard: 2.146 assert: 1.797 ifGuard: 2.141 total: assert: 17.976 total: ifGuard: 21.464 assert is ~16% faster than ifGuard
wikitech-l@lists.wikimedia.org