On 5/18/05, Tim Starling t.starling@physics.unimelb.edu.au wrote:
I thought I'd be clever and make a tighter loop by hand:
mov ecx, 1000000000 .p2align 4,,15
L6: loop L6
But to my disappointment it was slower than the machine generated version.
Note that if that has not changed (I do not really study instruction-level optimization nowadays), the "loop" instruction was slower than an identical (almost, except for flags) dec (e)cx, jnz label. Go figure.
-- [[cs:User:Mormegil | Petr Kadlec]]