On 5/18/05, Tim Starling <t.starling(a)physics.unimelb.edu.au> wrote:
I thought I'd be clever and make a tighter loop by
hand:
mov ecx, 1000000000
.p2align 4,,15
L6:
loop L6
But to my disappointment it was slower than the machine generated
version.
Note that if that has not changed (I do not really study
instruction-level optimization nowadays), the "loop" instruction was
slower than an identical (almost, except for flags) dec (e)cx, jnz
label. Go figure.
-- [[cs:User:Mormegil | Petr Kadlec]]