In a modern processor, such as x86_64 or ARM, it’s remarkably complicated. Here’s a quick example I knocked up to demonstrate.

I pulled out the loop body into a second function to stop the compiler from optimizing the empty loop into nothing, and so that the loop itself is separate from the stuff it does. When the above code is assembled for the ARM_64, you get this:

_example_for_loop : ## @example_for_loop Lfunc_begin0 : . file 2 "/Users/grahamcox/Projects/test" "/Users/grahamcox/Projects/test/test/forloop.c" . loc 2 13 0 ## /Users/grahamcox/Projects/test/test/forloop.c:13:0 . cfi_startproc ## %bb.0: pushq % rbp . cfi_def_cfa_offset 16 . cfi_offset % rbp , - 16 movq % rsp , % rbp . cfi_def_cfa_register % rbp pushq % r14 pushq % rbx . cfi_offset % rbx , - 32 . cfi_offset % r14 , - 24 leaq L_ . str (% rip ), % r14 xorl % ebx , % ebx Ltmp0 : ##DEBUG_VALUE: i <- 0 . p2align 4 , 0x90 LBB0_1 : ## =>This Inner Loop Header: Depth=1 ##DEBUG_VALUE: loop_body:i <- %ebx ##DEBUG_VALUE: i <- %ebx . loc 2 24 2 prologue_end ## /Users/grahamcox/Projects/test/test/forloop.c:24:2 xorl % eax , % eax movq % r14 , % rdi movl % ebx , % esi callq _printf Ltmp1 : . loc 2 14 27 ## /Users/grahamcox/Projects/test/test/forloop.c:14:27 incl % ebx Ltmp2 : ##DEBUG_VALUE: i <- %ebx . loc 2 14 20 is_stmt 0 ## /Users/grahamcox/Projects/test/test/forloop.c:14:20 cmpl $100 , % ebx Ltmp3 : . loc 2 14 2 ## /Users/grahamcox/Projects/test/test/forloop.c:14:2 jne LBB0_1 Ltmp4 : ## %bb.2: . loc 2 18 1 is_stmt 1 ## /Users/grahamcox/Projects/test/test/forloop.c:18:1 popq % rbx popq % r14 popq % rbp retq Ltmp5 : Lfunc_end0 : . cfi_endproc

This is assembled with optimization at max, which makes it as short and efficient as it thinks it knows how. Note, for example, that the call out to loop_body has been inlined (line 28).

Some of this is just debugging hints, such as the various ‘.loc’ lines, so they’re not part of the executable code itself. But they do give you a link to what part of the source code the assembled code is derived from.

Not all machine code is this difficult. Back in the day when the 6502 was popular, a ‘for’ loop like this could be written in 4 or 5 instructions. This was just as well when you consider 1MHz was considered ‘fast’.

LDA #100; TAX ; LOOP : JSR LOOP_BODY ; DEX ; CPX #0; BNE LOOP ;

(n.b. this might not be genuine 6502 assembly code -But it follows the general idea). Hope its useful.