When I wrote this answer, I was only looking at the title question about < vs. <= in general, not the specific example of a constant a < 901 vs. a <= 900 . Many compilers always shrink the magnitude of constants by converting between < and <= , e.g. because x86 immediate operand have a shorter 1-byte encoding for -128..127.

For ARM and especially AArch64, being able to encode as an immediate depends on being able to rotate a narrow field into any position in a word. So cmp w0, #0x00f000 would be encodeable, while cmp w0, #0x00effff might not be. So the make-it-smaller rule for comparison vs. a compile-time constant doesn't always apply for AArch64.

< vs. <= in general, including for runtime-variable conditions

In assembly language on most machines, a comparison for <= has the same cost as a comparison for < . This applies whether you're branching on it, booleanizing it to create a 0/1 integer, or using it as a predicate for a branchless select operation (like x86 CMOV). The other answers have only addressed this part of the question.

But this question is about the C++ operators, the input to the optimizer. Normally they're both equally efficient; the advice from the book sounds totally bogus because compilers can always transform the comparison that they implement in asm. But there is at least one exception where using <= can accidentally create something the compiler can't optimize.

As a loop condition, there are cases where <= is qualitatively different from < , when it stops the compiler from proving that a loop is not infinite. This can make a big difference, disabling auto-vectorization.

Unsigned overflow is well-defined as base-2 wrap around, unlike signed overflow (UB). Signed loop counters are generally safe from this with compilers that optimize based on signed-overflow UB not happening: ++i <= size will always eventually become false. (What Every C Programmer Should Know About Undefined Behavior)

void foo(unsigned size) { unsigned upper_bound = size - 1; // or any calculation that could produce UINT_MAX for(unsigned i=0 ; i <= upper_bound ; i++) ...

Compilers can only optimize in ways that preserve the (defined and legally observable) behaviour of the C++ source for all possible input values, except ones that lead to undefined behaviour.

(A simple i <= size would create the problem too, but I thought calculating an upper bound was a more realistic example of accidentally introducing the possibility of an infinite loop for an input you don't care about but which the compiler must consider.)

In this case, size=0 leads to upper_bound=UINT_MAX , and i <= UINT_MAX is always true. So this loop is infinite for size=0 , and the compiler has to respect that even though you as the programmer probably never intend to pass size=0. If the compiler can inline this function into a caller where it can prove that size=0 is impossible, then great, it can optimize like it could for i < size .

Asm like if(!size) skip the loop; do{...}while(--size); is one normally-efficient way to optimize a for( i<size ) loop, if the actual value of i isn't needed inside the loop (Why are loops always compiled into "do...while" style (tail jump)?).

But that do{}while can't be infinite: if entered with size==0 , we get 2^n iterations. (Iterating over all unsigned integers in a for loop C makes it possible to express a loop over all unsigned integers including zero, but it's not easy without a carry flag the way it is in asm.)

With wraparound of the loop counter being a possibility, modern compilers often just "give up", and don't optimize nearly as aggressively.

Example: sum of integers from 1 to n

Using unsigned i <= n defeats clang's idiom-recognition that optimizes sum(1 .. n) loops with a closed form based on Gauss's n * (n+1) / 2 formula.

unsigned sum_1_to_n_finite(unsigned n) { unsigned total = 0; for (unsigned i = 0 ; i < n+1 ; ++i) total += i; return total; }

x86-64 asm from clang7.0 and gcc8.2 on the Godbolt compiler explorer

# clang7.0 -O3 closed-form cmp edi, -1 # n passed in EDI: x86-64 System V calling convention je .LBB1_1 # if (n == UINT_MAX) return 0; // C++ loop runs 0 times # else fall through into the closed-form calc mov ecx, edi # zero-extend n into RCX lea eax, [rdi - 1] # n-1 imul rax, rcx # n * (n-1) # 64-bit shr rax # n * (n-1) / 2 add eax, edi # n + (stuff / 2) = n * (n+1) / 2 # truncated to 32-bit ret # computed without possible overflow of the product before right shifting .LBB1_1: xor eax, eax ret

But for the naive version, we just get a dumb loop from clang.

unsigned sum_1_to_n_naive(unsigned n) { unsigned total = 0; for (unsigned i = 0 ; i<=n ; ++i) total += i; return total; }

# clang7.0 -O3 sum_1_to_n(unsigned int): xor ecx, ecx # i = 0 xor eax, eax # retval = 0 .LBB0_1: # do { add eax, ecx # retval += i add ecx, 1 # ++1 cmp ecx, edi jbe .LBB0_1 # } while( i<n ); ret

GCC doesn't use a closed-form either way, so the choice of loop condition doesn't really hurt it; it auto-vectorizes with SIMD integer addition, running 4 i values in parallel in the elements of an XMM register.

# "naive" inner loop .L3: add eax, 1 # do { paddd xmm0, xmm1 # vect_total_4.6, vect_vec_iv_.5 paddd xmm1, xmm2 # vect_vec_iv_.5, tmp114 cmp edx, eax # bnd.1, ivtmp.14 # bound and induction-variable tmp, I think. ja .L3 #, # }while( n > i ) "finite" inner loop # before the loop: # xmm0 = 0 = totals # xmm1 = {0,1,2,3} = i # xmm2 = set1_epi32(4) .L13: # do { add eax, 1 # i++ paddd xmm0, xmm1 # total[0..3] += i[0..3] paddd xmm1, xmm2 # i[0..3] += 4 cmp eax, edx jne .L13 # }while( i != upper_limit ); then horizontal sum xmm0 and peeled cleanup for the last n%3 iterations, or something.

It also has a plain scalar loop which I think it uses for very small n , and/or for the infinite loop case.