"sub eax, 128" -> "add eax, -128". What's the improvement here ?

It's a space optimization. Sometimes shorter code leads to faster code because it frees up more space in the code cache. The trick is: -128 fits in a signed char; +128 does not. Here are some examples, along with their x86 machine code representation:

sub eax,+128 2D 80 00 00 00 add eax,-128 83 C0 80 sub ebx,+128 81 EB 80 00 00 00 add ebx,-128 83 C3 80