Consider this simple JMH benchmark. We increment something with and without synchronization on new object:

import org.openjdk.jmh.annotations.*; @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Fork(3) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class LockElision { int x; @Benchmark public void baseline() { x++; } @Benchmark public void locked() { synchronized (new Object()) { x++; } } }

If we run this test, and enable -prof perfnorm profiler right away, this is what we shall see:

Benchmark Mode Cnt Score Error Units LockElision.baseline avgt 15 0.268 ± 0.001 ns/op LockElision.baseline:CPI avgt 3 0.200 ± 0.009 #/op LockElision.baseline:L1-dcache-loads avgt 3 2.035 ± 0.101 #/op LockElision.baseline:L1-dcache-stores avgt 3 ≈ 10⁻³ #/op LockElision.baseline:branches avgt 3 1.016 ± 0.046 #/op LockElision.baseline:cycles avgt 3 1.017 ± 0.024 #/op LockElision.baseline:instructions avgt 3 5.076 ± 0.346 #/op LockElision.locked avgt 15 0.268 ± 0.001 ns/op LockElision.locked:CPI avgt 3 0.200 ± 0.005 #/op LockElision.locked:L1-dcache-loads avgt 3 2.024 ± 0.237 #/op LockElision.locked:L1-dcache-stores avgt 3 ≈ 10⁻³ #/op LockElision.locked:branches avgt 3 1.014 ± 0.047 #/op LockElision.locked:cycles avgt 3 1.015 ± 0.012 #/op LockElision.locked:instructions avgt 3 5.062 ± 0.154 #/op

Whoa, the tests perform exactly the same: timing is the same, the number of loads, stores, cycles, instructions are the same. With high probability, this means that the generated code is the same. Indeed it is, and looks like this:

14.50% 16.97% ↗ incl 0xc(%r8) ; increment field 76.82% 76.05% │ movzbl 0x94(%r9),%r10d ; JMH infra: do another @Benchmark 0.83% 0.10% │ add $0x1,%rbp 0.47% 0.78% │ test %eax,0x15ec6bba(%rip) 0.47% 0.36% │ test %r10d,%r10d ╰ je BACK

The lock is completely elided, there is nothing left out of allocation, out of synchronization, nothing. If we supply JVM flag -XX:-EliminateLocks , or we disable EA with -XX:-DoEscapeAnalysis (that breaks every optimization that depends on EA, including lock elision), then locked counters would balloon up:

Benchmark Mode Cnt Score Error Units LockElision.baseline avgt 15 0.268 ± 0.001 ns/op LockElision.baseline:CPI avgt 3 0.200 ± 0.001 #/op LockElision.baseline:L1-dcache-loads avgt 3 2.029 ± 0.082 #/op LockElision.baseline:L1-dcache-stores avgt 3 0.001 ± 0.001 #/op LockElision.baseline:branches avgt 3 1.016 ± 0.028 #/op LockElision.baseline:cycles avgt 3 1.015 ± 0.014 #/op LockElision.baseline:instructions avgt 3 5.078 ± 0.097 #/op LockElision.locked avgt 15 11.590 ± 0.009 ns/op LockElision.locked:CPI avgt 3 0.998 ± 0.208 #/op LockElision.locked:L1-dcache-loads avgt 3 11.872 ± 0.686 #/op LockElision.locked:L1-dcache-stores avgt 3 5.024 ± 1.019 #/op LockElision.locked:branches avgt 3 9.027 ± 1.840 #/op LockElision.locked:cycles avgt 3 44.236 ± 3.364 #/op LockElision.locked:instructions avgt 3 44.307 ± 9.954 #/op