In theory, (JIT) compiler can know that the object is not null and elide the runtime null check, for example when something is constant :

If that does not work, for example when the nullity cannot be inferred automatically, compilers can also employ dataflow analysis to remove the successive null checks after first null check for the object was done. For example:

Those optimizations are very useful, but quite boring, and they don’t solve the need for null checks in all other cases.

Fortunately, there is even a smarter way to do this: let the user code access the object without the explicit check ! Most of the time, nothing bad is going to happen, as most object accesses do not ever see the null object. But we still need to handle the corner case when the null access does happen. When it does, the JVM can intercept the resulting SIGSEGV ("Signal: Segmentation Fault"), look at the return address for that signal, and figure out where that access was made in the generated code. Once it figures that bit out, it can then know where to dispatch the control to handle this case — in most cases, throwing NullPointerException or branching somewhere.

This mechanism is known in Hotspot under the name "implicit null checks" . It was recently added to LLVM under the similar name , to cater for the same use case.

Practice

Consider this cunningly simple JMH benchmark:

import org.openjdk.jmh.annotations.*; import java.util.concurrent.TimeUnit; @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Fork(value = 3, jvmArgsAppend = {"-XX:LoopUnrollLimit=1"}) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ImplicitNP { @Param({"false", "true"}) boolean blowup; volatile Holder h; int itCnt; @Setup public void setup() { h = null; if (blowup && ++itCnt == 3) { // blow it up on 3-rd iteration for (int c = 0; c < 10000; c++) { try { test(); } catch (NullPointerException npe) { // swallow } } System.out.print("Boom! "); } h = new Holder(); } @CompilerControl(CompilerControl.Mode.DONT_INLINE) @Benchmark public int test() { int sum = 0; for (int c = 0; c < 100; c++) { sum += h.x; } return sum; } static class Holder { int x; } }

On the surface, this benchmark is simple: it performs the 100x integer addition.

Methodology-wise, this benchmark is cunning in several ways:

It is parametrized by blowup flag that would expose null object to test() method at the 3-rd iteration when blowup = true , and leave it alone otherwise. It uses the looping in benchmark-unsafe manner. That is mitigated by asking Hotspot to not to unroll the loops with LoopUnrollLimit . It accesses the same object over and over again. A smart optimizer would be able to hoist the load of h outside the loop, and then aggressively optimize. This is mitigated by declaring h as volatile : unless we are dealing with a God-like-smart optimizer, this is enough to break hoisting. It uses compiler hints to break inlining for test . This is not, strictly speaking, needed for this benchmark, but it is safety measure. The reasoning goes as follows: the test relies on profiling information for test , and smarter compilers can use caller-callee profiles to split the profile between the version called from setup() , and from the benchmark loop itself.

Out of the curiosity, with recent 8u232, it yields the following result:

Benchmark (blowup) Mode Cnt Score Error Units ImplicitNP.test false avgt 15 40.417 ± 0.030 ns/op ImplicitNP.test true avgt 15 63.187 ± 0.156 ns/op

Absolute numbers do not matter much here, the important bit is that one of the cases is much faster than the other one. The blowup = false case is significantly faster here. If we drill down into why, we would probably start with characterizing it with the help of -prof perfnorm , which can show the low-level machine counters for both tests:

Benchmark (blowup) Mode Cnt Score Error Units ImplicitNP.test false avgt 15 40.484 ± 0.090 ns/op ImplicitNP.test:L1-dcache-loads false avgt 3 206.606 ± 24.336 #/op ImplicitNP.test:L1-dcache-stores false avgt 3 5.861 ± 0.426 #/op ImplicitNP.test:branches false avgt 3 102.972 ± 13.679 #/op ImplicitNP.test:cycles false avgt 3 141.252 ± 22.330 #/op ImplicitNP.test:instructions false avgt 3 521.998 ± 87.292 #/op ImplicitNP.test true avgt 15 63.254 ± 0.047 ns/op ImplicitNP.test:L1-dcache-loads true avgt 3 206.154 ± 15.231 #/op ImplicitNP.test:L1-dcache-stores true avgt 3 4.971 ± 0.677 #/op ImplicitNP.test:branches true avgt 3 199.993 ± 20.805 #/op ; +100 branches ImplicitNP.test:cycles true avgt 3 221.388 ± 13.126 #/op ; +80 cycles ImplicitNP.test:instructions true avgt 3 714.439 ± 64.476 #/op ; +190 insns

So, we are hunting some excess branches. Note we had the loop with 100 iterations, so there must be the excess branch per iteration? Also, we have about 200 excess instructions, which makes sense as "branch" is really the test and jcc on x86_64.

Now that we have that hypothesis, let’s see the actual hot code for both cases, with the help of -prof perfasm . The highly edited snippets are below.

First, blowup = false case:

... 1.71% ↗ 0x...020: mov 0x10(%rsi),%r11d ; get field "h" 9.19% │ 0x...024: add 0xc(%r12,%r11,8),%eax ; sum += h.x │ ; implicit exception: │ ; dispatches to 0x...03e 59.60% │ 0x...029: inc %r10d ; increment "c" and loop 0.02% │ 0x...02c: cmp $0x64,%r10d ╰ 0x...030: jl 0x...d204020 4.57% 0x...032: add $0x10,%rsp 3.16% 0x...036: pop %rbp 3.37% 0x...037: test %eax,0x16a18fc3(%rip) 0x...03d: retq 0x...03e: mov $0xfffffff6,%esi 0x...043: callq 0x00007f8aed0453e0 ; <uncommon trap> ...

Here, we can see a very tight loop, and the instruction at 0x…​024 combines the compressed reference decoding of h , the access to h.x , and the implicit null check. We do not pay with any additional instructions to check h for nullity.

The implicit exception: dispatches to 0x…​03e line is the part of VM output that says VM knows SEGV exception coming from that instruction had actually failed null check. The JVM signal handler would then do its bidding and dispatch the control to 0x…​03e , which would then go on to throwing the exception.

Of course, if null -s are frequent on that path, going via the signal handler every time is rather slow. For our current case, we could have said that throwing the exception would still be heavy, but it runs into two logistical problems. First, even though exceptions are sometimes slow, there is no reason to make them even slower if we can avoid it. Second, we would like to deal with user-written null-checks using the same machinery, and users would not like their simple if (h == null) { …​ } else { …​ } branches run dramatically worse depending on the nullity of h . Therefore, we would like to use implicit null-checks only when the frequency of actual null -s is very low.

Luckily, the JVM can compile the code knowing the runtime profile. That is, when JIT compiler decides whether to emit the implicit null check, it can look into profile and see if the object was ever null . Moreover, even if it does emit the implicit null check, it can recompile the code later when that optimistic assumption about null frequency is violated. blowup = true case specifically violates that assumption by feeding null to our code. As the result, the JVM recompiles the whole thing into:

... 11.36% ↗ 0x...bd1: mov 0x10(%rsi),%r11d ; get field "h" 12.81% │ 0x...bd5: test %r11d,%r11d ; EXPLICIT NULL CHECK 0.02% ╭│ 0x...bd8: je 0x...bf4 17.23% ││ 0x...bda: add 0xc(%r12,%r11,8),%eax ; sum += h.x 25.07% ││ 0x...bdf: inc %r10d ; increment "c" and loop 8.70% ││ 0x...be2: cmp $0x64,%r10d 0.02% │╰ 0x...be6: jl 0x...bd1 3.31% │ 0x...be8: add $0x10,%rsp 2.49% │ 0x...bec: pop %rbp 2.72% │ 0x...bed: test %eax,0x160e640d(%rip) │ 0x...bf3: retq ↘ 0x...bf4: movabs $0x7821044f8,%rsi ; <preallocated NullPointerException> 0x...bfe: mov %r12d,0x10(%rsi) ; WTF 0x...c02: add $0x10,%rsp 0x...c06: pop %rbp 0x...c07: jmpq 0x00007f887d1053a0 ; throw_exception ...

Bam! There is the explicit null check in the generated code now! Implicit null check turned itself into explicit one, without user intervention.

You can see that in flight when looking into the full benchmark log:

# JMH version: 1.22 # VM version: JDK 1.8.0_232, OpenJDK 64-Bit Server VM, 25.232-b09 # VM options: -XX:LoopUnrollLimit=1 # Warmup: 5 iterations, 1 s each # Measurement: 5 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: org.openjdk.ImplicitNP.test # Parameters: (blowup = true) # Run progress: 50.00% complete, ETA 00:00:30 # Fork: 1 of 3 Warmup Iteration 1: 40.900 ns/op Warmup Iteration 2: 40.698 ns/op Warmup Iteration 3: Boom! 63.157 ns/op // <--- recompilation happened here Warmup Iteration 4: 63.158 ns/op Warmup Iteration 5: 63.130 ns/op Iteration 1: 63.188 ns/op Iteration 2: 63.208 ns/op Iteration 3: 63.128 ns/op Iteration 4: 63.137 ns/op Iteration 5: 63.143 ns/op