Most of the time we do new MyClass() , the runtime environment has to allocate storage for the instance in question. The textbook GC (memory manager) interface for allocation is very simple:

ref Allocate(T type); ref AllocateArray(T type, int size);

Of course, since memory managers are usually written in the language different from the language runtime is targeted by (e.g. Java targets JVM, yet HotSpot JVM is written in C++), the interface gets murkier. For example, such a call from Java program needs to transit into native VM code. Does it cost much? Probably. Does the memory manager have to cope with multiple threads begging for memory? For sure.

So to optimize this, we may instead allow threads to allocate the entire blocks of memory for their needs, and only transit to VM to get a new block. In Hotspot, these blocks are called Thread Local Allocation Buffers (TLABs), and there is a sophisticated machinery built to support them. Notice that TLABs are thread-local in the temporal sense, meaning they act like the buffers to accept current allocations. They still are parts of Java heap, the thread can still write the reference to a newly allocated object into the field outside of TLAB, etc.

All known OpenJDK GCs support TLAB allocation. This part of VM code is pretty well shared among them. All Hotspot compilers support TLAB allocation, so you would usually see the generated code for object allocation like this:

0x00007f3e6bb617cc: mov 0x60(%r15),%rax ; TLAB "current" 0x00007f3e6bb617d0: mov %rax,%r10 ; tmp = current 0x00007f3e6bb617d3: add $0x10,%r10 ; tmp += 16 (object size) 0x00007f3e6bb617d7: cmp 0x70(%r15),%r10 ; tmp > tlab_size? 0x00007f3e6bb617db: jae 0x00007f3e6bb61807 ; TLAB is done, jump and request another one 0x00007f3e6bb617dd: mov %r10,0x60(%r15) ; current = tmp (TLAB is fine, alloc!) 0x00007f3e6bb617e1: prefetchnta 0xc0(%r10) ; ... 0x00007f3e6bb617e9: movq $0x1,(%rax) ; store header to (obj+0) 0x00007f3e6bb617f0: movl $0xf80001dd,0x8(%rax) ; store klass to (obj+8) 0x00007f3e6bb617f7: mov %r12d,0xc(%rax) ; zero out the rest of the object

The allocation path is inlined in the generated code, and as such does not require calling into GC to allocate the object. If we are requesting to allocate the object that depletes the TLAB, or the objects is large enough to never fit into the TLAB, then we take a "slow path", and either satisfy the allocation there, or come back with a fresh TLAB. Notice how the most frequent "normal" path is just adding the object size to TLAB current cursor, and then moving on.

This is why this allocation mechanism is sometimes called "pointer bump allocation". Pointer bump requires a contiguous chunk of memory to allocate to, though — which brings back the need for heap compaction. Notice how CMS does free-list allocation in "old" generation, thus enabling concurrent sweep, but it has compacting stop-the-world "young" collections, that benefit from pointer bump allocation! A much lower quantity of objects that survived the young collection would pay the cost of free list allocation.

For the sake of experiment, we can turn TLAB machinery off with -XX:-UseTLAB . Then, all allocations would take into the native method, like this:

- 17.12% 0.00% org.openjdk.All perf-31615.map - 0x7faaa3b2d125 - 16.59% OptoRuntime::new_instance_C - 11.49% InstanceKlass::allocate_instance 2.33% BlahBlahBlahCollectedHeap::mem_allocate <---- entry point to GC 0.35% AllocTracer::send_allocation_outside_tlab_event