DDIQ: Why do we need to store the identity hash code? How does this affect the user-specified hash code?

Hash codes are supposed to have two properties: a) good distribution, meaning the values for distinct objects are more or less distinct; b) idempotence, meaning having the same hash code for the objects that have the same key object components. Note the latter implies that if object had not changed those key object components, its hash code should not change as well.

It is a frequent source of bugs to change the object in such a way that its hashCode changes after it was used. For example, adding the object to a HashMap as key, then changing its fields so that hashCode mutates as well would lead to surprising behaviors: the object might not be found in the map at all, because internal implementation would look in the "wrong" bucket. Likewise, it is a frequent source of performance anomalies to have badly distributed hash codes, for example returning a constant value.

For user-specified hash code, both properties are achieved by computing it over the set of user-selected fields. With enough variety of fields and field values, it would be well distributed, and by computing it over the unchanged (for example, final ) fields we get idempotence. In this case, we don’t need to store the hash code anywhere. Some hash code implementations may choose to cache it in another field, but that is not required.

For identity hash code, there is no guarantee there are fields to compute the hash code from, and even if we have some, then it is unknown how stable those fields actually are. Consider java.lang.Object that does not have fields: what’s its hash code? Two allocated Object -s are pretty much the mirrors of each other: they have the same metadata, they have the same (that is, empty) contents. The only distinct thing about them is their allocated address, but even then there are two troubles. First, addresses have very low entropy, especially coming from a bump-ptr allocator like most Java GCs employ, so it is not well distributed. Second, GC moves the objects, so address is not idempotent. Returning a constant value is a no-go from performance standpoint.