$\begingroup$

There are a few ways of looking at this. To make this easier I'm going to work describe things in terms of HDBSCAN*, a succesor to OPTICS. HDBSCAN* is essentially the same as OPTICS by removes the $\varepsilon$ parameter (or, if you like, fixes $\varepsilon = \infty$). HDBSCAN* also has a different approach for cluster extraction, but that's not overly relevant here. Thus under HDBSCAN* we define

$\text{core-dist}_\mathrm{MinPts}(p) = \text{Distance the the MinPts nearest neighbor of } p$

$\text{mutual-reachability-dist}_\mathrm{MinPts}(o, p) = \\ \qquad\max(\text{core-dist}_\mathrm{MinPts}(p), \text{core-dist}_\mathrm{MinPts}(o), \mathrm{dist}(p,o))$

much the same as OPTICS, but now we don't require any undefined values (we also include the core-distance of $o$ to ensure that mutual reachability is a metric).

One can view both OPTICS and HDBSCAN* as an extension of DBSCAN. DBSCAN takes two parameters MinPts and $\epsilon$ (note that this is a different use of epsilon than OPTICS -- if we focus on HDBSCAN* which has no such parameter we can avoid confusion: epsilon will refer only to the parameter used in DBSCAN).

HDBSCAN* generates a complete hierarchy of clusterings for a range of possible $\epsilon$ values, and thus for any fixed $\epsilon$ value you want the clustering at that level in the hierarchy to be the clustering DBSCAN would give for that $\epsilon$ value (OPTICS works similarly, but simply constrains the range of value that $\epsilon$ may take). To be in a cluster in DBSCAN you must be a core-point (I'm going to ignore border points for now); that is, the point must have at least MinPts other points within a ball of radius $\epsilon$ of itself. If a point isn't a core-point it is noise. The HDBSCAN* (and OPTICS*) reachability-distance captures this distinction by ensuring a point is not joined into a cluster until the DBSCAN $\epsilon$ value is such that the point is within the relevant distance of the other points in the cluster and the point is a core-point at that DBSCAN $\epsilon$ value. The alternative reachability-distance you suggest covers the first part, but not the second: noise points would be included in clusters as long as they were close enough -- and by transitive chaining this could result in clusters merging together prematurely via noise points.

An alternative way of looking at this is by comparing with other algorithms like Robust Single Linkage (which also produces a cluster hierarchy). Robust Single Linkage aims to improve single linkage clustering by making it more robust to noise. In single linkage at each distance-scale $r$ you form a graph by joining points with an edge if they are within $r$ of each other, and then the clustering at level $r$ of the hierarchy is simply the connected components of that graph. The problem is that single linkage is very susceptible to noise -- a point or two in the wrong place can join clusters together when perhaps they shouldn't be. Robust Single Linkage attempts to fix this by requiring a point to have at least $k$ neighbors within $\alpha r$ of it before adding any edges to it. This discounts sparse points until larger distance-scales and makes the approach more robust to noise. It can be demonstrated that this approach provides convergence to the level set tree of the PDF that generated the data -- i.e. it does a good job of fixing the noise issues! If we fix $\alpha = 1$ and $k = \mathrm{MinPts}$ then we recover HDBSCAN*; if we fix $r \leq \varepsilon$ then we get essentially OPTICS. Again we are using core-distance to discount noise, and make the clustering more noise robust.