This PEP proposes the addition of an isclose() function to the standard library math module that determines whether one value is approximately equal or "close" to another value.

These existing implementations indicate that this is a common need and not trivial to write oneself, making it a candidate for the standard library.

Many other non-python systems provide such a test, including the Boost C++ library and the APL language .

One can also find discussion and sample implementations on Stack Overflow and other help sites.

The numpy package has the allclose() and isclose() functions, but they are only available with numpy.

Floating point values contain limited precision, which results in their being unable to exactly represent some values, and for errors to accumulate with repeated computation. As a result, it is common advice to only use an equality comparison in very specific situations. Often an inequality comparison fits the bill, but there are times (often in testing) where the programmer wants to determine whether a computed value is "close" to an expected value, without requiring them to be exactly equal. This is common enough, particularly in testing, and not always obvious how to do it, that it would be useful addition to the standard library.

There are also drafts of this PEP and test code, etc. there:

This implementation has a flag that lets the user select which relative tolerance test to apply -- this PEP does not suggest that that be retained, but rather that the weak test be selected.

If the user sets the rel_tol parameter to 0.0, then only the absolute tolerance will effect the result. While not the goal of the function, it does allow it to be used as a purely absolute tolerance check as well.

NOTE: that the author of this PEP has resolved to go back over many of his tests that use the numpy allclose() function, which provides a default absolute tolerance, and make sure that the default value is appropriate.

Any non-zero default might result in user's tests passing totally inappropriately. If, on the other hand, a test against zero fails the first time with defaults, a user will be prompted to select an appropriate value for the problem at hand in order to get the test to pass.

The default absolute tolerance value is set to zero because there is no value that is appropriate for the general case. It is impossible to know an appropriate value without knowing the likely values expected for a given use case. If all the values tested are on order of one, then a value of about 1e-9 might be appropriate, but that would be far too large if expected values are on order of 1e-9 or smaller.

To handle this case, an optional parameter, abs_tol can be used to set a minimum tolerance used in the case of very small or zero computed relative tolerance. That is, the values will be always be considered close if the difference between them is less than abs_tol

There is a similar issue if the two values to be compared straddle zero: if a is approximately equal to -b, then a and b will never be computed as "close".

However, while mathematically correct, there are many use cases where a user will need to know if a computed value is "close" to zero. This calls for an absolute tolerance test. If the user needs to call this function inside a loop or comprehension, where some, but not all, of the expected values may be zero, it is important that both a relative tolerance and absolute tolerance can be tested for with a single function with a single set of parameters.

Relative comparison is problematic if either value is zero. By definition, no value is small relative to zero. And computationally, if either value is zero, the difference is the absolute value of the other value, and the computed absolute tolerance will be rel_tol times that value. When rel_tol is less than one, the difference will never be less than the tolerance.

NOTE: it may make sense to add a Decimal.isclose() that works properly and completely with the decimal type, but that is not included as part of this PEP.

The code will be tested to accommodate at least some values of these types:

The primary use-case is expected to be floating point numbers. However, users may want to compare other numeric types similarly. In theory, it should work for any type that supports abs() , multiplication, comparisons, and subtraction. However, the implementation in the math module is written in C, and thus can not (easily) use python's duck typing. Rather, the values passed into the funciton will be converted to the float type before the calculation is performed. Passing in types (or values) that cannot be converted to floats will raise an appropriate Exception (TypeError, ValueError, or OverflowError).

The IEEE 754 special values of NaN, inf, and -inf will be handled according to IEEE rules. Specifically, NaN is not considered close to any other value, including NaN. inf and -inf are only considered close to themselves.

The name, isclose , is selected for consistency with the existing isnan and isinf .

rel_tol : is the relative tolerance -- it is the amount of error allowed, relative to the larger absolute value of a or b. For example, to set a tolerance of 5%, pass tol=0.05. The default tolerance is 1e-9, which assures that the two values are the same within about 9 decimal digits. rel_tol must be greater than 0.0

a and b : are the two values to be tested to relative closeness

The new function will go into the math module, and have the following signature:

There are essentially two ways to think about how close two numbers are to each-other:

Absolute difference: simply abs(a-b)

Relative difference: abs(a-b)/scale_factor .

The absolute difference is trivial enough that this proposal focuses on the relative difference.

Usually, the scale factor is some function of the values under consideration, for instance:

The absolute value of one of the input values The maximum absolute value of the two The minimum absolute value of the two. The absolute value of the arithmetic mean of the two

These leads to the following possibilities for determining if two values, a and b, are close to each other.

abs(a-b) <= tol*abs(a) abs(a-b) <= tol * max( abs(a), abs(b) ) abs(a-b) <= tol * min( abs(a), abs(b) ) abs(a-b) <= tol * (a + b)/2

NOTE: (2) and (3) can also be written as:

(abs(a-b) <= abs(tol*a)) or (abs(a-b) <= abs(tol*b)) (abs(a-b) <= abs(tol*a)) and (abs(a-b) <= abs(tol*b))

(Boost refers to these as the "weak" and "strong" formulations ) These can be a tiny bit more computationally efficient, and thus are used in the example code.

Each of these formulations can lead to slightly different results. However, if the tolerance value is small, the differences are quite small. In fact, often less than available floating point precision.

How much difference does it make? When selecting a method to determine closeness, one might want to know how much of a difference it could make to use one test or the other -- i.e. how many values are there (or what range of values) that will pass one test, but not the other. The largest difference is between options (2) and (3) where the allowable absolute difference is scaled by either the larger or smaller of the values. Define delta to be the difference between the allowable absolute tolerance defined by the larger value and that defined by the smaller value. That is, the amount that the two input values need to be different in order to get a different result from the two tests. tol is the relative tolerance value. Assume that a is the larger value and that both a and b are positive, to make the analysis a bit easier. delta is therefore: delta = tol * (a-b) or: delta / tol = (a-b) The largest absolute difference that would pass the test: (a-b) , equals the tolerance times the larger value: (a-b) = tol * a Substituting into the expression for delta: delta / tol = tol * a so: delta = tol**2 * a For example, for a = 10 , b = 9 , tol = 0.1 (10%): maximum tolerance tol * a == 0.1 * 10 == 1.0 minimum tolerance tol * b == 0.1 * 9.0 == 0.9 delta = (1.0 - 0.9) = 0.1 or tol**2 * a = 0.1**2 * 10 = .1 The absolute difference between the maximum and minimum tolerance tests in this case could be substantial. However, the primary use case for the proposed function is testing the results of computations. In that case a relative tolerance is likely to be selected of much smaller magnitude. For example, a relative tolerance of 1e-8 is about half the precision available in a python float. In that case, the difference between the two tests is 1e-8**2 * a or 1e-16 * a , which is close to the limit of precision of a python float. If the relative tolerance is set to the proposed default of 1e-9 (or smaller), the difference between the two tests will be lost to the limits of precision of floating point. That is, each of the four methods will yield exactly the same results for all values of a and b. In addition, in common use, tolerances are defined to 1 significant figure -- that is, 1e-9 is specifying about 9 decimal digits of accuracy. So the difference between the various possible tests is well below the precision to which the tolerance is specified.

Symmetry A relative comparison can be either symmetric or non-symmetric. For a symmetric algorithm: isclose(a,b) is always the same as isclose(b,a) If a relative closeness test uses only one of the values (such as (1) above), then the result is asymmetric, i.e. isclose(a,b) is not necessarily the same as isclose(b,a). Which approach is most appropriate depends on what question is being asked. If the question is: "are these two numbers close to each other?", there is no obvious ordering, and a symmetric test is most appropriate. However, if the question is: "Is the computed value within x% of this known value?", then it is appropriate to scale the tolerance to the known value, and an asymmetric test is most appropriate. From the previous section, it is clear that either approach would yield the same or similar results in the common use cases. In that case, the goal of this proposal is to provide a function that is least likely to produce surprising results. The symmetric approach provide an appealing consistency -- it mirrors the symmetry of equality, and is less likely to confuse people. A symmetric test also relieves the user of the need to think about the order in which to set the arguments. It was also pointed out that there may be some cases where the order of evaluation may not be well defined, for instance in the case of comparing a set of values all against each other. There may be cases when a user does need to know that a value is within a particular range of a known value. In that case, it is easy enough to simply write the test directly: if a-b <= tol*a: (assuming a > b in this case). There is little need to provide a function for this particular case. This proposal uses a symmetric test.

Which symmetric test? There are three symmetric tests considered: The case that uses the arithmetic mean of the two values requires that the value be either added together before dividing by 2, which could result in extra overflow to inf for very large numbers, or require each value to be divided by two before being added together, which could result in underflow to zero for very small numbers. This effect would only occur at the very limit of float values, but it was decided there was no benefit to the method worth reducing the range of functionality or adding the complexity of checking values to determine the order of computation. This leaves the boost "weak" test (2)-- or using the larger value to scale the tolerance, or the Boost "strong" (3) test, which uses the smaller of the values to scale the tolerance. For small tolerance, they yield the same result, but this proposal uses the boost "weak" test case: it is symmetric and provides a more useful result for very large tolerances.