Having recently come across a method for automatic differentiation on sigfpe’s cornucopia of amazingly cool stuff masquerading as a blog, I decided to start playing around with it a little to see what might come out.

So, let’s say we take the standard definition of the derivative,

,

look at it for a bit, and decide that we don’t like the limit symbol, and that, in fact, we’re going to drop it entirely. After rearranging, we would then obtain the weird-looking f(x) + d f'(x) = f(x+d), and, presumably, set out to find what d could possibly look like.

To this end, we might expand f(x+d) about x, which yields

and, after subtracting f(x) + d f'(x) from both sides, degenerates into

It appears that d^n should be zero for all n>1. To see this, we can plug in f(x) = e^x. The coefficients of d^n become constants, and, dividing both sides by e^x, we obtain d^2/2! + d^3/3! + … = 0, which is the MacLaurin series for e^d with the first two terms missing, so e^d=d+1. Ignoring, momentarily, the troublesome fact that this gives d=0 in the reals, we certainly at least have d^2=0 (pardon the handwaving).

But in order for f(x) + d f'(x) = f(x+d) to be a remotely interesting statement, we must also have d != 0, and we want d to be unique. Given some structure whose objects we’d care to differentiate, we’re going to cheat a little, and extend it with the element d such that d != 0, but d^2 = 0. Having allowed such a number, all the weirdness disappears, the equality f(x+d) = f(x) + d f'(x) holds, and we’re left with a sort of infinitesimal constant which we’re free to plug into random things.

To test whether this makes any sense, we might start by taking the quadratic Q(x) = c_2 x^2 + c_1 x + c_0, and computing Q(x+d):

The derivative of Q(x) “fell out” into the coefficient of d. How about e^{x+d}?

Anything else? Let’s see. By the binomial theorem,

.

The d^{n-k} factor vanishes whenever n >= k+2, so



and we’ve just obtained the power law.

Trigonometric functions:

Using the power law experiment from earlier, we get



Plotting coefficients of d along the vertical axis, and the reals along the horizontal, we get the unit circle, while exponential maps are lines through the origin, which is kind of cool in and out of itself.

Ratios:

One useful thing here is that d is small enough to be in any radius of convergence, so we get logarithms “for free”:

So, what’s the point? The point is that we automatically obtain the derivative as a side effect of any computation, since f(x+d) = f(x) + d f'(x). In other words, by switching to dual numbers (numbers of the form x+y d) for things that need to be differentiated (and making it transparent through operator overloading), we can ask for the derivative of any differentiable function we’ve ever defined, and it’ll be evaluated symbolically.

To that end, here’s the first approximation in Haskell.



module Diff where

data Diffable a = !a :+ a

funcPart :: Diffable a -> a

funcPart (x :+ x’) = x

diffPart :: Diffable a -> a

diffPart (x :+ x’) = x’

instance (RealFloat a) => Eq (Diffable a) where

(x :+ x’) == (y :+ y’) = (x == y) && (x’ == y’)

instance (RealFloat a) => Show (Diffable a) where

show (x :+ x’) = show (x, x’)

instance (RealFloat a) => Num (Diffable a) where

(x :+ x’) + (y :+ y’) = (x + y) :+ (x’ + y’)

(x :+ x’) * (y :+ y’) = (x * y) :+ (x’ * y + y’ * x)

abs (x :+ x’) = if x < 0 then ((-x) :+ (-x’)) else (x :+ x’)

signum (x :+ x’) = (signum x) :+ 0

fromInteger x = (fromInteger x) :+ 1

instance (RealFloat a) => Fractional (Diffable a) where

fromRational x = (fromRational x) :+ 1

recip (x :+ x’) = (recip x) :+ (negate x’ / (x^2))

instance (RealFloat a) => Floating (Diffable a) where

pi = pi :+ 0

exp (x :+ x’) = (exp x) :+ (x’ * exp x)

log (x :+ x’) = (log x) :+ (x’ * recip x)

sin (x :+ x’) = (sin x) :+ (x’ * cos x)

cos (x :+ x’) = (cos x) :+ (x’ * (negate $ sin x))

sinh (x :+ x’) = (sinh x) :+ (x’ * cosh x)

cosh (x :+ x’) = (cosh x) :+ (x’ * sinh x)

asin (x :+ x’) = (asin x) :+ (x’ / (sqrt $ 1-x^2))

acos (x :+ x’) = (acos x) :+ (x’ / (negate (sqrt $ 1-x^2)))

atan (x :+ x’) = (atan x) :+ (x’ / (x^2+1))

asinh (x :+ x’) = (asinh x) :+ (x’ / (sqrt $ x^2+1))

acosh (x :+ x’) = (acosh x) :+ (x’ / (negate (sqrt $ x^2-1)))

atanh (x :+ x’) = (atanh x) :+ (x’ / (1 – x^2))



And now let’s try a sample GHCI session:

Derivatives of x^2 for x in {0, 0.5, …, 5}:



*Diff> map (diffPart . (^2) . fromRational) [0,0.5..5]

[0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]



Derivatives of cos^2 x + sin^2 x (should be identically zero):



*Diff> map (diffPart . (\x -> (cos x)^2 + (sin x)^2) . fromRational) [0..10]

[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]



So there we go, AD in half a page of code.