Rotation & Quaternions

The rotational algebra of our world is a beautiful bedeviling thing. The reason is that although rotations act on a three dimensional space, when embedded in three dimensions, rotations are not smooth or unique. When represented with Euler angles or matrices, every rotation has multiple representations. Change the order of rotations and you also change the endpoint (rotations are non-commutative) Traveling smoothly along some paths of rotations using a three dimensional embedding, suddenly the third degree of freedom can become inaccessible (the phenomenon of “Gimbal lock”). If you try to define an average or interpolated point-of-view in a naive way (axes=> angles => interpolated angles) you will find gibberish zero axes, and jerky non-smooth behavior.

Pause for a moment and realize that this is a sort of geometry your brain does poorly, imagine trying to right yourself after being dropped randomly into water upside down and backwards. Euler’s theorem tells us you only need to execute a single rotation around one axis to orient yourself, but your brain will invert rotations along a few axes one-after-another. You will perform two or more moves rather than one (getting right side up …then spinning around). You might lose your breath if you fall into the trap that rotations do not commute!

Rotational control doesn’t come naturally, it takes practice

To have smooth topology rotations must be embedded within a four-dimensional hypersphere, so we can forgive your brain. In this space a rotation is a 4-dimensional point, a quaternion, whose components can be thought of as the angle and 3 axis components of the rotation. Given a 3x3 rotation matrix Q, one can parameterize a quaternion (w,x,y,z)

Given any set of orthogonal axes (rows of Q), Euler’s theorem guarantees an axis-angle rotation which can map the natural xyz axes back and forth into the new frame. The formula above yields the natural 4-d form of that rotation.

Now suppose you have two, three or four systems of axes (ax_1, ax_2, ax_3). For example you want to look at the sun then the moon, or you want to fit 4 pretty objects in your field of vision, or define invariant axes for a cloud of points (the reason we use this math in TensorMol). Can you simply average the quaternion components q_av = (ax_1+ax_2+ax_3)/3? Sadly no… You can immediately understand why if you imagine averaging rotations around opposite axis vectors as an owl might when spinning his head. The “good, smooth” quaternions keep to the surface of the 4-d hypersphere (a curvy subset of 4d-euclidean space). To interpolate lines on that sphere, you can use SLERP. To average multiple quaternions we must construct the 4x4 matrix which is the outer product of the list of quaternions (Nx4) with itself, weighted if desired:

The largest eigenvector of this matrix is the desired average quaternion.

Implementing complex math in tf.

Again, my goal is to get rotationally invariant axes for a set of points which smooth, differentiable and local. I will walk through my whole implementation of this in tf.

Step 1- Don’t use tf. Write a simple test of your formulas in a notebook like math interface (mathematica, ipython/sage). Verify everything is working when you use all the fancy library routines tf. doesn’t have (eigenvectors etc.). Here’s what that looks like using mathematica.

Those fancy manipulate sliders are a nice way to get tangible faith that the point cloud is rotationally invariant when transformed using an averaged axis system depending on points in the cloud. It remains for us to do this same thing in tf. We’re ready for step 2:

Step 2: Power up a jupyter notebook and translate into tensorflow. It’s simply faster to avoid integration with the rest of the package while trying to debug some involved set of mathematical manipulations like this. The idea is to make a minimal notebook that also tests all the possible edge cases which could cause numerical instability. Here for example is some code showing that the mapping between axis systems and quaternions works invertibly:

def slerp(v0, v1, t=0.05):

"""

Interpolate between quaternions v0 and v1.

"""

v0 = safe_normalize(v0)

v1 = safe_normalize(v1)

dot = tf.reduce_sum(v0*v1,axis=-1,keepdims=True)

# If the dot product is negative, slerp won't take

# the shorter path. Note that v1 and -v1 are equivalent when

# the negation is applied to all four components. Fix by

# reversing one quaternion.

signflip = tf.where(tf.less_equal(dot,0.),-1.*tf.ones_like(dot),tf.ones_like(dot))

v1 *= signflip

dot *= signflip

# Linear answer.

linq = safe_normalize(v0 + t*(v1-v0))

#

sdot = tf.clip_by_value(dot,-1.0,1.0)

theta_0 = tf.acos(sdot)

theta = theta_0*t

sin_theta = tf.sin(theta)

sin_theta_0 = tf.sin(theta_0)

s0 = tf.cos(theta) - dot * sin_theta / (sin_theta_0+1e-19)

s1 = sin_theta / (sin_theta_0+1e-19)

sq = safe_normalize((s0 * v0) + (s1 * v1))

#

DOT_THRESHOLD = 0.9995

tdot = tf.concat([dot,dot,dot,dot],axis=-1)

slerpd = tf.where(tf.greater(tdot,DOT_THRESHOLD),linq,sq)

ttiled = tf.concat([t,t,t,t],axis=-1)

slerpdorv1 = tf.where(tf.greater(ttiled,1.0-1e-14),v1,slerpd)

return tf.where(tf.less(ttiled,1e-14),v0,slerpdorv1) def sftpluswparam(x):

return tf.log(1.0 + tf.exp(100. * x)) / 100.0 def RotToQuat(axes_):

"""

axes is a ... X 3 3 tensor of axes

this generates a ... X 4 tensor of quaternions.

which are 1:1 with those axes.

"""

w = (1./2.)*tf.sqrt(1e-15+tf.abs(1 + axes_[...,0, 0] + axes_[...,1, 1] + axes_[...,2, 2]))

x = tf.sign(axes_[...,2, 1] - axes_[...,1, 2])*tf.abs(0.5*tf.sqrt(1e-15+tf.abs(1.0 + axes_[...,0, 0] - axes_[...,1, 1] - axes_[...,2, 2])))

y = tf.sign(axes_[...,0, 2] - axes_[...,2, 0])*tf.abs(0.5*tf.sqrt(1e-15+tf.abs(1.0 - axes_[...,0, 0] + axes_[...,1, 1] - axes_[...,2, 2])))

z = tf.sign(axes_[...,1, 0] - axes_[...,0, 1])*tf.abs(0.5*tf.sqrt(1e-15+tf.abs(1.0 - axes_[...,0, 0] - axes_[...,1, 1] + axes_[...,2, 2])))

return tf.stack([w,x,y,z],axis=-1) def QuatToRot(q):

"""

a_ ... X 4 tensor of quaternions

this generates a ... X 3 X 3 of rotation matrices.

"""

tmp=tf.stack([1 - 2.*(q[...,2]*q[...,2] + q[...,3]*q[...,3]), 2*(q[...,1]*q[...,2] - q[...,3]*q[...,0]),

2*(q[...,1]*q[...,3] + q[...,2]*q[...,0]),2*(q[...,1]*q[...,2] + q[...,3]*q[...,0]), 1 - 2.*(q[...,1]*q[...,1] + q[...,3]*q[...,3]),

2*(q[...,2]*q[...,3] - q[...,1]*q[...,0]),2*(q[...,1]*q[...,3] - q[...,2]*q[...,0]), 2*(q[...,2]*q[...,3] + q[...,1]*q[...,0]),

1 - 2.*(q[...,1]*q[...,1] + q[...,2]*q[...,2])],axis=-1)

return tf.reshape(tmp,[-1,3,3]) def VectorsToOrient(v1,v2):

v1n = safe_normalize(v1)

v2n = safe_normalize(v2)

v3 = safe_normalize(tf.cross(v1n, v2n)+tf.constant(np.array([0., 0., 1e-19]), dtype=tf.float64))

# Compute the average of v1, v2, and their projections onto the

# plane.

v_av = (v1n + v2n) / 2.0

v_av = safe_normalize(v_av)

# Rotate pi/4 cw and ccw to obtain v1,v2

first = TF_AxisAngleRotation(v3, v_av, tf.constant(-Pi / 4., dtype=tf.float64))

second = TF_AxisAngleRotation(v3, v_av,tf.constant(Pi / 4., dtype=tf.float64))

vs = tf.concat([first[:, tf.newaxis, :], second[:, tf.newaxis, :],v3[:, tf.newaxis, :]],axis=1)

return vs def VectorsToAxisQs(v1,v2):

return tf.reshape(RotToQuat(VectorsToOrient(v1,v2)),(-1, 4)) def safe_normalize(x_):

nrm = tf.clip_by_value(tf.norm(x_,axis=-1,keepdims=True),1e-36,1e36)

nrm_ok = tf.logical_and(tf.not_equal(nrm,0.),tf.logical_not(tf.is_nan(nrm)))

safe_nrm = tf.where(nrm_ok,nrm,tf.ones_like(nrm))

return x_*tf.where(nrm_ok,1.0/safe_nrm,tf.zeros_like(nrm)) def safe_inv_norm(x_):

nrm = tf.clip_by_value(tf.norm(x_,axis=-1,keepdims=True),1e-36,1e36)

nrm_ok = tf.logical_and(tf.not_equal(nrm,0.),tf.logical_not(tf.is_nan(nrm)))

safe_nrm = tf.where(nrm_ok,nrm,tf.ones_like(nrm))

return tf.where(nrm_ok,1.0/safe_nrm,tf.zeros_like(nrm)) def safe_norm(x_):

nrm = tf.clip_by_value(tf.norm(x_, axis=-1, keepdims=True), 1e-36, 1e36)

nrm_ok = tf.logical_and(

tf.not_equal(nrm, 0.), tf.logical_not(tf.is_nan(nrm)))

safe_nrm = tf.where(nrm_ok, nrm, tf.zeros_like(nrm))

return safe_nrm with tf.Graph().as_default():

xyzs = tf.Variable(np.random.random((batch_size,MaxNAtom,3))*7.0 - 5.0)

init = tf.global_variables_initializer()

sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))

sess.run(init)

print sess.run(xyzs[0,:2])

print sess.run(VectorsToOrient(xyzs[:,0],xyzs[:,1]))

print sess.run(RotToQuat(VectorsToOrient(xyzs[:,0],xyzs[:,1])))

print sess.run(QuatToRot(RotToQuat(VectorsToOrient(xyzs[:,0],xyzs[:,1]))))

Style notes about the tf. code given above:

Each routine can be easily compared with the mathematica output, to quickly debug.

In general it is best to choose an order of dimensions for your tensor which goes (least often contracted,…,most often contracted), because several helpful tf. functions assume the first dimension of a tensor is the batch dimension.

tf.sqrt, and 1/(tensor) are both unstable operations in tf. They are unstable in a tricky way, because the implied chain-rule derivative graph (coming from tf.gradients(… op…, var) will still often evaluate NaN’s even when it appears that the arguments to the routine would always be in the well-behaved domain. One must make liberal use of tf.clip_by_value() , tf.where(), and infinitesimals to ensure both the routine and routine’s derivatives are well-behaved. Safe_norm is a good example.

Epilog

So is all this serious rotational mathematics only good for defining axis systems for atomic positions. No! Facebook AI-Research and collaborators from the EPFL published a nice use of quaternions for skeletal motion planning last week.

Again in the case of this “QuaterNet” application, quaternions found use as really the only way to smoothly transform between axis systems.