The Coordinate Transformation Notation


Introduction

The transformations among different coordinate frames are very common in computer vision, robotics and many other fields. They feel so common and “straightforward” that they might not deserve the time for a separate post. However, for more than once, I saw lengthy argument found its way back to the understanding of the transformation notation. People granted other saw what they saw, which was not always true. There was nitty-gritty about the notation, and it could even seem confusing sometimes. This post is about the coordinate transformation notation that makes most sense to me.

Notation In A Nutshell

Getting to the point straight, the commonly used notation for matrix transforms (here I use “transformation” and “transform” interchangeably) and vectors is:

Type Notation Description
Transform $T_{{dest}\text{\textunderscore}{src}}$ Transform of a point in Frame “src” to Frame “dest”
Vector $v_{{dest}\text{\textunderscore}{vec}}$ A vector “vec” (defined from one point to another) expressed in Frame “dest”

A pose, the position and orientation of an object in space, is also a transform from the object frame to a reference frame. Say you want to track your car from home to a grocery store. Your car pose is the transform of the car frame to the world frame. To make it easy in math, the pose of the car can be calculated as the transform from the car frame origin to the world frame.

Chain Rule

The “chain rule” explained below makes the notation above really shine. Say we want to transform a point in Frame $A_1$ to Frame ${A_N}$, then the transform matrix is

$$T_{A_N\text{\textunderscore}A_1} = T_{A_N\text{\textunderscore}A_{N-1}} T_{A_{N-1}\text{\textunderscore}A_{N-2}} … T_{A_2\text{\textunderscore}A_{1}}$$

Similarly, to transform a vector $a$ expressed in Frame $A_1$ to Frame Frame ${A_N}$:

$$v_{A_N\text{\textunderscore}a} = T_{A_N\text{\textunderscore}A_1} v_{A_1\text{\textunderscore}a}$$

and then we can expand the $T_{A_N\text{\textunderscore}A_1}$ above.

You can easily see the adjacent coordinate frames on the both sides of the multiplication operator are always the same, thus the name of “chain rule”.

For numerous times, when I got confused about the complicated transforms in some documentation or code, I used this simple chain rule to dessect the situation. This chain rule also helps developers implement and debug the transform in the code. It becomes very easy to visually validate the order of coordinate frames pairwise.

Example

Let’s see how this notation helps the problem-solving with in example. Nowadays low-cost inertial measurement units (IMUs) are widely used to track the motion of movable platforms (e.g., drones, cars, vacuum cleaners, VR/AR headsets, etc.). The tracking result, or odometry, is in the IMU frame at first. But the consumer of the odometry is usually in another frame, such as the platform base. Here comes the ubiquitous need of transforming odometry among different frames.

The odometry poses in IMU body frame can be thought as the transform from the IMU body frame to the IMU world frame (usually the IMU world frame is the frame of the first IMU pose after the platform boots), thus the notation $T_{imuBody\text{\textunderscore}imuWorld}$. Similary, we can also write the odometry poses in platform base frame as $T_{base\text{\textunderscore}world}$, with the world frame being the frame of the first platform base odometry pose. Our job now is to get the latter from the former.

Applying the chain rule, we can expand the odometry transform as:

$$T_{base\text{\textunderscore}world} = T_{base\text{\textunderscore}imuBody} T_{imuBody\text{\textunderscore}imuWorld} T_{imuWorld\text{\textunderscore}world}$$

On the right-hand side of the equation above, we see two new items:

Therefore, the equation is sufficient to transform the odometry from IMU frame to the platform base frame.

How It Got Confusing

(To from, or not to from: that is the question…)

The notation might seem neither special nor confusing, but it was not always what people used. For example, some people found it more intuitive to define $T_{A\text{\textunderscore}B}$ as the transform of “A to B” instead of “A from B”, which leads to the divide between the “to notation” and “from notation” (this post is in the “from” camp). People even have different understanding on the meaning of “to” and “from”. The “transform of A from B” in this post may actually mean “transform of A to B” in other eyes. If you need to deal with all of this when reading some sparsely-documented (read “undocumented”) code, things can get ugly quickly 😓.

Meanwhile, I also found people summarized rules such as the ones below (ref):

If the rotations / translations are performed in relation to the current, newly defined (or changed) coordinate system, the newly added transformation matrices need to be multiplicatively appended on the right-hand side. If all of them are performed in relation to the fixed reference coordinate system, the transformation matrices need to be multiplicatively appended on the left-hand side.

Rules like these are not wrong. They just look derived rules to me. But with the notation in this article, I don’t have to remember extra rules.

Baking It In The Code

Although there is nothing wrong to roll your own API to implement the various ways of transform, I recommend to use and see most people use Eigen::Transform in the Eigen library when possible. The APIs are easy to use. They are widely used and tested.

Conclusion