Spatial transforms
This article covers ways to interpret homogeneous transforms while recommending notational practices.
Published:
2021-09-06
Homogeneous Matrices
In robot kinematics, coordinate frames are commonly used to keep track of the position and orientation of entities in the space. A coordinate frame $\{O, (\hat{i}, \hat{j}, \hat{k})\}$ in 3D Euclidean space is defined commonly using an origin $O$ and orthonormal basis vectors $(\hat{i}, \hat{j}, \hat{k})$.
With two frames $\mathsf{F}$ and $\mathsf{F}'$ defined, one can measure the relative position and orientation of one frame relative to another; or one can translate and rotate one frame relative to another. Achieving either of these requires using a representation for translation and rotation (which has a variety of formalisms).
A common representation used for forward kinematics are homogeneous matrices. Recall the familiar homogeneous transformation matrix $\mathrm{T}$ as used in forward kinematics, $$ \mathrm{T} = \begin{bmatrix} \mathrm{R} & \mathbf{d} \\ \mathbf{0}& 1 \end{bmatrix} = \begin{bmatrix} r_{xx'} & r_{xy'} & r_{xz'} & d_x \\ r_{yx'} & r_{yy'} & r_{yz'} & d_y \\ r_{zx'} & r_{zy'} & r_{zz'} & d_z \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ The matrix $\mathrm{T}$ measures the state of frame $\mathsf{F}' = \{o', (\hat{x}', \hat{y}', \hat{z}')\}$ relative to frame $\mathsf{F} = \{o, (\hat{x}, \hat{y}, \hat{z})\}$. Here $\mathsf{F}'$ is the measured frame and $\mathsf{F}$ is the reference frame. Furthermore:
- $\mathrm{R}$ and $\mathrm{d}$ are the rotation and translation components respectively.
- $\mathrm{T} \in \mathsf{SE}(3)$ and $\mathrm{R} \in \mathsf{SO}(3)$, denoting the special Euclidean and special orthogonal groups respectively.
- $r_{uv} = \hat{\mathbf{u}} \cdot \hat{\mathbf{v}} = \cos(\hat{\mathbf{u}}, \hat{\mathbf{v}})$ quantifies the relative orientation between the basis unit vectors through the oriented (typically, anticlockwise) angle $\measuredangle(\hat{\mathbf{u}}, \hat{\mathbf{v}})$ at $\hat{\mathbf{v}}$ measured from $\hat{\mathbf{u}}$.
- $d_u = oo' \cdot \hat{\mathbf{u}}$ is the translation distance between both origins, at $o'$ measured from $o$, and projected in the direction of $\hat{\mathbf{u}}$.
Interpretations
A general way to express the transform is $\mathrm{T}: x \mapsto y$ or $y = \mathrm{T} x$. The object $x$ typically can be a point $p$, a vector $v$, or a frame $\mathrm{F}$.
There are two obvious ways to interpret the transformation $\mathrm{T}: x \mapsto y$.
- The scene is unchanged, but the object is measured from a different reference.
- The scene has changed, the object has been moved in the same reference.
This above notation hides this nuance, so a different notation is written below for each case.
Case 1: Change of Basis
$${}^\mathsf{target}x_0 = {}^\mathsf{target}\mathrm{T}_\mathsf{source} \; {}^\mathsf{source}x_0$$
The object $x_0$ is unchanged. It was measured in the frame $\mathsf{source}$ and now it is measured in the frame $\mathsf{target}$ by using $\mathrm{T}$.
Case 2: State Operator
$${}^\mathsf{reference}x_1 = {}^\mathsf{reference}\mathrm{T}_{\mathrm{R}, \mathbf{d}} \; {}^\mathsf{reference}x_0$$
The object $x_0$ has been changed into the object $x_1$ using $\mathrm{T}$ by being rotated by $\mathrm{R}$ and translated by $\mathrm{d}$, all in the same frame $\mathsf{reference}$.
Therefore, $\mathrm{T}$ either converts an object’s representation $({}^{\mathsf{S}}\Box_0 \to {}^{\mathsf{T}}\Box_0)$ or modifies its state $({}^{\mathsf{R}}\Box_0 \to {}^{\mathsf{R}}\Box_1)$.
These two interpretations are further discussed in the next sections.
Coordinate Transformation
Transform the coordinates of the same object $x_0$ from a $\mathsf{source}$ frame to a $\mathsf{target}$ frame.
- This measures the same object $x$ from a different location ($\mathsf{target}$), using a known location ($\mathsf{source}$).
- This is a change of basis re-expressing the coordinates of $x$ in a new frame, no objects were moved, no new objects were created.
- In other words, the state of the workspace has not been modified. It is just expressed differently.
Example: $^{\mathsf{A}}p = {}^{\mathsf{A}} \mathrm{T}_\mathsf{B} {}^{\mathsf{B}}p$
- The object $x$ here is a point $p$.
- There is one point $p$ and two frames $\mathsf{A}$ and $\mathsf{B}$.
- Transforms the coordinates of the unchanged point $p$ from frame $\mathsf{B}$ to frame $\mathsf{A}$.
State Operator
Create a new state $\Box_1$ from an input state $\Box_0$ both measured in the same frame $\mathsf{reference}$.
- This means either modifying the state of an existing object, or creating a new object using an old one.
- This is an operator, which is actively changing the scene, by moving or creating objects.
- The state of the workspace has been modified, by mutating existing objects or introducing new ones.
For a more programmatic illustration, see this pseudocode,
# Create new object
object2 ← transform(object1);
# Modify an existing object
object1 ← transform(object1);
- Here
object2
is assigned to a state representing a transformedobject1
. - The function
transform
is assumed to readobject1
’s state by copy, modify it, and then return it; it does not modify it in-place, nor delete it. - Here
object1
’s state is updated i.e., it is translated and/or rotated.
Example: ${}^{\mathsf{A}}v'=\mathrm{T}_{\mathrm{R}, \mathbf{d}} \; {}^{\mathsf{A}}v$
- There is one frame $\mathsf{A}$ and two vectors $v$ and $v'$.
- This changes the vector using rotation $\mathrm{R}$ and translation $\mathbf{d}$ into a new vector in the same frame $\mathsf{A}$.
Example: ${}^{\mathsf{A}} \mathrm{T}_\mathsf{B'} = \mathrm{T}_{\mathrm{R}, \mathbf{d}} \; {}^{\mathsf{A}} \mathrm{T}_\mathsf{B}$
- There are three frames $\mathsf{A}$, $\mathsf{B}$ and $\mathsf{B'}$.
- This encodes the pose of a transformed output frame $\mathsf{B'}$ measured from same frame $\mathsf{A}$.
Tasteful Notation
Notation often implies the preferred interpretation. Consider this example,
$${}^0\mathrm{T}_n = \prod_{k=0}^{n-1}{}^{k}\mathrm{T}_{k+1} = {}^\mathrm{0}\mathrm{T}_1 {}^\mathrm{1}\mathrm{T}_2 \cdots {}^{n-2}\mathrm{T}_{n-1} {}^{n-1}\mathrm{T}_n$$
There are $n$ different frames (numbered $0$ to $n-1$) and the same $\mathrm{T}_n$ is just read from one to the next.
Equivalently, rewrite this with the following notation,
$${}^0\mathrm{F}_1 = {}^\mathrm{0}\mathrm{T}_1 {}^\mathrm{0}\mathrm{T}_2 \cdots {}^{0}\mathrm{T}_{n-1} {}^0\mathrm{F}_0$$
Here there is one reference frame (numbered $0$) and the frame $\mathrm{F}_0$ is moved $n-1$ times into its final pose $\mathrm{F}_1$.
Both these equations are perfectly equivalent, only the notation is changed. It is important to remember that the interpretations are conceptually useful, but mathematically equivalent.