Reading Spatial Transforms
Last updated: Oct 7, 2023
This article covers different ways to interpret homogeneous spatial transforms and provides appropriate notational practices for each.
Homogeneous Matrices
In robot kinematics, coordinate frames are commonly used to keep track of the position and orientation of entities in the space. A coordinate frame $\{O, (\hat{i}, \hat{j}, \hat{k})\}$ in 3D Euclidean space is defined commonly using an origin $O$ and orthonormal basis vectors $(\hat{i}, \hat{j}, \hat{k})$.
With two frames $\mathsf{F}$ and $\mathsf{F}'$ defined, one can measure the relative position and orientation of one frame relative to another; or one can translate and rotate one frame relative to another. Achieving either of these requires using a representation for translation and rotation (which has a variety of formalisms).
A common representation used for forward kinematics are homogeneous matrices. Recall the familiar homogeneous transformation matrix $\mathrm{T}$ as used in forward kinematics, $$ \mathrm{T} = \begin{bmatrix} \mathrm{R} & \mathbf{d} \\ \mathbf{0}& 1 \end{bmatrix} = \begin{bmatrix} r_{xx'} & r_{xy'} & r_{xz'} & d_x \\ r_{yx'} & r_{yy'} & r_{yz'} & d_y \\ r_{zx'} & r_{zy'} & r_{zz'} & d_z \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ The matrix $\mathrm{T}$ measures the state of frame $\mathsf{F}' = \{o', (\hat{x}', \hat{y}', \hat{z}')\}$ relative to frame $\mathsf{F} = \{o, (\hat{x}, \hat{y}, \hat{z})\}$. Here $\mathsf{F}'$ is the measured frame and $\mathsf{F}$ is the reference frame. Furthermore:
- $\mathrm{R}$ and $\mathrm{d}$ are the rotation and translation components respectively.
- $\mathrm{T} \in \mathsf{SE}(3)$ and $\mathrm{R} \in \mathsf{SO}(3)$, denoting the special Euclidean and special orthogonal groups respectively.
- $r_{uv} = \hat{\mathbf{u}} \cdot \hat{\mathbf{v}} = \cos(\hat{\mathbf{u}}, \hat{\mathbf{v}})$ quantifies the relative orientation between the basis unit vectors through the oriented (typically, anticlockwise) angle $\measuredangle(\hat{\mathbf{u}}, \hat{\mathbf{v}})$ at $\hat{\mathbf{v}}$ measured from $\hat{\mathbf{u}}$.
- $d_u = oo' \cdot \hat{\mathbf{u}}$ is the translation distance between both origins, at $o'$ measured from $o$, and projected in the direction of $\hat{\mathbf{u}}$.
Interpretations
A general way to express the transform is $\mathrm{T}: x \mapsto y$ or $y = \mathrm{T} x$. The object $x$ typically can be a point $p$, a vector $v$, or a frame $\mathrm{F}$.
There are two obvious ways to interpret the transformation $\mathrm{T}: x \mapsto y$.
- The scene is unchanged, but the object is measured from a different reference.
- The scene has changed, the object has been moved in the same reference.
This above notation hides this nuance, so a different notation is written below for each case.
Case 1: Change of Basis
$${}^\mathsf{target}x_0 = {}^\mathsf{target}\mathrm{T}_\mathsf{source} \; {}^\mathsf{source}x_0$$
The object $x_0$ is unchanged. It was measured in the frame $\mathsf{source}$ and now it is measured in the frame $\mathsf{target}$ by using $\mathrm{T}$.
Case 2: State Operator
$${}^\mathsf{reference}x_1 = {}^\mathsf{reference}\mathrm{T}_{\mathrm{R}, \mathbf{d}} \; {}^\mathsf{reference}x_0$$
The object $x_0$ has been changed into the object $x_1$ using $\mathrm{T}$ by being rotated by $\mathrm{R}$ and translated by $\mathrm{d}$, all in the same frame $\mathsf{reference}$.
Therefore, $\mathrm{T}$ either converts an object’s representation $({}^{\mathsf{S}}\Box_0 \to {}^{\mathsf{T}}\Box_0)$ or modifies its state $({}^{\mathsf{R}}\Box_0 \to {}^{\mathsf{R}}\Box_1)$.
These two interpretations are further discussed in the next sections.
Coordinate Transformation
Transform the coordinates of the same object $x_0$ from a $\mathsf{source}$ frame to a $\mathsf{target}$ frame.
- This measures the same object $x$ from a different location ($\mathsf{target}$), using a known location ($\mathsf{source}$).
- This is a change of basis re-expressing the coordinates of $x$ in a new frame, no objects were moved, no new objects were created.
- In other words, the state of the workspace has not been modified. It is just expressed differently.
Example: $^{\mathsf{A}}p = {}^{\mathsf{A}} \mathrm{T}_\mathsf{B} {}^{\mathsf{B}}p$
- The object $x$ here is a point $p$.
- There is one point $p$ and two frames $\mathsf{A}$ and $\mathsf{B}$.
- Transforms the coordinates of the unchanged point $p$ from frame $\mathsf{B}$ to frame $\mathsf{A}$.
State Operator
Create a new state $\Box_1$ from an input state $\Box_0$ both measured in the same frame $\mathsf{reference}$.
- This means either modifying the state of an existing object, or creating a new object using an old one.
- This is an operator, which is actively changing the scene, by moving or creating objects.
- The state of the workspace has been modified, by mutating existing objects or introducing new ones.
For a more programmatic illustration, see this pseudocode,
# Create new object
object2 ← transform(object1);
# Modify an existing object
object1 ← transform(object1);
- Here
object2
is assigned to a state representing a transformedobject1
. - The function
transform
is assumed to readobject1
’s state by copy, modify it, and then return it; it does not modify it in-place, nor delete it. - Here
object1
’s state is updated i.e., it is translated and/or rotated.
Example: ${}^{\mathsf{A}}v'=\mathrm{T}_{\mathrm{R}, \mathbf{d}} \; {}^{\mathsf{A}}v$
- There is one frame $\mathsf{A}$ and two vectors $v$ and $v'$.
- This changes the vector using rotation $\mathrm{R}$ and translation $\mathbf{d}$ into a new vector in the same frame $\mathsf{A}$.
Example: ${}^{\mathsf{A}} \mathrm{T}_\mathsf{B'} = \mathrm{T}_{\mathrm{R}, \mathbf{d}} \; {}^{\mathsf{A}} \mathrm{T}_\mathsf{B}$
- There are three frames $\mathsf{A}$, $\mathsf{B}$ and $\mathsf{B'}$.
- This encodes the pose of a transformed output frame $\mathsf{B'}$ measured from same frame $\mathsf{A}$.
Tasteful Notation
Notation often implies the preferred interpretation. Consider this example,
$${}^0\mathrm{T}_n = \prod_{k=0}^{n-1}{}^{k}\mathrm{T}_{k+1} = {}^\mathrm{0}\mathrm{T}_1 {}^\mathrm{1}\mathrm{T}_2 \cdots {}^{n-2}\mathrm{T}_{n-1} {}^{n-1}\mathrm{T}_n$$
There are $n$ different frames (numbered $0$ to $n-1$) and the same $\mathrm{T}_n$ is just read from one to the next.
Equivalently, rewrite this with the following notation,
$${}^0\mathrm{F}_1 = {}^\mathrm{0}\mathrm{T}_1 {}^\mathrm{0}\mathrm{T}_2 \cdots {}^{0}\mathrm{T}_{n-1} {}^0\mathrm{F}_0$$
Here there is one reference frame (numbered $0$) and the frame $\mathrm{F}_0$ is moved $n-1$ times into its final pose $\mathrm{F}_1$.
Both these equations are perfectly equivalent, only the notation is changed. It is important to remember that the interpretations are conceptually useful, but mathematically equivalent
References
[1] Mortenson, Geometric Transformations for 3D Modeling, 2007.
[2] Spong, Robot Modeling and Control, 2020.