Gradient rules

Cheat sheet to differentiate expressions with the \( \nabla \) operator to compute gradients of various functions.

\( \newcommand{\dotprod}[2]{ \langle #1 \cdot #2 \rangle } \)

Usual operations

Sum rule, Product rule, Division rule, Scalar rule.
\(f\) and \(g\) both scalar functions
\(g: \mathbb R^n \rightarrow \mathbb R\)
\(f: \mathbb R^n \rightarrow \mathbb R\)

$$ \begin{array}{lcl} \nabla [ f + g ] & = &\nabla f + \nabla g \\ \nabla [ f . g ] & = & \nabla f . g + f . \nabla g \\ \nabla \left [ \frac{f}{g} \right ] & = &\frac{\nabla f . g – f . \nabla g}{g^2} \\ \nabla [ \alpha . f ] & = & \alpha . \nabla f \end{array} $$

Gradient of the norm

The norm is a scalar function, \( \| \|: \mathbb R^n \rightarrow \mathbb R\)
\( \vec x \) is a vector of dimension \(n\): \( \{x_1, \cdots, x_n\} \)

$$\nabla \| \vec {x} \| = \frac{ \vec {x}}{ \| \vec {x} \|} $$ If we re-write the norm as \( \| \| = norm() \) it may be more legible to some people: $$ norm(x_1, \cdots, x_n) = \frac{ \{x_1, \cdots, x_n\} } { norm(x_1, \cdots, x_n) } $$

Related: gradient of the squared norm (i.e. vector dot product).
The squared norm is a scalar function, \( \| \|^2: \mathbb R^n \rightarrow \mathbb R\)
$$\nabla \Big [ \| \vec {x} \|^2 \Big ] = \nabla \dotprod{\vec x }{ \vec x} = 2 . \vec {x} $$

Not to be confused with when we compose with a vectored-valued function:
\( \| \vec u(t) \|^2: \mathbb R \rightarrow\mathbb R^n \rightarrow \mathbb R\)
\(\Big[\| \vec u(t) \|^2\Big]_t = \Big[\dotprod{\vec u}{\vec u}\Big]_t = 2 \dotprod{\vec u}{\vec u'}\)

Gradient of a matrix

With \( M \) a \( n \times n \) matrix:

$$\nabla [  M \vec {x} ] = M $$
$\nabla [ \vec{x}^T M ] = M$ i

Likewise for rigid transformations (rotations \( M \in \mathbb R^{3 \times 3}\) and translations \(\vec t \in \mathbb R^3\)):

$$\nabla [  M \vec {x} + \vec t ] = M $$

If, and only if $M$ is symetric:

$\nabla [ \vec{x}^T M \vec x ] = 2.M\vec{x}$ i

Chain rules

With \(s: \mathbb R \rightarrow \mathbb R\) univariate and \(f: \mathbb R^n \rightarrow \mathbb R\) multivariate real valued the operation boils down to an uniform scale of the gradient:

$$ \nabla \left [ s( f(\vec {x}) ) \right ] = s’( f(\vec {x}) ) \nabla f(\vec {x}) $$

Similar to the above consider \(s: \mathbb R^n \rightarrow \mathbb R\) and plug in each parameter a scalar function \(f_n: \mathbb R^n \rightarrow \mathbb R\). The gradient is now a weighted sum of \( \nabla f_n \):

$$ \begin{array}{lcl}
\nabla \left [  s \left( f_1(\vec {p}), \cdots, f_n(\vec {p}) \right) \right ] & = & \frac{ \partial s( f_1(\vec {p}) )}{\partial x_1} . \nabla f_1(\vec {p}) + \cdots +  \frac{ \partial s( f_n(\vec {p}) )}{\partial x_n} . \nabla f_n(\vec {p}) \\
\text{(or expressed as a dot product)} & = & \left [  \nabla f_1(\vec {p}), \cdots,  \nabla f_n(\vec {p}) \right ] . \nabla s( f_1(\vec p), \cdots, f_n(\vec p))
\end{array}
$$

With \(m: \mathbb R^n \rightarrow \mathbb R^n\) deformation map and \(f: \mathbb R^n \rightarrow \mathbb R\) multivariate scalar function the operation boils down to transform the gradient with a matrix: $$ \nabla \left [ f( m(\vec {x}) ) \right ] = \mathbf{J}\left [ m(\vec {x}) \right ]^\mathsf{T} \nabla f(m(\vec {x}))$$

Where \(  \mathbf{J}\left [ m(\vec {x}) \right ]^\mathsf{T} \) denotes the transpose of the \(n \times n \) Jacobian matrix.

Related Jacobian rules

Jacobian of a linear function is the identity matrix \( I \) times the slope \( a \): $$ J[a \vec {x} + c]  =  a.I $$





Vector-valued function differentiation

Vector-valued functions or parametric functions have the following form: \(\vec g(t) = \{g_1(t), \cdots, g_n(t) \}^T \)

Chain rule

A univariate differentiation can lead to the use of \( \nabla \):

\(f: \mathbb R^n \rightarrow \mathbb R\) multivariable scalar function
\(\vec g: \mathbb R \rightarrow \mathbb R^n\) a parametric function with components \(\vec g(x) = \{g_1, \cdots, g_n \}^T \):

$$ \begin{array}{lcl} \left [ f(\vec g(x)) \right ] ' & = & \langle\nabla f( \vec g(x))^T . \vec {g’(x)}\rangle \\ \text{(otherwise said)} & = & \frac{ \partial f(\vec {g}) }{\partial x_1} . \frac{ \partial  g_1(x) }{\partial x} + \cdots + \frac{ \partial f(\vec {g}) }{\partial x_n} . \frac{ \partial g_n(x) }{\partial x} \end{array} $$

In short, we do the dot product between the gradient of \( f \) and the speed of \( g \)

Example: differentiate the norm of the position vector

Consider, \(\| g(t) \|: \mathbb R \rightarrow \mathbb R^n \rightarrow \mathbb R\) where:

\(\| \|: \mathbb R^n \rightarrow \mathbb R\) is a multivariable scalar function
\(\vec g: \mathbb R \rightarrow \mathbb R^n\) is a parametric function,

\[ \Big [\| g(t) \| \Big ]' = \nabla \| g(t) \| . g'(t) = \frac{g(t)}{ \| g(t) \| } \ . \ g'(t) \]

Usual operations

Sum rule, Product rule, Division rule, Scalar rule but also dot product rule and cross product rule.
\(\vec u: \mathbb R \rightarrow \mathbb R^n\)
\(\vec v: \mathbb R \rightarrow \mathbb R^n\)
Two vector-valued (\( \cong \)parametric) functions:

$$ \newcommand{annotation}[1]{ ~~~~~~\raise 0.8em {\color{grey}\scriptsize{ \text{#1} }} } \begin{array}{lcl} (\vec u + \vec v)' & = & \vec u' + \vec v' & \annotation{sum rule (component wise)} \\ (\vec u / \vec v)' & = & \frac{\vec u' . \vec v + \vec v' . \vec u }{ v^2 } & \annotation{division rule (component wise)} \\ (\alpha . \vec u)' & = & \alpha \vec u' & \annotation{scalar rule} \\ \dotprod{ \vec u }{ \vec v }' & = & \dotprod{\vec u' }{ \vec v} + \dotprod{\vec v' }{ \vec u } & \annotation{dot product rule} \\ \langle \vec u \times \vec v \rangle' & = & \vec u' \times \vec v + \vec u \times \vec v' & \annotation{cross product rule (warning: non-commutative!) } \\ \end{array} $$

\(f: \mathbb R \rightarrow \mathbb R\)
The usual univariate chain rule holds:

$$ \newcommand{annotation}[1]{ ~~~~~~\raise 0.8em {\color{grey}\scriptsize{ #1 }} } \Big [\vec u \big (f(x) \big ) \Big]' = \vec u'\big(f(x)\big) . f'(x) $$

Component-wise division rule:

$$ \left [ {\vec u(t) } / {f(t)} \right ]' = (\vec u' . f - f' . \vec u) / f^2 $$

Property: constant norm equates orthogonal derivative

If the norm of the vector-valued function \(u\) is constant, then its derivative \(u'\) is perpendicular to it and the converse is also true:

$$ \| \vec u(t) \| = c \quad \Leftrightarrow \quad \vec u \perp \vec u' \annotation{c \in \mathbb R } $$

This also implies \(u\) is a circle, however, if \(u\) describes a circle it does not imply the above (e.g. a circle not centered at the origin).


When the norm is constant, the squared norm is constant as well, and can be written as a dot product: $$ \left \{ \begin{matrix} \| \vec u(t) \| & = & c \\ \| \vec u(t) \|^2 & = & c \\ \dotprod{u(t) }{ u(t) } & = & c \\ \end{matrix} \right . \quad \Leftrightarrow \quad \left . \begin{matrix} \vec u \perp \vec u{\color{purple}'} \\ \langle u(t) . u{\color{purple}'}(t) \rangle & = & 0 \\ \end{matrix} \right . $$

Differentiate the dot product of \(u(t)\) over itself: $$ \begin{aligned} \Big( u(t) . u(t) \Big)' & = \Big(c \Big )' \\ u'(t) . u(t) + u(t) . u'(t) & = 0 \annotation{\text{apply dot product rule}} \\ 2(u' . u) & = 0 \\ u' . u & = 0 \\ 😀\\ \end{aligned} $$ A null dot product \(u' . u\) means \(u'\) and \(u\) are perpendicular \( u \perp u' \). In physics, it is well known that constant velocity \(\| v \| = c\) implies a orthogonal acceleration \(v'\)! In which case, only the direction of the velocity can change, not its intensity. In the other hand, the acceleration's intensity will effect at which rate the velocity's orientation changes. More discussed here

Some links

Wikipedia:
- Gradient as a derivative
- Vector Calculus identities
- Vector algebra relations
Others:
- Multivariate / multivariable chain rule
- Vector-valued functions derivatives
- List of uni-variate differentiation rules

No comments

(optional field, I won't disclose or spam but it's necessary to notify you if I respond to your comment)
All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.
Anti-spam question: