Gradient rules

Multivariate calculus - differentiate expressions with nabla/gradient operator. - 01/2015 - #Math |

Image loading…

Cheat sheet to differentiate expressions with the $ \nabla $ operator to compute gradients of various functions.

$$ \newcommand{\Annotation}[1]{\mathrlap{~~~~~~\raisebox{0.8em}{$\scriptstyle\htmlClass{annotation-text}{\begin{array}{l}#1\end{array}}$}}} \newcommand{\annotation}[1]{{~~~~~~\raisebox{0.8em}{$\scriptstyle\htmlClass{annotation-text}{\begin{array}{l}#1\end{array}}$}}} $$

$ \newcommand{\dotprod}[2]{ \langle #1 \cdot #2 \rangle } $

Usual operations

$g: \mathbb R^n \rightarrow \mathbb R$
$f: \mathbb R^n \rightarrow \mathbb R$

$$ \begin{array}{lcl} \nabla [ f . g ] & = & \nabla f . g + f . \nabla g \\ \nabla \left [ \frac{f}{g} \right ] & = &\frac{\nabla f . g – f . \nabla g}{g^2} \\ \nabla [ \alpha . f ] & = & \alpha . \nabla f \end{array} $$

Gradient of the norm

The norm is a scalar function, $ \| \|: \mathbb R^n \rightarrow \mathbb R$
$ \vec x $ is a vector of dimension $n$: $ \{x_1, \cdots, x_n\} $

$$\nabla \| \vec {x} \| = \frac{ \vec {x}}{ \| \vec {x} \|} $$ If we re-write the norm as $ \| \| = norm() $ it may be more legible to some people: $$ norm(x_1, \cdots, x_n) = \frac{ \{x_1, \cdots, x_n\} } { norm(x_1, \cdots, x_n) } $$

Related: gradient of the squared norm (i.e. vector dot product).
The squared norm is a scalar function, $ \| \|^2: \mathbb R^n \rightarrow \mathbb R$
$$\nabla \Big [ \| \vec {x} \|^2 \Big ] = \nabla \dotprod{\vec x }{ \vec x} = 2 . \vec {x} $$

Not to be confused with when we compose with a vectored-valued function:
$ \| \vec u(t) \|^2: \mathbb R \rightarrow\mathbb R^n \rightarrow \mathbb R$
$\Big[\| \vec u(t) \|^2\Big]_t = \Big[\dotprod{\vec u}{\vec u}\Big]_t = 2 \dotprod{\vec u}{\vec u'}$

Gradient of a matrix

With $ M $ a $ n \times n $ matrix:

$$\nabla [ M \vec {x} ] = M $$

$\nabla [ \vec{x}^T M ] = M$

Or similarly with a vector $\vec v \in \mathbb R^n$

$$ \nabla \vec v^T \vec x = \vec v^T $$

Likewise for rigid transformations (rotations $ M \in \mathbb R^{3 \times 3}$ and translations $\vec t \in \mathbb R^3$):

$$\nabla [ M \vec {x} + \vec t ] = M $$

If, and only if $M$ is symetric:

$\nabla [ \vec{x}^T M \vec x ] = (M + M^T) \vec x = 2.M\vec{x}$

Chain rules

With $s: \mathbb R \rightarrow \mathbb R$ univariate and $f: \mathbb R^n \rightarrow \mathbb R$ multivariate real valued the operation boils down to an uniform scale of the gradient:

$$ \nabla \left [ s( f(\vec {x}) ) \right ] = s’( f(\vec {x}) ) \nabla f(\vec {x}) $$

Similar to the above consider $s: \mathbb R^n \rightarrow \mathbb R$ and plug in each parameter a scalar function $f_n: \mathbb R^n \rightarrow \mathbb R$. The gradient is now a weighted sum of $ \nabla f_n $:

$$ \begin{array}{lcl} \nabla \left [ s \left( f_1(\vec {p}), \cdots, f_n(\vec {p}) \right) \right ] & = & \frac{ \partial s( f_1(\vec {p}) )}{\partial x_1} . \nabla f_1(\vec {p}) + \cdots + \frac{ \partial s( f_n(\vec {p}) )}{\partial x_n} . \nabla f_n(\vec {p}) \\ \text{(or expressed as a dot product)} & = & \left [ \nabla f_1(\vec {p}), \cdots, \nabla f_n(\vec {p}) \right ] . \nabla s( f_1(\vec p), \cdots, f_n(\vec p)) \end{array} $$

With $m: \mathbb R^n \rightarrow \mathbb R^n$ deformation map and $f: \mathbb R^n \rightarrow \mathbb R$ multivariate scalar function the operation boils down to transform the gradient with a matrix: $$ \nabla \left [ f( m(\vec {x}) ) \right ] = \mathbf{J}\left [ m(\vec {x}) \right ]^\mathsf{T} \nabla f(m(\vec {x}))$$

Where $ \mathbf{J}\left [ m(\vec {x}) \right ]^\mathsf{T} $ denotes the transpose of the $n \times n $ Jacobian matrix.

Related Jacobian rules

Jacobian of a linear function is the identity matrix $ I $ times the slope $ a $: $$ J[a \vec {x} + c] = a.I $$

Vector-valued function differentiation

Vector-valued functions or parametric functions have the following form: $\vec g(t) = \{g_1(t), \cdots, g_n(t) \}^T $

Chain rule

A univariate differentiation can lead to the use of $ \nabla $:

$f: \mathbb R^n \rightarrow \mathbb R$ multivariable scalar function
$\vec g: \mathbb R \rightarrow \mathbb R^n$ a parametric function with components $\vec g(x) = \{g_1, \cdots, g_n \}^T $:

$$ \begin{array}{lcl} \left [ f(\vec g(x)) \right ] ' & = & \langle\nabla f( \vec g(x))^T . \vec {g’(x)}\rangle \\ \text{(otherwise said)} & = & \frac{ \partial f(\vec {g}) }{\partial x_1} . \frac{ \partial g_1(x) }{\partial x} + \cdots + \frac{ \partial f(\vec {g}) }{\partial x_n} . \frac{ \partial g_n(x) }{\partial x} \end{array} $$

In short, we do the dot product between the gradient of $ f $ and the speed of $ g $

Example: differentiate the norm of the position vector

Consider, $\| g(t) \|: \mathbb R \rightarrow \mathbb R^n \rightarrow \mathbb R$ where:

$\| \|: \mathbb R^n \rightarrow \mathbb R$ is a multivariable scalar function
$\vec g: \mathbb R \rightarrow \mathbb R^n$ is a parametric function,

\[ \Big [\| g(t) \| \Big ]' = \nabla \| g(t) \| . g'(t) = \frac{g(t)}{ \| g(t) \| } \ . \ g'(t) \]

Usual operations

Sum rule, Product rule, Division rule, Scalar rule but also dot product rule and cross product rule.
$\vec u: \mathbb R \rightarrow \mathbb R^n$
$\vec v: \mathbb R \rightarrow \mathbb R^n$
Two vector-valued ($ \cong $parametric) functions:

$$ \begin{array}{lcl} (\vec u + \vec v)' & = & \vec u' + \vec v' & \annotation{\text{sum rule (component wise)}} \\ (\vec u / \vec v)' & = & \frac{\vec u' . \vec v + \vec v' . \vec u }{ v^2 } & \annotation{\text{division rule (component wise)}} \\ (\alpha . \vec u)' & = & \alpha \vec u' & \annotation{\text{scalar rule}} \\ \dotprod{ \vec u }{ \vec v }' & = & \dotprod{\vec u' }{ \vec v} + \dotprod{\vec v' }{ \vec u } & \annotation{\text{dot product rule}} \\ \langle \vec u \times \vec v \rangle' & = & \vec u' \times \vec v + \vec u \times \vec v' & \annotation{\text{cross product rule (warning: non-commutative!)}} \\ \end{array} $$

$f: \mathbb R \rightarrow \mathbb R$
The usual univariate chain rule holds:

$$ \Big [\vec u \big (f(x) \big ) \Big]' = \vec u'\big(f(x)\big) . f'(x) $$

Component-wise division rule:

$$ \left [ {\vec u(t) } / {f(t)} \right ]' = (\vec u' . f - f' . \vec u) / f^2 $$

Property: constant norm equates orthogonal derivative

If the norm of the vector-valued function $u$ is constant, then its derivative $u'$ is perpendicular to it and the converse is also true:

$$ \| \vec u(t) \| = c \quad \Leftrightarrow \quad \vec u \perp \vec u' \annotation{c \in \mathbb R} $$

This also implies $u$ is a circle, however, if $u$ describes a circle it does not imply the above (e.g. a circle not centered at the origin).

When the norm is constant, the squared norm is constant as well, and can be written as a dot product: $$ \left \{ \begin{matrix} \| \vec u(t) \| & = & c \\ \| \vec u(t) \|^2 & = & c \\ \dotprod{u(t) }{ u(t) } & = & c \\ \end{matrix} \right . \quad \Leftrightarrow \quad \left . \begin{matrix} \vec u \perp \vec u{\color{purple}'} \\ \langle u(t) . u{\color{purple}'}(t) \rangle & = & 0 \\ \end{matrix} \right . $$

Differentiate the dot product of $u(t)$ over itself: $$ \begin{aligned} \Big( u(t) . u(t) \Big)' & = \Big(c \Big )' \\ u'(t) . u(t) + u(t) . u'(t) & = 0 \annotation{\text{apply dot product rule}} \\ 2(u' . u) & = 0 \\ u' . u & = 0 \\ 😀\\ \end{aligned} $$ A null dot product $u' . u$ means $u'$ and $u$ are perpendicular $ u \perp u' $. In physics, it is well known that constant velocity $\| v \| = c$ implies a orthogonal acceleration $v'$! In which case, only the direction of the velocity can change, not its intensity. In the other hand, the acceleration's intensity will effect at which rate the velocity's orientation changes. More discussed here

Some links

Wikipedia:
- Gradient as a derivative
- Vector Calculus identities
- Vector algebra relations
Others:
- Multivariate / multivariable chain rule
- Vector-valued functions derivatives
- List of uni-variate differentiation rules

Donate

One comment

Your proof for g"radient of matrix when M is symmetric" seems wrong.

In your last 3 steps, you transform (x^T)(A^T) to Ax, which seems wrong, that should be (Ax)^T.
I think the mistake starts from this: LUx (row vector) + (x^T)LU (column vector) Lam - 22/01/2026 -- 05:05

Rodolphe Vaillant's homepage

Research, teaching and more...

Gradient rules

Usual operations

Gradient of the norm

Gradient of a matrix

Chain rules

Related Jacobian rules

Vector-valued function differentiation

Chain rule

Example: differentiate the norm of the position vector

Usual operations

Property: constant norm equates orthogonal derivative

Some links