矩阵微积分 · 微积分 03

关键字矩阵微积分加法法则乘法法则链式法则Logistic 函数softmax 函数梯度

摘要 —— 本文介绍了矩阵微积分的运算规则,并通过两个典型案例展示了其计算方式。

矩阵微积分是用矩阵和向量表示因变量每个成分关于自变量每个成分的偏导数。技巧是观察偏导的维度构成。

运算规则

1、向量 $\to$ 标量:$\forall x \in \mathbb{R}^p, \forall y = f(x) \in \mathbb{R}$,

$$ \frac{\partial y}{\partial x} = \bigg[ \frac{\partial y}{\partial x_1}, ..., \frac{\partial y}{\partial x_p} \bigg]^\textrm{T} \in \mathbb{R}^p $$

2、标量 $\to$ 向量:$\forall x \in \mathbb{R}, \forall y = f(x) \in \mathbb{R}^q$,

$$ \frac{\partial y}{\partial x} = \bigg[ \frac{\partial y_1}{\partial x}, ..., \frac{\partial y_q}{\partial x} \bigg] \in \mathbb{R}^{1 \times q} $$

3、向量 $\to$ 向量:$\forall x \in \mathbb{R}^p, \forall y = f(x) \in \mathbb{R}^q$,

$$ \frac{\partial y}{\partial x} = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_q}{\partial x_1} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_p} & \cdots & \frac{\partial y_q}{\partial x_p} \end{matrix} \right] \in \mathbb{R}^{p \times q} $$

4、加法法则:$\forall x \in \mathbb{R}^p, \forall y = f(x) \in \mathbb{R}^q, \forall z = f(x) \in \mathbb{R}^q$,

$$ \frac{\partial (y + z)}{\partial x} = \frac{\partial y}{\partial x} + \frac{\partial z}{\partial x} \in \mathbb{R}^{p \times q} $$

5、乘法法则

6、链式法则

7、常用结论

更多相关结论,推荐查阅 The Matrix Cookbook

案例

Logistic 函数的梯度

$\forall x \in \mathbb{R}, \sigma (x) = \frac{1}{1 + e^{-x}}$。可以验证 $\sigma'(x) = \sigma(x) \big(1 - \sigma (x) \big)$。当输入 $x \in \mathbb{R}^n$ 时,

$$ \sigma' (x) = \operatorname{diag} \Big( \sigma (x) \odot \big( 1- \sigma (x) \big) \Big) \in \mathbb{R}^{n \times n} $$

SoftMax 函数的梯度

$\forall x = [x_1, ..., x_n]^\textrm{T} \in \mathbb{R}^n$,softmax 函数的输出 $z = [z_1, ..., z_n]^\textrm{T}$ 定义为:

$$ z_i = \operatorname{softmax} (x_i) = \frac{e^{x_i}}{\sum_{k=1}^n e^{x_k}} $$

下面给出了 softmax 函数的导数。

首先,

$$ z = \operatorname{softmax} (x) = \frac{1}{\sum_i e^{x_i}} [e^{x_1}, ..., e^{x_n}]^\textrm{T} = \frac{\exp (x)}{\sum_i e^{x_i}} = \frac{\exp (x)}{I_n^\textrm{T} \exp (x)} \in \mathbb{R}^n $$

其中 $I_n = [1, ..., 1]^\textrm{T} \in \mathbb{R}^n$。所以

$$ \frac{\partial z}{\partial x} = \frac{\partial \Big( \frac{\exp (x)}{I_n^\textrm{T} \exp (x)} \Big)}{\partial x} = \frac{\partial \Big( \exp (x) \cdot \frac{1}{I_n^\textrm{T} \exp (x)} \Big)}{\partial x} $$

根据乘法法则三,进一步有

\begin{aligned} \frac{\partial \Big( \exp (x) \cdot \frac{1}{I_n^\textrm{T} \exp (x)} \Big)}{\partial x} = \frac{\partial \exp (x)}{\partial x} \cdot \frac{1}{I_n^\textrm{T} \exp (x)} + \frac{\partial \Big( \frac{1}{I_n^\textrm{T} \exp (x)} \Big)}{\partial x} \exp^\textrm{T} (x) \\ = \frac{\operatorname{diag} \Big( \exp (x) \Big)}{I_n^\textrm{T} \exp (x)} - \Bigg( \frac{1}{I_n^\textrm{T} \exp (x)} \Bigg)^2 \cdot \frac{\partial \Big( I_n^\textrm{T} \exp (x) \Big)}{\partial x} \exp^\textrm{T} (x) \end{aligned}

紧接着,根据乘法法则一,有

$$ \frac{\partial \Big( I_n^\textrm{T} \exp (x) \Big)}{\partial x} = \frac{\partial \exp (x)}{\partial x} \cdot I_n + \frac{\partial I_n^\textrm{T}}{\partial x} \exp^\textrm{T} (x) = \operatorname{diag} \Big( \exp(x) \Big) I_n = \exp (x) $$

所以

\begin{aligned} \frac{\partial z}{\partial x} = \operatorname{diag} \Bigg( \frac{\exp (x) }{I_n^\textrm{T} \exp (x)} \Bigg) - \frac{\exp (x)}{I_n^\textrm{T} \exp (x)} \cdot \frac{\exp^\textrm{T} (x)}{I_n^\textrm{T} \exp (x)} \\ = \operatorname{diag} \Big( \operatorname{softmax} (x) \Big) - \operatorname{softmax} (x) \cdot \operatorname{softmax}^\textrm{T} (x) \end{aligned}

最后

在本科阶段,非数学相关专业的同学很少讲解矩阵微积分。然而,矩阵微积分又是机器学习及相关理论的基础,是必须要熟练掌握的。本文仅是抛砖引玉,在实际科研中,应结合 The Matrix Cookbook 对复杂问题具体分析。

转载申请

本作品采用 知识共享署名 4.0 国际许可协议 进行许可, 转载时请注明原文链接。您必须给出适当的署名,并标明是否对本文作了修改。

您也可以通过下方按钮直接分享本页面:


发表评论

登录以发表评论

最新评论


Designed & written by Hailiang Zhao.
hliangzhao.cn. Copyright © 2021 - 2022 | 浙ICP备2021026965号-1
Manage