矩阵求导方法

1、全微分

  • 当X是矩阵,$dy = tr(\frac{\partial y}{\partial X}^T dX)$

  • 当X是向量,$dy = \frac{\partial y}{\partial X}^T dX=tr(\frac{\partial y}{\partial X}^T dX)$

2、活用迹(tr)

$(1) a是标量,a = tr(a)$

$(2) A,B为方阵,tr(AB) = tr(BA)$

$(3) tr(A) = tr(A^T)$

$(4) tr(A+B) = tr(A)+tr(B)$

$(5) 微分d(X^T) = (dX)^T$

这些公式将用于下面的求导

3、矩阵求导例子

下面将展示一些用上面公式求矩阵导数的例子:


例1:

$X = (x_1,…,x_n)^T$是向量,$A$是与$X$无关的矩阵:$y = X^TAX ,求 \frac{\partial y}{\partial X}?$

全微分表达式:
$dy = (dX^T)AX + X^TA(dX)$

由公式(1)得:

$dy = tr((dX^T)AX + X^TA(dX))$

由公式(2)得:

$dy = tr((dX^T)AX)+tr( X^TA(dX))$

由公式(5) (3)得:

$dy = tr((dX)^TAX)+tr( X^TA(dX))= tr(X^TA^TdX)+tr( X^TA(dX))$

由公式(2)得:

$dy = tr(X^TA^T(dX) + X^TA(dX))=tr(X^T(A^T+A)dX)$

$\because dy = tr(\frac{\partial y}{\partial X}^T dX)$

$\Rightarrow \frac{\partial y}{\partial X}^T = X^T(A^T+A)$

$\Rightarrow \frac{\partial y}{\partial X} = (X^T(A^T+A))^T= (A+A^T)X$


例2:

$y = tr(AB),求\frac{\partial y}{\partial A}?$

全微分表达式:
$dy = tr[(dA)B]$

由公式(2)得:
$dy = tr[B\ dA]$

$\because dy = tr(\frac{\partial y}{\partial X}^T dX)$

$\Rightarrow \frac{\partial y}{\partial A}^T = B$

$\Rightarrow \frac{\partial y}{\partial A} =B^T$