1、全微分
当X是矩阵,$dy = tr(\frac{\partial y}{\partial X}^T dX)$
当X是向量,$dy = \frac{\partial y}{\partial X}^T dX=tr(\frac{\partial y}{\partial X}^T dX)$
2、活用迹(tr)
$(1) a是标量,a = tr(a)$
$(2) A,B为方阵,tr(AB) = tr(BA)$
$(3) tr(A) = tr(A^T)$
$(4) tr(A+B) = tr(A)+tr(B)$
$(5) 微分d(X^T) = (dX)^T$
这些公式将用于下面的求导
3、矩阵求导例子
下面将展示一些用上面公式求矩阵导数的例子:
例1:
$X = (x_1,…,x_n)^T$是向量,$A$是与$X$无关的矩阵:$y = X^TAX ,求 \frac{\partial y}{\partial X}?$
全微分表达式:
$dy = (dX^T)AX + X^TA(dX)$
由公式(1)得:
$dy = tr((dX^T)AX + X^TA(dX))$
由公式(2)得:
$dy = tr((dX^T)AX)+tr( X^TA(dX))$
由公式(5) (3)得:
$dy = tr((dX)^TAX)+tr( X^TA(dX))= tr(X^TA^TdX)+tr( X^TA(dX))$
由公式(2)得:
$dy = tr(X^TA^T(dX) + X^TA(dX))=tr(X^T(A^T+A)dX)$
$\because dy = tr(\frac{\partial y}{\partial X}^T dX)$
$\Rightarrow \frac{\partial y}{\partial X}^T = X^T(A^T+A)$
$\Rightarrow \frac{\partial y}{\partial X} = (X^T(A^T+A))^T= (A+A^T)X$
例2:
$y = tr(AB),求\frac{\partial y}{\partial A}?$
全微分表达式:
$dy = tr[(dA)B]$
由公式(2)得:
$dy = tr[B\ dA]$
$\because dy = tr(\frac{\partial y}{\partial X}^T dX)$
$\Rightarrow \frac{\partial y}{\partial A}^T = B$
$\Rightarrow \frac{\partial y}{\partial A} =B^T$