矩阵求导
2015-12-13 17:36
232 查看
关于矩阵求导讲的最详细的还是wiki上的页面https://en.wikipedia.org/wiki/Matrix_calculus
关于矩阵求导,很多地方会有不同的表现形式,说到底是这么一回事,一个m维的向量y对n维的向量x求导∂y∂x,得到的结果应该是m乘n还是n乘以m。具体内容可以看wikipedia。
y的元素以列的形式布局,x以行的形式,或是反过来,这就导致了不同的可能性:
分子布局(numerator layout):根据y或者xT来布局,也叫Jacobian
formulation
分母布局(denominator layout):根据yT或者x来布局,也叫Hessian
formulation
A third possibility sometimes seen is to insist on writing the derivative as ∂y∂x′,
(i.e. the derivative is taken with respect to the transpose of x) and follow the numerator layout. This makes it possible to claim that the matrix is laid out according to both numerator and denominator. In practice this produces results the same as the numerator
layout.
When handling the [[gradient]] ∂y∂x and
the opposite case ∂y∂x, we
have the same issues. To be consistent, we should do one of the following:
If we choose numerator layout for ∂y∂x, we
should lay out the [[gradient]] ∂y∂x as
a row vector, and ∂y∂x as
a column vector.
If we choose denominator layout for ∂y∂x, we
should lay out the [[gradient]] ∂y∂x as
a column vector, and ∂y∂x as
a row vector.
In the third possibility above, we write ∂y∂x′ and∂y∂x, and
use numerator layout.
Not all math textbooks and papers are consistent in this respect throughout the entire paper. That is, sometimes different conventions are used in different contexts within the same paper. For example, some choose denominator layout for gradients (laying them
out as column vectors), but numerator layout for the vector-by-vector derivative ∂y∂x.
Similarly, when it comes to scalar-by-matrix derivatives ∂y∂X and
matrix-by-scalar derivatives ∂Y∂x, then
consistent numerator layout lays out according to ”Y”’
and ‘XT”,
while consistent denominator layout lays out according to ”YT”and
”X”. In practice, however, following a denominator layout for ∂Y∂x, and
laying the result out according to ”YT”,
is rarely seen because it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the following layouts can often be found:
”Consistent numerator layout”, which lays out ∂Y∂x according
to ”Y′’
and ∂y∂X according
to ”XT”.
”Mixed layout”, which lays out ∂Y∂x according
to ”Y”
and ∂y∂X according
to ”’X”’.
Use the notation ∂y∂XT,with
results the same as consistent numerator layout.
In the following formulas, we handle the five possible combinations ∂y∂x,∂y∂x,∂y∂x,∂y∂X and∂Y∂x separately.
We also handle cases of scalar-by-scalar derivatives that involve an intermediate vector or matrix. (This can arise, for example, if a multi-dimensional [[parametric curve]] is defined in terms of a scalar variable, and then a derivative of a scalar function
of the curve is taken with respect to the scalar that parameterizes the curve.) For each of the various combinations, we give numerator-layout and denominator-layout results, except in the cases above where denominator layout rarely occurs. In cases involving
matrices where it makes sense, we give numerator-layout and mixed-layout results. As noted above, cases where vector and matrix denominators are written in transpose notation are equivalent to numerator layout with the denominators written without the transpose.
Keep in mind that various authors use different combinations of numerator and denominator layouts for different types of derivatives, and there is no guarantee that an author will consistently use either numerator or denominator layout for all types. Match
up the formulas below with those quoted in the source to determine the layout used for that particular type of derivative, but be careful not to assume that derivatives of other types necessarily follow the same kind of layout.
When taking derivatives with an aggregate (vector or matrix) denominator in order to find a maximum or minimum of the aggregate, it should be kept in mind that using numerator layout will produce results that are transposed with respect to the aggregate. For
example, in attempting to find the [[maximum likelihood]] estimate of a [[multivariate normal distribution]] using matrix calculus, if the domain is a ”k”x1 column vector, then the result using the numerator layout will be in the form of a 1x”k” row vector.
Thus, either the results should be transposed at the end or the denominator layout (or mixed layout) should be used.
The results of operations will be transposed when switching between numerator-layout and denominator-layout notation.
=== Numerator-layout notation ===
Using numerator-layout notation, we have:Minka, Thomas P. “Old and New Matrix Algebra Useful for Statistics.” December 28, 2000. []http://research.microsoft.com/en-us/um/people/minka/papers/matrix/]
:
∂y∂x=[∂y∂x1∂y∂x2⋯∂y∂xn].
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x∂y2∂x⋮∂ym∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y2∂x1⋮∂ym∂x1∂y1∂x2∂y2∂x2⋮∂ym∂x2⋯⋯⋱⋯∂y1∂xn∂y2∂xn⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x12⋮∂y∂x1q∂y∂x21∂y∂x22⋮∂y∂x2q⋯⋯⋱⋯∂y∂xp1∂y∂xp2⋮∂y∂xpq⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
The following definitions are only provided in numerator-layout notation:
∂Y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y11∂x∂y21∂x⋮∂ym1∂x∂y12∂x∂y22∂x⋮∂ym2∂x⋯⋯⋱⋯∂y1n∂x∂y2n∂x⋮∂ymn∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
dX=⎡⎣⎢⎢⎢⎢⎢dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋱⋯dx1ndx2n⋮dxmn⎤⎦⎥⎥⎥⎥⎥.
===Denominator-layout notation===
Using denominator-layout notation, we have:[]http://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/IFEM.AppD.pdf]
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x1∂y∂x2⋮∂y∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
∂y∂x=[∂y1∂x∂y2∂x⋯∂ym∂x].
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y1∂x2⋮∂y1∂xn∂y2∂x1∂y2∂x2⋮∂y2∂xn⋯⋯⋱⋯∂ym∂x1∂ym∂x2⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x21⋮∂y∂xp1∂y∂x12∂y∂x22⋮∂y∂xp2⋯⋯⋱⋯∂y∂x1q∂y∂x2q⋮∂y∂xpq
关于矩阵求导,很多地方会有不同的表现形式,说到底是这么一回事,一个m维的向量y对n维的向量x求导∂y∂x,得到的结果应该是m乘n还是n乘以m。具体内容可以看wikipedia。
y的元素以列的形式布局,x以行的形式,或是反过来,这就导致了不同的可能性:
分子布局(numerator layout):根据y或者xT来布局,也叫Jacobian
formulation
分母布局(denominator layout):根据yT或者x来布局,也叫Hessian
formulation
A third possibility sometimes seen is to insist on writing the derivative as ∂y∂x′,
(i.e. the derivative is taken with respect to the transpose of x) and follow the numerator layout. This makes it possible to claim that the matrix is laid out according to both numerator and denominator. In practice this produces results the same as the numerator
layout.
When handling the [[gradient]] ∂y∂x and
the opposite case ∂y∂x, we
have the same issues. To be consistent, we should do one of the following:
If we choose numerator layout for ∂y∂x, we
should lay out the [[gradient]] ∂y∂x as
a row vector, and ∂y∂x as
a column vector.
If we choose denominator layout for ∂y∂x, we
should lay out the [[gradient]] ∂y∂x as
a column vector, and ∂y∂x as
a row vector.
In the third possibility above, we write ∂y∂x′ and∂y∂x, and
use numerator layout.
Not all math textbooks and papers are consistent in this respect throughout the entire paper. That is, sometimes different conventions are used in different contexts within the same paper. For example, some choose denominator layout for gradients (laying them
out as column vectors), but numerator layout for the vector-by-vector derivative ∂y∂x.
Similarly, when it comes to scalar-by-matrix derivatives ∂y∂X and
matrix-by-scalar derivatives ∂Y∂x, then
consistent numerator layout lays out according to ”Y”’
and ‘XT”,
while consistent denominator layout lays out according to ”YT”and
”X”. In practice, however, following a denominator layout for ∂Y∂x, and
laying the result out according to ”YT”,
is rarely seen because it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the following layouts can often be found:
”Consistent numerator layout”, which lays out ∂Y∂x according
to ”Y′’
and ∂y∂X according
to ”XT”.
”Mixed layout”, which lays out ∂Y∂x according
to ”Y”
and ∂y∂X according
to ”’X”’.
Use the notation ∂y∂XT,with
results the same as consistent numerator layout.
In the following formulas, we handle the five possible combinations ∂y∂x,∂y∂x,∂y∂x,∂y∂X and∂Y∂x separately.
We also handle cases of scalar-by-scalar derivatives that involve an intermediate vector or matrix. (This can arise, for example, if a multi-dimensional [[parametric curve]] is defined in terms of a scalar variable, and then a derivative of a scalar function
of the curve is taken with respect to the scalar that parameterizes the curve.) For each of the various combinations, we give numerator-layout and denominator-layout results, except in the cases above where denominator layout rarely occurs. In cases involving
matrices where it makes sense, we give numerator-layout and mixed-layout results. As noted above, cases where vector and matrix denominators are written in transpose notation are equivalent to numerator layout with the denominators written without the transpose.
Keep in mind that various authors use different combinations of numerator and denominator layouts for different types of derivatives, and there is no guarantee that an author will consistently use either numerator or denominator layout for all types. Match
up the formulas below with those quoted in the source to determine the layout used for that particular type of derivative, but be careful not to assume that derivatives of other types necessarily follow the same kind of layout.
When taking derivatives with an aggregate (vector or matrix) denominator in order to find a maximum or minimum of the aggregate, it should be kept in mind that using numerator layout will produce results that are transposed with respect to the aggregate. For
example, in attempting to find the [[maximum likelihood]] estimate of a [[multivariate normal distribution]] using matrix calculus, if the domain is a ”k”x1 column vector, then the result using the numerator layout will be in the form of a 1x”k” row vector.
Thus, either the results should be transposed at the end or the denominator layout (or mixed layout) should be used.
The results of operations will be transposed when switching between numerator-layout and denominator-layout notation.
=== Numerator-layout notation ===
Using numerator-layout notation, we have:Minka, Thomas P. “Old and New Matrix Algebra Useful for Statistics.” December 28, 2000. []http://research.microsoft.com/en-us/um/people/minka/papers/matrix/]
:
∂y∂x=[∂y∂x1∂y∂x2⋯∂y∂xn].
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x∂y2∂x⋮∂ym∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y2∂x1⋮∂ym∂x1∂y1∂x2∂y2∂x2⋮∂ym∂x2⋯⋯⋱⋯∂y1∂xn∂y2∂xn⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x12⋮∂y∂x1q∂y∂x21∂y∂x22⋮∂y∂x2q⋯⋯⋱⋯∂y∂xp1∂y∂xp2⋮∂y∂xpq⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
The following definitions are only provided in numerator-layout notation:
∂Y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y11∂x∂y21∂x⋮∂ym1∂x∂y12∂x∂y22∂x⋮∂ym2∂x⋯⋯⋱⋯∂y1n∂x∂y2n∂x⋮∂ymn∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
dX=⎡⎣⎢⎢⎢⎢⎢dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋱⋯dx1ndx2n⋮dxmn⎤⎦⎥⎥⎥⎥⎥.
===Denominator-layout notation===
Using denominator-layout notation, we have:[]http://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/IFEM.AppD.pdf]
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x1∂y∂x2⋮∂y∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
∂y∂x=[∂y1∂x∂y2∂x⋯∂ym∂x].
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y1∂x2⋮∂y1∂xn∂y2∂x1∂y2∂x2⋮∂y2∂xn⋯⋯⋱⋯∂ym∂x1∂ym∂x2⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥.
∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x21⋮∂y∂xp1∂y∂x12∂y∂x22⋮∂y∂xp2⋯⋯⋱⋯∂y∂x1q∂y∂x2q⋮∂y∂xpq
相关文章推荐
- 浅谈Java中的equals和==
- 关于css的absolute和relative的问题理解
- SendRedirect和forward差分
- openwrt修改flash大小
- 欢迎使用CSDN-markdown编辑器
- 传智播客视频
- 第十五周 项目三 B-树的基本操作
- [LeetCode]:Single Number III
- Node.js实战(1)
- 铺垫一个 数据类型
- Person Re-identification Datasets
- 关于NestableRuntimeException异常的解决
- RecyclerView的事件监听
- mysql修改表alter小结
- Java 多线程(四)线程间的通信jdk1.5中Lock,Condition----各种锁的相关详细概念
- 菜鸟学习Hibernate——简单的增、删、改、查操作
- onvif学习笔记2:了解onvif
- MTK RF参数写入代码操作步骤 (2012-09-25 11:43:10)
- NSURLSessionConfiguration的简单实用
- GBDT(MART) 迭代决策树入门教程 | 简介