KL Divergence between two multivariate normal distributions(使用了trace 和Expection的性质)
2015-10-29 12:58
555 查看
http://stats.stackexchange.com/questions/60680/kl-divergence-between-two-multivariate-gaussians
I give a detailed derivation process for the KL Divergence between two multivariate normal distributions.
Given p(x)∼N(μ1,Σ1),q(x)∼N(μ2,Σ2)p(x)\sim N(\mu_1, \Sigma_1), q(x)\sim N(\mu_2, \Sigma_2), we need to solve the KL KL=∫[log(p(x))−log(q(x))] p(x) dxKL=\int \left[\log( p(x)) - \log( q(x)) \right]\ p(x)\ dx
Firstly, we have the multivariate normal distribution pdf,
p(x)=(2π)−k2|Σ1|−12exp{−12(x−μ1)TΣ−11(x−μ1)}p(x) = (2\pi)^{-\frac{k}{2}}|{\Sigma_1}|^{-\frac{1}{2}} \exp\{-\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)\}
logp(x)=−k2log2π−12log|Σ1|−12(x−μ1)TΣ−11(x−μ1)\log p(x) = -\frac{k}{2} \log 2\pi -\frac{1}{2} \log|{\Sigma_1}|-\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)
logq(x)=−k2log2π−12log|Σ2|−12(x−μ2)TΣ−12(x−μ2)\log q(x) = -\frac{k}{2} \log 2\pi -\frac{1}{2} \log|{\Sigma_2}|-\frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2)
From the reference link above, we know a general derivation,
KL=∫[12log|Σ2||Σ1|−12(x−μ1)TΣ−11(x−μ1)+12(x−μ2)TΣ−12(x−μ2)]×p(x)dx=12log|Σ2||Σ1|−12tr {E[(x−μ1)(x−μ1)T] Σ−11}+12E[(x−μ2)TΣ−12(x−μ2)]=12log|Σ2||Σ1|−12tr {Id}+12(μ1−μ2)′Σ−12(μ1−μ2)+12tr{Σ−12Σ1}=12[log|Σ2||Σ1|−d+tr(Σ−12Σ1)+(μ2−μ1)TΣ−12(μ2−μ1)].\begin{aligned}
KL &= \int \left[ \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} (x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) + \frac{1}{2} (x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) \right] \times p(x) dx \\
&= \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} \text{tr}\ \left\{E[(x - \mu_1)(x - \mu_1)^T] \ \Sigma_1^{-1} \right\} + \frac{1}{2} E[(x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2)] \\
&= \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} \text{tr}\ \{I_d \} + \frac{1}{2} (\mu_1 - \mu_2)' \Sigma_2^{-1} (\mu_1 - \mu_2) + \frac{1}{2} \text{tr} \{ \Sigma_2^{-1} \Sigma_1 \} \\
&= \frac{1}{2}\left[\log\frac{|\Sigma_2|}{|\Sigma_1|} - d + \text{tr} (\Sigma_2^{-1}\Sigma_1) + (\mu_2 - \mu_1)^T \Sigma_2^{-1}(\mu_2 - \mu_1)\right].
\end{aligned}
Here, I give the detailed process for ∫12(x−μ1)TΣ−11(x−μ1)p(x)dx\int \frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) p(x) dx and ∫12(x−μ2)TΣ−12(x−μ2)p(x)dx\int \frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) p(x) dx.
tr(AB)=tr(BA)\text{tr}(AB) = \text{tr}(BA)
tr(ABC)=tr(BCA)\text{tr}(ABC) = \text{tr}(BCA)
tr(ABC)=tr(CAB)\text{tr}(ABC) = \text{tr}(CAB)
tr(ABC)≠tr(ACB)\text{tr}(ABC) \neq \text{tr}(ACB)
E(tr(x))=(tr(E(x))\mathbb{E}(\text{tr}(x))= (\text{tr}(\mathbb{E}(x)), Expectation symbol can be exchanged by trace.
\begin{aligned}
&\int \frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) p(x) dx\\
&=\mathbb{E}_p(\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) ) \quad \text{the expectation here is related to p(x) rather than q(x)}\\
&=\mathbb{E}_p(\text{tr}(\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) ))\\
&=\mathbb{E}_p(\text{tr}(\frac{1}{2}(x-\mu_1)(x-\mu_1)^T\Sigma_1^{-1} ))\\
&=\text{tr}(\mathbb{E}_p(\frac{1}{2}(x-\mu_1)(x-\mu_1)^T\Sigma_1^{-1} ))\\
&=\text{tr}(\mathbb{E}_p[(x-\mu_1)(x-\mu_1)^T]\frac{1}{2}\Sigma_1^{-1} )\quad \text{the definition of covariance matrix}\\
&=\text{tr}(\Sigma_1 \frac{1}{2}\Sigma_1^{-1})\\
&=\text{tr}(I_d)\\
&=d\\
\end{aligned}
\begin{aligned}
&\int \frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) \times p(x) dx\\
&=\int \frac{1}{2}[(x-\mu_1) + (\mu_1 - \mu_2)]^T\Sigma_2^{-1}[(x-\mu_1) + (\mu_1 - \mu_2)]\times p(x) dx\\
&=\int \frac{1}{2}\{(x-\mu_1)^T\Sigma_2^{-1}(x-\mu_1) + 2(x-\mu_1)^T\Sigma_2^{-1}(\mu_1-\mu_2) + (\mu_1-\mu_2)^T\Sigma_2^{-1}(\mu_1-\mu_2)\}\times p(x)dx\\
&=\text{tr}(\Sigma_2^{-1}\Sigma_1) +0+ (\mu_1-\mu_2)^T\Sigma_2^{-1}(\mu_1-\mu_2)\\
\end{aligned}
I give a detailed derivation process for the KL Divergence between two multivariate normal distributions.
Given p(x)∼N(μ1,Σ1),q(x)∼N(μ2,Σ2)p(x)\sim N(\mu_1, \Sigma_1), q(x)\sim N(\mu_2, \Sigma_2), we need to solve the KL KL=∫[log(p(x))−log(q(x))] p(x) dxKL=\int \left[\log( p(x)) - \log( q(x)) \right]\ p(x)\ dx
Firstly, we have the multivariate normal distribution pdf,
p(x)=(2π)−k2|Σ1|−12exp{−12(x−μ1)TΣ−11(x−μ1)}p(x) = (2\pi)^{-\frac{k}{2}}|{\Sigma_1}|^{-\frac{1}{2}} \exp\{-\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)\}
logp(x)=−k2log2π−12log|Σ1|−12(x−μ1)TΣ−11(x−μ1)\log p(x) = -\frac{k}{2} \log 2\pi -\frac{1}{2} \log|{\Sigma_1}|-\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)
logq(x)=−k2log2π−12log|Σ2|−12(x−μ2)TΣ−12(x−μ2)\log q(x) = -\frac{k}{2} \log 2\pi -\frac{1}{2} \log|{\Sigma_2}|-\frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2)
From the reference link above, we know a general derivation,
KL=∫[12log|Σ2||Σ1|−12(x−μ1)TΣ−11(x−μ1)+12(x−μ2)TΣ−12(x−μ2)]×p(x)dx=12log|Σ2||Σ1|−12tr {E[(x−μ1)(x−μ1)T] Σ−11}+12E[(x−μ2)TΣ−12(x−μ2)]=12log|Σ2||Σ1|−12tr {Id}+12(μ1−μ2)′Σ−12(μ1−μ2)+12tr{Σ−12Σ1}=12[log|Σ2||Σ1|−d+tr(Σ−12Σ1)+(μ2−μ1)TΣ−12(μ2−μ1)].\begin{aligned}
KL &= \int \left[ \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} (x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) + \frac{1}{2} (x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) \right] \times p(x) dx \\
&= \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} \text{tr}\ \left\{E[(x - \mu_1)(x - \mu_1)^T] \ \Sigma_1^{-1} \right\} + \frac{1}{2} E[(x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2)] \\
&= \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} \text{tr}\ \{I_d \} + \frac{1}{2} (\mu_1 - \mu_2)' \Sigma_2^{-1} (\mu_1 - \mu_2) + \frac{1}{2} \text{tr} \{ \Sigma_2^{-1} \Sigma_1 \} \\
&= \frac{1}{2}\left[\log\frac{|\Sigma_2|}{|\Sigma_1|} - d + \text{tr} (\Sigma_2^{-1}\Sigma_1) + (\mu_2 - \mu_1)^T \Sigma_2^{-1}(\mu_2 - \mu_1)\right].
\end{aligned}
Here, I give the detailed process for ∫12(x−μ1)TΣ−11(x−μ1)p(x)dx\int \frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) p(x) dx and ∫12(x−μ2)TΣ−12(x−μ2)p(x)dx\int \frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) p(x) dx.
Trace and Expectation Tricks
Given xx is a scalar value, E(x)=E(tr(x))\mathbb{E}(x)=\mathbb{E}(\text{tr}(x)) since the trace of a scalar value is the scalar itself. Specially, for the expectation of quadratic form, E(xTAx)=E(tr(xTAx))=E(tr(AxxT))=tr(E(AxxT))\mathbb{E}(x^TAx)=\mathbb{E}(\text{tr}(x^TAx))=\mathbb{E}(\text{tr}(Axx^T))=\text{tr}(\mathbb{E}(Axx^T)) based on the properties below. After adding the trace symbol, we can exchange the position within the quadratic form.tr(AB)=tr(BA)\text{tr}(AB) = \text{tr}(BA)
tr(ABC)=tr(BCA)\text{tr}(ABC) = \text{tr}(BCA)
tr(ABC)=tr(CAB)\text{tr}(ABC) = \text{tr}(CAB)
tr(ABC)≠tr(ACB)\text{tr}(ABC) \neq \text{tr}(ACB)
E(tr(x))=(tr(E(x))\mathbb{E}(\text{tr}(x))= (\text{tr}(\mathbb{E}(x)), Expectation symbol can be exchanged by trace.
Part 1
∫12(x−μ1)TΣ−11(x−μ1)p(x)dx=Ep(12(x−μ1)TΣ−11(x−μ1))the expectation here is related to p(x) rather than q(x)=Ep(tr(12(x−μ1)TΣ−11(x−μ1)))=Ep(tr(12(x−μ1)(x−μ1)TΣ−11))=tr(Ep(12(x−μ1)(x−μ1)TΣ−11))=tr(Ep[(x−μ1)(x−μ1)T]12Σ−11)the definition of covariance matrix=tr(Σ112Σ−11)=tr(Id)=d\begin{aligned}
&\int \frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) p(x) dx\\
&=\mathbb{E}_p(\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) ) \quad \text{the expectation here is related to p(x) rather than q(x)}\\
&=\mathbb{E}_p(\text{tr}(\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) ))\\
&=\mathbb{E}_p(\text{tr}(\frac{1}{2}(x-\mu_1)(x-\mu_1)^T\Sigma_1^{-1} ))\\
&=\text{tr}(\mathbb{E}_p(\frac{1}{2}(x-\mu_1)(x-\mu_1)^T\Sigma_1^{-1} ))\\
&=\text{tr}(\mathbb{E}_p[(x-\mu_1)(x-\mu_1)^T]\frac{1}{2}\Sigma_1^{-1} )\quad \text{the definition of covariance matrix}\\
&=\text{tr}(\Sigma_1 \frac{1}{2}\Sigma_1^{-1})\\
&=\text{tr}(I_d)\\
&=d\\
\end{aligned}
Part2
∫12(x−μ2)TΣ−12(x−μ2)×p(x)dx=∫12[(x−μ1)+(μ1−μ2)]TΣ−12[(x−μ1)+(μ1−μ2)]×p(x)dx=∫12{(x−μ1)TΣ−12(x−μ1)+2(x−μ1)TΣ−12(μ1−μ2)+(μ1−μ2)TΣ−12(μ1−μ2)}×p(x)dx=tr(Σ−12Σ1)+0+ (μ1−μ2)TΣ−12(μ1−μ2)\begin{aligned}
&\int \frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) \times p(x) dx\\
&=\int \frac{1}{2}[(x-\mu_1) + (\mu_1 - \mu_2)]^T\Sigma_2^{-1}[(x-\mu_1) + (\mu_1 - \mu_2)]\times p(x) dx\\
&=\int \frac{1}{2}\{(x-\mu_1)^T\Sigma_2^{-1}(x-\mu_1) + 2(x-\mu_1)^T\Sigma_2^{-1}(\mu_1-\mu_2) + (\mu_1-\mu_2)^T\Sigma_2^{-1}(\mu_1-\mu_2)\}\times p(x)dx\\
&=\text{tr}(\Sigma_2^{-1}\Sigma_1) +0+ (\mu_1-\mu_2)^T\Sigma_2^{-1}(\mu_1-\mu_2)\\
\end{aligned}
相关文章推荐
- IP|子网|子网掩码| 域名|DNS
- xampp里mysql数据库结构一直加载,如何在mysql页面创建自增id
- 5.7 Pairwise Swap
- c#读写App.config,ConfigurationManager.AppSettings 不生效的解决方法
- ubuntu kylin 中的一些个人使用技巧
- objective学习笔记1
- mysql 如何用一条SQL将一张表里的数据插入到另一张表 3个例子
- Oracle级联递归查询
- VS中C#读取app.config数据库配置字符串的三种方法
- android:scaleType属性
- PostgreSQL新手入门
- Edge水流向题(Problem ID:1033)
- Apriori算法
- 我的linux服务器(二)
- android:descendantFocusability用法简析(listview的item点击无响应问题)
- 关于arXive的一点了解
- bootstrap的栅格系统
- IO类
- android隐藏标题栏后影响样式的解决方案
- 简单servlet 监听