您的位置:首页 > 其它

KL Divergence between two multivariate normal distributions(使用了trace 和Expection的性质)

2015-10-29 12:58 555 查看
http://stats.stackexchange.com/questions/60680/kl-divergence-between-two-multivariate-gaussians

I give a detailed derivation process for the KL Divergence between two multivariate normal distributions.

Given p(x)∼N(μ1,Σ1),q(x)∼N(μ2,Σ2)p(x)\sim N(\mu_1, \Sigma_1), q(x)\sim N(\mu_2, \Sigma_2), we need to solve the KL KL=∫[log(p(x))−log(q(x))] p(x) dxKL=\int \left[\log( p(x)) - \log( q(x)) \right]\ p(x)\ dx

Firstly, we have the multivariate normal distribution pdf,

p(x)=(2π)−k2|Σ1|−12exp{−12(x−μ1)TΣ−11(x−μ1)}p(x) = (2\pi)^{-\frac{k}{2}}|{\Sigma_1}|^{-\frac{1}{2}} \exp\{-\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)\}

logp(x)=−k2log2π−12log|Σ1|−12(x−μ1)TΣ−11(x−μ1)\log p(x) = -\frac{k}{2} \log 2\pi -\frac{1}{2} \log|{\Sigma_1}|-\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1)

logq(x)=−k2log2π−12log|Σ2|−12(x−μ2)TΣ−12(x−μ2)\log q(x) = -\frac{k}{2} \log 2\pi -\frac{1}{2} \log|{\Sigma_2}|-\frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2)

From the reference link above, we know a general derivation,

KL=∫[12log|Σ2||Σ1|−12(x−μ1)TΣ−11(x−μ1)+12(x−μ2)TΣ−12(x−μ2)]×p(x)dx=12log|Σ2||Σ1|−12tr {E[(x−μ1)(x−μ1)T] Σ−11}+12E[(x−μ2)TΣ−12(x−μ2)]=12log|Σ2||Σ1|−12tr {Id}+12(μ1−μ2)′Σ−12(μ1−μ2)+12tr{Σ−12Σ1}=12[log|Σ2||Σ1|−d+tr(Σ−12Σ1)+(μ2−μ1)TΣ−12(μ2−μ1)].\begin{aligned}
KL &= \int \left[ \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} (x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) + \frac{1}{2} (x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) \right] \times p(x) dx \\
&= \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} \text{tr}\ \left\{E[(x - \mu_1)(x - \mu_1)^T] \ \Sigma_1^{-1} \right\} + \frac{1}{2} E[(x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2)] \\
&= \frac{1}{2} \log\frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2} \text{tr}\ \{I_d \} + \frac{1}{2} (\mu_1 - \mu_2)' \Sigma_2^{-1} (\mu_1 - \mu_2) + \frac{1}{2} \text{tr} \{ \Sigma_2^{-1} \Sigma_1 \} \\
&= \frac{1}{2}\left[\log\frac{|\Sigma_2|}{|\Sigma_1|} - d + \text{tr} (\Sigma_2^{-1}\Sigma_1) + (\mu_2 - \mu_1)^T \Sigma_2^{-1}(\mu_2 - \mu_1)\right].
\end{aligned}

Here, I give the detailed process for ∫12(x−μ1)TΣ−11(x−μ1)p(x)dx\int \frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) p(x) dx and ∫12(x−μ2)TΣ−12(x−μ2)p(x)dx\int \frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) p(x) dx.

Trace and Expectation Tricks

Given xx is a scalar value, E(x)=E(tr(x))\mathbb{E}(x)=\mathbb{E}(\text{tr}(x)) since the trace of a scalar value is the scalar itself. Specially, for the expectation of quadratic form, E(xTAx)=E(tr(xTAx))=E(tr(AxxT))=tr(E(AxxT))\mathbb{E}(x^TAx)=\mathbb{E}(\text{tr}(x^TAx))=\mathbb{E}(\text{tr}(Axx^T))=\text{tr}(\mathbb{E}(Axx^T)) based on the properties below. After adding the trace symbol, we can exchange the position within the quadratic form.

tr(AB)=tr(BA)\text{tr}(AB) = \text{tr}(BA)

tr(ABC)=tr(BCA)\text{tr}(ABC) = \text{tr}(BCA)

tr(ABC)=tr(CAB)\text{tr}(ABC) = \text{tr}(CAB)

tr(ABC)≠tr(ACB)\text{tr}(ABC) \neq \text{tr}(ACB)

E(tr(x))=(tr(E(x))\mathbb{E}(\text{tr}(x))= (\text{tr}(\mathbb{E}(x)), Expectation symbol can be exchanged by trace.

Part 1

∫12(x−μ1)TΣ−11(x−μ1)p(x)dx=Ep(12(x−μ1)TΣ−11(x−μ1))the expectation here is related to p(x) rather than q(x)=Ep(tr(12(x−μ1)TΣ−11(x−μ1)))=Ep(tr(12(x−μ1)(x−μ1)TΣ−11))=tr(Ep(12(x−μ1)(x−μ1)TΣ−11))=tr(Ep[(x−μ1)(x−μ1)T]12Σ−11)the definition of covariance matrix=tr(Σ112Σ−11)=tr(Id)=d
\begin{aligned}
&\int \frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) p(x) dx\\
&=\mathbb{E}_p(\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) ) \quad \text{the expectation here is related to p(x) rather than q(x)}\\
&=\mathbb{E}_p(\text{tr}(\frac{1}{2}(x-\mu_1)^T\Sigma_1^{-1}(x-\mu_1) ))\\
&=\mathbb{E}_p(\text{tr}(\frac{1}{2}(x-\mu_1)(x-\mu_1)^T\Sigma_1^{-1} ))\\
&=\text{tr}(\mathbb{E}_p(\frac{1}{2}(x-\mu_1)(x-\mu_1)^T\Sigma_1^{-1} ))\\
&=\text{tr}(\mathbb{E}_p[(x-\mu_1)(x-\mu_1)^T]\frac{1}{2}\Sigma_1^{-1} )\quad \text{the definition of covariance matrix}\\
&=\text{tr}(\Sigma_1 \frac{1}{2}\Sigma_1^{-1})\\
&=\text{tr}(I_d)\\
&=d\\
\end{aligned}

Part2

∫12(x−μ2)TΣ−12(x−μ2)×p(x)dx=∫12[(x−μ1)+(μ1−μ2)]TΣ−12[(x−μ1)+(μ1−μ2)]×p(x)dx=∫12{(x−μ1)TΣ−12(x−μ1)+2(x−μ1)TΣ−12(μ1−μ2)+(μ1−μ2)TΣ−12(μ1−μ2)}×p(x)dx=tr(Σ−12Σ1)+0+ (μ1−μ2)TΣ−12(μ1−μ2)
\begin{aligned}
&\int \frac{1}{2}(x-\mu_2)^T\Sigma_2^{-1}(x-\mu_2) \times p(x) dx\\
&=\int \frac{1}{2}[(x-\mu_1) + (\mu_1 - \mu_2)]^T\Sigma_2^{-1}[(x-\mu_1) + (\mu_1 - \mu_2)]\times p(x) dx\\
&=\int \frac{1}{2}\{(x-\mu_1)^T\Sigma_2^{-1}(x-\mu_1) + 2(x-\mu_1)^T\Sigma_2^{-1}(\mu_1-\mu_2) + (\mu_1-\mu_2)^T\Sigma_2^{-1}(\mu_1-\mu_2)\}\times p(x)dx\\
&=\text{tr}(\Sigma_2^{-1}\Sigma_1) +0+ (\mu_1-\mu_2)^T\Sigma_2^{-1}(\mu_1-\mu_2)\\
\end{aligned}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: