The Earth Mover's Distance
2014-01-12 22:25
302 查看
The EMD is based on the minimal cost that must be paid to transform one distribution into the other.Intuitively,given two distributions,one can be seen as a mass of earth properly spread in space,the other as a collection of holes in that same space.Then,the EMD measures the least amount of work needed to fill the holes with earth.Here,a unit of work corresponds to transporting a unit of earth by a unit of ground distance.
This can be formalized as the following linear programming problem:
Let P={(p1,wp1),...,(pm,wpm)}
be the first signture with m clusters,where pi is the cluster representative and wpi is the weight of the cluster;
Q={(q1,wq1),...,(qn,wqn)}
the second signature with n cluster; and
D=[dij]
the ground distance matrix where dij is the ground distance between cluster pi and qj .
We want to find a flow
F=[fij]
with fij the flow between pi and qj, that minimizes the overall cost
![](http://images.cnitblog.com/blog/488795/201401/122214358797.jpg)
subject to the following constranits:
![](http://images.cnitblog.com/blog/488795/201401/122215225198.jpg)
Constraint (1) allows moving "supplis" from P to Q and not vice versa. Constraint (2) limits the amount of supplies that can be sent by the clusters in P to their weights.Constaint (3) limits the clusters in Q to receive no more supplies than their weights; and constraint (4) forces to move the maximum amount of supplies possible. We call this amount the total flow. Once the transportation problem is solved, and we hve found the optimal flow F, the earth mover's distance is defined as the resulting work normalied by the total flow:
![](http://images.cnitblog.com/blog/488795/201401/122222183169.jpg)
The normalization factor is the total weight of the smaller signature, because of constraint (4). This factor is needed when the two signatures have different total weight, in order to avoid favoring smaller signatures. In general, the ground distance dij can be any distance and will be chosen according to the problem at hand.
This can be formalized as the following linear programming problem:
Let P={(p1,wp1),...,(pm,wpm)}
be the first signture with m clusters,where pi is the cluster representative and wpi is the weight of the cluster;
Q={(q1,wq1),...,(qn,wqn)}
the second signature with n cluster; and
D=[dij]
the ground distance matrix where dij is the ground distance between cluster pi and qj .
We want to find a flow
F=[fij]
with fij the flow between pi and qj, that minimizes the overall cost
![](http://images.cnitblog.com/blog/488795/201401/122214358797.jpg)
subject to the following constranits:
![](http://images.cnitblog.com/blog/488795/201401/122215225198.jpg)
Constraint (1) allows moving "supplis" from P to Q and not vice versa. Constraint (2) limits the amount of supplies that can be sent by the clusters in P to their weights.Constaint (3) limits the clusters in Q to receive no more supplies than their weights; and constraint (4) forces to move the maximum amount of supplies possible. We call this amount the total flow. Once the transportation problem is solved, and we hve found the optimal flow F, the earth mover's distance is defined as the resulting work normalied by the total flow:
![](http://images.cnitblog.com/blog/488795/201401/122222183169.jpg)
The normalization factor is the total weight of the smaller signature, because of constraint (4). This factor is needed when the two signatures have different total weight, in order to avoid favoring smaller signatures. In general, the ground distance dij can be any distance and will be chosen according to the problem at hand.
相关文章推荐
- IOS 学习:UITableView 使用详解1
- VC POST表单——登录验证新浪邮箱
- Android重量级开发之--提高android启动速度研究
- OpenStack Cinder源码分析之七
- HDU 1051(贪心)
- android handler 机制
- Apache Flume
- socket服务端-多线程
- Switch-Case例子
- Java中实现浮点数的快速简单格式化
- 第一次android小组会
- 利用SetConsoleTextAttribute函数设置控制台颜色
- Scrapy UnicodeDecodeError解决方法
- Android屏幕适配
- Storm集群安装部署步骤【详细版】
- C++之私有继承
- matlab中调用Libsvm
- Win7 VMware Workstation 10.0.1 安装 Ubuntu 12.04
- Android开发之adb
- 博客之我感