您的位置:首页 > 其它

层次聚类的具体实现 Hierarchical clustering implementation

2013-10-21 10:25 387 查看
Single linkage(nearest neighbor):两个cluster中最近的对象的距离为cluster之间的距离;

Complete linkage (furthest neighbor):两个cluster中最远的对象的距离为cluster之间的距离;

Group average linkage:两个cluster中对象的平均距离为cluster之间的距离;

Single-Link下面以为例讲解具体的实现算法



一直迭代:

    找到距离最近的两个cluster(i,j);

    把行i设为行i和行j的最小值;

    把列i设为列i和列j的最小值;

    如果dmin[i']==j,更改dmin[i‘];

具体的代码:

package snippet;

import java.util.Arrays;
import java.util.Vector;

public class Snippet {
public static void main(String[] args) {
int M = StdIn.readInt();
int N = StdIn.readInt();
// read in N vectors of dimension M
Vector[] vectors = new Vector
;
String[] names  = new String
;
for (int i = 0; i < N; i++) {
names[i] = StdIn.readString();
double[] d = new double[M];
for (int j = 0; j < M; j++)
d[j] = StdIn.readDouble();
vectors[i] = new Vector(Arrays.asList(d));
}
double INFINITY = Double.POSITIVE_INFINITY;
double[][] d = new double

;
int[] dmin = new int
;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
if (i == j) d[i][j] = INFINITY;
else  d[i][j] = vectors[i].distanceTo(vectors[j]);
if (d[i][j] < d[i][dmin[i]]) dmin[i] = j;
}
}

for (int s = 0; s < N-1; s++) {
// find closest pair of clusters (i1, i2)
int i1 = 0;
for (int i = 0; i < N; i++)
if (d[i][dmin[i]] < d[i1][dmin[i1]]) i1 = i;
int i2 = dmin[i1];
// overwrite row i1 with minimum of entries in row i1 and i2
for (int j = 0; j < N; j++)
if (d[i2][j] < d[i1][j]) d[i1][j] = d[j][i1] = d[i2][j];
d[i1][i1] = INFINITY;
// infinity-out old row i2 and column i2
for (int i = 0; i < N; i++)
d[i2][i] = d[i][i2] = INFINITY;
// update dmin and replace ones that previous pointed to
// i2 to point to i1
for (int j = 0; j < N; j++) {
if (dmin[j] == i2) dmin[j] = i1;
if (d[i1][j] < d[i1][dmin[i1]]) dmin[i1] = j;
}
}
}
}

 参考资料:www.cs.princeton.edu/courses/archive/spring10/cos233/lectures/cos233-234-lecture7.pdf
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  算法 聚类