Kmeans && Kmeans++ && Davies-Bouldin && Dunn index
2013-04-07 01:32
288 查看
K-means is a very generic clustering algorithm, using four steps to separate the points into clusters. The following part show how it works:
1. Initialization, for every point, choose its cluster ID randomly.
2. Update the center, calculate different centers of points of their own cluster.
3. Reallocation or Assignment, assign the point, with the shortest distance to the centers of its cluster, to the cluster of the center.
4. Check the convergence, back to step 2 if centers or clusters are changed.
We can use the following formulas to evaluate how many clusters should be assign, so called the Davies-Bouldin Index (DBI), which lower is better.
![](http://images.cnitblog.com/blog/412280/201303/16230418-03d7fb88617c4d0f932449a325b17cb8.png)
![](http://images.cnitblog.com/blog/412280/201303/16230511-80d9a8d419c74133b83a6d6bbf95b7b0.png)
is the average dist. to the center of its cluster, the center can be median , mean etc. and the distance can be Euclidean distance or another.
![](http://images.cnitblog.com/blog/412280/201303/16230529-65ffbfd9cfd24cc1b04a787a27887abe.png)
, the dist. between center i and j, or a measure of separation between cluster i and j.
源码链接<View code>
seperate the dataset into 6 parts
the iterations is: 14
by using the initialization of kmeans++.
Vector write with cluster_id finished
Only have one cluster or Max intra cluster distance is 0
the return value will be '0'.
Dunn cluster_num =1 0.0
Dunn cluster_num =2 1.4224045250244335
Dunn cluster_num =3 0.3787325061720893
Dunn cluster_num =4 0.4329611146967893
Dunn cluster_num =5 0.4504612854441182
cluster number is 1, the value will be 0
Davies_Bouldin cluster_num =1 0.0
Davies_Bouldin cluster_num =2 0.436282523420732
Davies_Bouldin cluster_num =3 1.0864451744194168
Davies_Bouldin cluster_num =4 1.0391365922042606
Davies_Bouldin cluster_num =5 1.0061221318606566
1. Initialization, for every point, choose its cluster ID randomly.
2. Update the center, calculate different centers of points of their own cluster.
3. Reallocation or Assignment, assign the point, with the shortest distance to the centers of its cluster, to the cluster of the center.
4. Check the convergence, back to step 2 if centers or clusters are changed.
We can use the following formulas to evaluate how many clusters should be assign, so called the Davies-Bouldin Index (DBI), which lower is better.
![](http://images.cnitblog.com/blog/412280/201303/16230418-03d7fb88617c4d0f932449a325b17cb8.png)
![](http://images.cnitblog.com/blog/412280/201303/16230511-80d9a8d419c74133b83a6d6bbf95b7b0.png)
is the average dist. to the center of its cluster, the center can be median , mean etc. and the distance can be Euclidean distance or another.
![](http://images.cnitblog.com/blog/412280/201303/16230529-65ffbfd9cfd24cc1b04a787a27887abe.png)
, the dist. between center i and j, or a measure of separation between cluster i and j.
源码链接<View code>
seperate the dataset into 6 parts
the iterations is: 14
by using the initialization of kmeans++.
Vector write with cluster_id finished
Only have one cluster or Max intra cluster distance is 0
the return value will be '0'.
Dunn cluster_num =1 0.0
Dunn cluster_num =2 1.4224045250244335
Dunn cluster_num =3 0.3787325061720893
Dunn cluster_num =4 0.4329611146967893
Dunn cluster_num =5 0.4504612854441182
cluster number is 1, the value will be 0
Davies_Bouldin cluster_num =1 0.0
Davies_Bouldin cluster_num =2 0.436282523420732
Davies_Bouldin cluster_num =3 1.0864451744194168
Davies_Bouldin cluster_num =4 1.0391365922042606
Davies_Bouldin cluster_num =5 1.0061221318606566
![](http://images.cnitblog.com/blog/412280/201304/10200828-29d3cce06d9141f8b54c8b3edf048af0.png)
相关文章推荐
- Davies Bouldin index
- 浅谈我对DB INDEX (Davies Bouldin index)的理解
- org.apache.jasper.JasperException: Cannot find bean: "list" in any scope
- CMake Error:Failed to find "glu32" in "" with CMAKE_CXX_LIBRARY_ARCHITECTURE "".
- Can't match key hostname in map hosts.byname. Reason: No such key in map
- CodeSign error: Certificate identity 'iPhone Developer:xxx appears more than once in the keychain.
- com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column '?°?????±±???' in 'field l
- 聚类——k-means & MapReduce
- ios7内购、Game Center 实现 in-App Purchases & Game Center
- Server Error in '/' Application. Access to the Path Is Denied" error message appears
- <Programming_in_Lua> 笔记(未完)
- 问题关于: Unable to find the mojo 'org.appfuse:maven-warpath-plugin:1.0-m5:add-classes' in the plugin 'org.appfuse:maven-warpath-pl
- com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'e.title' in
- Hadoop 解除 "Name node is in safe mode"
- HTTP Status 500 - Unable to instantiate Action, AccountAction, defined for 'accountindex' in namespa
- Linq to SQL 语法查询(子查询 & in操作 & join )
- Algorithms & Data structures in C++& GO ( Lock Free Queue)
- 启动jupyter&ipython时,报错“Fatal error in launcher: Unable to create process using '"'”
- 解决In function `mwException::mwException()': undefined reference to `mclcppCreateError'
- How to troubleshoot 'Procedure or function has too many arguments specified' in asp.net 2.0