Python K-means使用
2013-03-19 09:07
176 查看
import Pycluster as pc import numpy as np import sys import matplotlib.pylab as pl # Read data filename and desired number of clusters from command line filename, n = sys.argv[1], int( sys.argv[2] ) # x and y coordinates, whitespace-separated data = np.loadtxt( filename, usecols =(0,1) ) # Perform clustering and find centroids # The function receives data from file(paramter 'data'),and executes Kcluster #ncluster is the number of cluster and npass is the trial number of primary number #The function returns three values.The first value is a array .It includes all point of primary data set which was alloctated index of cluster. #The result receives by clustermap,here we do not appoint the distance function clustermap = pc.kcluster( data, nclusters=n, npass=50 )[0] #Get the mass of cluster centroids = pc.clustercentroids( data, clusterid=clustermap )[0] # Obtain distance matrix m = pc.distancematrix( data ) # Find the masses of all clusters mass = np.zeros( n )#Create a array which primary values is zero x = [list(d)[0] for d in data] y = [list(d)[1] for d in data] xcenter=list() ycenter=list() for i in range(len(list(centroids))): xcenter.append(list(centroids)[i][0]) ycenter.append(list(centroids)[i][1]) for c in clustermap: mass[c] += 1 # Create a matrix for individual silhouette coefficients sil = np.zeros( n*len(data) ) sil.shape = ( len(data), n ) # Evaluate the distance for all pairs of points for i in range( 0, len(data) ): for j in range( i+1, len(data) ): d = m[j][i] sil[i, clustermap[j] ] += d sil[j, clustermap[i] ] += d # Normalize by cluster size (that is: form average over cluster) for i in range( 0, len(data) ): sil[i,:] /= mass # Evaluate the silhouette coefficient s=0 for i in range( 0, len(data) ): c = clustermap[i] a = sil[i,c] b = min( sil[i, range(0,c)+range(c+1,n) ] ) si = (b-a)/max(b,a) # This is the silhouette coeff of point i s+=si # Print overall silhouette coefficient print n, s/len(data) pl.xlim(0,11) pl.ylim(0,11) pl.plot(x,y,'o') pl.plot(xcenter,ycenter,'or') pl.show()
数据
1 2.6
2 1
2 1.5
3 4
2.7 3.5
2.4 3.2
5.5 9.2
6 9
5.8 9
3 2
1 2.8
2 1.6
7 8
7.3 8.2
6.9 8.5
相关文章推荐
- python中使用k-means对鸢尾花数据集聚类
- 使用Python构建一个基于k-means的文档分类器
- 【Python Oracle】使用cx_Oracle 进行数据库操作介绍
- VSCode基础使用+VSCode调试python程序入门
- python使用scrapy爬表格,爬虫中级
- python下使用help提示“more不是内部命令”
- linux下使用vim开发python
- 如何使用sublime编辑器运行python程序
- python 随机数使用方法,推导以及字符串,双色球小程序实例
- 【备忘】使用FME PythonCaller的基本代码结构
- python3之使用exec运行模块文件
- Python中如何使用*args和**kwargs
- 使用Python几个库打造自己的REPL
- 第三课 Python爬虫Beautifulsoup4模块的使用
- windows下如何同时使用python2和python3
- python3.5 正常安装 却不能直接使用Tkinter 包
- 使用Python进行验证码识别
- ES索引重建--使用python elasticsearch
- 使用python进行收据搜集示例之feature_engineering_example
- 使用python实现统计Nginx进程所占用的物理内存 推荐