您的位置:首页 > 编程语言 > Python开发

Python K-means使用

2013-03-19 09:07 176 查看
import Pycluster as pc
import numpy as np
import sys
import matplotlib.pylab as pl

# Read data filename and desired number of clusters from command line
filename, n = sys.argv[1], int( sys.argv[2] )
# x and y coordinates, whitespace-separated
data = np.loadtxt( filename, usecols =(0,1) )
# Perform clustering and find centroids
# The function receives data from file(paramter 'data'),and executes Kcluster
#ncluster is the number of cluster and npass is  the trial number of primary number
#The function returns three values.The first value is a array .It includes all point of primary data set which was alloctated  index of cluster.
#The result receives by clustermap,here we do not appoint the distance function
clustermap = pc.kcluster( data, nclusters=n, npass=50 )[0]

#Get the mass of cluster
centroids = pc.clustercentroids( data, clusterid=clustermap )[0]
# Obtain distance matrix
m = pc.distancematrix( data )
# Find the masses of all clusters
mass = np.zeros( n )#Create a array which primary values is zero
x = [list(d)[0] for d in data]

y = [list(d)[1] for d in data]

xcenter=list()
ycenter=list()
for i in range(len(list(centroids))):
xcenter.append(list(centroids)[i][0])
ycenter.append(list(centroids)[i][1])

for c in clustermap:
mass[c] += 1
# Create a matrix for individual silhouette coefficients
sil = np.zeros( n*len(data) )
sil.shape = ( len(data), n )
# Evaluate the distance for all pairs of points
for i in range( 0, len(data) ):
for j in range( i+1, len(data) ):
d = m[j][i]
sil[i, clustermap[j] ] += d
sil[j, clustermap[i] ] += d
# Normalize by cluster size (that is: form average over cluster)
for i in range( 0, len(data) ):
sil[i,:] /= mass

# Evaluate the silhouette coefficient
s=0
for i in range( 0, len(data) ):
c = clustermap[i]
a = sil[i,c]
b = min( sil[i, range(0,c)+range(c+1,n) ] )
si = (b-a)/max(b,a) # This is the silhouette coeff of point i
s+=si
# Print overall silhouette coefficient
print n, s/len(data)

pl.xlim(0,11)
pl.ylim(0,11)
pl.plot(x,y,'o')
pl.plot(xcenter,ycenter,'or')
pl.show()


数据

1 2.6

2 1

2 1.5

3 4

2.7 3.5

2.4 3.2

5.5 9.2

6 9

5.8 9

3 2

1 2.8

2 1.6

7 8

7.3 8.2

6.9 8.5

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: