python K-Means 实例二则
2014-10-10 19:27
309 查看
k-means
clustering example (Python)
http://www.cnblogs.com/lexus/archive/2012/12/08/2808826.htmlI had to illustrate a k-means algorithm for my thesis, but I could not find any existing examples that were both simple and looked good on paper. See below for Python code that does just what I wanted.
#!/usr/bin/python
# Adapted from http://hackmap.blogspot.com/2007/09/k-means-clustering-in-scipy.html
import numpy
import matplotlib
matplotlib.use('Agg')
from scipy.cluster.vq import *
import pylab
pylab.close()
# generate 3 sets of normally distributed points around
# different means with different variances
pt1 = numpy.random.normal(1, 0.2, (100,2))
pt2 = numpy.random.normal(2, 0.5, (300,2))
pt3 = numpy.random.normal(3, 0.3, (100,2))
# slightly move sets 2 and 3 (for a prettier output)
pt2[:,0] += 1
pt3[:,0] -= 0.5
xy = numpy.concatenate((pt1, pt2, pt3))
# kmeans for 3 clusters
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1])),3)
colors = ([([0.4,1,0.4],[1,0.4,0.4],[0.1,0.8,1])[i] for i in idx])
# plot colored points
pylab.scatter(xy[:,0],xy[:,1], c=colors)
# mark centroids as (X)
pylab.scatter(res[:,0],res[:,1], marker='o', s = 500, linewidths=2, c='none')
pylab.scatter(res[:,0],res[:,1], marker='x', s = 500, linewidths=2)
pylab.savefig('/tmp/kmeans.png')
The output looks like this (also available in vector format here):
The X’s mark cluster centers. Feel free to use any of these files for whatever purposes. An attribution would be nice, but is not required
.
Machine
Learning, Programming, Python
Using python and k-means to find the dominant colors in images
http://www.tuicool.com/articles/IRN7Fj时间 2012-10-24 01:23:09 charlesleifer.com原文 http://charlesleifer.com/blog/using-python-and-k-means-to-find-the-dominant-colors-in-images/
October 23, 2012 17:23 / 0
comments / algorithms python
I'm working on a little photography website for my Dad and thought it would be neat to extract color information from photographs. I tried a couple of different approaches before finding one that works pretty well. This approach uses k-means
clustering to cluster the pixels in groups based on their color. The center of those resulting clusters are then the "dominant" colors. k-means is a great fit for this problem because it is (usually) fast. It has the caveat of requiring you to specify
up-front how many clusters you want -- I found that it works well when I specified around 3.
A warning
I'm no expert on data-mining -- almost all my experience comes from reading Toby Segaran's excellent book Programming Collective Intelligence . In one ofthe first chapters Toby covers clustering algorithms, including a nice treatment of k-means, so if you want to really learn from an expert I'd suggest picking up a copy. You won't be disappointed.
How it works
The way I understand it to work is you start with a bunch of data points. For simplicity let's say they're numbers on a number-line. You want to group the numbers into "k" clusters, so pick "k" points randomly from the data to use as your "clusters".Now loop over every point in the data and calculate its distance to each of the "k" clusters. Find the nearest cluster and associate that point with the cluster. When you've looped over all the points they should all be assigned to one of the "k" clusters.
Now, for each cluster recalculate its center by averaging the distances of all the associated points and start over.
When the centers stop moving very much you can stop looping. You will end up with something like this -- the points are colored based on what "cluster" they are in and the dark-black circles indicate the centers of each cluster.
Applying it to photographs
The neat thing about this algorithm is, since it relies only on a simple distance calculation, you can extend it out to multi-dimensional data. Color is often represented using 3 channels, Red, Green, and Blue. So what I did was treat all the pixels in theimage like points on a 3-dimensional space. That's all there was to it!
I made a few optimizations along the way:
resize the image down to 200x200 or so using PIL
instead of storing "duplicate" points, store a count with each -- saves on calculations
Looking at some results
The results:
The results:
The results:
The results:
The source code
Below is the source code. It requires PIL to resize the image down to 200x200 and to extract the colors/counts. The "colorz" function is the one that returns the actual color codes for a filename.from collections import namedtuple from math import sqrt import random try: import Image except ImportError: from PIL import Image Point = namedtuple('Point', ('coords', 'n', 'ct')) Cluster = namedtuple('Cluster', ('points', 'center', 'n')) def get_points(img): points = [] w, h = img.size for count, color in img.getcolors(w * h): points.append(Point(color, 3, count)) return points rtoh = lambda rgb: '#%s' % ''.join(('%02x' % p for p in rgb)) def colorz(filename, n=3): img = Image.open(filename) img.thumbnail((200, 200)) w, h = img.size points = get_points(img) clusters = kmeans(points, n, 1) rgbs = [map(int, c.center.coords) for c in clusters] return map(rtoh, rgbs) def euclidean(p1, p2): return sqrt(sum([ (p1.coords[i] - p2.coords[i]) ** 2 for i in range(p1.n) ])) def calculate_center(points, n): vals = [0.0 for i in range(n)] plen = 0 for p in points: plen += p.ct for i in range(n): vals[i] += (p.coords[i] * p.ct) return Point([(v / plen) for v in vals], n, 1) def kmeans(points, k, min_diff): clusters = [Cluster([p], p, p.n) for p in random.sample(points, k)] while 1: plists = [[] for i in range(k)] for p in points: smallest_distance = float('Inf') for i in range(k): distance = euclidean(p, clusters[i].center) if distance < smallest_distance: smallest_distance = distance idx = i plists[idx].append(p) diff = 0 for i in range(k): old = clusters[i] center = calculate_center(plists[i], old.n) new = Cluster(plists[i], center, old.n) clusters[i] = new diff = max(diff, euclidean(old.center, new.center)) if diff < min_diff: break return clusters
Playing with it in the browser
I ported the code over to JavaScript -- let me tell you, its pretty rough, but it works and is fast. If you'd like to take a look at a live example, check out:http://charlesleifer.com/static/colors/ -- you can view the source to see the js version, but basically it is just using the HTML5 canvas and its
getImageDatamethod.
Thanks for reading
Thanks for reading, I hope you found this post interesting. I am sure this is not the only approach so if you have other ideas please feel free to leavea comment orcontact me directly.
相关文章推荐
- python sklearn 的k-means聚类易懂实例
- K均值(K-means)算法原理及Spark MLlib调用实例(Scala/Java/python)
- 机器学习中的k-means聚类及其Python实例
- Python 时间处理datetime实例
- PYTHON 与C相互交互调用实例解析
- Python translator使用实例
- Python 文件操作技巧(File operation) 实例代码分析
- 用C语言扩展Python的功能的实例
- PYTHON 与C相互交互调用实例解析
- Python 类实例化时变换实例的类
- Python实现的通讯实例
- Python 与 C++ 程序的简单实例对比
- Python 流程控制实例代码
- python中类属性和类实例的属性的区别
- Python的类变量和实例变量
- Python translator使用实例
- Python与C++ 程序的简单实例对比
- python文件读写实例学习笔记
- python中的新型类及其实例详解
- Python实例应用