Kmeans笔记
2016-01-23 18:16
309 查看
最近因工作需要,折腾了一下opencv中的kmeans,网上关于opencv的kmeans比较少,说的也不好理解。无奈只能自己硬着头皮来。使用官方提供的demo,用cout把其中的points打印出来,来来回回对比,也就略懂一二。先上代码,然后慢慢分析。
第20行,创建一个mat实例points。长度为1001内的随机值,类型是2通道的浮点数据,纬度1。可以使用语句来看效果,建议将第19行的值减少,例如101。
你可以看到效果是这样子的,通道值为0。
第26~35行的for循环将会产生clusterCount个中心,并将points分成clusterCount段且填充随机数。
第37行,则是将points重新打乱。可以在37行前后使用语句,看看randShuffle函数的效果。
第39行,kmeans函数,对样本进行聚类,CPP的函数原型是这样子的
其中,
data – Data for clustering. An array of N-Dimensional points with float coordinates is needed. Examples of this array can be:
– Mat points(count, 2, CV_32F);
– Mat points(count, 1, CV_32FC2);
– Mat points(1, count, CV_32FC2);
– std::vector<cv::Point2f> points(sampleCount);
输入是一个N维度的浮点型的点,包括1个维度。
K – Number of clusters to split the set by.
需要人工指定类的数量
labels – Input/output integer array that stores the cluster indices for every sample.
存储每个样本的类型
criteria – The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy. The accuracy is specified as criteria.epsilon. As soon as each of the cluster centers moves by less than criteria.epsilon on some iteration, the algorithm stops.
终止条件
attempts – Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness (see the last function parameter).
执行次数,配合 flags 使用
flags – Flag that can take the following values:
– KMEANS_RANDOM_CENTERS Select random initial centers in each attempt.
随机中心
– KMEANS_PP_CENTERS Use kmeans++ center initialization by Arthur and Vassilvitskii [Arthur2007].
– KMEANS_USE_INITIAL_LABELS During the first (and possibly the only) attempt,use the user-supplied labels instead of computing them from the initial centers. For the second and further attempts, use the random or semi-random centers. Use one of KMEANS_*_CENTERS flag to specify the exact method.
用户自定义中心
centers – Output matrix of the cluster centers, one row per each cluster center.
中心值,使用cout可以看到centers的值和for循环中的centers值接近。
内容大概这么多,只要把样本转换成规定的points,并制定类的数据量,还有定义label和centers两个变量,其他的可以和demo一样即可。
int main( int /*argc*/, char** /*argv*/ ) { const int MAX_CLUSTERS = 5; Scalar colorTab[] = { Scalar(0, 0, 255), Scalar(0,255,0), Scalar(255,100,100), Scalar(255,0,255), Scalar(0,255,255) }; Mat img(500, 500, CV_8UC3); RNG rng(12345); for(;;) { int k, clusterCount = rng.uniform(2, MAX_CLUSTERS+1); int i, sampleCount = rng.uniform(1, 1001); Mat points(sampleCount, 1, CV_32FC2), labels; clusterCount = MIN(clusterCount, sampleCount); Mat centers; /* generate random sample from multigaussian distribution */ for( k = 0; k < clusterCount; k++ ) { Point center; center.x = rng.uniform(0, img.cols); center.y = rng.uniform(0, img.rows); Mat pointChunk = points.rowRange(k*sampleCount/clusterCount, k == clusterCount - 1 ? sampleCount : (k+1)*sampleCount/clusterCount); rng.fill(pointChunk, RNG::NORMAL, Scalar(center.x, center.y), Scalar(img.cols*0.05, img.rows*0.05)); } randShuffle(points, 1, &rng); kmeans(points, clusterCount, labels, TermCriteria( TermCriteria::EPS+TermCriteria::COUNT, 10, 1.0), 3, KMEANS_PP_CENTERS, centers); img = Scalar::all(0); for( i = 0; i < sampleCount; i++ ) { int clusterIdx = labels.at<int>(i); Point ipt = points.at<Point2f>(i); circle( img, ipt, 2, colorTab[clusterIdx], FILLED, LINE_AA ); } imshow("clusters", img); char key = (char)waitKey(); if( key == 27 || key == 'q' || key == 'Q' ) // 'ESC' break; } return 0; }
第20行,创建一个mat实例points。长度为1001内的随机值,类型是2通道的浮点数据,纬度1。可以使用语句来看效果,建议将第19行的值减少,例如101。
cout << "points = " << endl << points << endl;
你可以看到效果是这样子的,通道值为0。
points = [0, 0; 0, 0; 0, 0; 0, 0;]
第26~35行的for循环将会产生clusterCount个中心,并将points分成clusterCount段且填充随机数。
第37行,则是将points重新打乱。可以在37行前后使用语句,看看randShuffle函数的效果。
cout << "points = " << endl << points << endl;
第39行,kmeans函数,对样本进行聚类,CPP的函数原型是这样子的
double kmeans(InputArray data, int K, InputOutputArray bestLabels, TermCriteria criteria, int attempts, int flags, OutputArray centers=noArray() )
其中,
data – Data for clustering. An array of N-Dimensional points with float coordinates is needed. Examples of this array can be:
– Mat points(count, 2, CV_32F);
– Mat points(count, 1, CV_32FC2);
– Mat points(1, count, CV_32FC2);
– std::vector<cv::Point2f> points(sampleCount);
输入是一个N维度的浮点型的点,包括1个维度。
K – Number of clusters to split the set by.
需要人工指定类的数量
labels – Input/output integer array that stores the cluster indices for every sample.
存储每个样本的类型
criteria – The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy. The accuracy is specified as criteria.epsilon. As soon as each of the cluster centers moves by less than criteria.epsilon on some iteration, the algorithm stops.
终止条件
attempts – Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness (see the last function parameter).
执行次数,配合 flags 使用
flags – Flag that can take the following values:
– KMEANS_RANDOM_CENTERS Select random initial centers in each attempt.
随机中心
– KMEANS_PP_CENTERS Use kmeans++ center initialization by Arthur and Vassilvitskii [Arthur2007].
– KMEANS_USE_INITIAL_LABELS During the first (and possibly the only) attempt,use the user-supplied labels instead of computing them from the initial centers. For the second and further attempts, use the random or semi-random centers. Use one of KMEANS_*_CENTERS flag to specify the exact method.
用户自定义中心
centers – Output matrix of the cluster centers, one row per each cluster center.
中心值,使用cout可以看到centers的值和for循环中的centers值接近。
内容大概这么多,只要把样本转换成规定的points,并制定类的数据量,还有定义label和centers两个变量,其他的可以和demo一样即可。
相关文章推荐
- JAVA设计模式(DESIGN PATTERNS IN JAVA)读书摘要 第1部分接口型模式——第4章 外观(Facade)模式
- 报 Unable to resolve target 'android-19' 问题
- SpringMVC 多文件上传
- 【cocos2d-x制作别踩白块儿】第一期:游戏介绍
- 程序员的“认知失调”
- 安卓开发之Camera
- android 所有布局属性和UI控件
- 50个Android开发技巧(11 为文字加入特效)
- C++之路进阶——splay树(序列终结者)
- JQuery中使用Ajax实现诸如登录名检测等异步请求Demo
- 最新版hadoop2.7.1单机版与伪分布式安装配置
- 【南理oj】325 - zb的生日(dfs)
- 线性回归 算法 实例
- Prism MEF example
- Hotel
- xcode7集成百度地图 archive报错问题
- Core Animation1-简介
- 编译器的工作过程
- HDU 3667 Transportation (最小费用最大流)
- hdu 1358 Period(KMP循环节)