OpenCV 中 Kmeans 用法整理
2014-05-07 18:41
211 查看
1.K-Means
clustering in OpenCV
K-Means is an algorithm to detect clusters in a given set of points. It does this without you supervising or correcting the results. It works with any number of dimensions as well (that is, it works on a plane, 3D space, 4D space and any other finite dimensional
spaces). And OpenCV comes with this algorithm built right into it!
The function you need to call to execute the algorithm is:
This function is in the cv namespace. So you can use it by cv::kmeans or
by simply including the cv namespace. If you know how K-means works, the parameters should be self
explanatory.
samples: (input) The
actual data points that you need to cluster. It should contain exactly one point per row. That is, if you have 50 points in a 2D plane, then you should have a matrix with 50 rows and 2 columns.
clusterCount: (input) The
number of clusters in the data points.
labels: (output) Returns
the cluster each point belongs to. It can also be used to indicate the initial guess for each point.
termcrit: (input) This
is an iterative algorithm. So you need to specify the termination criteria (number of iterations & desired accuracy)
attempts: (input) The
number of times the algorithm is run with different center placements
flags: (input) Possible
values include:
KMEANS_RANDOM_CENTER: Centers are generated randomly
KMEANS_PP_CENTER: Uses the kmeans++ center initialization
KMEANS_USE_INITIAL_LABELS: The first iteration uses the suppliedlabels to
calculate centers. Later iterations use random or semi-random centers (use the above two flags for that).
centers: (output) This
matrix holds the center of each cluster.
The function returns the compactness of the final clustering. What is compactness? It’s a measure of how good the labeling was done. The smaller the better.
When attempts is 1, the value returned is the compactness of the only iteration that happened. If attempts is
more than 1, the final labeling returned is the one with the least compactness.
转自:http://www.aishack.in/2010/08/k-means-clustering-in-opencv/
Kmeans clustering is one of the most widely used UnSupervised Learning Algorithms. If you are not sure what Kmeans is, refer this article.
Also if you have heard about the term Vector Quantization, Kmeans is closely related to that (refer this article to know more about it). Autonlab has
a great ppt on Kmeans Clustering.
First, I'll talk about the kmeans usage in OpenCV with C++ and then I'll explain it with a program. If you are not yet comfortable in OpenCV with C++, please refer to this article and
the pretty much everything else is the same as in C API (where you use IplImage*,etc).
Function call in C++ API of OpenCV accepts the input in following format:
double kmeans(const Mat& samples, int clusterCount, Mat& labels, TermCriteria termcrit, int attempts, int flags, Mat* centers);
Parameters explained as follows:
samples: It contains the data. Each row represents a Feature Vector. Each co lumn in a row represent a dimension. So, we can have multiple dimensions of data in the feature vector. Example if we have 50, 5
dimensional feature vector, we will have 50 rows, 5 colums of this matrix. One thing interesting which I've noticed is kmeans doesn't work with CV_64F type.
clusterCount: It should be specified beforehand. We need to know how many clusters do we divide the data into. It is an integer.
labels: It is an output Matrix. If we had a Matrix of above specified size (i.e 50 x 5 ), we will have 50 x 1 output Matrix. It determines which cluster the feature vector belongs. It starts with 0, 1, ....
(number of clusters-1).
TermCriteria: It determines the criteria in applying the algorithm. Max iterations, accuracy,etc.
attempts: number of attempts made with different initial labelling. Also refer documentation for elaborate information
on this parameter.
flags: It can be
KMEANS_RANDOM_CENTERS (for random initialization of cluster centers).
KMEANS_PP_CENTERS (for kmeans++ version of initializing cluster centers)
KMEANS_USE_INITIAL_LABELS (for user defined initialization).
centers: Matrix holding center of each cluster. If we divide the 50 x 5feature vector into 2 clusters, we will have 2 centers of each in 5 dimensions.
Sample program is explained as follows:
转自:http://www.developerstation.org/2012/01/kmeans-clustering-in-opencv-with-c.html
Finds centers of clusters and groups input samples around the clusters.
C++: double kmeans(InputArray data,
int K, InputOutputArray bestLabels, TermCriteria criteria, int attempts, int flags, OutputArray centers=noArray())
Python: cv2.kmeans(data,
K, criteria, attempts, flags[, bestLabels[, centers]]) →
retval, bestLabels, centers
C: int cvKMeans2(const
CvArr* samples, int cluster_count, CvArr* labels, CvTermCriteria termcrit, int attempts=1, CvRNG* rng=0, int flags=0, CvArr*_centers=0,
double* compactness=0 )
Python: cv.KMeans2(samples,
nclusters, labels, termcrit, attempts=1, flags=0, centers=None) → float
The function kmeans implements a k-means algorithm that finds the centers of cluster_count clusters
and groups the input samples around the clusters. As an output,
contains a 0-based cluster
index for the sample stored in the
row of the samples matrix.
The function returns the compactness measure that is computed as
after every attempt. The best (minimum) value is chosen and the corresponding labels and the compactness value are returned by the function. Basically, you can use only the core of the function, set the number of attempts to 1, initialize labels each time using
a custom algorithm, pass them with the ( flags =KMEANS_USE_INITIAL_LABELS )
flag, and then choose the best (most-compact) clustering.
Note
An example on K-means clustering can be found at opencv_source_code/samples/cpp/kmeans.cpp
(Python) An example on K-means clustering can be found at opencv_source_code/samples/python2/kmeans.py
Splits an element set into equivalency classes.
C++: template<typename _Tp, class _EqPredicate> int partition(const
vector<_Tp>& vec, vector<int>& labels, _EqPredicate predicate=_EqPredicate())
The generic function partition implements an
algorithm
for splitting a set of
elements into one or more equivalency classes, as described inhttp://en.wikipedia.org/wiki/Disjoint-set_data_structure .
The function returns the number of equivalency classes.
转自:http://docs.opencv.org/modules/core/doc/clustering.html
clustering in OpenCV
K-Means is an algorithm to detect clusters in a given set of points. It does this without you supervising or correcting the results. It works with any number of dimensions as well (that is, it works on a plane, 3D space, 4D space and any other finite dimensional
spaces). And OpenCV comes with this algorithm built right into it!
K-means with OpenCV’s C++ interface
The function you need to call to execute the algorithm is:double kmeans(const Mat& samples, int clusterCount, Mat& labels, TermCriteria termcrit, int attempts, int flags, Mat* centers)
This function is in the cv namespace. So you can use it by cv::kmeans or
by simply including the cv namespace. If you know how K-means works, the parameters should be self
explanatory.
Parameters
samples: (input) Theactual data points that you need to cluster. It should contain exactly one point per row. That is, if you have 50 points in a 2D plane, then you should have a matrix with 50 rows and 2 columns.
clusterCount: (input) The
number of clusters in the data points.
labels: (output) Returns
the cluster each point belongs to. It can also be used to indicate the initial guess for each point.
termcrit: (input) This
is an iterative algorithm. So you need to specify the termination criteria (number of iterations & desired accuracy)
attempts: (input) The
number of times the algorithm is run with different center placements
flags: (input) Possible
values include:
KMEANS_RANDOM_CENTER: Centers are generated randomly
KMEANS_PP_CENTER: Uses the kmeans++ center initialization
KMEANS_USE_INITIAL_LABELS: The first iteration uses the suppliedlabels to
calculate centers. Later iterations use random or semi-random centers (use the above two flags for that).
centers: (output) This
matrix holds the center of each cluster.
Returns
The function returns the compactness of the final clustering. What is compactness? It’s a measure of how good the labeling was done. The smaller the better.When attempts is 1, the value returned is the compactness of the only iteration that happened. If attempts is
more than 1, the final labeling returned is the one with the least compactness.
转自:http://www.aishack.in/2010/08/k-means-clustering-in-opencv/
2.Kmeans clustering
in OpenCV with C++
Kmeans clustering is one of the most widely used UnSupervised Learning Algorithms. If you are not sure what Kmeans is, refer this article.Also if you have heard about the term Vector Quantization, Kmeans is closely related to that (refer this article to know more about it). Autonlab has
a great ppt on Kmeans Clustering.
First, I'll talk about the kmeans usage in OpenCV with C++ and then I'll explain it with a program. If you are not yet comfortable in OpenCV with C++, please refer to this article and
the pretty much everything else is the same as in C API (where you use IplImage*,etc).
Function call in C++ API of OpenCV accepts the input in following format:
double kmeans(const Mat& samples, int clusterCount, Mat& labels, TermCriteria termcrit, int attempts, int flags, Mat* centers);
Parameters explained as follows:
samples: It contains the data. Each row represents a Feature Vector. Each co lumn in a row represent a dimension. So, we can have multiple dimensions of data in the feature vector. Example if we have 50, 5
dimensional feature vector, we will have 50 rows, 5 colums of this matrix. One thing interesting which I've noticed is kmeans doesn't work with CV_64F type.
clusterCount: It should be specified beforehand. We need to know how many clusters do we divide the data into. It is an integer.
labels: It is an output Matrix. If we had a Matrix of above specified size (i.e 50 x 5 ), we will have 50 x 1 output Matrix. It determines which cluster the feature vector belongs. It starts with 0, 1, ....
(number of clusters-1).
TermCriteria: It determines the criteria in applying the algorithm. Max iterations, accuracy,etc.
attempts: number of attempts made with different initial labelling. Also refer documentation for elaborate information
on this parameter.
flags: It can be
KMEANS_RANDOM_CENTERS (for random initialization of cluster centers).
KMEANS_PP_CENTERS (for kmeans++ version of initializing cluster centers)
KMEANS_USE_INITIAL_LABELS (for user defined initialization).
centers: Matrix holding center of each cluster. If we divide the 50 x 5feature vector into 2 clusters, we will have 2 centers of each in 5 dimensions.
Sample program is explained as follows:
#include "opencv2/highgui/highgui.hpp" #include "opencv2/core/core.hpp" #include <iostream> using namespace cv; using namespace std; int main( int /*argc*/, char** /*argv*/ ) { cout << "\n Usage in C++ API:\n double kmeans(const Mat& samples, int clusterCount, Mat& labels, TermCriteria termcrit, int attempts, int flags, Mat* centers) \n\n\n" << endl; Mat points(sampleCount,dimensions, CV_32F,Scalar(10)); Mat labels; Mat centers(clusterCount, 1, points.type()); int clusterCount = 2; int dimensions = 5; int sampleCount = 50; // values of 1st half of data set is set to 10 //change the values of 2nd half of the data set; i.e. set it to 20 for(int i =24;i<points.rows;i++) { for(int j=0;j<points.cols;j++) { points.at<float>(i,j)=20; } } kmeans(points, clusterCount, labels, TermCriteria( CV_TERMCRIT_EPS+CV_TERMCRIT_ITER, 10, 1.0), 3, KMEANS_PP_CENTERS, centers); // we can print the matrix directly. cout<<"Data: \n"<<points<<endl; cout<<"Center: \n"<<centers<<endl; cout<<"Labels: \n"<<labels<<endl; return 0; }
转自:http://www.developerstation.org/2012/01/kmeans-clustering-in-opencv-with-c.html
3.Kmeans
Finds centers of clusters and groups input samples around the clusters.C++: double kmeans(InputArray data,
int K, InputOutputArray bestLabels, TermCriteria criteria, int attempts, int flags, OutputArray centers=noArray())
Python: cv2.kmeans(data,
K, criteria, attempts, flags[, bestLabels[, centers]]) →
retval, bestLabels, centers
C: int cvKMeans2(const
CvArr* samples, int cluster_count, CvArr* labels, CvTermCriteria termcrit, int attempts=1, CvRNG* rng=0, int flags=0, CvArr*_centers=0,
double* compactness=0 )
Python: cv.KMeans2(samples,
nclusters, labels, termcrit, attempts=1, flags=0, centers=None) → float
Parameters: | samples – Floating-point matrix of input samples, one row per sample. data – Data for clustering. cluster_count – Number of clusters to split the set by. K – Number of clusters to split the set by. labels – Input/output integer array that stores the cluster indices for every sample. criteria – The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy. The accuracy is specified as criteria.epsilon. As soon as each of the cluster centers moves by less than criteria.epsilon on some iteration, the algorithm stops. termcrit – The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy. attempts – Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness (see the last function parameter). rng – CvRNG state initialized by RNG(). flags – Flag that can take the following values: KMEANS_RANDOM_CENTERS Select random initial centers in each attempt. KMEANS_PP_CENTERS Use kmeans++ center initialization by Arthur and Vassilvitskii [Arthur2007]. KMEANS_USE_INITIAL_LABELS During the first (and possibly the only) attempt, use the user-supplied labels instead of computing them from the initial centers. For the second and further attempts, use the random or semi-random centers. Use one of KMEANS_*_CENTERS flag to specify the exact method. centers – Output matrix of the cluster centers, one row per each cluster center. _centers – Output matrix of the cluster centers, one row per each cluster center. compactness – The returned value that is described below. |
---|
and groups the input samples around the clusters. As an output,
contains a 0-based cluster
index for the sample stored in the
row of the samples matrix.
The function returns the compactness measure that is computed as
after every attempt. The best (minimum) value is chosen and the corresponding labels and the compactness value are returned by the function. Basically, you can use only the core of the function, set the number of attempts to 1, initialize labels each time using
a custom algorithm, pass them with the ( flags =KMEANS_USE_INITIAL_LABELS )
flag, and then choose the best (most-compact) clustering.
Note
An example on K-means clustering can be found at opencv_source_code/samples/cpp/kmeans.cpp
(Python) An example on K-means clustering can be found at opencv_source_code/samples/python2/kmeans.py
partition
Splits an element set into equivalency classes.C++: template<typename _Tp, class _EqPredicate> int partition(const
vector<_Tp>& vec, vector<int>& labels, _EqPredicate predicate=_EqPredicate())
Parameters: | vec – Set of elements stored as a vector. labels – Output vector of labels. It contains as many elements as vec. Each label labels[i] is a 0-based cluster index of vec[i] . predicate – Equivalence predicate (pointer to a boolean function of two arguments or an instance of the class that has the method booloperator()(const _Tp& a, const _Tp& b) ). The predicate returns true when the elements are certainly in the same class, and returns falseif they may or may not be in the same class. |
---|
algorithm
for splitting a set of
elements into one or more equivalency classes, as described inhttp://en.wikipedia.org/wiki/Disjoint-set_data_structure .
The function returns the number of equivalency classes.
[Arthur2007] | Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 2007 |
相关文章推荐
- mongoDB用法整理
- OpenCV&Qt学习之四——OpenCV 实现人脸检测与相关知识整理
- c++ list的用法(整理)
- bitset用法整理
- bitset 用法整理
- opencv 第十章 《跟踪与运动》知识点整理、总结
- 整理mysql的一些常用用法
- Json 用法整理
- 开发日志整理2【android layout用法注意点】
- c# static的全部用法收集整理
- find、locate、which、whereis命令用法整理
- 项目中使用easyui框架,常见的用法(慢慢整理)
- SQL 系统存储过程用法整理
- Promise和Async/Await用法整理
- mysql导入数据load data infile用法整理
- 【javascript】javascript中this用法整理(推荐)
- C++标准库:bitset 用法整理&&zoj 3812
- java类vector的详细用法整理
- MySQL中使用SHOW PROFILE命令分析性能的用法整理
- #ifndef#define#endif的用法(整理)