scikit-learn(工程中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维)
2015-08-11 08:37
453 查看
参考:http://scikit-learn.org/stable/modules/clustering.html
在实际项目中,我们真的很少用到那些简单的模型,比如LR、kNN、NB等,虽然经典,但在工程中确实不实用。
今天我们不关注具体的模型,而关注无监督的聚类方法。
之所以关注无监督聚类方法,是因为,在实际项目中,我们除了使用PCA等方法降维外,有时候我们也会考虑使用聚类的方法降维特征。
Overview of clustering methods:
![](https://oscdn.geek-share.com/Uploads/Images/Content/201909/26/748d02429b0b79895f8924078dadb801)
A comparison of the clustering algorithms in scikit-learn
[thead]
在实际项目中,我们真的很少用到那些简单的模型,比如LR、kNN、NB等,虽然经典,但在工程中确实不实用。
今天我们不关注具体的模型,而关注无监督的聚类方法。
之所以关注无监督聚类方法,是因为,在实际项目中,我们除了使用PCA等方法降维外,有时候我们也会考虑使用聚类的方法降维特征。
Overview of clustering methods:
A comparison of the clustering algorithms in scikit-learn
Method name | Parameters | Scalability | Usecase | Geometry (metric used) |
---|---|---|---|---|
K-Means | number of clusters | Very large n_samples, medium n_clusterswith MiniBatch code | General-purpose, even cluster size, flat geometry, not too many clusters | Distances between points |
Affinity propagation | damping, sample preference | Not scalable with n_samples | Many clusters, uneven cluster size, non-flat geometry | Graph distance (e.g. nearest-neighbor graph) |
Mean-shift | bandwidth | Not scalable withn_samples | Many clusters, uneven cluster size, non-flat geometry | Distances between points |
Spectral clustering | number of clusters | Medium n_samples, small n_clusters | Few clusters, even cluster size, non-flat geometry | Graph distance (e.g. nearest-neighbor graph) |
Ward hierarchical clustering | number of clusters | Large n_samples andn_clusters | Many clusters, possibly connectivity constraints | Distances between points |
Agglomerative clustering | number of clusters, linkage type, distance | Large n_samples andn_clusters | Many clusters, possibly connectivity constraints, non Euclidean distances | Any pairwise distance |
DBSCAN | neighborhood size | Very large n_samples, medium n_clusters | Non-flat geometry, uneven cluster sizes | Distances between nearest points |
Gaussian mixtures | many | Not scalable | Flat geometry, good for density estimation | Mahalanobis distances to centers |
Birch | branching factor, threshold, optional global clusterer. | Large n_clusters andn_samples | Large dataset, outlier removal, data reduction. | Euclidean distance between points |
相关文章推荐
- 软件设计师自我修炼1:如何让用户对软件产生依赖
- android ContentProvider的使用
- 杭电ACM(HDUOJ)试题分类
- MeasureSpec学习
- 动态规划之背包问题
- 计算组合数C(n,m)
- Property list 概述
- 装饰者设计模式
- nyoj32组合数(dfs模板)
- 在Java中辅助报表工具展现json
- 轻松入门React和Webpack
- 线程(2)-----线程池
- 编写Socket客户端和服务器程序,客户端发送一个包含多个数字的字符串给服务器,服务器排序后返回给客户端,要求服务器能连续不断地服务。
- tableView联动
- HDU - 3715 Go Deeper (二分 + 2-SAT)
- linux下的解压命令详解
- 抓包函数-pcap_next
- 测量平差之间接平差
- 2015-8-10工作日志
- 剑指Offer面试题44(Java版):扑克牌的顺序