在Hbase中选择多少个column family才合适呢?
2011-05-14 20:56
741 查看
下面主要说的是在设计Hbase schema的时候,要尽量只有一个column family,至于为什么主要从flush和compaction说起,它们触发的基本单位都是Region级别,所以当一个column family有大量的数据的时候会触发整个region里面的其他column family的memstore(其实这些memstore可能仅有少量的数据,还不需要flush的)也发生flush动作;另外compaction触发的条件是当store file的个数(不是总的store file的大小)达到一定数量的时候会发生,而flush产生的大量store file通常会导致compaction,flush/compaction会发生很多IO相关的负载,这对Hbase的整体性能有很大影响,所以选择合适的column family个数很重要。
下面是关于这方面的英文原文:
HBase currently does not do well with anything about two or three column families so keep the number of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed though the amount of data they carry is small. Compaction is currently triggered by the total number of files under a column family. Its not size based. When many column families the flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by changing flushing and compaction to work on a per column family basis).
Try to make do with one column famliy if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time.
下面是关于这方面的英文原文:
HBase currently does not do well with anything about two or three column families so keep the number of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed though the amount of data they carry is small. Compaction is currently triggered by the total number of files under a column family. Its not size based. When many column families the flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by changing flushing and compaction to work on a per column family basis).
Try to make do with one column famliy if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time.
相关文章推荐
- HBase中为什么要有Column Family
- Understanding HBase column-family performance options
- mysql如何选择合适的数据类型存储不同的数据
- 如何选择合适的PoE交换机?
- 如何为Kafka集群选择合适的Partitions数量
- 选择合适的数据类型char和varchar , text和blob,浮点数(float)和定点数(decimal),日期类型选择,字符集的选择
- 机器学习总结 选择一个合适的算法
- 选择合适的监控指标 确保跨境电商网站业务稳步增长
- MySQL如何选择合适的存储引擎
- 在线教育,如何选择一个合适的视频云平台
- [Hadoop in China 2011] eBay:选择HBase建立搜索引擎的原因
- 如何选择合适的MySQL存储引擎
- HBase addColumn addColumns 两个函数的deprecated解决方法
- ext.net 前台创建GridPanel,store ,后台设置Column,model,完成数据绑定。主要实现行选择事件
- CMakeLists.txt文件写法(0):选择合适的文件编辑器
- 企业如何选择合适的BI工具?
- 8.2 第八章 选择合适的数据类型
- 转:邹建--选择合适的游标类型
- 如何选择合适的云服务器商
- Executors.newFixedThreadPool(NTHREADS)线程池数量设置多少合适?