分布式数据存储系统
2010-12-14 15:24
387 查看
这类系统以Google的BigTable(来源于Fay Chang等人的论文Bigtable: A Distributed Storage System for Structured Data)为代表,因为它不是开源的,所以产生了许多开源的版本,比如Hypertable(C++语言编写)和HBase(Java,基于Hadoop之上)。
因为不支持SQL操作,所以有时也被叫做NoSQL数据存储。
那什么是分布式存储系统?
先看看Bigtable的定义:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance.
http://labs.google.com/papers/bigtable.html
来看看HBase的定义:
HBase is an open-source, distributed, column-oriented store modeled after the Google paper, "Bigtable: A Distributed Storage System for Structured Data" by Chang et al.
和传统的关系数据库的区别
HBase的领导人——Kellerman、Michael Stack和Byran Duxbury如下总结道:
HBase项目是为那些Oracle年许可费够得上一个小国家的国民生产总值(GNP)或由于其库表中有一些BLOB列且
行数达到了数百万级因而导致MySQL濒临崩溃的用户提供的。任何拥有大量的结构化或半结构化数据、而且正受限于关系数据库管理系统(RDBMS)的用户
都可以看看HBase。
和一些分布式缓存的区别
一些分布式缓存系统,比如Tangosol Coherence, GemFire,JBoss Cache和MemCached等,同样也可以做到分布式,可扩展性。stackoverflow上有个家伙总结得挺好(以Hypertable和Memcached比较为例):
Hypertable is an implementation of concepts in Google's Bigtable. Namely a column-oriented DB which has properties of being highly denormalized which means it doesn't need joins.
Memcached is an in-memory caching layer which acts like a distributed hashtable, keeping you app from having to hit the actual DB.
Both lend themselves well to being distributed and work well with MapReduce style topologies but they server different purposes. Memocached/DHT is going to serve to speed access to data in memory while HyperTable/Bigtable are actual mechanisms for permanent data storage on disk.
背景知识:
MapReduce, functional programming, Map, Reduce
HBase的领导人探讨Hadoop、BigTable和分布式数据库
http://www.infoq.com/cn/news/2008/05/hbase-interview
A
Compendium of solutions for scaling a Data Store
http://bhavin.directi.com/tag/cassandra/
Writing Scalable Software in Java
http://www.slideshare.net/rbadaro/writing-scalable-software-in-java
YunTable-云时代的BigTable
http://www.tektalk.org/2010/10/09/yuntable-%E4%BA%91%E6%97%B6%E4%BB%A3%E7%9A%84bigtable/
因为不支持SQL操作,所以有时也被叫做NoSQL数据存储。
那什么是分布式存储系统?
先看看Bigtable的定义:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance.
http://labs.google.com/papers/bigtable.html
来看看HBase的定义:
HBase is an open-source, distributed, column-oriented store modeled after the Google paper, "Bigtable: A Distributed Storage System for Structured Data" by Chang et al.
和传统的关系数据库的区别
HBase的领导人——Kellerman、Michael Stack和Byran Duxbury如下总结道:
HBase项目是为那些Oracle年许可费够得上一个小国家的国民生产总值(GNP)或由于其库表中有一些BLOB列且
行数达到了数百万级因而导致MySQL濒临崩溃的用户提供的。任何拥有大量的结构化或半结构化数据、而且正受限于关系数据库管理系统(RDBMS)的用户
都可以看看HBase。
和一些分布式缓存的区别
一些分布式缓存系统,比如Tangosol Coherence, GemFire,JBoss Cache和MemCached等,同样也可以做到分布式,可扩展性。stackoverflow上有个家伙总结得挺好(以Hypertable和Memcached比较为例):
Hypertable is an implementation of concepts in Google's Bigtable. Namely a column-oriented DB which has properties of being highly denormalized which means it doesn't need joins.
Memcached is an in-memory caching layer which acts like a distributed hashtable, keeping you app from having to hit the actual DB.
Both lend themselves well to being distributed and work well with MapReduce style topologies but they server different purposes. Memocached/DHT is going to serve to speed access to data in memory while HyperTable/Bigtable are actual mechanisms for permanent data storage on disk.
背景知识:
MapReduce, functional programming, Map, Reduce
HBase的领导人探讨Hadoop、BigTable和分布式数据库
http://www.infoq.com/cn/news/2008/05/hbase-interview
A
Compendium of solutions for scaling a Data Store
http://bhavin.directi.com/tag/cassandra/
Writing Scalable Software in Java
http://www.slideshare.net/rbadaro/writing-scalable-software-in-java
YunTable-云时代的BigTable
http://www.tektalk.org/2010/10/09/yuntable-%E4%BA%91%E6%97%B6%E4%BB%A3%E7%9A%84bigtable/
相关文章推荐
- Redis是一种面向“key-value”类型数据的分布式NoSQL数据库系统,具有高性能、持久存储、适应高并发应用场景等优势。它虽然起步较晚,但发展却十分迅速。
- 分布式爬虫系统设计、实现与实战:爬取京东、苏宁易购全网手机商品数据+MySQL、HBase存储
- Bigtable探秘 Google分布式数据存储系统
- Facebook开源LogDevice:一种用于日志的分布式数据存储系统
- cobar分布式数据存储系统cobar-server解析
- 大数据在分布式系统中的存储、管理与分析
- LogDevice:Facebook开发的分布式日志数据存储系统
- 一共81个,开源大数据处理工具汇总:查询引擎、流式计算、迭代计算、离线计算、键值存储、表格存储、文件存储、资源管理、日志收集系统、消息系统、分布式服务、集群管理、基础设施、搜索引擎、数据挖掘=监控
- Bigtable探秘 Google分布式数据存储系统DFS
- Bigtable探秘 Google分布式数据存储系统
- 一共81个,开源大数据处理工具汇总:查询引擎、流式计算、迭代计算、离线计算、键值存储、表格存储、文件存储、资源管理、日志收集系统、消息系统、分布式服务、集群管理、基础设施、搜索引擎、数据挖掘=监控
- 适宜于窗口访问模式的分布式瓦片数据存储系统TSS
- Bigtable探秘 Google分布式数据存储系统
- BigTable是Google设计的分布式数据存储系统
- 高性能分布式计算与存储系统设计概要——暨2012年工作3年半总结(上)
- Google 三大论文中文版之一个分布式的结构化数据存储系统
- C语言判断系统数据大/小端存储方式
- [置顶] 存储系统实现-数据文件格式
- CoreOS实践指南(六):分布式数据存储Etcd(下)
- logoOLAP 数据存储系统 Druid-IO