HBase Tutorial: Theory and Practice of a Distributed Data Store (2)
2013-06-13 07:52
316 查看
Non-Relational Databases
They originally do not support SQL
(1).In practice, this is becoming a thin line to make the distinction.
(2).One difference is in the data model.
(3).Another difference is in the consistency model(ACID and transactions are generally sacrificed).
Consistency models and the CAP theorem
Strict: all changes to data are atomic.
Sequential: changes to data are seen in the same order as they were applied.
Causal: causally related changes are seen in the same order.
Eventual: updates propagates through the system and replicas when in steady state.
Weak: no guarantee.
Data model:
How the data is stored: key/value, semi-structured, column-oritened,…
Consistency model: This translates in how fast the system handles READS and WRITES.
Atomic read-modify-write
(1).Easy in a centralized system, difficult in a distributed one.
(2).Prevent race conditions in multi-threaded or shared-nothing designs.
(3).Can reduces client-side complexity.
(4).Support for multiple clients accessing data simultaneously.
Database Normalization
Schema design at scale
(1).A good methodology is to apply the DDI principle
Denormalization
Duplication
Intelligent Key design
Denormalization
Duplicate data in more than one table such that at READ time no further aggregation is required.
What is BigTable?
BigTable is a distributed storage system for managing structured data designed to scale to a very large size
BigTable is a sparse,distributed, persistent multi-dimensional sorted map
What is HBase?
Essentially it’s an open-source version of BigTable
The most basic unit in HBase is a column
(1).Each column may have multiple versions, with each distinct value contained in a separate cell
(2).One or more columns form a row,that is addressed uniquely by a row key.
They originally do not support SQL
(1).In practice, this is becoming a thin line to make the distinction.
(2).One difference is in the data model.
(3).Another difference is in the consistency model(ACID and transactions are generally sacrificed).
Consistency models and the CAP theorem
Strict: all changes to data are atomic.
Sequential: changes to data are seen in the same order as they were applied.
Causal: causally related changes are seen in the same order.
Eventual: updates propagates through the system and replicas when in steady state.
Weak: no guarantee.
Data model:
How the data is stored: key/value, semi-structured, column-oritened,…
Consistency model: This translates in how fast the system handles READS and WRITES.
Atomic read-modify-write
(1).Easy in a centralized system, difficult in a distributed one.
(2).Prevent race conditions in multi-threaded or shared-nothing designs.
(3).Can reduces client-side complexity.
(4).Support for multiple clients accessing data simultaneously.
Database Normalization
Schema design at scale
(1).A good methodology is to apply the DDI principle
Denormalization
Duplication
Intelligent Key design
Denormalization
Duplicate data in more than one table such that at READ time no further aggregation is required.
What is BigTable?
BigTable is a distributed storage system for managing structured data designed to scale to a very large size
BigTable is a sparse,distributed, persistent multi-dimensional sorted map
What is HBase?
Essentially it’s an open-source version of BigTable
The most basic unit in HBase is a column
(1).Each column may have multiple versions, with each distinct value contained in a separate cell
(2).One or more columns form a row,that is addressed uniquely by a row key.
相关文章推荐
- HBase Tutorial: Theory and Practice of a Distributed Data Store(1)
- Game Sound: An Introduction to the History, Theory, and Practice of Video Game Music and Sound Desig
- Systems Modelling: Theory and Practice
- Java theory and practice: Fixing the Java Memory Model, Part 2
- Java theory and practice: Fixing the Java Memory Model, Part 2
- SIGGRAPH 2016 Course: Physically Based Shading in Theory and Practice
- NLP-文献-Distributed Representations of Sentences and Documents
- Resources for Reinforcement Learning: Theory and Practice
- Java theory and practice
- [论文笔记] A novel reduction approach to analyzing QoS of workflow processes (Concurrency and Computation: Practice and Experience, 2009)
- Modern Cryptography: Theory and Practice
- Java theory and practice: Dealing with InterruptedException
- Distributed Representations of Words and Phrasesand their Compositionality
- 读论文《Distributed Representations of Words and Phrases and their Compositionality》
- Java theory and practice
- Software Engineering: Theory and Practice
- Best Practice: GetProcessHeaps 和 GetProcessHeap 的使用。(The best practice of GetProcessHeaps and GetProcessHeap)
- Time, Clocks, and the Ordering of Events in a Distributed System
- XML in Theory and Practice
- Distributed Representations of Words and Phrasesand their Compositionality