您的位置:首页 > 运维架构

ZooKeeper相关资料

2015-10-14 23:56 387 查看
ZooKeeper: enables coordination for distributed system

Similar to multithread programming, but shared nothing. Easier with a component provide share store, like ZooKeeper.

ZooKeeper manage data related to coordination, not for bulk storage

Master-Slave Key Problems:

1. Master Crashes

2. Slave Crashes

3. Communication Failure(Master and Slave cannot exchange message)

Master Failures

Store the state of the system at the time old primary master crashed.

Split-Brain Problem: more than one master running independently caused by false suspicion of master failed

Question: if there are 5 servers at beginning, when 3 separated from other 2, what will happen? will there be two separate master?(is this actually what split-brain means?) or just the group has the majority of 3?

Worker Failures:

Detect worker failure by master. If the computation has side effects, some recovery procedure might be necessary to clean up the state.

Communication Failure:

Reassigning a task may cause two workers executing the same task, which depends on application may unacceptable.

ephemeral state.

Cannot tell is a node is crashed or just be slow.

Summary of Master-Worker Tasks

1. Master Election

2. Crash Detection

3. Group Membership Management: figure out which workers are available,
group member is worker


4. Metadata Management: master and worker be able to store assignments and execution statuses in a reliable manner

ZooKeeper Basics

ZooKeeper not provide primitives, but expose file system like API comprised for a small set of calls.

Recipe to denote these implementation of primitives

API

create /path data

delete /path

exists /path

setData /path data

getData /path

getChildren /path

Modes

Persistent

Ephemeral: delete if client created it crashes or simply closed the connection. (Cannot have children)

Sequential znode: tasks-1, tasks-2 ....

All modes: persistent, ephemeral, persistent-sequential, ephemeral-sequential

Watches and Notifications

Clients register to watch notification instead of POLLING.

【IMPORTANT】One-Shot Notification, Example:

A want to watch on /tasks, but tasks has changed before A successfully set the watch, A may lost the notification, if the notification is important for A, so A get status of /tasks when setting the watch



【IMPORTANT】Notification Guarantee, notifications are delivered to a client before any other change is made to the znode.

Question: What if the Notification is lost?

Versions

use version to check valid update

ZooKeeper Architecture

Client Library: responsible for talking to servers

ZooKeeper Quorums

In Quorum mode, ZooKeeper replicates its data tree across all servers in the ensemble nodes.

quorum minimum number of legislators required to be present for a note. (How Quorum is be used? Solve the problem of delay to store data across all servers before return to client? Solve the partition problem?Solve the split-brain problem,抽屉原理)

当一个网络中被分隔成两部分以后,不管哪一部分被使用,都会有最新的更新

There are other quorums other than majority quorums

ZooKeeper Session

when a session ends for any reason, the ephemeral nodes created during session disappeared.(能有哪些原因?)

Moving a session to different server if has not heard from its current server for a some time. Moving session to a different server is transparently by ZooKeeper library.

Session offer order guarantee

ZooKeeper Lifecycle

Case: Waiting on CONNECTING During Network Partitions

ZooKeeper Quorums 

1. server need to know each other, each zookeeper server is configured with a list of servers, if quorum is reached, then the service is available.

2. client specify the host:port pair it tries to connect(we can limit the hosts client can reached to achieve simple location based balancing)

Can we enhance third-party framework or application to use ZooKeeper do coordination work?

Implementing a Primitive: Locks with ZooKeeper

To acquire a lock, create a ephemeral znode: /lock

Others watch changes for /lock

Implementation of a Master-Worker Example

Roles

Master / Worker / Client

Master

master相当于一个竞争资源,对应lock recipe,创建一个ephemeral node用来表示master

create ephemeral znode: /master

stat /master true

监听(watch)/master节点

Workers, Tasks, Assignments

persistent

create /workers ""

create /tasks "" 

create /assigns ""

ls /workers true

ls /tasks true

监听 /workers 子节点是否发生改变

The Work Role

create -e /workers/worker1.example.com "worker1.example.com:2224"

此时针对 ls /workers true 的watcher将会接受到通知

create -e /assign/worker1.example.com 

创建子节点接收任务分配

ls /assign/worker1.example.com true

The Client Role

create -s  /tasks/task- "cmd"

created /tasks/task-000000

-s 表明自增

ls /tasks/task-000000 true

监听task完成情况

当task创建完成以后,服务器监听到此事件,就去找当前的 tasks 和 workers 去完成任务分配

ls /tasks

=> [task-000000]

ls /workers

=> [worker1.example.com]

create /assign/worker1.example.com/task-000000 ""

此时worker将会接收到新分配任务的通知,执行任务

当worker完成任务以后,创建一个task的对应status node

/tasks/task-000000/status "done"

此时client会监听到task完成

IMPORTANT: ZooKeeper Internals

ZooKeeper Internals

in Ensemble, One leader handling all requests;  followers receive and vote for updates;observer not participate in decision process, they only learn what have been decided

Requests, Transactions, and Identifiers

Requests读写分离

ZooKeeper server process read requests locally (exists, getData, getChildren) .

ZooKeeper leader process write requests (create, delete, setData).

Question: 如果client和Server A建立了session,发起了一个write request,完成。再向Server A发起了一个read request,并不能保证Server A一定能读到最新的?

Transaction: one state update operation (write request)

check the version number

idempotent: apply transaction multi times get same result

zxid: ZooKeeper transaciton id

Leader Election

leader election notification: server identifier (sid) and most transaction it executes (zxid)

p158

Zab: ZooKeeper Atomic Broadcast protocol

References

<<ZooKeeper>>
http://blog.cloudera.com/blog/2013/02/how-to-use-apache-zookeeper-to-build-distributed-apps-and-why/ http://zookeeper.apache.org/doc/trunk/recipes.html http://www.ibm.com/developerworks/library/bd-zookeeper/ http://highscalability.com/blog/2008/7/15/zookeeper-a-reliable-scalable-distributed-coordination-syste.html https://engineering.pinterest.com/blog/zookeeper-resilience-pinterest
kafka
http://kafka.apache.org/documentation.html
paxos算法[ZooKeeper内部实现的算法模型]
http://baike.baidu.com/link?url=GZCf2VymTzeZDfGCRs2QgPpjLm6xJEX5TzvtWEQOULy77yEqO0nc6Gy6JOxJjIaBSPTCqv9E1fEKUx420AyX9q http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf http://www.ux.uis.no/~meling/papers/2013-paxostutorial-opodis.pdf https://www.youtube.com/watch?v=JEpsBg0AO6o 【非常好的解释,需要翻墙】
https://distributedthoughts.wordpress.com/2013/09/22/understanding-paxos-part-1/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  zookeeper hadoop hdfs