Managing Multiple Resources in Hadoop 2 with YARN
2013-12-25 23:17
141 查看
An overview of some of Cloudera’s contributions to YARN that help support management of multiple resources, from multi resource scheduling in the Fair Schedule to node-level enforcement
As Apache Hadoop become ubiquitous, it is becoming more common for users to run diverse sets of workloads on Hadoop, and these jobs are more likely to have different resource profiles. For example, a MapReduce distcp job orCloudera
Impala query>Apache Spark (incubating)
job executing an iterative machine-learning algorithm with complex updates may wish to store the entire dataset in memory and use spurts of CPU to perform complex computation on it.
For that reason, the new YARN framework in Hadoop 2 allows workloads to share cluster resources dynamically between a variety of processing frameworks, including MapReduce, Impala, and Spark. YARN currently handles memory and CPU and will coordinate additional
resources like disk and network I/O in the future.
Accounting for memory, CPU, and other resources separately confers several advantages:
It allows us to treat tenants on a Hadoop cluster more fairly by rationing the resources that are most utilized at a point in time.
It makes resource configuration more straightforward, because a single resource does not need to be used as a proxy for others.
It provides more predictable performance by not oversubscribing nodes, and protects higher-priority workloads with better isolation.
Finally, it can increase cluster utilization because all the above mean that resource needs and capacities can be configured less conservatively.
One of Cloudera’s top priorities in Cloudera Enterprise 5 (in>
Background
In Hadoop 1, a single dimension, the “slot”, represented resources on a cluster. Each node was configured with a number of slots, and each map or reduce task occupied a single slot regardless of how much memory or CPU it used.
This approach offered the benefit of simplicity but had a few disadvantages. Because of the coarse-grained abstraction, it was common for a node’s resources to be over or under subscribed. Initially, YARN improved this situation by switching to memory-based
scheduling – in YARN, each node is configured with a set amount of memory and applications (such as MapReduce request containers) for their tasks with configurable amounts of memory. More recently, YARN added CPU as a resource in the same manner: Nodes are
configured with a number of “virtual cores” (vcores) and applications give a vcore number when requesting a container. In almost all cases, a node’s virtual core capacity should be set as the number of physical cores on the machine. CPU capacity is configured
with the yarn.nodemanager.resource.cpu-vcores, and the mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores properties can be used to change the CPU request for MapReduce tasks from the default of 1.
Perhaps the most difficult challenge when managing multiple resources is deciding how to share them fairly. With a single resource, moving toward a fair allocation across a set of agents is pretty straightforward. When there’s space to place a container, we
give it to the agent that has the least total memory already allocated. The Hadoop Fair Scheduler has been taking this approach for years, allocating containers to the agent (pool or application) with the smallest current allocation. (A recent
blog postdescribes how we extended this model to a hierarchy of pools.)
But what if you want to schedule both memory and CPU, and you need them in possibly different and changing proportions? If you have 6GB and three cores, and I have 4GB and two cores, it’s pretty clear that I should get the next container. What if you have 6GB
and three cores, but I have 4GB and four cores? Sure, you have a larger total number of units, but cores might be more valuable. To complicate things further, I might care about CPU more than you do.
To navigate this problem, YARN drew on recent
research from Ghodsi et al at UC Berkeley that presents a notion of fairness that works with multiple resources. The researchers chose this notion, called dominant resource fairness (DRF), over some other options because it provides several properties
that are trivially satisfied in the single resource case but more difficult in the multi resource case. For example, it is strategy-proof — meaning that agents cannot increase the amount of resources they receive by “lying” about what they need, and it incentivizes
sharing, meaning that agents with separate clusters of equal size can only stand to benefit by merging their clusters.
The basic idea is as follows: My share of a resource is the ratio between the amount of that resource allocated to me and the total capacity on the cluster of that resource. So if the cluster has 10GB and I have 4GB, then my share of memory is 40%. If the cluster
has 20 cores and I have 5 of them, my share of CPU is 25%. My dominant share is simply the max of resource shares, in this case 40%.
With single resource fairness, we try to equalize the shares of that resource across agents. When it’s time to allocate a container, we give it to the agent with the lowest share. With DRF, we try to equalize the dominant shares across agents, where the dominant
share can come from a different resource for each agent. If you have 1GB and 10 cores, I get the next container, because my 40% dominant share of memory is less than your 50% dominant share of CPU.
We can use DRF when working with hierarchical queues similarly to how we do so with a single resource. We assign containers by starting at the root queue and traversing the queue tree. At each node, we explore the nodes below it in the order given by our fairness
algorithm (in order of smallest dominant share).
So, DRF gives us a way to schedule multiple resources. But what about enforcing the allocations? Because memory is for the most part inelastic, we enforce memory limits by killing container processes when they go over their memory allocation. While we could
do the same for CPU, this approach is a little bit harsh because the application doesn’t really have much say in the matter unless we expect it to insert a millisecond sleep after every five lines of code. What we really want is a way to control what proportion
of CPU time is allotted to each process.
Luckily, the Linux kernel provides us exactly this in a feature called cgroups (control groups). With cgroups, you can place a YARN container process and all the threads it spawns in a control group. You then allocate a number of “CPU shares” to the process
and place it in the cgroups hierarchy next to the other YARN container processes. Available CPU cycles will then be allotted to these processes in proportion to the number of shares given to them. Thus, if the node is not fully scheduled or containers are
not using their full allotment, other containers will get to use the extra CPU.
cgroups also provides similar controls for disk and network I/O that we will probably use in the future for managing these resources.
We implemented cgroups CPU enforcement in YARN-3. To turn on cgroups CPU enforcement from Cloudera Manager:
Go to the configuration for the YARN service check the boxes for “Use CGroups for Resource Management” and “Always use Linux Container Executor”.
Go to the hosts configuration and check the box for ”Enable CGroup-based Resource Management”.
If you are not using Cloudera Manager:
Use the LinuxContainerExecutor. This requires setting yarn.nodemanager.container-executor.class to org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor in all NodeManager configs and setting certain permissions, as described here.
Set yarn.linux-container-executor.resources-handler.class to org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler
There are a ton of details we didn’t cover regarding how multiple resources interact with Fair Scheduler features like per-queue minimums and maximums, preemption, and fair share reporting. For more info, check out the Fair Schedulerdocumentation.
Sandy Ryza is a Software Engineer at Cloudera and a Hadoop Committer.
Ref: http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
As Apache Hadoop become ubiquitous, it is becoming more common for users to run diverse sets of workloads on Hadoop, and these jobs are more likely to have different resource profiles. For example, a MapReduce distcp job orCloudera
Impala query>Apache Spark (incubating)
job executing an iterative machine-learning algorithm with complex updates may wish to store the entire dataset in memory and use spurts of CPU to perform complex computation on it.
For that reason, the new YARN framework in Hadoop 2 allows workloads to share cluster resources dynamically between a variety of processing frameworks, including MapReduce, Impala, and Spark. YARN currently handles memory and CPU and will coordinate additional
resources like disk and network I/O in the future.
Accounting for memory, CPU, and other resources separately confers several advantages:
It allows us to treat tenants on a Hadoop cluster more fairly by rationing the resources that are most utilized at a point in time.
It makes resource configuration more straightforward, because a single resource does not need to be used as a proxy for others.
It provides more predictable performance by not oversubscribing nodes, and protects higher-priority workloads with better isolation.
Finally, it can increase cluster utilization because all the above mean that resource needs and capacities can be configured less conservatively.
One of Cloudera’s top priorities in Cloudera Enterprise 5 (in>
Background
In Hadoop 1, a single dimension, the “slot”, represented resources on a cluster. Each node was configured with a number of slots, and each map or reduce task occupied a single slot regardless of how much memory or CPU it used.
This approach offered the benefit of simplicity but had a few disadvantages. Because of the coarse-grained abstraction, it was common for a node’s resources to be over or under subscribed. Initially, YARN improved this situation by switching to memory-based
scheduling – in YARN, each node is configured with a set amount of memory and applications (such as MapReduce request containers) for their tasks with configurable amounts of memory. More recently, YARN added CPU as a resource in the same manner: Nodes are
configured with a number of “virtual cores” (vcores) and applications give a vcore number when requesting a container. In almost all cases, a node’s virtual core capacity should be set as the number of physical cores on the machine. CPU capacity is configured
with the yarn.nodemanager.resource.cpu-vcores, and the mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores properties can be used to change the CPU request for MapReduce tasks from the default of 1.
Dominant Resource Fairness
Perhaps the most difficult challenge when managing multiple resources is deciding how to share them fairly. With a single resource, moving toward a fair allocation across a set of agents is pretty straightforward. When there’s space to place a container, wegive it to the agent that has the least total memory already allocated. The Hadoop Fair Scheduler has been taking this approach for years, allocating containers to the agent (pool or application) with the smallest current allocation. (A recent
blog postdescribes how we extended this model to a hierarchy of pools.)
But what if you want to schedule both memory and CPU, and you need them in possibly different and changing proportions? If you have 6GB and three cores, and I have 4GB and two cores, it’s pretty clear that I should get the next container. What if you have 6GB
and three cores, but I have 4GB and four cores? Sure, you have a larger total number of units, but cores might be more valuable. To complicate things further, I might care about CPU more than you do.
To navigate this problem, YARN drew on recent
research from Ghodsi et al at UC Berkeley that presents a notion of fairness that works with multiple resources. The researchers chose this notion, called dominant resource fairness (DRF), over some other options because it provides several properties
that are trivially satisfied in the single resource case but more difficult in the multi resource case. For example, it is strategy-proof — meaning that agents cannot increase the amount of resources they receive by “lying” about what they need, and it incentivizes
sharing, meaning that agents with separate clusters of equal size can only stand to benefit by merging their clusters.
The basic idea is as follows: My share of a resource is the ratio between the amount of that resource allocated to me and the total capacity on the cluster of that resource. So if the cluster has 10GB and I have 4GB, then my share of memory is 40%. If the cluster
has 20 cores and I have 5 of them, my share of CPU is 25%. My dominant share is simply the max of resource shares, in this case 40%.
With single resource fairness, we try to equalize the shares of that resource across agents. When it’s time to allocate a container, we give it to the agent with the lowest share. With DRF, we try to equalize the dominant shares across agents, where the dominant
share can come from a different resource for each agent. If you have 1GB and 10 cores, I get the next container, because my 40% dominant share of memory is less than your 50% dominant share of CPU.
We can use DRF when working with hierarchical queues similarly to how we do so with a single resource. We assign containers by starting at the root queue and traversing the queue tree. At each node, we explore the nodes below it in the order given by our fairness
algorithm (in order of smallest dominant share).
Enforcing CPU Allocations with CGroups
So, DRF gives us a way to schedule multiple resources. But what about enforcing the allocations? Because memory is for the most part inelastic, we enforce memory limits by killing container processes when they go over their memory allocation. While we coulddo the same for CPU, this approach is a little bit harsh because the application doesn’t really have much say in the matter unless we expect it to insert a millisecond sleep after every five lines of code. What we really want is a way to control what proportion
of CPU time is allotted to each process.
Luckily, the Linux kernel provides us exactly this in a feature called cgroups (control groups). With cgroups, you can place a YARN container process and all the threads it spawns in a control group. You then allocate a number of “CPU shares” to the process
and place it in the cgroups hierarchy next to the other YARN container processes. Available CPU cycles will then be allotted to these processes in proportion to the number of shares given to them. Thus, if the node is not fully scheduled or containers are
not using their full allotment, other containers will get to use the extra CPU.
cgroups also provides similar controls for disk and network I/O that we will probably use in the future for managing these resources.
We implemented cgroups CPU enforcement in YARN-3. To turn on cgroups CPU enforcement from Cloudera Manager:
Go to the configuration for the YARN service check the boxes for “Use CGroups for Resource Management” and “Always use Linux Container Executor”.
Go to the hosts configuration and check the box for ”Enable CGroup-based Resource Management”.
If you are not using Cloudera Manager:
Use the LinuxContainerExecutor. This requires setting yarn.nodemanager.container-executor.class to org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor in all NodeManager configs and setting certain permissions, as described here.
Set yarn.linux-container-executor.resources-handler.class to org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler
Want to Learn More?
There are a ton of details we didn’t cover regarding how multiple resources interact with Fair Scheduler features like per-queue minimums and maximums, preemption, and fair share reporting. For more info, check out the Fair Schedulerdocumentation.Sandy Ryza is a Software Engineer at Cloudera and a Hadoop Committer.
Ref: http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
相关文章推荐
- Managing Multiple Screens in JavaFX
- 条款15:在资源管理类中提供对原始资源的访问(Provide access to raw resources in resource-managing classes)
- How to chain multiple MapReduce jobs in Hadoop
- Managing Projects in Human Resources: Training and Developement
- hadoop 启动报错 Incompatible clusterIDs in /tmp/hadoop-root/dfs/data: namenode
- Searching for a String in Multiple Files
- Configuring and Managing Cluster Resources
- How to detect and avoid memory and resources leaks in .NET applications()
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/util/Apps Hadoop2.6.0编程问题与解决
- Resources in Visual Tracking(转载)
- Why need two IF in singleton pattern in the multiple threads scenario
- Java - Why multiple inheritances are not supported in Java
- hadoop datanode启动失败(All directories in dfs.data.dir are invalid)
- Failed to locate the winutils binary in the hadoop binary path
- 研磨Hadoop源码(二)-yarn-ClientToAMTokenSecretManagerInRM
- MapReduce算法设计--Think in Hadoop
- Hadoop 解除 "Name node is in safe mode"(转)
- Search across multiple lines using regular expression in VIM
- Multiple substitutions specified in non-positional format
- windows下eclipse远程连接hadoop错误“Exception in thread"main"java.io.IOException: Call to Master.Hadoop/172.20.145.22:9000 failed ”