您的位置:首页 > 其它

[Erlang危机](5.1.2)CPU

2014-11-09 13:12 260 查看

原创文章,转载请注明出处:服务器非业余研究http://blog.csdn.net/erlib 作者Sunface联系邮箱:cto@188.com


CPU

Unfortunately for Erlang developers, CPU is very hard to profile. There are a few reasons for this:
 • The VM does a lot of work unrelated to processes when it comes to scheduling — high scheduling work and high amounts of work done by the Erlang processes are hard to characterize.
 • The VM internally uses a model based on reductions, which represent an arbitrary number of work actions. Every function call, including BIFs, will increment a process reduction counter. After a given number of reductions, the process gets descheduled.
 • To avoid going to sleep when work is low, the threads that control the Erlang schedulers will do busy looping. This ensures the lowest latency possible for sudden load spikes. The VM flag +sbwt none|very_short|short|medium|long|very_long can be used to change this value.
对Erlang开发者来说,有个非常不幸的消息,CPU性能很难测量 :
 • VM在调度的时候会做很多和进程并没关系的工作——大量的调度和进程工作这是是很难量化描述的。
 • VM内部使用一个基于reductions的模型(reduction可以表示任意数量的工作行为)。每个函数的调用,包括BIFs,都会增加进程的reduction counter。进程被分配了一定数量的reductions后(默认为2000 - Sunface),会被调度器切换出去,该调度器会继续执行其它进程。 
    • 为了防止负载低时休眠,控制Erlang调度的线程会频繁地在一个loop里面跑。这是为了确保在系统负载突然增高的时候也能实现低延迟。可以用VM的+sbwt none|very_short|short|medium|long|very_long 选项来调整这个值。
 These factors combine to make it fairly hard to find a good absolute measure of how busy your CPU is actually running Erlang code. It will be common for Erlang nodes in production to do a moderate amount of work and use a lot of CPU, but to actually fit a lot of work in the remaining place when the workload gets higher.
 The most accurate representation for this data is the scheduler wall time. It’s an optional metric that needs to be turned on by hand on a node, and polled at regular intervals. It will reveal the time percentage a scheduler has been running processes and normal Erlang code, NIFs, BIFs, garbage collection, and so on, versus the amount of time it has spent idling or trying to schedule processes.
 这些因素使得制定一个完全获取CPU真实运行情况的方案变得非常困难。所以这只适用于在生产中常见的Erlang节点在做适量的工作和使用大量的CPU的情况,并不适用于高负载工作时。
   对于CPU数据最真实精确的选项是:scheduler wall time,需要在启动节点时打开这个选项并做定期的轮询。它会显示某个调度器的各项工作的时间占用百分比:执行进程,NIFS,BIFS,垃圾回收(GC)等等,并会与调度器的空转时间进行对比。 The value here represents scheduler utilization rather than CPU utilization. The higher the ratio, the higher the workload.
 While the basic usage is explained in the Erlang/OTP reference manual 13, the value can be obtained by calling recon:
  这个值与其说代表CPU利用率,不如更准确地说是代表调度器利用率。百分比越高,负载就越大。基本的用法在Erlang/OTP的阅读手册中有说明 13,同时也能通过recon来获取。
-----------------------------------------------------------------------
1> recon:scheduler_usage(1000).

[{1,0.9919596133421669},

{2,0.9369579039389054},

{3,1.9294092120138725e-5},

{4,1.2087551402238991e-5}]

-----------------------------------------------------------------------
 The function recon:scheduler_usage(N) will poll for N milliseconds (here, 1 second) and output the value of each scheduler. In this case, the VM has two very loaded schedulers (at 99.2% and 93.7% repectively), and two mostly unused ones at far below 1%. Yet, a tool like htop would report something closer to this for each core: reco:scheduler_usage(N)会做N毫秒间隔的轮询(这里是1s = 1000ms),定时地把每个调度器的值都输出来。在这个例子中,VM有2个正在负载的很高调度器(一个99.2%,一个93.7%).还有2个空闲的调度器(低于1%)。一个类似于htop的工具也可以提供类似每个核的指标:-----------------------------------------------------------------------
1 [||||||||||||||||||||||||| 70.4%]

2 [||||||| 20.6%]

3 [|||||||||||||||||||||||||||||100.0%]

4 [|||||||||||||||| 40.2%]

-----------------------------------------------------------------------
 The result being that there is a decent chunk of CPU usage that would be mostly free for scheduling actual Erlang work (assuming the schedulers are busy waiting more than trying to select tasks to run), but is being reported as busy by the OS.
 Another interesting behaviour possible is that the scheduler usage may show a higher rate (1.0) than what the OS will report. Schedulers waiting for os resources are considered utilized as they cannot handle more work. If the OS itself is holding up on non-CPU tasks it is still possible for Erlang’s schedulers not to be able to do more work and report a full ratio.
 These behaviours may especially be important to consider when doing capacity planning, and can be better indicators of headroom than looking at CPU usage or load. 上面的结果显示:有相当一部分的CPU本来是用来调度erlang任务的(假设调度器都忙于等待多个需要运行的进程而不是尝试去选择执行任务),但是却被OS报告说这部分CPU很繁忙。(这里不太好理解,我个人的理解是作者想说明部分CPU明明应该是空闲状态,但是因为VM中有调度器在等待一些进程,就被OS描述为繁忙,因此系统显示的CPU占用率要高于实际占用率 - Sunface)
 另一个非常有趣的行为可以是recon显示的调度器的使用率可能会比OS报告中的数值高。调度器在等待OS资源时会被认为已经被利用因此无法再处理其它工作,而这时OS在挂起执行其它不需要CPU的任务时,Erlang的调度器就可能显示一个100%,而OS的CPU利用率的显示会远低于这个值。
 因此在做实际负载评估时要特别去考虑这些行为,比直接去看CPU使用量和负载评估的更准确。[13] http://www.erlang.org/doc/man/erlang.html#statistics_scheduler_wall_time[注13]: http://www.erlang.org/doc/man/erlang.html#statistics_scheduler_wall_time
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  erlang