 [b]原创文章,转载请注明出处:服务器非业余研究http://blog.csdn.net/erlib 作者Sunface[/b]

Stack Buffers


Stack buffers are ideal when you want the amount of control offered by queue buffers, but you have an important requirement for low latency.
To use a stack as a buffer, you’ll need two processes, just like you would with queue buffers, but a list 20 will be used instead of a queue data structure.
  使用堆栈来作缓冲,你也需要2个进程,就和队列缓冲一样的,但是需要用List(列表)21 结构来取代队列缓冲的queue结构(队列).

The reason the stack buffer is particularly good for low latency is related to issues similar to buffer bloat 21. If you get behind on a few messages being buffered in a queue, all the messages in the queue get to be slowed down and acquire milliseconds of wait time.
Eventually, they all get to be too old and the entire buffer needs to be discarded.


On the other hand, a stack will make it so only a restricted number of elements are kept waiting while the newer ones keep making it to the server to be processed in a timely manner.
Whenever you see the stack grow beyond a certain size or notice that an element in it is too old for your QoS requirements you can just drop the rest of the stack and keep going from there. PO Box also offers such a buffer implementation.

A major downside of stack buffers is that messages are not necessarily going to be processed in the order they were submitted — they’re nicer for independent tasks, but will ruin your day if you expect a sequence of events to be respected.


[20] Erlang lists are stacks. For all we care, they provide push and pop operations that take O(1) complexity and are very fast.
[21] http://queue.acm.org/detail.cfm?id=2071893

[注20]:Erlang List就是一个堆栈,我们都关心地是他提供压栈和弹出操作且只有O(1)的复杂度。

Time-Sensitive Buffers


If you need to react to old events before they are too old, then things become more complex, as you can’t know about it without looking deep in the stack each time, and dropping from the bottom of the stack in a constant manner gets to be inefficient.


An interesting approach could be done with buckets, where multiple stacks are used, with each of them containing a given time slice. When requests get too old for the QoS constraints, drop an entire bucket, but not the entire buffer.

 下面介绍一个有意思实现方法:使用多重堆栈,每个堆栈都包含一个给定的时间片(time slice).当消息过时并且不符合QoS的约束时,就把相应的堆栈丢弃掉,注意不是整个缓冲区。

It may sound counter-intuitive to make some requests a lot worse to benefit the majority — you’ll have great medians but poor 99 percentiles — but this happens in a state where you would drop messages anyway, and is preferable in cases where you do need low latency.

Dealing With Constant Overload


Being under constant overload may require a new solution. Whereas both queues and buffers will be great for cases where overload happens from time to time (even if it’s a rather prolonged period of time), they both work more reliably when you expect the input rate to eventually drop, letting you catch up.


You’ll mostly get problems when trying to send so many messages they can’t make it all to one process without overloading it.


Two approaches are generally good for this case:
 • Have many processes that act as buffers and load-balance through them (scale horizontally)
 • use ETS tables as locks and counters (reduce the input)

       • 使用更多的进程来作为缓冲区,通过这些缓冲区实现负载均衡 (水平扩展)
    • 使用ETS表来作锁(locks)和计数器(counters) (减少输入)

ETS tables are generally able to handle a ton more requests per second than a process, but the operations they support are a lot more basic. A single read, or adding or removing from a counter atomically is as fancy as you should expect things to get for the general case.
ETS tables will be required for both approaches. Generally speaking, the first approach could work well with the regular process registry:


you take N processes to divide up the load, give them all a known name, and pick one of them to send the message to. Given you’re pretty much going to assume you’ll be overloaded, randomly picking a process with an even distribution tends to be reliable: no state communication is required, work will be shared in a roughly equal manner, and it’s rather insensitive to failure.


In practice, though, we want to avoid atoms generated dynamically, so I tend to prefer to register workers in an ETS table with read_concurrency set to true.
It’s a bit more work, but it gives more flexibility when it comes to updating the number of workers later on.


An approach similar to this one is used in the lhttpc 22 library mentioned earlier, to split load balancers on a per-domain basis.

还有个与上面这个类似的方法:此方法在前面提到的lhttpc22有使用,那就是将负载均衡放在每一个基本域上(per-domain basis).

For the second approach, using counters and locks, the same basic structure still remains (pick one of many options, send it a message), but before actually sending a message, you must atomically update an ETS counter 23. There is a known limit shared across all clients (either through their supervisor, or any other config or ETS value) and each request that can be made to a process needs to clear this limit first.


This approach has been used in dispcount 24 to avoid message queues, and to guarantee low-latency responses to any message that won’t be handled so that you do not need to wait to know your request was denied. It is then up to the user of the library whether to give up as soon as possible, or to keep retrying with different workers.


[22] The lhttpc_lb module in this library implements it.
[23] By using ets:update_counter/3.
[24] https://github.com/ferd/dispcount

[注22] :可以查看lhttpc_l模块实现了。
[注23] :通过使用ets:update_counter/3。
[注24] :https://github.com/ferd/dispcount

How Do You Drop


Most of the solutions outlined here work based on message quantity, but it’s also possible to try and do it based on message size, or expected complexity, if you can predict it. When using a queue or stack buffer, instead of counting entries, all you may need to do is count their size or assign them a given load as a limit.

 大部分这里列出的方法都是基于消息数量的,但也可以试下基于消息大小或某个更复杂的指标。当你使用队列或多缓冲区(stack buffer),而不是计算消息总数量,你可能需要做的是计算好消息大小或者给予一个限定大小的负载。

I’ve found that in practice, dropping without regard to the specifics of the message works rather well, but each application has its share of unique compromises that can be acceptable or not 25.


There are also cases where the data is sent to you in a "fire and forget" manner — the entire system is part of an asynchronous pipeline — and it proves difficult to provide feedback to the end-user about why some requests were dropped or are missing.

 还有一些情况是是数据发给系统后,就置之不理了 ---整个系统都是异步管道的一部分----这个就很难给终端用户反馈为什么有些请求会被忽略或丢弃掉。

If you can reserve a special type of message that accumulates dropped responses and tells the user " N messages were dropped for reason X", that can, on its own, make the compromise far more acceptable to the user.


This is the choice that was made with Heroku’s logplex log routing system, which can spit out L10 errors, alerting the user that a part of the system can’t deal with all the volume right now.

 HeroKu的logplex 就是这样为系统处理日志的,它可以把日志分成10个等级,并会警告用户哪个等级的日志现在不能处理。

In the end, what is acceptable or not to deal with overload tends to depend on the humans that use the system. It is often easier to bend the requirements a bit than develop new technology, but sometimes it is just not avoidable.


[25] Old papers such as Hints for Computer System Designs by Butler W. Lampson recommend dropping messages: "Shed load to control demand, rather than allowing the system to become overloaded." The paper also mentions that "A system cannot be expected to function well if the demand for any resource exceeds two-thirds of the capacity, unless the load can be characterized extremely well." adding that "The only systems in which cleverness has worked are those with very well-known loads."

[注25] : Butler W.Lampson写的 Hints for Computer System Designs 论文中建议:"宁愿控制负载,也不能让系统超负荷",论文还指出"一个系统如果被要求工作大于2/3的负载之上,就不可以被认为是正常工作的","聪明的系统都会工作在适当的负载下"。
