Discarding Data


When nothing can slow down outside of your Erlang system and things can’t be scaled up, you must either drop data or crash (which drops data that was in flight, for most cases, but with more violence).


It’s a sad reality that nobody really wants to deal with. Programmers, software engineers, and computer scientists are trained to purge the useless data, and keep everything that’s useful. Success comes through optimization, not giving up.


However, there’s a point that can be reached where the data that comes in does so at a rate faster than it goes out, even if the Erlang system on its own is able to do everything fast enough. In some cases, It’s the component after it that blocks.


If you don’t have the option of limiting how much data you receive, you then have to drop messages to avoid crashing.


Random Drop


Randomly dropping messages is the easiest way to do such a thing, and might also be the most robust implementation, due to its simplicity.
The trick is to define some threshold value between 0.0 and 1.0 and to fetch a random number in that range:
   随机丢弃消息是最简单的实现方式,但正是因为简单,可能也是最粗暴的实现方式。这个方法的难点在于如何定义0.0~1.0之间的阈值(丢弃总数据1%-100%之间一个值    -Sunface),并随机取阈值范围内的一个数据:



random(Rate) ->


  random:uniform() =< Rate.%%若随机数小于阈值则返回true

maybe_seed() ->

  case get(random_seed) of

    undefined -> 


    {X,X,X} -> 


    _ -> 




If you aim to keep 95% of the messages you send, the authorization could be written by a call to case drop:random(0.95) of true -> send(); false -> drop() end, or a shorter drop:random(0.95) andalso send() if you don’t need to do anything specific when dropping a message.

case drop:random(0.95) of

  ture -> 


  false -> 



drop:random(0.95) andalso send().


The maybe_seed() function will check that a valid seed is present in the process dictionary and use it rather than a crappy one, but only if it has not been defined before, in order to avoid calling now() (a monotonic function that requires a global lock) too often.
There is one ‘gotcha’ to this method, though: the random drop must ideally be done at the producer level rather than at the queue (the receiver) level.

 maybe_seed() 函数会在进程字典里面检查是否存在一个有效的随机数种子,并使用它。但这只针对种子未被定义的情况,用来避免每次都要调用一个now()(这个函数是有一个系统全局锁的,详见Erlang取当前时间的瓶颈以及解决方案 - Sunface)。maybe_seed()函数包含了'gotcha',试想:理想状况下,随机丢弃应该发生在具体的消息处理进程,而不是分发消息队列的进程。

The best way to avoid overloading a queue is to not send data its way in the first place. Because there are no bounded mailboxes in Erlang, dropping in the receiving process only guarantees that this process will be spinning wildly, trying to get rid of messages, and fighting the schedulers to do actual work.

On the other hand, dropping at the producer level is guaranteed to distribute the work equally across all processes.


This can give place to interesting optimizations where the working process or a given monitor process15 uses values in an ETS table or application:set_env/3 to dynamically increase and decrease the threshold to be used with the random number.

This allows control over how many messages are dropped based on overload, and the configuration data can be fetched by any process rather efficiently by using application:get_env/2.
Similar techniques could also be used to implement different drop ratios for different message priorities, rather than trying to sort it all out at the consumer level.


[15] Any process tasked with checking the load of specific processes using heuristics such as process_info(Pid, message_queue_len) could be a monitor


Queue Buffers


Queue buffers are a good alternative when you want more control over the messages you get rid of than with random drops, particularly when you expect overload to be coming in bursts rather than a constant stream in need of thinning.


Even though the regular mailbox for a process has the form of a queue, you’ll generally want to pull all the messages out of it as soon as possible. A queue buffer will need two processes to be safe:
• The regular process you’d work with (likely a gen_server);
• A new process that will do nothing but buffer the messages. Messages from the outside should go to this process.
 • 常规的工作处理进程(就像gen_server)。
 • 一个只实现消息缓冲功能的新进程,外部的消息应当先进入到这个进程中。

To make things work, the buffer process only has to remove all the messages it can from its mail box and put them in a queue data structure 16 it manages on its own.


Whenever the server is ready to do more work, it can ask the buffer process to send it a given number of messages that it can work on. The buffer process picks them from its queue, forwards them to the server, and goes back to accumulating data.


Whenever the queue grows beyond a certain size 17 and you receive a new message, you can then pop the oldest one and push the new one in there, dropping the oldest elements as you go. 18


This should keep the entire number of messages received to a rather stable size and provide a good amount of resistance to overload, somewhat similar to the functional version of a ring buffer.
The PO Box 19 library implements such a queue buffer.
 这样就能让消息流的长度控制在一个合理的范围内,让系统稳定的同时也能防止过载,有些类似于ring buff。19 
  PO Box19 就提供了这样一个缓冲队列。

[16] The queue module in Erlang provides a purely functional queue data structure that can work fine for such a buffer.
[17] To calculate the length of a queue, it is preferable to use a counter that gets incremented and decremented on each message sent or received, rather than iterating over the queue every time. It takes slightly more memory, but will tend to distribute the load of counting more evenly, helping predictability and avoiding more sudden build-ups in the buffer’s mailbox.
[18] You can alternatively make a queue that pops the newest message and queues up the oldest ones if you feel previous data is more important to keep.
[19] Available at: https://github.com/ferd/pobox, the library has been used in production for a long time in large scale products at Heroku and is considered mature

[注19]: https://github.com/ferd/pobox 这个库很久之前大规模用于Heroku,是很成熟的库。
