解决服务器UDP数据包丢失问题
2016-08-25 17:33
204 查看
Severe UDP packet loss
While looking after a UDP based service, it came to my attention that we were losing a significant number of inbound packets. The first place to start is with netstat(8) and
you can use the -s option to check statistics for various protocols (or add -u for UDP only, or -t for TCP).
Example output of netstat -su:
This is showing the total number of UDP packets received and sent, plus two extra metrics. The second line shows UDP packets that were sent to a port that doesn't have a listening socket, then the third line shows packets that were dropped by the kernel.
Sockets contain a couple of buffers between the kernel and the application, one for receiving and one for sending data which have a fixed size. When the application fails to read from the buffer fast enough, packets will be discarded, incrementing the receive
error counter.
As no technical blog post is complete without a pretty graph, below is a graph generated using Munin, showing the UDP traffic flowing on one particular
system:
![](http://m0dlx.com/~dominic/netstat_udp_graph.png)
In the above graph, you can see the dominant line being the received packets and the turquoise line lower down is showing the packet receive errors.
On Linux, the buffer sizes are controlled by a group of sysctl parameters with rmem* being receive buffers and w* being send buffers:
net.core.rmem_default
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max
Checking a Debian Etch system, the default values for the max is about 128kB and the default size is 120kB. I've shown them here using the sysctl(8) tool.
Using sysctl, you can update the values of these parameters with the -w option:
This now causes any application to have increased buffer sizes on its sockets by default, which provided your application doesn't have other bottlenecks affecting its throughput, will give it a little more space. It's also possible to increase the maximum and
then have the application alter the socket size - see socket(7) for more info.
In our case, you can clearly see on the graph that the problem has been solved for a few days. We had to apply two changes mentioned:
Increasing the buffer size, which was done using the application config (and increasing the net.core.rmem_max parameter, leaving rmem_default alone)
by tweaking the application to increase its throughput, using more controlled buffering internally, rather than relying on the kernel socket buffering
Only one packet has been lost since the changes were made, which is an acceptable error rate for this application given its throughput.
While looking after a UDP based service, it came to my attention that we were losing a significant number of inbound packets. The first place to start is with netstat(8) and
you can use the -s option to check statistics for various protocols (or add -u for UDP only, or -t for TCP).
Example output of netstat -su:
$ netstat -su Udp: 2829651752 packets received 27732564 packets to unknown port received. 1629462811 packet receive errors 179722143 packets sent
This is showing the total number of UDP packets received and sent, plus two extra metrics. The second line shows UDP packets that were sent to a port that doesn't have a listening socket, then the third line shows packets that were dropped by the kernel.
Sockets contain a couple of buffers between the kernel and the application, one for receiving and one for sending data which have a fixed size. When the application fails to read from the buffer fast enough, packets will be discarded, incrementing the receive
error counter.
As no technical blog post is complete without a pretty graph, below is a graph generated using Munin, showing the UDP traffic flowing on one particular
system:
![](http://m0dlx.com/~dominic/netstat_udp_graph.png)
In the above graph, you can see the dominant line being the received packets and the turquoise line lower down is showing the packet receive errors.
On Linux, the buffer sizes are controlled by a group of sysctl parameters with rmem* being receive buffers and w* being send buffers:
net.core.rmem_default
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max
Checking a Debian Etch system, the default values for the max is about 128kB and the default size is 120kB. I've shown them here using the sysctl(8) tool.
$ sysctl net.core grep [rw]mem net.core.wmem_max = 131071 net.core.rmem_max = 131071 net.core.wmem_default = 122880 net.core.rmem_default = 122880
Using sysctl, you can update the values of these parameters with the -w option:
$ sudo sysctl -w net.core.rmem_max=1048576 net.core.rmem_default=1048576 net.core.rmem_max = 1048576 net.core.rmem_default = 1048576
This now causes any application to have increased buffer sizes on its sockets by default, which provided your application doesn't have other bottlenecks affecting its throughput, will give it a little more space. It's also possible to increase the maximum and
then have the application alter the socket size - see socket(7) for more info.
In our case, you can clearly see on the graph that the problem has been solved for a few days. We had to apply two changes mentioned:
Increasing the buffer size, which was done using the application config (and increasing the net.core.rmem_max parameter, leaving rmem_default alone)
by tweaking the application to increase its throughput, using more controlled buffering internally, rather than relying on the kernel socket buffering
Only one packet has been lost since the changes were made, which is an acceptable error rate for this application given its throughput.
相关文章推荐
- 服务器端发送UDP数据包客户端未收到问题解决
- 安装Apache服务器遇见.dll丢失等问题解决
- Socket 通信原理(Android客户端和服务器以TCP&&UDP方式互通)(解决不可连的问题,解决方案在最后面)
- Windows 2003 服务器开启自带的DNS服务后,开启大量UDP端口问题的原因和解决办法
- SpringMVC解决跨域问题,以及跨服务器调用时候session丢失的问题总结!!!
- Android使用wifi通过UDP协议发送广播数据包给PC接收不到问题解决方法
- SpringMVC解决跨域问题,以及跨服务器调用时候session丢失的问题总结!!!
- IIS服务器中 ASP.NET State Service 开启后 Session 仍容易丢失的问题终极解决办法
- 由一个论坛帖子, 解决udp 服务器无法返回数据给第一个客户端的问题
- apache服务器:无法启动此程序,因为计算机中丢失VCRUNTIME140.dll 尝试重新安装此程序以解决此问题
- 解决与HTTP 500 – 内部服务器错误错误信息有关的问题
- Session丢失问题解决方法一
- 搭建配置服务器过程中遇到的问题及其解决办法(转)
- 昨天服务器出现问题,解决过程如下所述
- word 弹出语音识别、数据丢失、空间不够对话框问题解决
- 无法在web服务器上启动调试,问题解决
- 打开页面时出现"Automation 服务器不能创建对象"问题的解决方法
- 解决在Microsoft Visual Studio .NET 2003页面切换按钮等事件丢失的问题
- 2003服务器时间格式问题 解决出现上下午的问题
- 解决在Microsoft Visual Studio .NET 2003页面切换按钮等事件丢失的问题