The event in requested index is outdated and cleared (the requested history has been cleared ) 问题和修复
2017-06-02 11:21
2496 查看
现象
etcd高负载时客户端 watch etcd 返回401
报错如下:ERROR: watch error 401: The event in requested index is outdated and cleared (the requested history has been cleared [15047837/15040498]) [15048836]
此时etcd 会断开客户端 的watcher,index 从上次断开前的modifyIndex到重新watch起来的latestIndex之间的事件会丢失掉,造成线上变更丢失现象。
原因分析
原理层面
etcd只保留所有etcd键中最近的1000个事件的响应,当watch的起始index值不在最近的这1000个就会报上述index过期错误
代码层面:
go-etcd client中的watchOnce每获得一个事件就会从当前事件的modifyIndex + 1 重新watch
在处理watch事件时没有采用异步的方式,这样watch会阻塞住直到响应处理结束,当集群负载较大或者相应处理时间较长时modifyIndex + 1可能会不在最新的1000个index中,出现上述index过期的现象
优化方案
两点思路:
处理事件的函数改成异步方式,这样就不会block etcd watch,预期可以显著减小变更丢失现象
感知到事件后,采用get方式获取事件,不完全依赖于watch的结果,这样虽然不主观上避免index过期的现象,但是可以完全补偿变更丢失掉的事件
etcd高负载时客户端 watch etcd 返回401
报错如下:ERROR: watch error 401: The event in requested index is outdated and cleared (the requested history has been cleared [15047837/15040498]) [15048836]
此时etcd 会断开客户端 的watcher,index 从上次断开前的modifyIndex到重新watch起来的latestIndex之间的事件会丢失掉,造成线上变更丢失现象。
原因分析
原理层面
etcd只保留所有etcd键中最近的1000个事件的响应,当watch的起始index值不在最近的这1000个就会报上述index过期错误
代码层面:
go-etcd client中的watchOnce每获得一个事件就会从当前事件的modifyIndex + 1 重新watch
在处理watch事件时没有采用异步的方式,这样watch会阻塞住直到响应处理结束,当集群负载较大或者相应处理时间较长时modifyIndex + 1可能会不在最新的1000个index中,出现上述index过期的现象
优化方案
两点思路:
处理事件的函数改成异步方式,这样就不会block etcd watch,预期可以显著减小变更丢失现象
感知到事件后,采用get方式获取事件,不完全依赖于watch的结果,这样虽然不主观上避免index过期的现象,但是可以完全补偿变更丢失掉的事件
相关文章推荐
- SharePoint Designer Check In and Check Out Error – Cannot perform this operation.The file is no longer check out or has been del
- 安卓-开发问题之The connection to adb is down, and a severe error has occured.
- 'ddlXXX' has a SelectedIndex which is invalid because it does not exist in the list of items
- The project file '' has been renamed or is no longer in the solution 解决方案
- ERROR:the server has either erred or is incapable of performing the requested operation
- android解决”The connection to adb is down, and a severe error has occured“问题
- Error: The project file ' ' has been renamed or is no longer in the solution
- PRB: "Requested Registry Access Is Not Allowed" Error Message When ASP.NET Application Tries to Write New EventSource in the Eve
- “Location of the Android SDK has not been set up in the preferences”问题的解决
- The connection to adb is down, and a severe error has occured. 问题和解决
- 怎么解决这个问题“The connection to adb is down, and a severe error has occured”
- 奇葩问题:This file could not be checked in because the original version of the file on the server was moved or deleted. A new version of this file has been saved to the server, but your check-in comments were not saved
- the project file '' has been renamed or is no longer in the solution 解决办法
- The file '/ApplicationName/Default.aspx' has not been pre-compiled, and cannot be requested
- Error: The Web Server Has Been Locked Down and Is Blocking the DEBUG Verb
- We are already in the process of making 11 connections and the number of simultaneous builds has been throttled to 10
- Given constant integers x and t, write a function that takes no argument and returns true if the function has been called x number of times in last t secs.
- Error: The project file ' ' has been renamed or is no longer in the solution
- "Setup has detected a pending system reboot from a previous install, Setup Cannot continue until the machine is rebooted. Please reboot the machine and run the installation again."问题的解决。
- The project file ' ' has been renamed or is no longer in the solution. 解决方案