您的位置:首页 > 其它

srs之直播数据分析

2016-03-03 09:18 281 查看
本文转载
客服接到客户投诉,为毛直播看不了?为毛一直在卡?比较常见的答案是:农民工又挖断光纤了,黑客又攻击系统了。有毛线办法?看过FMS日志的都知道,里面没有任何有用的信息。俺们公司的大码农当年在某CDN时,就下决心再也不给农民工朋友摸黑,也不动用黑客攻击的大名。光纤挖断是有的,黑客攻击也是有的,问题是光纤没有断,黑客没有光顾时,卡和流不能看的概率也很大,服务器日志能给予原因吗?不能,这就是为何FMS日志,或其他不能告知原因的日志,球一样的原因了。CDN会用到多台服务器,或者说是服务器集群,再加上调度那就是多个集群之间的调用和依赖运行。SRS(Simple
Rtmp Sever)是流媒体服务器,因此我只考虑流媒体系统的日志应该怎么设计,才能发现问题所在?
先考虑常见系统的日志,FMS的日志是这样的:2015-03-2412:23:58
3409 (s)2641173 Accepted a connection from IP:192.168.1.141, referrer:http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.23,pageurl: http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935- 702111234525315439
3130 3448 normal livestream - - rtmp://192.168.1.185:1935/live/livestream rtmp://192.168.1.185:1935/live/livestream - flv - - 0 - 0 0 - - http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935 -1 -1.000000 -crtmpd的日志确实很详细,可惜啥也看不粗来,特别是几千人在线的服务器:/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/netio/epoll/iohandlermanager.cpp:120Handlers
count changed: 15->16 IOHT_TCP_CARRIER/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/netio/epoll/tcpacceptor.cpp:185Client
connected: 192.168.1.141:54823 -> 192.168.1.173:1935/home/winlin/tools/crtmpserver.20130514.794/sources/applications/appselector/src/rtmpappprotocolhandler.cpp:83Selected
application: flvplayback (live)/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/application/baseclientapplication.cpp:246Protocol
CTCP(17) <-> TCP(18) <-> [IR(19)] unregistered fromapplication: appselector/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/application/baseclientapplication.cpp:257Stream
NR(5) with name `` registered to application `flvplayback` from protocolIR(19)/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/application/baseclientapplication.cpp:268Stream
NR(5) with name `` unregistered from application `flvplayback` fromprotocol IR(19)/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/application/baseclientapplication.cpp:257Stream
NR(6) with name `` registered to application `flvplayback` from protocolIR(19)/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/protocols/rtmp/basertmpappprotocolhandler.cpp:1043Play
request for stream name `livestream`. Start: -2000; length: -1000/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/application/baseclientapplication.cpp:268Stream
NR(6) with name `` unregistered from application `flvplayback` fromprotocol IR(19)/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/application/baseclientapplication.cpp:257Stream
ONR4R(7) with name `livestream` registered to application `flvplayback`from protocol IR(19)/home/winlin/tools/crtmpserver.20130514.794/sources/common/src/utils/buffering/iobuffer.cpp:110Unable
to read data. Size advertised by network layer was 262144. Permanenterror: 2/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/netio/epoll/tcpcarrier.cpp:78Unable
to read data. 192.168.1.141:54823 -> 192.168.1.173:1935 CTCP(17)<-> [TCP(18)] <-> IR(19)/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/netio/epoll/iohandlermanager.cpp:130Handlers
count changed: 16->15 IOHT_TCP_CARRIER/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/protocols/protocolmanager.cpp:45Enqueue
for delete for protocol [IR(19)]/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/application/baseclientapplication.cpp:268Stream
ONR4R(7) with name `livestream` unregistered from application`flvplayback` from protocol IR(19)/home/winlin/tools/crtmpserver.20130514.794/sources/thelib/src/application/baseclientapplication.cpp:246Protocol
[IR(19)] unregistered from application: flvplaybacknginx-rtmp的日志已经有进步了,能按照连接区分:2015/03/2411:42:01
[info] 7992#0: *3 client connected '192.168.1.141'2015/03/2411:42:01 [info] 7992#0:
*3 connect: app='live' args='' flashver='MAC17,0,0,134'swf_url='http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.23'tc_url='rtmp://192.168.1.173:1935/live'page_url='http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935'acodecs=3575
vcodecs=252 object_encoding=3, client: 192.168.1.141, server:0.0.0.0:19352015/03/2411:42:01
[info] 7992#0: *3 createStream, client: 192.168.1.141, server:0.0.0.0:19352015/03/2411:42:01
[info] 7992#0: *3 play: name='livestream' args='' start=0 duration=0reset=0 silent=0, client: 192.168.1.141, server: 0.0.0.0:19352015/03/2411:42:20
[info] 7992#0: *3 disconnect, client: 192.168.1.141, server:0.0.0.0:19352015/03/2411:42:20
[info] 7992#0: *3 deleteStream, client: 192.168.1.141, server:0.0.0.0:1935可惜nginx日志只能知道这个连接的信息,而上下文都不清楚,连接也就这些信息。而SRS(simple
rtmp sever)的日志,能知道播放连接对应的推流连接,会在后续打印出连接的摘要信息,并且可以在集群内任意追溯。假设我知道流名称,或者不知道流名称,反正任意信息,譬如我知道播放的链接会打一个"type=Play"的标记出来,就从这一点开始。假设从源站开始:[winlin@dev6
srs]$ grep -ina"type=Play" objs/srs.origin.log31:[2014-08-0610:09:34.671][trace][22288][107]
client identified, type=Play,stream_name=livestream, duration=-1.00发现有个107的链接播放了源站信息,查看它的日志:[winlin@dev6
srs]$ grep -ina"[107]" objs/srs.origin.log27:[2014-08-0610:09:34.589][trace][22288][107]
RTMP client ip=127.0.0.128:[2014-08-0610:09:34.591][trace][22288][107] complex
handshake success29:[2014-08-0610:09:34.631][trace][22288][107] connect app,
tcUrl=rtmp://dev:1935/live,pageUrl=http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935,swfUrl=http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.23,schema=rtmp, vhost=__defaultVhost__, port=1935,
app=live, args=(obj)30:[2014-08-06 10:09:34.631][trace][22288][107]edge-srs ip=192.168.1.159,
version=0.9.190, pid=22314, id=10831:[2014-08-0610:09:34.671][trace][22288][107]
client identified, type=Play,stream_name=livestream, duration=-1.0032:[2014-08-0610:09:34.671][trace][22288][107]
out chunk size to 6000033:[2014-08-0610:09:34.671][trace][22288][107] source
url=__defaultVhost__/live/livestream,ip=127.0.0.1, cache=1, is_edge=0, source_id=105[105]34:[2014-08-0610:09:34.672][trace][22288][107]
dispatch cached gop success. count=307,duration=451535:[2014-08-0610:09:34.672][trace][22288][107]
create consumer, queue_size=30.00, tba=44100,tbv=2536:[2014-08-0610:09:34.672][trace][22288][107]
ignored. set buffer length to 100037:[2014-08-0610:09:34.673][trace][22288][107]
input chunk size to 6000040:[2014-08-0610:09:44.748][trace][22288][107] -> PLA
time=10007, msgs=0, okbps=464,0,0,ikbps=3,0,041:[2014-08-0610:09:47.805][warn][22288][107][104]
client disconnect peer. ret=1004可以看到源id是105,关键字是source_id=105,查这个源:[winlin@dev6
srs]$ grep --color -ina"[105]" objs/srs.origin.log16:[2014-08-0610:09:30.331][trace][22288][105]
RTMP client ip=127.0.0.117:[2014-08-0610:09:30.331][trace][22288][105] srand
initialized the random.18:[2014-08-0610:09:30.332][trace][22288][105] simple
handshake success.19:[2014-08-0610:09:30.373][trace][22288][105]connectapp,tcUrl=rtmp://127.0.0.1:1936/live?vhost=__defaultVhost__,pageUrl=,swfUrl=,schema=rtmp,
vhost=__defaultVhost__, port=1936, app=live, args=null21:[2014-08-0610:09:30.417][trace][22288][105]
client identified, type=publish(FMLEPublish),stream_name=livestream, duration=-1.0022:[2014-08-0610:09:30.417][trace][22288][105]
out chunk size to 6000023:[2014-08-06 10:09:30.418][trace][22288][105]source
url=__defaultVhost__/live/livestream, ip=127.0.0.1, cache=1, is_edge=0,source_id=-1[-1]24:[2014-08-0610:09:30.466][trace][22288][105]
got metadata, width=768, height=320, vcodec=7,acodec=1025:[2014-08-06 10:09:30.466][trace][22288][105]46B
video sh, codec(7, profile=100, level=32, 0x0, 0kbps, 0fps, 0s)26:[2014-08-0610:09:30.466][trace][22288][105]
4B audio sh, codec(10, profile=1, 2channels,0kbps, 44100HZ), flv(16bits, 2channels, 44100HZ)33:[2014-08-06
10:09:34.671][trace][22288][107]source url=__defaultVhost__/live/livestream, ip=127.0.0.1, cache=1, is_edge=0,source_id=105[105]38:[2014-08-0610:09:40.732][trace][22288][105]
<- CPB time=10100, okbps=3,0,0,ikbps=332,0,0可见这个就是ingest的连接,即编码器推流连接。已经查到了源头。同时可以看到107这个其实是srs(simple
rtmp sever)的回源链接,关键字是edge-srs:30:[2014-08-0610:09:34.631][trace][22288][107]
edge-srs ip=192.168.1.159, version=0.9.190,pid=22314, id=108可以去边缘服务器上查它的信息,id是108:[winlin@dev6
srs]$ grep --color -ina"[108]" objs/srs.log29:[2014-08-06 10:09:34.579][trace][22314][108]edge
pull connected, can_publish=1, url=rtmp://dev:1935/live/livestream,server=127.0.0.1:193630:[2014-08-0610:09:34.591][trace][22314][108]
complex handshake success.31:[2014-08-0610:09:34.671][trace][22314][108] connected,
version=0.9.190, ip=127.0.0.1,pid=22288, id=10732:[2014-08-0610:09:34.672][trace][22314][108]
out chunk size to 6000033:[2014-08-0610:09:34.672][trace][22314][108] ignore
the disabled transcode:34:[2014-08-0610:09:34.672][trace][22314][108] edge change
from 100 to state 101 (pull).35:[2014-08-0610:09:34.672][trace][22314][108] input
chunk size to 6000036:[2014-08-0610:09:34.672][trace][22314][108] got metadata,
width=768, height=320, vcodec=7,acodec=1037:[2014-08-0610:09:34.672][trace][22314][108]
46B video sh, codec(7, profile=100, level=32,0x0, 0kbps, 0fps, 0s)38:[2014-08-0610:09:34.672][trace][22314][108]
4B audio sh, codec(10, profile=1, 2channels,0kbps, 44100HZ), flv(16bits, 2channels, 44100HZ)39:[2014-08-0610:09:34.779][trace][22314][107]
update source_id=108[108]46:[2014-08-0610:09:36.853][trace][22314][110] source
url=__defaultVhost__/live/livestream,ip=192.168.1.179, cache=1, is_edge=1, source_id=108[108]50:[2014-08-06
10:09:44.949][trace][22314][108]<- EIG time=10293, okbps=3,0,0, ikbps=441,0,053:[2014-08-0610:09:47.805][warn][22314][108][4]
origin disconnected, retry. ret=1007这个边缘服务器上这个回源链接有两个客户端连接上,107和110,关键字是source_id=108:[winlin@dev6
srs]$ grep --color -ina"[107]" objs/srs.log18:[2014-08-0610:09:34.281][trace][22314][107]
RTMP client ip=192.168.1.17919:[2014-08-0610:09:34.282][trace][22314][107] srand
initialized the random.20:[2014-08-0610:09:34.291][trace][22314][107] complex
handshake success21:[2014-08-0610:09:34.291][trace][22314][107]connectapp,tcUrl=rtmp://dev:1935/live,pageUrl=http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935,swfUrl=http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.23,schema=rtmp,
vhost=__defaultVhost__, port=1935, app=live, args=null22:[2014-08-0610:09:34.532][trace][22314][107]
ignored. set buffer length to 80023:[2014-08-0610:09:34.568][trace][22314][107]
client identified, type=Play,stream_name=livestream, duration=-1.0024:[2014-08-0610:09:34.568][trace][22314][107]
out chunk size to 6000025:[2014-08-0610:09:34.568][trace][22314][107] source
url=__defaultVhost__/live/livestream,ip=192.168.1.179, cache=1, is_edge=1, source_id=-1[-1]26:[2014-08-06
10:09:34.579][trace][22314][107]dispatch cached gop success. count=0, duration=027:[2014-08-0610:09:34.579][trace][22314][107]
create consumer, queue_size=30.00, tba=0,tbv=028:[2014-08-0610:09:34.579][trace][22314][107]
ignored. set buffer length to 80039:[2014-08-0610:09:34.779][trace][22314][107]
update source_id=108[108]54:[2014-08-0610:09:47.805][trace][22314][107] cleanup
when unpublish55:[2014-08-0610:09:47.805][trace][22314][107] edge change from
101 to state 0 (init).56:[2014-08-06 10:09:47.805][warn][22314][107][9]client
disconnect peer. ret=1004107是触发回源的连接。查看110这个链接:[winlin@dev6
srs]$ grep --color -ina"\[110\]" objs/srs.log40:[2014-08-0610:09:36.609][trace][22314][110]
RTMP client ip=192.168.1.17941:[2014-08-0610:09:36.613][trace][22314][110] complex
handshake success42:[2014-08-0610:09:36.613][trace][22314][110]connectapp,tcUrl=rtmp://dev:1935/live,pageUrl=http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935,swfUrl=http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.23,schema=rtmp,
vhost=__defaultVhost__, port=1935, app=live, args=null43:[2014-08-0610:09:36.835][trace][22314][110]
ignored. set buffer length to 80044:[2014-08-0610:09:36.853][trace][22314][110]
client identified, type=Play,stream_name=livestream, duration=-1.0045:[2014-08-0610:09:36.853][trace][22314][110]
out chunk size to 6000046:[2014-08-0610:09:36.853][trace][22314][110] source
url=__defaultVhost__/live/livestream,ip=192.168.1.179, cache=1, is_edge=1, source_id=108[108]47:[2014-08-0610:09:36.853][trace][22314][110]
dispatch cached gop success. count=95,duration=157348:[2014-08-0610:09:36.853][trace][22314][110]
create consumer, queue_size=30.00, tba=44100,tbv=2549:[2014-08-0610:09:36.853][trace][22314][110]
ignored. set buffer length to 80051:[2014-08-0610:09:45.919][trace][22314][110]
-> PLA time=8759, msgs=21, okbps=461,0,0,ikbps=3,0,052:[2014-08-0610:09:46.247][warn][22314][110][104]
client disconnect peer. ret=1004可见110也是个flash播放连接。解决了排错的问题,如何解决卡的问题?用测速了,SRS(Simple
Rtmp Sever)可以支持linux客户端和flash客户端直接对SRS(Simple Rtmp Server)测速,测试到SRS(Simple Rtmp Sever)服务器的上行和下行速度。也就是说,页面推流前,就可以调用js接口来获取到服务器的速度,如果知道服务器地址,还可以自己选择服务器。
配合测速和日志系统,就能知道卡或者故障的用户分发的路径是什么,运维或者高级运维系统可以自己排插这个路径的问题所在。从这以后,挖断光纤的概率小很多了,而黑客攻击也只有真正被攻击时(这时候要用热备)才是公鸡了。集成了测速客户端的客户,自己就可以排除推流端的带宽过低的故障,这种问题大约占了90%的故障。而且更加致命的是,没有办法证明,除非能测速。泥马其他测速是测试到其他服务器的速度,譬如360测速,那顶多证明用户到其他服务器的下行没有问题,和流媒体服务商的服务器有球关系?而且还是上行!关于测速,参考:https://github.com/winlinvip/simple-rtmp-server/wiki/v1_CN_BandwidthTestTool

关于SRS可追溯日志,参考:https://github.com/winlinvip/simple-rtmp-server/wiki/v1_CN_SrsLog#tracable-log
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: