您的位置:首页 > 其它

BitTorrent协议规范

2015-07-03 14:44 363 查看
转自:http://blog.csdn.net/romandion/article/details/2909589



BitTorrent.org

Home
For Users
For Developers
Forums
Donate!

BEP:3
Title:The BitTorrent Protocol Specification
Version:11031
Last-Modified:2008-02-28 16:43:58 -0800 (Thu, 28 Feb 2008)
Author:Bram Cohen <bram at bittorrent.com>
Status:Final
Type:Standard
Created:10-Jan-2008
Post-History:
Contents

A BitTorrent file distribution
consists of these entities:
To start serving, a host
goes through the following steps:
To start downloading, a user does
the following:
The connectivity is as follows:
Metainfo files
are bencoded dictionaries with the following keys:
Tracker GET requests have the following
keys:
All
non-keepalive messages start with a single byte which gives their type.
The possible values are:
Copyright

BitTorrent is a protocol for distributing files. It identifies content by URL and is designed to integrate seamlessly with the web. Its advantage over plain HTTP is that when multiple downloads of the same file happen concurrently, the downloaders upload
to each other, making it possible for the file source to support very large numbers of downloaders with only a modest increase in its load.

A BitTorrent file distribution consists of these entities:

An ordinary web server
A static 'metainfo' file
A BitTorrent tracker
An 'original' downloader
The end user web browsers
The end user downloaders

There are ideally many end users for a single file.

To start serving, a host goes through the following steps:

Start running a tracker (or, more likely, have one running already).
Start running an ordinary web server, such as apache, or have one already.
Associate the extension .torrent with mimetype application/x-bittorrent on their web server (or have done so already).
Generate a metainfo (.torrent) file using the complete file to be served and the URL of the tracker.
Put the metainfo file on the web server.
Link to the metainfo (.torrent) file from some other web page.
Start a downloader which already has the complete file (the 'origin').

To start downloading, a user does the following:

Install BitTorrent (or have done so already).
Surf the web.
Click on a link to a .torrent file.
Select where to save the file locally, or select a partial download to resume.
Wait for download to complete.
Tell downloader to exit (it keeps uploading until this happens).

The connectivity is as follows:

Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to 'spam'.
Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example i3ecorresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is
invalid. All encodings with a leading zero, such as i03e, are invalid, other than i0e, which of course corresponds to 0.
Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For examplel4:spam4:eggse corresponds to ['spam', 'eggs'].
Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow': 'moo', 'spam': 'eggs'} and d4:spaml1:a1:bee corresponds
to {'spam': ['a', 'b']}. Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).

Metainfo files are bencoded dictionaries with the following keys:

announceThe URL of the tracker.info
This maps to a dictionary, with keys described below.
The name key maps to a string which is the suggested name to save the file (or directory) as. It is purely advisory.

piece length maps to the number of bytes in each piece the file is split into. For the purposes of transfer, files are split into fixed-size pieces which are all the same
length except for possibly the last one which may be truncated. piece length is almost always a power of two, most commonly 2 18 = 256 K (BitTorrent prior to version 3.2 uses
2 20 = 1 M as default).

pieces maps to a string whose length is a multiple of 20. It is to be subdivided into strings of length 20, each of which is the SHA1 hash of the piece at the corresponding index.

There is also a key length or a key files, but not both or neither. If length is present then the download represents a single file, otherwise it represents
a set of files which go in a directory structure.

In the single file case, length maps to the length of the file in bytes.

For the purposes of the other keys, the multi-file case is treated as only having a single file by concatenating the files in the order they appear in the files list. The files list is the value files maps to, and is a list
of dictionaries containing the following keys:

length - The length of the file, in bytes.

path - A list of strings corresponding to subdirectory names, the last of which is the actual file name (a zero length list is an error case).

In the single file case, the name key is the name of a file, in the muliple file case, it's the name of a directory.

Tracker GET requests have the following keys:

info_hashThe 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file. This value will almost certainly have to be escaped.peer_idA string of length 20 which this downloader uses as its id. Each downloader generates its own id at random at the start of a new download. This value will also almost certainly have to be escaped.ipAn optional parameter giving the IP (or dns name) which this peer is at. Generally used for the origin if it's on the same machine as the tracker.portThe port number this peer is listening on. Common behavior is for a downloader to try to listen on port 6881 and if that port is taken try 6882, then 6883, etc. and give up after 6889.uploadedThe total amount uploaded so far, encoded in base ten ascii.downloadedThe total amount downloaded so far, encoded in base ten ascii.leftThe number of bytes this peer still has to download, encoded in base ten ascii. Note that this can't be computed from downloaded and the file length since it might be a resume, and there's a chance that some of the downloaded data failed an integrity check
and had to be re-downloaded.eventThis is an optional key which maps to started, completed, or stopped (or empty, which is the same as not being present).
If not present, this is one of the announcements done at regular intervals. An announcement using started is sent when a download first begins, and one using completed is sent when the download
is complete. No completed is sent if the file was complete when started. Downloaders send an announcement using stopped when they cease downloading.
Tracker responses are bencoded dictionaries. If a tracker response has a key failure reason, then that maps to a human readable string which explains why the query failed,
and no other keys are required. Otherwise, it must have two keys: interval, which maps to the number of seconds the downloader should wait between regular rerequests, and peers. peers maps
to a list of dictionaries corresponding to peers, each of which contains the keys peer id, ip, and port,
which map to the peer's self-selected ID, IP address or dns name as a string, and port number, respectively. Note that downloaders may rerequest on nonscheduled times if an event happens or they need more peers.

If you want to make any extensions to metainfo files or tracker queries, please coordinate with Bram Cohen to make sure that all extensions are done compatibly.

BitTorrent's peer protocol operates over TCP. It performs efficiently without setting any socket options.

Peer connections are symmetrical. Messages sent in both directions look the same, and data can flow in either direction.

The peer protocol refers to pieces of the file by index as described in the metainfo file, starting at zero. When a peer finishes downloading a piece and checks that the hash matches, it announces that it has that piece to all of its peers.

Connections contain two bits of state on either end: choked or not, and interested or not. Choking is a notification that no data will be sent until unchoking happens. The reasoning and common techniques behind choking are explained later in this document.

Data transfer takes place whenever one side is interested and the other side is not choking. Interest state must be kept up to date at all times - whenever a downloader doesn't have something they currently would ask a peer for in unchoked, they must express
lack of interest, despite being choked. Implementing this properly is tricky, but makes it possible for downloaders to know which peers will start downloading immediately if unchoked.

Connections start out choked and not interested.

When data is being transferred, downloaders should keep several piece requests queued up at once in order to get good TCP performance (this is called 'pipelining'.) On the other side, requests which can't be written out to the TCP buffer immediately should
be queued up in memory rather than kept in an application-level network buffer, so they can all be thrown out when a choke happens.

The peer wire protocol consists of a handshake followed by a never-ending stream of length-prefixed messages. The handshake starts with character ninteen (decimal) followed by the string 'BitTorrent protocol'. The leading character is a length prefix, put
there in the hope that other new protocols may do the same and thus be trivially distinguishable from each other.

All later integers sent in the protocol are encoded as four bytes big-endian.

After the fixed headers come eight reserved bytes, which are all zero in all current implementations. If you wish to extend the protocol using these bytes, please coordinate with Bram Cohen to make sure all extensions are done compatibly.

Next comes the 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. (This is the same value which is announced as info_hash to the tracker, only here it's raw instead of quoted here). If both
sides don't send the same value, they sever the connection. The one possible exception is if a downloader wants to do multiple downloads over a single port, they may wait for incoming connections to give a download hash first, and respond with the same one
if it's in their list.

After the download hash comes the 20-byte peer id which is reported in tracker requests and contained in peer lists in tracker responses. If the receiving side's peer id doesn't match the one the initiating side expects, it severs the connection.

That's it for handshaking, next comes an alternating stream of length prefixes and messages. Messages of length zero are keepalives, and ignored. Keepalives are generally sent once every two minutes, but note that timeouts can be done much more quickly when
data is expected.

All non-keepalive messages start with a single byte which gives their type.

The possible values are:

0 - choke
1 - unchoke
2 - interested
3 - not interested
4 - have
5 - bitfield
6 - request
7 - piece
8 - cancel

'choke', 'unchoke', 'interested', and 'not interested' have no payload.

'bitfield' is only ever sent as the first message. Its payload is a bitfield with each index that downloader has sent set to one and the rest set to zero. Downloaders which don't have anything yet may skip the 'bitfield' message. The first byte of the bitfield
corresponds to indices 0 - 7 from high bit to low bit, respectively. The next one 8-15, etc. Spare bits at the end are set to zero.

The 'have' message's payload is a single number, the index which that downloader just completed and checked the hash of.

'request' messages contain an index, begin, and length. The last two are byte offsets. Length is generally a power of two unless it gets truncated by the end of the file. All current implementations use 2 15 , and close connections which request an amount
greater than 2 17.

'cancel' messages have the same payload as request messages. They are generally only sent towards the end of a download, during what's called 'endgame mode'. When a download is almost complete, there's a tendency for the last few pieces to all be downloaded
off a single hosed modem line, taking a very long time. To make sure the last few pieces come in quickly, once requests for all pieces a given downloader doesn't have yet are currently pending, it sends requests for everything to everyone it's downloading
from. To keep this from becoming horribly inefficient, it sends cancels to everyone else every time a piece arrives.

'piece' messages contain an index, begin, and piece. Note that they are correlated with request messages implicitly. It's possible for an unexpected piece to arrive if choke and unchoke messages are sent in quick succession and/or transfer is going very
slowly.

Downloaders generally download pieces in random order, which does a reasonably good job of keeping them from having a strict subset or superset of the pieces of any of their peers.

Choking is done for several reasons. TCP congestion control behaves very poorly when sending over many connections at once. Also, choking lets each peer use a tit-for-tat-ish algorithm to ensure that they get a consistent download rate.

The choking algorithm described below is the currently deployed one. It is very important that all new algorithms work well both in a network consisting entirely of themselves and in a network consisting mostly of this one.

There are several criteria a good choking algorithm should meet. It should cap the number of simultaneous uploads for good TCP performance. It should avoid choking and unchoking quickly, known as 'fibrillation'. It should reciprocate to peers who let it
download. Finally, it should try out unused connections once in a while to find out if they might be better than the currently used ones, known as optimistic unchoking.

The currently deployed choking algorithm avoids fibrillation by only changing who's choked once every ten seconds. It does reciprocation and number of uploads capping by unchoking the four peers which it has the best download rates from and are interested.
Peers which have a better upload rate but aren't interested get unchoked and if they become interested the worst uploader gets choked. If a downloader has a complete file, it uses its upload rate rather than its download rate to decide who to unchoke.

For optimistic unchoking, at any one time there is a single peer which is unchoked regardless of it's upload rate (if interested, it counts as one of the four allowed downloaders.) Which peer is optimistically unchoked rotates every 30 seconds. To give them
a decent chance of getting a complete piece to upload, new connections are three times as likely to start as the current optimistic unchoke as anywhere else in the rotation.

Copyright

This document has been placed in the public domain.
http://www.ppcn.net/n2933c38.aspx
BitTorrent 是一种分发文件的协议。它通过URL来识别内容,并且可以无缝的和web进行交互。它基于HTTP协议,它的优势是:如果有多个下载者并发的下载同一个文 件,那么,每个下载者也同时为其它下载者上传文件,这样,文件源可以支持大量的用户进行下载,而只带来适当的负载的增长。(译注:因为大量的负载被均衡到 整个系统中,所以提供源文件的机器的负载只有少量增长)

一个BT文件分布系统由下列实体组成:

一个普通的web服务器

一个静态的“元信息”文件

一个跟踪(tracker)服务器

终端用户的web浏览器

终端用户的bt下载器

理想的情况是多个终端用户在下载同一个文件。

要提供文件共享,那么一台主机需要执行以下步骤:

Ø运行一个 tracker服务器(或者,已经有一个tracker服务器在运行了也可以)

Ø运行一个web服务器,例如apache,或者已经有一个web服务器在运行了。

Ø在web服务器上,将文件扩展名.torrent 和MIME类型 application/x-bittorrent关联起来(或者已经关联了)

Ø根据 tracker服务器的 URL 和要共享的文件来创建一个“元信息”文件(.torrent)。

Ø将“元信息”文件发布到web服务器上

Ø在某个web页面上,添加一个到“元信息”文件的链接。

Ø运行一个已经拥有完整文件的下载者(被成为’origin’,或者’seed’,种子)

要开始下载文件,那么终端用户执行以下步骤:

Ø安装 BT(或者已经安装)

Ø访问提供 .torrent 文件的web服务器

Ø点击到 .torrent 文件的链接(译注:这时候,bt会弹出一个对话框)

Ø选择要把下载的文件保存到哪里?或者是一次断点续传

Ø等待下载的完成。

Ø结束bt程序的运行(如果不主动结束,那么bt会一直为其它人提供文件上传)

各个部分之间的连通性如下:

网站负责提供一个静态的文件,而把BT辅助程序(客户端)放在客户端机器上。

Trackers从所有下载者处接收信息,并返回给它们一个随机的peers的列表。这种交互是通过HTTP或HTTPS协议来完成的。

下载者周期性的向tracker登记,使得tracker能了解它们的进度;下载者之间通过直接连接进行数据的上传和下载。这种连接使用的是 BitTorrent 对等协议,它基于TCP。

Origin只负责上传,从不下载,因为它已经拥有了完整的文件。Origin是必须的。

元文件和tracker的响应都采用的是一种简单、有效、可扩展的格式,被称为bencoding,它可以包含字符串和整数。由于对不需要的字典关键字可以忽略,所以这种格式具有可扩展性,其它选项以后可以方便的加进来。

Bencoding格式如下:

对于字符串,首先是一个字符串的长度,然后是冒号,后面跟着实际的字符串,例如:4:spam,就是“ spam”

整数编码如下,以 ‘i’ 开始,然后10进制的整数值,最后以’e’结尾。例如,i3e表示3,I-3e表示-3。整数没有大小限制。I-0e是无效的。除了 i0e外,所以以0起始的整数都无效。I0e当然表示0。

列表编码如下,以’l’开始,接下来是列表值的编码(也采用bencoded编码),最后以’e’结束。例如:l4:spam4:eggse 表示 [‘spam’, ‘eggs’]。

字 典编码如下,以’d’开始,接下来是可选的keys和它对应的值,最户以’e’结束。例如:d3:cow3:moo4:spam4:eggse,表示 {‘cow’:’moo’,’spam’:’eggs’},而d4:spaml1:al:bee 表示 {‘spam’:[‘a’,’b’]}。键值必须是字符串,而且已经排序(并非是按照字母顺序排序,而是根据原始的字符串进行排序)。

元文件是采用bencoded编码的字典,包括以下关键字:

announce tracker的服务器

info 它实际上是一个字典,包括以下关键字:

Name:

一个字符串,在保存文件的时候,作为一个建议值。仅仅是个建议而已,你可以用别的名字保存文件。

Piece length:

为了更好的传输,文件被分隔成等长的片断,除了最后一个片断以外,这个值就是片断的大小。片断大小几乎一直都是2的幂,最常用的是 256k(BT的前一个版本3.2,用的是1M作为默认大小)

Pieces:

一个长度为20的整数倍的字符串。它将再被分隔为20字节长的字符串,每个子串都是相应片断的hash值。

此外,还有一个length或files的关键字,这两个关键字只能出现一个。如果是length,那么表示要下载的仅仅是单个文件,如果是files那么要下载的是一个目录中的多个文件。

如果是单个文件,那么length是该文件的长度。

为了能支持其它关键字,对于多个文件的情况,也把它当作一个文件来看,也就是按照文件出现的顺序,把每个文件的信息连接起来,形成一个字符串。每个文件的信息实际上也是一个字典,包括以下关键字:

Length:文件长度

Path:子目录名称的列表,列表最后一项是文件的实际名称。(不允许出现列表为空的情况)。

Name:在单文件情况下,name是文件的名称,而在多文件情况下,name是目录的名称。

Tracker查询。Trakcer通过HTTP的GET命令的参数来接收信息,而响应给对方(也就是下载者)的是经过bencoded编码的消 息。注意,尽管当前的tracker的实现需要一个web服务器,它实际上可以运行的更轻便一些,例如,作为apache的一个模块。

Tracker GET requests have the following keys:

发送给Tracker的GET请求,包含以下关键字:

Info_hash:

元文件中info部分的sha hash,20字节长。这个字符创几乎肯定需要被转义(译注:在URL中,有些字符不能出现,必须通过unicode进行编码)

Peer_id:

下载者的id,一个20字节长的字符串。每个下载者在开始一次新的下载之前,需要随机创建这个id。这个字符串通常也需要被转义。

Ip:

一个可选的参数,给出了peer的ip地址(或者dns名称?)。通常用在origin身上,如果它和tracker在同一个机器上。

Port:

peer所监听的端口。下载者通常在在 6881 端口上监听,如果该端口被占用,那么会一直尝试到 6889,如果都被占用,那么就放弃监听。

Uploaded:

已经上载的数据大小,十进制表示。

Downloaded:

已经下载的数据大小,十进制表示

Left:

该peer还有多少数据没有下载完,十进制表示。注意,这个值不能根据文件长度和已下载数据大小计算出来,因为很可能是断点续传,如果因为检查文件完整性失败而必须重新下载的时候,这也提供了一个机会。

Event:

一个可选的关键字,值是started、compted或者stopped之一(也可以为空,不做处理)。如果不出现该关键 字,。在一次下载刚开始的时候,该值被设置为started,在下载完成之后,设置为completed。如果下载者停止了下载,那么该值设置为 stopped。

Tracker的响应是用bencoded编码的字典。如果tracker的响应中有一个关键字failure reason,那么它对应的是一个字符串,用来解释查询失败的原因,其它关键字都不再需要了。否则,它必须有两个关键字:Interval:下载者在两次 发送请求之间的时间间隔。Peers:一个字典的列表,每个字典包括以下关键字:Peer id,Ip,Port,分别对应peer所选择的id、ip地址或者dns名称、端口号。注意,如果某些事件发生,或者需要更多的peers,那么下载者 可能不定期的发送请求,

(downloader 通过 HTTP 的GET 命令来向 tracker 发送查询请求,tracker 响应一个peers 的列表)

如果你想对元信息文件或者tracker查询进行扩展,那么需要同Bram Cohen协调,以确保所有的扩展都是兼容的。

BT对等协议基于TCP,它很有效率,并不需要设置任何socket选项。(译注:BT对等协议指的是peer与peer之间交换信息的协议)

对等的两个连接是对称的,消息在两个方向上同样的传递,数据也可以在任何一个方向上流动。

一旦某个peer下载完了一个片断,并且也检查了它的完整性,那么它就向它所有的peers宣布它拥有了这个片断。

连接的任何一端都包含两比特的状态信息:是否choked,是否感兴趣。Choking是通知对方,没有数据可以发送,除非unchoking发生。Choking的原因以及技术后文解释。

一旦一端状态变为interested,而另一端变为非choking,那么数据传输就开始了。(也就是说,一个peer,如果想从它的某个 peer那里得到数据,那么,它首先必须将它两之间的连接设置为 interested,其实就是发一个消息过去,而另一个peer,要检查它是否应该给这个家伙发送数据,如果它对这个家伙是 unchoke,那么就可以给它发数据,否则还是不能给它数据)Interested状态必须一直被设置――任何时候。要用点技巧才能比较好的实现这个目 的,但它使得下载者能够立刻知道哪些peers将开始下载。

对等协议由一个握手开始,后面是循环的消息流,每个消息的前面,都有一个数字来表示消息的长度。握手的过程首先是先发送19,然后发送“BitTorrent protocol”。19就是“BitTorrent protocol”的长度。

后续的所有的整数,都采用big-endian 来编码为4个字节

在协议名称之后,是8个保留的字节,这些字节当前都设置为0。

接下来对元文件中的 info 信息,通过 sha1 计算后得到的 hash值,20个字节长。接收消息方,也会对 info 进行一个 hash 运算,如果这两个结果不一样,那么说明对方要的文件,并不是自己所要提供的,所以切断连接。

接下来是20个字节的 peer id。

这就是握手过程

接下来就是以消息长度开始的消息流,这是可选的。长度为0 的消息,用于保持连接的活动状态,被忽略。通常每隔2分钟发送一个这样的消息。

其它类型的消息,都有一个字节长的消息类型,可能的值如下:

‘choke’, ‘unchoe’, ‘interested’, not interested’类型的消息不再含有其它数据了。

‘bitfield’永远也仅仅是第一个被发送的消息。它的数据实际是一个位图,如果downloader已经发送了某个片断,那么对应的位置1,否则置0。Downloaders如果一个片断也没有,可以忽略这个消息。(通过这个消息,能知道什么了?)

‘have’类型的消息,后面的数据是一个简单的数字,它是下载者刚刚下载完并检查过完整性的片断的索引。(由此,可以看到,peer通过这种消息,很快就相互了解了谁都有什么片断)

‘request’类型的消息,后面包含索引、开始位置和长度)长度是2的幂。当前的实现都用的是215 ,而关闭连接的时候,请求一个超过2 17的长度。(这种类型的消息,就是当一个peer希望另一个peer给它提供片断的时候,发出的请求)

‘cancel’类型的消息,它的数据和’request’消息一样。它们通常只在下载趋向完成的时候发送,也就是在‘结束模式“阶段发送。在一次 下载接近完成的时候,最后的几个片断需要很长时间才能下载完。为了确保最后几个片断尽快下载完,它向所有的peers发送下载请求。为了保证这不带来可怕 的低效,一旦某个片断下载完成,它就其它peers发送’cancel’消息。(意思就是说,我不要这个片断了,你要是准备好了,也不用给我发了,可以想 象,如果对方还是把数据发送过来了,那么这边必须忽略这些重复的数据)。

‘piece’类型的消息,后面保护索引号、开始位置和实际的数据。注意,这种类型的消息和 ‘request’消息之间有潜在的联系(译注:因为通常有了request消息之后,才会响应‘piece’消息)。如果choke和unchoke消 息发送的过于迅速,或者,传输速度变的很慢,那么可能会读到一些并不是所期望的片断。( 也就是说,有时候读到了一些片断,但这些片断并不是所想要的)

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: