您的位置:首页 > Web前端

Protocol Buffers, Avro, Thrift,MessagePack区别

2015-06-26 10:43 405 查看
 Perhaps one of the first inescapable observations that a newGoogle developer (Noogler) makes once they dive into the code is that ProtocolBuffers (PB) is the "language of data" at Google. Put simply,Protocol Buffers are used for serialization, RPC, and
about everything inbetween.

 

Initially developed in early 2000's as an optimized serverrequest/response protocol (hence the name), they have become the de-facto datapersistence format and RPC protocol. Later, following a major (v2) rewrite in2008, Protocol Buffers was open sourced by
Google and now, through a number ofthird party extensions, can be used across dozens of languages - includingRuby, of course.

 

But, Protocol Buffers for everything? Well, it appears towork for Google, but more importantly I think this is a great example of whereunderstanding the historical context in which each was developed is just asinstrumental as comparing features and benchmarking
speed.

 

Protocol Buffers vs. Thrift

 

Let's take a step back and compare Protocol Buffers to the"competitors", of which there are plenty. Between PB, Thrift, Avroand MessagePack, which is the best? Truth of the matter is, they are all verygood and each has its own strong points. Hence, the answer
is as much of apersonal choice, as well as understanding of the historical context for each,and correctly identifying your own, individual requirements.

 

When Protocol Buffers was first being developed (early 2000's),the preferred language at Google was C++ (nowadays, Java is on par). Hence itshould not be surprising that PB is strongly typed, has a separate schema file,and also requires a compilation step
to output the language-specificboilerplate to read and serialize messages. To achieve this, Google definedtheir own language (IDL) for specifying the proto files, and limited PB'sdesign scope to efficient serialization of common types and attributes found
inJava, C++ and Python. Hence, PB was designed to be layered over an (existing)RPC mechanism.

 

By comparison, Thrift which was open sourced by Facebook inlate 2007, looks and feels very similar to Protocol Buffers - in alllikelihood, there was some design influence from PB there. However, unlike PB,Thrift makes RPC a first class citizen: Thrift compiler
provides a variety oftransport options (network, file, memory), and also tries to target many morelanguages.

 

Which is the "better" of the two? Both have beenproduction tested at scale, so it really depends on your own situation. If youare primarily interested in the binary serialization, or if you already have anRPC mechanism then Protocol Buffers is a great place
to start. Conversely, ifyou don't yet have an RPC mechanism and are looking for one, then Thrift may bea good choice. (Word of warning: historically, Thrift has not been consistentin their feature support and performance across all the languages, so do someresearch).

 

Protocol Buffers vs. Avro, MessagePack

 

While Thrift and PB differ primarily in their scope, Avroand MessagePack should really be compared in light of the more recent trends:rising popularity of dynamic languages, and JSON over XML. As most every webdevelopers knows, JSON is now ubiquitous, and
easy to parse, generate, andread, which explains its popularity. JSON also requires no schema, provides notype checking, and it is a UTF-8 based protocol - in other words, easy to workwith, but not very efficient when put on the wire.

 

MessagePack is effectively JSON, but with efficient binaryencoding. Like JSON, there is no type checking or schemas, which depending onyour application can be either be a pro or a con. But, if you are alreadystreaming JSON via an API or using it for storage,
then MessagePack can be adrop-in replacement.

 

Avro, on the other hand, is somewhat of a hybrid. In itsscope and functionality it is close to PB and Thrift, but it was designed withdynamic languages in mind. Unlike PB and Thrift, the Avro schema is embeddeddirectly in the header of the messages, which
eliminates the need for the extracompile stage. Additionally, the schema itself is just a JSON blob - no customparser required! By enforcing a schema Avro allows us to do data projections(read individual fields out of each record), perform type checking, and
enforcethe overall message structure.

 

"The Best" Serialization Format

 

Reflecting on the use of Protocol Buffers at Google and allof the above competitors it is clear that there is no one definitive,"best" option. Rather, each solution makes perfect sense in thecontext it was developed and hence the same logic should be applied
to your ownsituation.

 

If you are looking for a battle-tested, strongly typedserialization format, then Protocol Buffers is a great choice. If you also needa variety of built-in RPC mechanisms, then Thrift is worth investigating. Ifyou are already exchanging or working with JSON,
then MessagePack is almost adrop-in optimization. And finally, if you like the strongly typed aspects, butwant the flexibility of easy interoperability with dynamic languages, then Avromay be your best bet at this point in time.

 

Ilya GrigorikIlya Grigorik is a web performance engineer anddeveloper advocate at Google, where his focus is on making the web fast anddriving adoption of performance best practices — follow on Twitter, Google+.

View comments (18), share on Twitter (239), Google+,Facebook (2).

High-Performance Browser Networking (O'Reilly)

What every web developer must know about networking andbrowser performance: impact of latency and bandwidth, TCP, UDP, and TLSoptimization, performance tips for mobile networks, and an under the hood lookat performance of HTTP 1.1/2.0, XMLHttpRequest, WebSocket,
WebRTC, DataChannel,and other transports.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: