What is Facebook's architecture?
2012-02-14 19:33
489 查看
From various readings and conversations I had, my understanding of Facebook's current architecture is:
Web front-end written in PHP. Facebook's HipHop [1] then converts it to C++ and compiles it using g++, thus providing a high performance templating and Web logic execution layer
Business logic is exposed as services using Thrift [2]. Some of these services are implemented in PHP, C++ or Java depending on service requirements (some other languages are probably used...)
Services implemented in Java don't use any usual enterprise application server but rather use Facebook's custom application server. At first this can look as wheel reinvented but as these services are exposed and consumed only (or mostly) using Thrift, the
overhead of Tomcat, or even Jetty, was probably too high with no significant added value for their need.
Persistence is done using MySQL, Memcached [3], Facebook's Cassandra [4], Hadoop's HBase [5]. Memcached is used as a cache for MySQL as well as a general purpose cache. Facebook engineers admit that their use of Cassandra is currently decreasing as they now
prefer HBase for its simpler consistency model and its MapReduce ability.
Offline processing is done using Hadoop and Hive
Data such as logging, clicks and feeds transit using Scribe [6] and are aggregating and stored in HDFS using Scribe-HDFS [7], thus allowing extended analysis using MapReduce
BigPipe [8] is their custom technology to accelerate page rendering using a pipelining logic
Varnish Cache [9] is used for HTTP proxying. They've prefered it for its high performance and efficiency [10].
The storage of the billions of photos posted by the users is handled by Haystack, an ad-hoc storage solution developed by Facebook which brings low level optimizations and append-only writes [11].
Facebook Messages is using its own architecture which is notably based on infrastructure sharding and dynamic cluster management. Business logic and persistence is encapsulated in so-called 'Cell'. Each Cell handles a part of users ; new Cells can be added
as popularity grows [12]. Persistence is achieved using HBase [13].
Facebook Messages' search engine is built with an inverted index stored in HBase [14]
Facebook Search Engine's implementation details are unknown as far as I know
The typeahead search uses a custom storage and retrieval logic [15]
Chat is based on an Epoll server developed in Erlang and accessed using Thrift [16]
About the resources provisioned for each of these components, some information and numbers are known:
Facebook is estimated to own more than 60,000 servers [17]. Their recent datacenter in Prineville, Oregon is based on entirely self-designed hardware [18] that was recently unveiled as Open Compute Project [19].
300 TB of data is stored in Memcached processes [20]
Their Hadoop and Hive cluster is made of 3000 servers with 8 cores, 32 GB RAM, 12 TB disks that is a total of 24k cores, 96 TB RAM and 36 PB disks [20]
100 billion hits per day, 50 billion photos, 3 trillion objects cached, 130 TB of logs per day as of july 2010 [21]
[1] HipHop
for PHP: http://developers.facebook.com/b...
[2] Thrift: http://thrift.apache.org/
[3] Memcached: http://memcached.org/
[4] Cassandra: http://cassandra.apache.org/
[5] HBase: http://hbase.apache.org/
[6] Scribe: https://github.com/facebook/scribe
[7] Scribe-HDFS: http://hadoopblog.blogspot.com/2...
[8] BigPipe: http://www.facebook.com/notes/fa...
[9] Varnish
Cache: http://www.varnish-cache.org/
[10] Facebook
goes for Varnish: http://www.varnish-software.com/...
[11] Needle
in a haystack: efficient storage of billions of photos:http://www.facebook.com/note.php...
[12] Scaling
the Messages Application Back End: http://www.facebook.com/note.php...
[13] The
Underlying Technology of Messages: https://www.facebook.com/note.ph...
[14] The
Underlying Technology of Messages Tech Talk:http://www.facebook.com/video/vi...
[15] Facebook's
typeahead search architecture: http://www.facebook.com/video/vi...
[16] Facebook
Chat: http://www.facebook.com/note.php...
[17] Who
has the most Web Servers?: http://www.datacenterknowledge.c...
[18] Building
Efficient Data Centers with the Open Compute Project: http://www.facebook.com/note.php...
[19] Open
Compute Project: http://opencompute.org/
[20] Facebook's
architecture presentation at Devoxx 2010:http://www.devoxx.com
[21] Scaling
Facebook to 500 millions users and beyond:http://www.facebook.com/note.php...
Web front-end written in PHP. Facebook's HipHop [1] then converts it to C++ and compiles it using g++, thus providing a high performance templating and Web logic execution layer
Business logic is exposed as services using Thrift [2]. Some of these services are implemented in PHP, C++ or Java depending on service requirements (some other languages are probably used...)
Services implemented in Java don't use any usual enterprise application server but rather use Facebook's custom application server. At first this can look as wheel reinvented but as these services are exposed and consumed only (or mostly) using Thrift, the
overhead of Tomcat, or even Jetty, was probably too high with no significant added value for their need.
Persistence is done using MySQL, Memcached [3], Facebook's Cassandra [4], Hadoop's HBase [5]. Memcached is used as a cache for MySQL as well as a general purpose cache. Facebook engineers admit that their use of Cassandra is currently decreasing as they now
prefer HBase for its simpler consistency model and its MapReduce ability.
Offline processing is done using Hadoop and Hive
Data such as logging, clicks and feeds transit using Scribe [6] and are aggregating and stored in HDFS using Scribe-HDFS [7], thus allowing extended analysis using MapReduce
BigPipe [8] is their custom technology to accelerate page rendering using a pipelining logic
Varnish Cache [9] is used for HTTP proxying. They've prefered it for its high performance and efficiency [10].
The storage of the billions of photos posted by the users is handled by Haystack, an ad-hoc storage solution developed by Facebook which brings low level optimizations and append-only writes [11].
Facebook Messages is using its own architecture which is notably based on infrastructure sharding and dynamic cluster management. Business logic and persistence is encapsulated in so-called 'Cell'. Each Cell handles a part of users ; new Cells can be added
as popularity grows [12]. Persistence is achieved using HBase [13].
Facebook Messages' search engine is built with an inverted index stored in HBase [14]
Facebook Search Engine's implementation details are unknown as far as I know
The typeahead search uses a custom storage and retrieval logic [15]
Chat is based on an Epoll server developed in Erlang and accessed using Thrift [16]
About the resources provisioned for each of these components, some information and numbers are known:
Facebook is estimated to own more than 60,000 servers [17]. Their recent datacenter in Prineville, Oregon is based on entirely self-designed hardware [18] that was recently unveiled as Open Compute Project [19].
300 TB of data is stored in Memcached processes [20]
Their Hadoop and Hive cluster is made of 3000 servers with 8 cores, 32 GB RAM, 12 TB disks that is a total of 24k cores, 96 TB RAM and 36 PB disks [20]
100 billion hits per day, 50 billion photos, 3 trillion objects cached, 130 TB of logs per day as of july 2010 [21]
[1] HipHop
for PHP: http://developers.facebook.com/b...
[2] Thrift: http://thrift.apache.org/
[3] Memcached: http://memcached.org/
[4] Cassandra: http://cassandra.apache.org/
[5] HBase: http://hbase.apache.org/
[6] Scribe: https://github.com/facebook/scribe
[7] Scribe-HDFS: http://hadoopblog.blogspot.com/2...
[8] BigPipe: http://www.facebook.com/notes/fa...
[9] Varnish
Cache: http://www.varnish-cache.org/
[10] Facebook
goes for Varnish: http://www.varnish-software.com/...
[11] Needle
in a haystack: efficient storage of billions of photos:http://www.facebook.com/note.php...
[12] Scaling
the Messages Application Back End: http://www.facebook.com/note.php...
[13] The
Underlying Technology of Messages: https://www.facebook.com/note.ph...
[14] The
Underlying Technology of Messages Tech Talk:http://www.facebook.com/video/vi...
[15] Facebook's
typeahead search architecture: http://www.facebook.com/video/vi...
[16] Facebook
Chat: http://www.facebook.com/note.php...
[17] Who
has the most Web Servers?: http://www.datacenterknowledge.c...
[18] Building
Efficient Data Centers with the Open Compute Project: http://www.facebook.com/note.php...
[19] Open
Compute Project: http://opencompute.org/
[20] Facebook's
architecture presentation at Devoxx 2010:http://www.devoxx.com
[21] Scaling
Facebook to 500 millions users and beyond:http://www.facebook.com/note.php...
相关文章推荐
- What is Facebook's architecture?
- What things is chaos can't change?
- nb developer faq - What are the basic things I should know about NetBeans' architecture to get started?
- What is the 'software life cycle'?
- CareerCup What is the difference between a computers heap and it's stack?
- ld: i386 architecture of input file `exit.o' is incompatible with i386:x86-64 output
- What is the difference between '/n' and '/r/n'? /n /r/n的区别是什么
- [Quora] What is the most elegant line of code you've seen?
- What is the ARGB int encoding of pixels in Java's AWT?
- What is Favicon.ico? Personalise Your Site's Bookmark(zt)
- What is a Kappa coefficient? (Cohen's Kappa)
- What is i'm? Chat for charity initiative officially launched!
- What is McAfee's refund policy?
- linux下出现architecture of input file `*.o' is incompatible with i386:x86-64 output的解决方法
- MYSQL 执行Insert语句throws "The table 'xxx' is full" 的问题分析及解决办法
- filegroup reference and partitioning scheme' is not supported in this version of sql server.
- Android Your content must have a ListView whose id attribute is 'android.R.id.list'错误的解决办法
- Android开发:'default' is not a best match for any device/locale combination
- What's -her-face 她叫什么来着
- unexcept token. token is : 'union', at line 33 column 2, token type is 'Keyword'