Facebook’s architecture(转)

标签: 未分类 | 发表时间:2011-07-06 16:33 | 作者:jm Adam

From various readings and conversations I had, my understanding of Facebook’s current architecture is:

* Web front-end written in PHP. Facebook’s HipHop [1] then converts it to C++ and compiles it using g++, thus providing a high performance templating and Web logic execution layer
* Business logic is exposed as services using Thrift [2]. Some of these services are implemented in PHP, C++ or Java depending on service requirements (some other languages are probably used…)
* Services implemented in Java don’t use any usual enterprise application server but rather use Facebook’s custom application server. At first this can look as wheel reinvented but as these services are exposed and consumed only (or mostly) using Thrift, the overhead of Tomcat, or even Jetty was probably too high with no significant added value for their need.
* Persistence is done using MySQL, Memcached [3], Facebook’s Cassandra [4], Hadoop’s HBase [5]. Memcached is used as a cache for MySQL as well as a general purpose cache. Facebook engineers admit that their use of Cassandra is currently decreasing as they now prefer HBase for its simpler consistency model and its MapReduce ability.
* Offline processing is done using Hadoop and Hive
* Data such as logging, clicks and feeds transit using Scribe [6] and are aggregating and stored in HDFS using Scribe-HDFS [7], thus allowing extended analysis using MapReduce
* BigPipe [8] is their custom technology to accelerate page rendering using a pipelining logic
* Varnish Cache [9] is used for HTTP proxying. They’ve prefered it for its high performance and efficiency [10].
* The storage of the billions of photos posted by the users is handled by Haystack, an ad-hoc storage solution developed by Facebook which brings low level optimizations and append-only writes [11].
* Facebook Messages is using its own architecture which is notably based on infrastructure sharding and dynamic cluster management. Business logic and persistence is encapsulated in so-called ‘Cell’. Each Cell handles a part of users ; new Cells can be added as popularity grows [12]. Persistence is achieved using HBase [13].
* Facebook Messages’ search engine is built with an inverted index stored in HBase [14]
* Facebook Search Engine’s implementation details are unknown as far as I know
* The typeahead search uses a custom storage and retrieval logic [15]
* Chat is based on an Epoll server developed in Erlang and accessed using Thrift [16]

About the resources provisioned for each of these components, some information and numbers are known:

* Facebook is estimated to own more than 60,000 servers [17]. Their recent datacenter in Prineville, Oregon is based on entirely self-designed hardware [18] that was recently unveiled as Open Compute Project [19].
* 300 TB of data is stored in Memcached processes [20]
* Their Hadoop and Hive cluster is made of 3000 servers with 8 cores, 32 GB RAM, 12 TB disks that is a total of 24k cores, 96 TB RAM and 36 PB disks [20]
* 100 billion hits per day, 50 billion photos, 3 trillion objects cached, 130 TB of logs per day as of july 2010 [21]

[1] HipHop for PHP: http://developers.facebook.com/blog/post/358
[2] Thrift: http://thrift.apache.org/
[3] Memcached: http://memcached.org/
[4] Cassandra: http://cassandra.apache.org/
[5] HBase: http://hbase.apache.org/
[6] Scribe: https://github.com/facebook/scribe
[7] Scribe-HDFS: http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html
[8] BigPipe: http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919
[9] Varnish Cache: http://www.varnish-cache.org/
[10] Facebook goes for Varnish: http://www.varnish-software.com/customers/facebook
[11] Needle in a haystack: efficient storage of billions of photos: http://www.facebook.com/note.php?note_id=76191543919
[12] Scaling the Messages Application Back End: http://www.facebook.com/note.php?note_id=10150148835363920
[13] The Underlying Technology of Messages: https://www.facebook.com/note.php?note_id=454991608919
[14] The Underlying Technology of Messages Tech Talk: http://www.facebook.com/video/video.php?v=690851516105
[15] Facebook’s typeahead search architecture: http://www.facebook.com/video/video.php?v=432864835468
[16] Facebook Chat: http://www.facebook.com/note.php?note_id=14218138919
[17] Who has the most Web Servers?: http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/
[18] Building Efficient Data Centers with the Open Compute Project: http://www.facebook.com/note.php?note_id=10150144039563920
[19] Open Compute Project: http://opencompute.org/
[20] Facebook’s architecture presentation at Devoxx 2010: http://www.devoxx.com
[21] Scaling Facebook to 500 millions users and beyond: http://www.facebook.com/note.php?note_id=409881258919

相关 [facebook architecture] 推荐:

Facebook’s architecture(转)

- Adam - 淘宝JAVA中间件团队博客
Facebook’s HipHop [1] then converts it to C++ and compiles it using g++, thus providing a high performance templating and Web logic execution layer. Some of these services are implemented in PHP, C++ or Java depending on service requirements (some other languages are probably used…).

译|High-Performance Server Architecture

- - 掘金 架构
本文的目的是分享我多年来关于如何开发某种应用程序的一些想法,对于这种应用程序,术语“服务”只是一个无力的近似称呼. 更准确地说,将写的与一大类程序有关,这些程序旨每秒处理大量离散的消息或请求. 网络服务通常最适合此定义,但从某种意义上讲,实际上并非所有的程序都是服务. 但是,由于“高性能请求处理程序”是很糟糕的标题,为简单起见,倒不如叫“服务”万事大吉.

读《game engine architecture》有感

- 启鑫 - 博客园-首页原创精华区
最近在看一本叫做《game engine architecture》的书,这本书从很细,很具体的讲解现在游戏引擎的体系结构. 本书的亮点:1.讲解现代游戏引擎架构,拥有非常新的实例. 包括作者自己公司的引擎和商业引擎例如Unreal的实例. 代码少而思想多,往往一段话就可以让你了解某个部分的实现--(来自豆瓣上的点评).

论文《Architecture of a Database System》小结

- - 掘金 架构
原文: https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf. 本文主要讨论DBMS的体系结构,包括进程模型、并行架构、存储系统设计、事物系统实现、查询处理器和优化器架构,以及常见的共享组件和工具. 数据库系统是最早广泛部署的在线服务器系统之一,因此,它开创了不仅跨越数据管理,而且跨越应用程序、操作系统和网络服务的设计解决方案.

[Architecture] MVP, MVC, MVVM, 傻傻分不清楚~

- Amo - 點部落-小朱® 的技術隨手寫


- Discoverer - 60designwebpick
位于 LUGANO 湖湖畔山坡上的两层别墅住宅,由意大利的 JM ARCHITECTURE 事务所设计. 一个圆角多边形玻璃房在地面之上的层,包含生活区、厨房、餐厅及仓储空间. 卧室、浴室和车库在半地下较低的层,每一个层级都有与其密切关联的独立室外空间. 住宅的两个层级都被庭院所包围,在较高层级的玻璃房之中可以欣赏背靠山坡的景色,以及透过庭院俯瞰 LUGANO 湖.

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

- -


- Lorna - It Talks--上海魏武挥的博客
腾讯近日很低调地推出了一个名为“朋友”的网络服务(也是一个使用独立域名的网站),这是一个与时下社交网站,比如人人、开心等非常类似的产品. 与它们一样,目前这个“朋友”上也加载了一些应用,当然,一贯的,以腾讯自家出品为主. 而且,我个人以为,未来会有更多的腾讯在QQ这个客户端上的应用,逐步向这个网站迁移.


- 亦农 - 王建硕
今天的湾区阳光灿烂,280州际公路两边的绿色山坡和蔚蓝的白云,让人觉得自己是Windows XP桌面上的一个图标. 下午,2点,终于来到Facebook这个神奇的公司. 他们的新家在南加利福尼亚街的最里面,一幢两层的楼里. 他们刚刚从车位紧张的Palo Alto城里搬到这里,据说一层楼又要搬了. 我好像是他们再次搬地方前的最后一批访客.