Sunday, June 13, 2010

Facebook infrastructure and data

Very interesting presentation about Facebook by Aditya Agarwal, Director of Engineering
  • 8 billion minutes spent every day, 5 billion of contents shared per week, 3 billion photos per month
  • 80,000 applications use Facebook connect
  • 500,000 million user
Stack is made up of several components. The front-end is a PHP optimized with mem-cache and asynchronous communication, which has been replaced by a HipHop code transformer in C++. The service components is written in different languages and communicating by using Thrift, Scribe and some in-house components. Memcache is used to store in memory hash table and for caching mysql data or application generated data. Mysql is used to store data (key, value) and with almost no relational model (clearly no local relational join, since "tables" are distributed ;-). All the software they have contributed to the community is here


  1. The parts of the talk on culture (especially rapid iteration, small teams, keeping developers in control of their own products) are also worthwhile. A good summary of that is at

  2. Why wouldn't Facebook use mysql?

    They've been using it for some time and, if used as a key/value store heavily partitioned so the working set is mostly in memory, mysql is so fast that I very much doubt the considerable effort to switch from it would be attractive. After all, there's no software project more dangerous and prone to failure than rewriting a lot of code without any clear purpose.

    Last time I checked, Amazon and Google still use mysql in some of their systems. Why wouldn't Facebook?

    1. Since Timeline is more concerned about organizing data neatly than shooting out updates in real time, MySQL is well suited for the app. Although the data is aggregated in the same location as the data is kept (i.e. not over a network connection), that data is managed by MySQL, and not an alternative like NoSQL or Hadoop Hbase.