Monday, January 12, 2009

Magy!: how many servers for a video search engine?

Magy! is the ultimate search engine. It won all the battles and survived. They have an ultra-social video search engine, which receives 500 new videos per second. Videos are user-contributed.
  • Can estimate what is the disk space needed to store one year of videos?
  • Can you propose a server infrastructure for storing the videos?
  • What about caching.
Yet another back of the envelope computation.


  1. 1) First question
    Let's say that you have 31 Million of seconds a year, and you define 30 megabytes to be the maximum uploaded video's size.
    Doing the math this gives a total of 473,040 Tera a year, or 473 Petabyte of some kind of redundant storage.

    2) Second question.
    What kind of infrastructure can accomodate 473 Peta of storage?
    I'm my opinion that's kind of infrastructure could not still be built. But, well it depends on you budget actually. :-)
    You would need a sum of 1000 EMC DMX-4 expanded to maximum capacity, using and enterprise storage solution, or an amount of about 100k standard Intel Server equipped with 5 1 Tera disk in raid configuration.
    Did you were thinking about something particular ?

    3) Third QUestion.
    Caching ? Who need caching when you have a such massive and disperse base of data. Unless you meant it for the CDN...

  2. Err, did I won something? :-)

  3. 1) correct. but you may want to estimate an average case as well. Not all the videos will reach 30Mbytes.

    2) Suppose that you do want to use commodity low cost servers and implement a software redundancy.

    3) Not all the videos will be accessed with the same frequency. A caching layer will improve a lot the qps of the system