Thursday, December 9, 2010

Google's Dremel

Greg pointed out this interesting G paper "Dremel: Interactive Analysis of Web-Scale Datasets" (PDF)"

"By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing."

The key idea is very simple, but the implementation is complex: "our goal is to store all values of a given field consecutively to improve retrieval efficiency". This is the so called columnar vs. record-oriented storage.

Then, SQL relational operations are implemented by using the following intuition: "Think of a nested record as a labeled tree, where each label corresponds to a field name. The selection operator prunes away the branches of the tree that do not satisfy the specified conditions." .. "Dremel uses a multi-level serving tree to execute queries (see Figure 7). A root server receives incoming queries, reads metadata from the tables, and routes the queries to
the next level in the serving tree."

Dremel scans quadrillions of records per month.

1 comment:

  1. Note how much time is spent in parsing the objects... any ideas to speed it up? :)