We are all familiar with Trees, and we assume that pointers are an required evil needed for encoding the structure of a tree. Anyway, why do you need to spend 64bit pointers, when you can represent the structure with just few bits? Those ideas are not just theory, they are used in real implementation of Large Scale Tree data Structures that needs to fit in memory.
Let's start from this example taken from the paper Representing Trees of Higher Degree (E. Demain is one of the authors and his studies about Khipu are fascinating, but this is another story...)
An ordinal tree T can be represented as a bit array. The degree of each tree (number of sons) is encoded with a sequence of 1 terminated by a 0. All the nodes are represented with a level degree visit of T (commas are used just for helping the reader but we don't need to represent them). This representation is known as Level Ordered Unary Degree Sequence (LOUDS). Using auxiliary data structures for Rank and Select operations, LOUDS supports in constant time, finding the parent, the ith child, and the degree of a node.
- The number of 0 in the binary array is the number of node in the three.
- Each node is son of another node and therefore has a 1 representing it (we need to find which 1 is the correct one)
- As a consequence, we need at least a space of 2n, plus something for navigation
Now, we need to introduce two operations which can be computed in constant time with additional 0(n) space (see the paper).
- Rank1(j) returns the # of 1 up and including position j. Rank0(j) is a similar operation for 0
- Select1(j) returns the position of of the j-th 1. Select0(j) is a similar operation for 0
- Compute the degree of a node in position p. This is the number of 1 up to the next 0 and can be computed as Select0(rank0(p)+1) - p
- Compute the parent of a three as in Select1(rank0(p)+1)
- Compute the ith children of a tree;
Another magic representation is based on balanced parenthesis and it is due to Munro and Raman. Suppose you visit the three depth first and open a parenthesis on the way down, and close the parenthesis on the way up. The ordinal tree is encoded again ad a bitvector but now we have the additional benefit that the nodes of a subtree are very close in the bitvector. This is a consequence of the depth first visit. It is possible to navigate the three with Rank & Select operations.
Benoit, Demain, Munro, Raman proposed a variant of LOUDS based on depth first named Depth First Unary Degree Sequence (DFUDS), which preserves locality and still allows tree navigation in terms of Rank & Select. In addition, they proposed further optimization for finding the ith children of a node, if the tree is a trie with k outdegree.
More information about this fascinating word is available in Succinct Trees in Practice, 2010
No comments:
Post a Comment