NPC 2016: Network and Parallel Computing pp 205-212

# On Determination of Balance Ratio for Some Tree Structures

• Daxin Zhu
• Tinran Wang
• Xiaodong Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9966)

## Abstract

In this paper, we studies the problem to find the maximal number of red nodes of a kind of balanced binary search tree. We have presented a dynamic programming formula for computing r(n), the maximal number of red nodes of a red-black tree with n keys. The first dynamic programming algorithm uses $$O(n^2\log n)$$ time and uses $$O(n\log n)$$ space. The basic algorithm is then improved to a more efficient O(n) time algorithm. The time complexity of the new algorithm is finally reduced to O(n) and the space is reduced to only $$O(\log n)$$.

## Keywords

Balanced Ratio Dynamic Programming Formula Complete Binary Search Tree Time Complexity Black Height
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## 1 Introduction

This paper studies the worst case balance ratio of the red-black tree structure. A red-black tree is a kind of self-balancing binary search tree. Each node of the binary tree has an extra bit, and that bit is often interpreted as the color (red or black) of the node. These color bits are used to ensure the tree remains approximately balanced during insertions and deletions. The data structure was originally presented by Rudolf Bayer in 1972 with its name ’symmetric binary B-tree’ [2]. Guibas and Sedgewick named the data structure red-black tree in 1978, [4]. They introduced the red/black color convention and the properties of a red-black tree at length. A simpler-to-code variant of red-black trees was presented in [1, 5]. This variant of red-black trees was called the variant AA-trees [8]. The left-leaning red-black tree [6] was introduced in 2008 by Sedgewick. It is a new version of red-black tree which eliminated a previously unspecified degree of freedom. Either 2–3 trees or 2–4 trees can also be made isometric to red-black trees for any sequence of operations [6].

## 2 The Basic Properties and Algorithms

A red-black tree of n keys is denoted by T in this paper. In a red-black tree T of n keys, r(n) and s(n), are defined as the maximal and the minimal number of red internal nodes respectively. It is readily seen that in this case of $$n=2^k-1$$, the number of red nodes of T achieves its maximum, if the node from the bottom to the top are colored alternately red and black. the number of red nodes of T achieves its minimum, if the node from the bottom to the top are all colored black.

In the special case of $$n=2^k-1$$, we can conclude,
\begin{aligned} r(n)&=r(2^{k} -1) =\displaystyle {\sum _{i=0}^{\left\lfloor (k-1)/2\right\rfloor }2^{k-2i-1}}\\&=2^{k-1} \displaystyle {\sum _{i=0}^{\left\lfloor (k-1)/2\right\rfloor }\frac{1}{4^{i}}}\\&=\displaystyle {\frac{2^{k-1} }{3} \left( 4-\frac{1}{4^{\left\lfloor (k-1)/2\right\rfloor }}\right) }\\&=\displaystyle {\frac{2^{k+1}-2^{k-1-2\left\lfloor (k-1)/2\right\rfloor }}{3}}\\&=\displaystyle {\frac{2^{k+1} -2^{(k-1)\,\,\texttt {mod}\,\,2}}{3}}\\&=\displaystyle {\frac{2^{k+1} -2+k\,\, \texttt {mod}\,\,2}{3}}\\&=\displaystyle {\frac{2(2^{k} -1)+k\,\,\texttt {mod}\,\,2}{3}}\\&=\displaystyle {\frac{2n+\log (n+1)\,\,\texttt {mod}\,\,2}{3}} \end{aligned}
The number of black nodes b(n) can then be,
\begin{aligned} b(n)&=n-r(n)\\&=\displaystyle {n-\frac{2n+\log (n+1)\,\,\texttt {mod}\,\,2}{3}}\\&=\displaystyle {\frac{n-\log (n+1)\,\,\texttt {mod}\,\,2}{3}} \end{aligned}
Therefore, in this case, the ratio of red nodes to black nodes is,
\begin{aligned} r(n)/b(n)=\frac{2n+\log (n+1)\,\,\texttt {mod}\,\,2}{n-\log (n+1)\,\,\texttt {mod}\,\,2} \end{aligned}
In the general cases, the maximal number of red nodes of a red-black tree with n keys can be denoted by $$\gamma (n,0)$$ if root is red, and by $$\gamma (n,1)$$ if root is black. We then have,
\begin{aligned} r(n)=\max \{\gamma (n,0),\gamma (n,1)\}. \end{aligned}
It can be proved by induction further that $$\gamma (n,0)\le \frac{2n+1}{3}$$ and $$\gamma (n,1)\le \frac{2n}{3}$$. Therefore,
\begin{aligned} \displaystyle {r(n)\le \max \left\{ \frac{2n+1}{3},\frac{2n}{3}\right\} =\frac{2n+1}{3}} \end{aligned}
Thus, for $$n\ge 7$$, we have
\begin{aligned} 0\le \frac{r(n)}{n-r(n)} \le \frac{\frac{2n+1}{3} }{n-\frac{2n+1}{3} } =\frac{2n+1}{n-1} \le 2.5 \end{aligned}
In the general cases, the maximal number of red nodes of a subtree of black-height j and size i can be denoted by a(ij, 0) if root is red and by a(ij, 1) if root is black. It follows from $$\frac{1}{2}\log n\le j\le 2\log n$$ that
\begin{aligned} \gamma (n,k)=\max \limits _{\frac{1}{2}\log n\le j\le 2\log n}a(n,j,k) \end{aligned}
(1)
Furthermore, we can denote for any $$1\le i\le n, \frac{1}{2}\log i\le j\le 2\log i$$ that
\begin{aligned} {\left\{ \begin{array}{ll} \alpha _1(i,j)=\max \limits _{0\le t\le i/2 }\{a(t,j,1)+a(i-t-1,j,1)\}&{}\\ \alpha _2(i,j)=\max \limits _{0\le t\le i/2 }\{a(t,j,0)+a(i-t-1,j,0)\}&{}\\ \alpha _3(i,j)=\max \limits _{0\le t\le i/2 }\{a(t,j,1)+a(i-t-1,j,0)\}&{}\\ \alpha _4(i,j)=\max \limits _{0\le t\le i/2 }\{a(t,j,0)+a(i-t-1,j,1)\}&{}\\ \end{array}\right. } \end{aligned}
(2)

### Theorem 1

a(ij, 0) and a(ij, 1) can be formulated for each $$1\le i\le n, \frac{1}{2}\log (i+1)\le j\le 2\log (i+1)$$, as follows.
\begin{aligned} \left\{ \begin{array}{l} a(i,j,0)=1+\alpha _1(i,j)\\ a(i,j,1)=\max \{\alpha _1(i,j-1),\alpha _2(i,j-1),\alpha _3(i,j-1)\}\\ \end{array} \right. \end{aligned}
(3)

### Proof

Let ij be two indices such that $$1\le i\le n, \frac{1}{2}\log (i+1)\le j\le 2\log (i+1)$$. T(ij, 0) is defined to be a red-black tree of i keys and black-height j, and its root red. T(ij, 1) is defined similarly if its root black. The number of red nodes in T(ij, 0) and T(ij, 1) can be denoted respectively by a(ij, 0) and a(ij, 1).

(1) We consider T(ij, 0) first. The two children of T(ij, 0) must be black, since its root red. The two subtrees L and R must both have a black-height of j. The subtrees T(tj, 1) and $$T(i-t-1,j,1)$$ which connected to a red node must be a red-black tree of black-height j and i keys. The number of red nodes is thus $$1+a(t,j,1)+a(i-t-1,j,1)$$. T(ij, 0) have a maximal number of red nodes, and thus,
\begin{aligned} a(i,j,0)\ge \max \limits _{0\le t\le i/2 }\{1+a(t,j,1)+a(i-t-1,j,1)\} \end{aligned}
(4)
On the other hand, If the number of red nodes of L and R are denoted by r(L) and r(R), and the sizes of L and R are t and $$i-t-1$$, then we can conclude that $$r(L)\le a(t,j,1)$$ and $$r(R)\le a(i-t-1,j,1)$$. Therefore we have,
\begin{aligned} a(i,j,0)\le 1+\max \limits _{0\le t\le i/2 }\{a(t,j,1)+a(i-t-1,j,1)\} \end{aligned}
(5)
It follows from (4) and (5) that,
\begin{aligned} a(i,j,0)=1+\max \limits _{0\le t\le i/2 }\{a(t,j,1)+a(i-t-1,j,1)\} \end{aligned}
(6)
(2) We now consider T(ij, 1). The two subtrees L and R must both have a black-heights $$j-1$$.
If both L and R have a black root, then $$T(t,j-1,1)$$ and $$T(i-t-1,j-1,1)$$ must have black-height j and i keys. The number of red nodes must be $$a(t,j-1,1)+a(i-t-1,j-1,1)$$. It follows that
\begin{aligned} a(i,j,1)\ge \max \limits _{0\le t\le i/2 }\{a(t,j-1,1)+a(i-t-1,j-1,1)\}=\alpha _1(i,j-1) \end{aligned}
(7)
The other three cases, can be discussed similarly.
\begin{aligned} a(i,j,1)\ge \max \limits _{0\le t\le i/2 }\{a(t,j-1,0)+a(i-t-1,j-1,0)\}=\alpha _2(i,j-1) \end{aligned}
(8)
\begin{aligned} a(i,j,1)\ge \max \limits _{0\le t\le i/2 }\{a(t,j-1,1)+a(i-t-1,j-1,0)\}=\alpha _3(i,j-1) \end{aligned}
(9)
\begin{aligned} a(i,j,1)\ge \max \limits _{0\le t\le i/2 }\{a(t,j-1,0)+a(i-t-1,j-1,1)\}=\alpha _4(i,j-1) \end{aligned}
(10)
Therefore, we can conclude,
\begin{aligned} a(i,j,1)\ge \max \{\alpha _1(i,j-1),\alpha _2(i,j-1),\alpha _3(i,j-1),\alpha _4(i,j-1)\} \end{aligned}
(11)
On the other hand, If the number of red nodes of L and R are denoted by r(L) and r(R), and the sizes of L and R are t and $$i-t-1$$, then we can conclude that $$r(L)\le a(t,j-1,1)$$ and $$r(R)\le a(i-t-1,j-1,1)$$. Therefore we have,
\begin{aligned} a(i,j,1)\le \max \limits _{0\le t\le i/2 }\{a(t,j-1,1)+a(i-t-1,j-1,1)\}=\alpha _1(i,j-1) \end{aligned}
(12)
The other three cases can be discussed similarly that
\begin{aligned} a(i,j,1)\le \max \limits _{0\le t\le i/2 }\{a(t,j-1,0)+a(i-t-1,j-1,0)\}=\alpha _2(i,j-1) \end{aligned}
(13)
\begin{aligned} a(i,j,1)\le \max \limits _{0\le t\le i/2 }\{a(t,j-1,1)+a(i-t-1,j-1,0)\}=\alpha _3(i,j-1) \end{aligned}
(14)
\begin{aligned} a(i,j,1)\le \max \limits _{0\le t\le i/2 }\{a(t,j-1,0)+a(i-t-1,j-1,1)\}=\alpha _4(i,j-1) \end{aligned}
(15)
It follows that
\begin{aligned} a(i,j,1)\le \max \{\alpha _1(i,j-1),\alpha _2(i,j-1),\alpha _3(i,j-1),\alpha _4(i,j-1)\} \end{aligned}
(16)
It follows from (11) and (16) that
\begin{aligned} a(i,j,1)=\max \{\alpha _1(i,j-1),\alpha _2(i,j-1),\alpha _3(i,j-1),\alpha _4(i,j-1)\} \end{aligned}
(17)
It is clear that $$\alpha _4(i,j)$$ achieves its maximum at $$t_0$$ $$\alpha _4(i,j)=a(t_0,j,0)+a(i-t_0-1,j,1), 0\le t_0\le i/2$$, then $$\alpha _3(i,j)$$ achieves its maximum at $$t_1=i-t_0-1$$, $$\alpha _3(i,j)=a(t_1,j,1)+a(i-t_1-1,j,0), 0\le t_1\le i/2$$. Thus, $$\alpha _3(i,j)=\alpha _4(i,j)$$, for each $$1\le i\le n, \frac{1}{2}\log (i+1)\le j\le 2\log (i+1)$$, and finally we have,
\begin{aligned} a(i,j,1)=\max \{\alpha _1(i,j-1),\alpha _2(i,j-1),\alpha _3(i,j-1)\} \end{aligned}
(18)
The proof is complete.

## 3 The Improvement of Time Complexity

Some special pictures of the maximal red-black trees are listed in Fig. 2. It can be observed from these pictures of the maximal red-black trees that some properties of r(n) are useful.

(1) The maximal red-black tree with r(n) red nodes of n keys as shown above can be realized by a complete binary search tree.

(2) In the maximal red-black tree of n keys, the nodes along the left spine are colored red, black, $$\cdots$$, alternatively from the bottom to the top. In such a red-black tree, its black-height must be $$\frac{1}{2}\log n$$.

The dynamic programming formula in Theorem 1 we can be improved further from above observations. The second loop for j can be reduced to $$j=\frac{1}{2}\log i$$ to $$1+\frac{1}{2}\log i$$, since the black-height of i keys must be $$1+\frac{1}{2}\log i$$. The time complexity of the algorithm can thus be reduced immediately to $$O(n^2)$$.

It can be seen from observation (1) that the subtree of a T is a complete binary tree. If T has a size n, then its left subtree must have a size of
$$left(n)=2^{\lfloor \log n\rfloor -1}-1+\min \{2^{\lfloor \log n\rfloor -1},n-2^{\lfloor \log n\rfloor }+1\}$$
and its right subtree must have a size of
$$right(n)=n-left(n)-1$$
It follows that the range $$0\le t\le i/2$$ can be reduced to $$t=left(i)$$. The time complexity of the algorithm can thus be reduced further to O(n).

Another efficient algorithm for r(n) need only $$O(\log n)$$ space can be built as follows.

### Theorem 2

In a red-black tree T of n keys, let r(n) be the maximal number of red nodes in T. The values of $$d(1)=r(n)$$ can be formulated as follows.
\begin{aligned} d(m)=\left\{ \begin{array}{ll} h(m) &{} h(m)\le 1\\ 1+d(4m)+d(4m+1)+d(4m+2)+d(4m+3) &{} h(m)\mod 2=1\\ d(2m)+d(2m+1) &{} h(m)\mod 2=0\\ \end{array} \right. \end{aligned}
(19)
where
\begin{aligned} h(m)=\left\{ \begin{array}{ll} 1+\lfloor \log n\rfloor -\lfloor \log m\rfloor &{} \frac{m}{2^{\lfloor \log n\rfloor -\lfloor \log m\rfloor }}\le n\\ \lfloor \log n\rfloor -\lfloor \log m\rfloor &{} \,\,\,\mathtt {otherwise}\\ \end{array} \right. \end{aligned}
(20)

### Proof

We can label the nodes of a maximal red-black tree like a heap. The root of the tree is labeled 1. The left child of a node i is labeled 2i and the right child is labeled $$2i+1$$. Let d(i) denote the maximal number of red nodes of T and h(i), denote the height at node i. Then we have $$r(n)=d(1)$$. It is easy to verify that if $$\frac{i}{2^{\lfloor \log n\rfloor -\lfloor \log i\rfloor }}>n$$, then we have $$h(i)=\lfloor \log n\rfloor -\lfloor \log i\rfloor$$, otherwise, $$h(i)=1+\lfloor \log n\rfloor -\lfloor \log i\rfloor$$.

It is obvious that if $$h(i)\le 1$$, then $$d(i)=h(i)$$.

Therefore, if h(i) is even then the left subtree rooted at node 2i and the right subtree rooted at node $$2i+1$$ are both black. If h(i) odd, the four nodes rooted at nodes 4i, $$4i+1$$, $$4i+2$$ and $$4i+3$$ can be all maximal red-black trees. Therefore, if $$h(i)>1$$, we have
$$d(i)=\left\{ \begin{array}{ll} 1+d(4i)+d(4i+1)+d(4i+2)+d(4i+3) &{} h(i) \,\,\,\mathtt {odd}\\ d(2i)+d(2i+1) &{} h(i) \,\,\,\mathtt {even}\\ \end{array} \right.$$
The proof is complete.
A new recursive algorithm for r(n) can be build as the following algorithm.

It can be seen that the time complexity of the algorithm is O(n), since each node is visited at most once in the algorithm. The space complexity of the algorithm is the stack space used in recursive calls. The depth of recursive is at most $$\log n$$, and the space complexity of the algorithm is thus $$O(\log n)$$.

In order to find r(n), the algorithm can be reformulated in a non-recursive form as follows.
By further improvement of the formula, we can reduce algorithm to a very simple algorithm as follows.

It is clear that the time complexities of above two algorithms are both $$O\log n$$.

If the number of n can fit into a computer word, then a(n) and o(n) can be computed in O(1) time.

## 4 Concluding Remarks

We have presented a dynamic programming formula for computing r(n), the maximal number of red nodes of a red-black tree with n keys. The time complexity of the new algorithm is finally reduced to O(n) and the space is reduced to only $$O(\log n)$$.

## Notes

### Acknowledgement

This work was supported by Intelligent Computing and Information Processing of Fujian University Laboratory and Data-Intensive Computing of Fujian Provincial Key Laboratory.

## References

1. 1.
Andersson, A.: Balanced search trees made simple. In: Dehne, F., Sack, J.-R., Santoro, N., Whitesides, S. (eds.) WADS 1993. LNCS, vol. 709, pp. 60–71. Springer, Heidelberg (1993). doi:
2. 2.
Bayer, R.: Symmetric binary B-trees, Data structure and maintenance algorithms. Acta Informatica 1(4), 290–306 (1972)
3. 3.
Goodrich, M.T., Tamassia, R.: Algorithm Design and Applications. Wiley, Hoboken (2015)
4. 4.
Guibas, L.J., Sedgewick, R.: A dichromatic framework for balanced trees. In: 19th FOCS, pp. 8–21 (1978)Google Scholar
5. 5.
Heejin, P., Kunsoo, P.: Parallel algorithms for redCblack trees. Theor. Comput. Sci. 262(1–2), 415–435 (2001)Google Scholar
6. 6.
Sedgewick, R.: Left-leaning RedCBlack Trees. http://www.cs.princeton.edu/rs/talks/LLRB/LLRB.pdf
7. 7.
8. 8.
Weiss, M.A.: Data Structures and Problem Solving Using C++, 2nd edn. Addison-Wesley, New York (2000)Google Scholar