Red–Black Trees: Structure, Height Guarantees, and Invariants

An introduction to red–black trees: conventions, invariants, case analysis, and their relation to 2–3–4 trees.

📊 Prefer slides? Check out the interactive presentation: Red–Black Trees Slides

Motivation

A binary search tree (BST) stores keys so that for each internal node $ x $:

BST operations have running time proportional to the height $ h $ of the tree:

In the worst case (for example, inserting already sorted keys into an empty tree), a BST degenerates into a chain and $ h = \Theta(n) $. To avoid this, we use balanced search trees.

A small comparison:

Data Structure Search Insert Delete Height Notes
Unbalanced BST $ O(\log n) $ avg, $ O(n) $ worst same same Up to $ n $ Can degrade to a linked list
AVL Tree $ O(\log n) $ $ O(\log n) $ $ O(\log n) $ Strictly balanced Stricter height invariants
Red–Black Tree $ O(\log n) $ $ O(\log n) $ $ O(\log n) $ $ \le 2\log_2(n+1) $ Relaxed balancing, few rotations
Treap Expected $ O(\log n) $ Expected $ O(\log n) $ Expected $ O(\log n) $ Expected $ O(\log n) $ Randomized priorities

Red–black trees guarantee logarithmic height via color-based invariants instead of strict height balancing.


Structural Conventions

To be precise, we fix conventions compatible with CLRS.

Internal Nodes and NIL Leaves

Height, Depth, Level

Let the tree be rooted at a node root.

The height of the tree is the height of the root.

Black-Height

For any node $ x $, the black-height $ bh(x) $ is defined as:

The number of black nodes on any simple path from $ x $ (exclusive) down to a NIL leaf (inclusive).

Thanks to the red–black properties, this number is well-defined (independent of which path to a NIL leaf you choose).


Definition of Red–Black Trees

A red–black tree is a binary search tree where each internal node is colored red or black and the following invariants hold:

  1. Every node is either red or black.
  2. The root is black.
  3. Every NIL leaf is black.
  4. If a node is red, then both its children are black. (No red–red edge.)
  5. For every node $ x $, every simple path from $ x $ to a NIL leaf has the same number of black nodes.

A simple example:

graph TD
    R["🔴 Black<br/>(Root)"]
    A["🔵 Red"]
    B["🔴 Black"]
    C["🔴 Black"]
    D["🔴 Black"]
    E["🔴 Black"]
    F["🔴 Black"]
    NIL1["⬛ NIL"]
    NIL2["⬛ NIL"]
    NIL3["⬛ NIL"]
    NIL4["⬛ NIL"]
    NIL5["⬛ NIL"]
    NIL6["⬛ NIL"]
    NIL7["⬛ NIL"]
    NIL8["⬛ NIL"]
    
    R --> A
    R --> B
    A --> C
    A --> D
    C --> NIL1
    C --> NIL2
    D --> NIL3
    D --> NIL4
    B --> E
    B --> F
    E --> NIL5
    E --> NIL6
    F --> NIL7
    F --> NIL8

Children of a Red Node

We now explicitly analyze all possible shapes of children for a red node.

Let R be a red internal node. Each child can be:

We consider all possibilities.

Case 1: Both children are NIL (allowed)

graph TD
    R["🔵 Red"]
    NIL1["⬛ NIL"]
    NIL2["⬛ NIL"]
    
    R --> NIL1
    R --> NIL2

This is allowed: a red internal node whose two children are NIL leaves.


Case 2: Both children are black internal nodes (allowed)

graph TD
    R["🔵 Red"]
    L["🔴 Black"]
    RR["🔴 Black"]
    NIL1["⬛ NIL"]
    NIL2["⬛ NIL"]
    NIL3["⬛ NIL"]
    NIL4["⬛ NIL"]
    
    R --> L
    R --> RR
    L --> NIL1
    L --> NIL2
    RR --> NIL3
    RR --> NIL4

This is also allowed. The subtrees below the two black children must, of course, satisfy the red–black invariants.


Case 3: One NIL child and one black internal child (forbidden)

Consider:

graph TD
    R["🔵 Red"]
    L["🔴 Black"]
    N["⬛ NIL"]
    NIL1["⬛ NIL"]
    NIL2["⬛ NIL"]
    
    R --> L
    R --> N
    L --> NIL1
    L --> NIL2

Thus the number of black nodes on paths from R to NIL leaves differs between the left and right sides, contradicting property 5.

Therefore, this case is not allowed.

The symmetric case (left NIL, right black internal) is equally forbidden.


Case 4: Any red child (forbidden)

graph TD
    R["🔵 Red"]
    C["🔵 Red"]
    
    R --> C

This directly violates property 4: a red node cannot have a red child.


Summary: children of a red node

A red node may have:

but may not have:

This structural fact will be referenced in several proofs.


Black-Height and Tree Size

We now formalize the relationship between black-height and the number of internal nodes.

Let $ S(k) $ be the minimum number of internal nodes in any red–black subtree whose root has black-height $ k $.

Derivation of the recurrence

\[S(0) = 0.\]

Thus:

\[S(k) = 1 + 2S(k-1).\]

Solving the recurrence

Unroll:

\[\begin{aligned} S(k) &= 1 + 2S(k-1) \\ &= 1 + 2(1 + 2S(k-2)) \\ &= 1 + 2 + 4S(k-2) \\ &= 1 + 2 + 4(1 + 2S(k-3)) \\ &= 1 + 2 + 4 + 8S(k-3) \\ &\dots \\ &= 1 + 2 + 4 + \dots + 2^{k-1} + 2^k S(0). \end{aligned}\]

Since $ S(0) = 0 $, the last term vanishes:

\[S(k) = 1 + 2 + 4 + \dots + 2^{k-1} = 2^k - 1.\]

Therefore, any subtree with black-height $ k $ has at least $ 2^k - 1 $ internal nodes.

If the whole tree has $ n $ internal nodes and root black-height $ bh(\text{root}) = k $, then:

\[n \ge 2^k - 1 \quad\Rightarrow\quad k \le \log_2(n+1).\]

This bounds the black-height in terms of the tree size.


Path Lengths and Height Bound

We now relate black-height to the actual height of the tree.

Consider any node $ x $ with black-height $ B $. Every path from $ x $ to a NIL leaf passes through exactly $ B $ black nodes (excluding $ x $, including NIL).

Shortest path from (x)

The shortest path uses only black nodes (no reds at all along the path):

graph TD
    X["🔴 x<br/>(Black)"]
    B1["🔴 B<br/>(Black)"]
    B2["🔴 B<br/>(Black)"]
    BN["⬛ NIL"]
    
    X --> B1
    B1 --> B2
    B2 --> BN

The number of internal nodes on such a path is $ \approx B $ (up to constant offsets depending on conventions). For our purposes, shortest length is proportional to $ B $.

Longest path from (x)

Because no red node may have a red child, between any two black nodes on a path there can be at most one red node. Thus the longest path alternates:

graph TD
    X["🔴 x<br/>(Black)"]
    B1["🔴 B<br/>(Black)"]
    R1["🔵 R<br/>(Red)"]
    B2["🔴 B<br/>(Black)"]
    R2["🔵 R<br/>(Red)"]
    B3["🔴 B<br/>(Black)"]
    N["⬛ NIL"]
    
    X --> B1
    B1 --> R1
    R1 --> B2
    B2 --> R2
    R2 --> B3
    B3 --> N

With $ B $ black nodes, there can be at most $ B $ red nodes interleaved. Hence:

\[L_{\max}(x) \le 2B,\quad L_{\min}(x) \ge B, \quad\Rightarrow\quad \frac{L_{\max}(x)}{L_{\min}(x)} \le 2.\]

Height bound for the tree

For the root, let black-height be $ k $. Then:

From the previous section:

\[k \le \log_2(n+1).\]

Combining:

\[h \le 2k \le 2\log_2(n+1).\]

Thus a red–black tree with $ n $ internal nodes has height at most $ 2\log_2(n+1) $.


Extreme Sizes and Red/Black Ratio

Minimum and maximum internal nodes for fixed black-height

For a fixed black-height $ k $:

\[n_{\min}(k) = 2^k - 1.\]

Ratio of red to black internal nodes

Let:

Each red node must have a black parent, and each black internal node can have at most two red children. Therefore:

\[R \le 2B \quad\Rightarrow\quad 0 \le \frac{R}{B} \le 2.\]

Absorbing Red Nodes and 2–3–4 Trees

A useful viewpoint is to “absorb” each red node into its black parent:

Consider this fragment:

graph TD
    B["🔴 Black"]
    R1["🔵 Red"]
    R2["🔵 Red"]
    C1["🔴 B"]
    C2["🔴 B"]
    C3["🔴 B"]
    C4["🔴 B"]
    
    B --> R1
    B --> R2
    R1 --> C1
    R1 --> C2
    R2 --> C3
    R2 --> C4

After absorption:

graph TD
    B["🔴 Black"]
    C1["🔴 B"]
    C2["🔴 B"]
    C3["🔴 B"]
    C4["🔴 B"]
    
    B --> C1
    B --> C2
    B --> C3
    B --> C4

In general, after absorbing all red nodes:

The resulting structure is exactly a 2–3–4 tree (a B-tree of order 4).


Relaxed Red–Black Trees

A relaxed red–black tree satisfies properties 1, 3, 4, 5, but allows the root to be red (property 2 may fail).

Suppose the root is red. If we simply recolor the root red $ \to $ black and make no other changes:

Therefore, recoloring the root is sufficient to turn a relaxed red–black tree into a proper red–black tree.


Cheat Sheet

Conventions

Concept Definition
Internal node Real, key-bearing node
NIL leaf Black sentinel child for every missing pointer; counted in black-height, not in height
Depth(x) Number of edges from root to $ x $
Level(x) depth(x) + 1
Height(x) Number of internal nodes on longest path from $ x $ to an internal leaf
Black-height Black nodes from $ x $ (exclusive) to NIL (inclusive)

Red–Black Invariants

Property Statement
1 Every node is red or black
2 Root is black
3 All NIL leaves are black
4 Red node has two black children
5 All paths from a node to NIL have the same number of black nodes

Structural Facts

Topic Result
Children of a red node Either two NIL children or two black internal children
Forbidden for red node Exactly one non-NIL child; any red child
Min nodes for black-height $ k $ $ 2^k - 1 $
Max nodes for black-height $ k $ $ 4^k - 1 $
Height bound $ h \le 2\log_2(n+1) $
Longest vs shortest path $ L_{\max} \le 2L_{\min} $
Red/black internal node ratio $ 0 \le R/B \le 2 $
Absorbing red nodes Internal node degrees in $ {2,3,4} $
Corresponding structure 2–3–4 tree
Relaxed root recoloring Recolor root red→black yields a valid red–black tree