콘텐츠로 이동

Sedgewick & Wayne — Algorithms 4th Edition: Under the Hood

Source: Algorithms, 4th Edition — Robert Sedgewick & Kevin Wayne, Princeton University (2011, 969 pp.)
Focus: Internal mechanics of every major data structure and algorithm — memory layout, pointer arithmetic, amortization proofs, cache behavior, and the mathematical invariants that make these algorithms correct.


1. Union-Find: Weighted Quick-Union with Path Compression

1.1 Memory Layout of the id[] / sz[] Arrays

Union-Find maintains two integer arrays of size N. In the JVM, each int[] costs 24 + 4N bytes (16-byte object header + 4-byte length field + 4N data bytes, padded to 8 bytes).

int[] id  →  [ heap header (16B) | N (4B) | pad(4B) | id[0] | id[1] | ... | id[N-1] ]
int[] sz  →  [ heap header (16B) | N (4B) | pad(4B) | sz[0] | sz[1] | ... | sz[N-1] ]

Every find(p) follows the id chain until id[i] == i (root). Weighted quick-union keeps trees flat by always attaching the smaller tree under the larger tree's root.

1.2 Path Compression State Machine

stateDiagram-v2
    [*] --> TraverseToRoot : find(p) called
    TraverseToRoot --> RecordPath : collect nodes along path
    RecordPath --> CompressPath : set each node's parent = root
    CompressPath --> ReturnRoot : id[node] = root
    ReturnRoot --> [*]

1.3 Tree Depth Analysis: Ackermann Inverse

Without path compression, weighted union gives tree depth ≤ lg N. With path compression, amortized cost per operation is O(α(N)) where α is the inverse Ackermann function — effectively ≤ 4 for any N that can be stored on Earth.

flowchart TD
    A["find(p): p=7"] --> B["id[7]=3 — follow"]
    B --> C["id[3]=1 — follow"]
    C --> D["id[1]=1 — ROOT found"]
    D --> E["Path compression:\nid[7]=1, id[3]=1"]
    E --> F["Next find(7): one hop"]

1.4 Union by Rank — Structural Invariant

union(p, q):
  rootP = find(p)
  rootQ = find(q)
  if rootP == rootQ: return   // already connected
  if sz[rootP] < sz[rootQ]:   // attach smaller under larger
      id[rootP] = rootQ
      sz[rootQ] += sz[rootP]
  else:
      id[rootQ] = rootP
      sz[rootP] += sz[rootQ]

This guarantees: any tree of height h has at least 2^h nodes → depth ≤ lg N.


2. Sorting: Internal Memory and Cache Mechanics

2.1 Mergesort — Bottom-Up Iterative Memory Flow

Top-down mergesort suffers from O(log N) recursive call stack frames. Bottom-up mergesort eliminates recursion entirely, merging runs of increasing size 1→2→4→8→...

sequenceDiagram
    participant Stack as Call Stack
    participant Aux as aux[] (heap, N ints)
    participant Arr as a[] (input array)

    Stack->>Arr: pass 0..N-1 reference
    Note over Stack,Aux: Bottom-up: no recursion stack frames
    loop sz = 1, 2, 4, 8, ...
        loop lo = 0; lo < N-sz
            Arr->>Aux: copy a[lo..mid] into aux[lo..mid]
            Aux->>Arr: merge aux[lo..mid] + a[mid+1..hi] → a[lo..hi]
        end
    end

Key memory invariant: aux[] is allocated once (N elements), never reallocated. Each merge pass reads from a[], writes through aux[], then writes back — 2N reads + 2N writes per pass × log N passes = 2N log N total memory operations.

2.2 Mergesort vs Quicksort — Cache Line Behavior

block-beta
  columns 3
  block:merge["Mergesort\n(sequential access)"]:1
    A["Read aux[lo..hi]"]
    B["Write a[lo..hi]"]
  end
  block:quick["Quicksort\n(scattered access)"]:1
    C["Partition: random swaps"]
    D["Recursive: random subranges"]
  end
  block:shell["Shellsort\n(strided access)"]:1
    E["h-sort: stride-h reads"]
    F["Decreasing h → cache-friendly"]
  end

Cache lines are typically 64 bytes = 16 ints. Mergesort's sequential scan always fills cache lines fully. Quicksort's partition phase scans sequentially (good) but recursive calls on random subranges cause TLB misses on large N.

2.3 Quicksort Partition: 3-Way (Dijkstra Dutch Flag)

Standard 2-way partition degrades to O(N²) on equal keys. Sedgewick's 3-way partition maintains three regions:

flowchart LR
    subgraph Array["a[lo..hi]"]
        direction LR
        A["a[lo..lt-1]\n< pivot"] --- B["a[lt..gt]\n= pivot"] --- C["a[gt+1..hi]\n> pivot"]
    end
    D["lt pointer"] --> B
    E["gt pointer"] --> B
    F["i pointer: scans lt→gt"] --> B

Pointer invariant after each comparison: - a[i] < pivotswap(lt, i), lt++, i++ - a[i] == pivoti++ (no swap) - a[i] > pivotswap(i, gt), gt-- (i stays)

This reduces the sort of N equal keys from O(N²) to O(N).

2.4 Heapsort — Binary Heap Memory Map

The heap stores a complete binary tree in a flat array with 1-based indexing. For node at index k: left child = 2k, right child = 2k+1, parent = k/2.

flowchart TB
    subgraph Heap["int[] heap (1-indexed)"]
        R["[1] = ROOT (max)"]
        L["[2] = left child"]
        Ri["[3] = right child"]
        LL["[4]"] --- LR["[5]"] --- RL["[6]"] --- RR["[7]"]
    end
    R --> L
    R --> Ri
    L --> LL
    L --> LR
    Ri --> RL
    Ri --> RR

Sink operation (sink(k, N)): compare a[k] with max of a[2k], a[2k+1]; swap down if necessary. Repeat until heap order restored. Cost = 2 lg N compares per delMax().

Heapsort phases: 1. Heapify: Build max-heap bottom-up in O(N) time (not O(N log N)) — sinking from N/2 down to 1 2. Sortdown: Repeatedly extract max, shrink heap boundary by 1


3. Symbol Tables: BSTs and Red-Black Trees

3.1 BST Node Memory Layout

Each Node in a BST holds: Key key, Value val, Node left, Node right, int N (subtree size). In a 64-bit JVM with compressed OOPs:

Node object (56 bytes):
  ┌─────────────────────────────────────────┐
  │ object header (16 bytes)                │
  │ key reference (8 bytes → heap object)   │
  │ val reference (8 bytes → heap object)   │
  │ left reference (8 bytes → Node or null) │
  │ right reference (8 bytes → Node or null)│
  │ N (int, 4 bytes)                        │
  │ padding (4 bytes)                       │
  └─────────────────────────────────────────┘

For N nodes: heap usage = 56N bytes (keys/values excluded).

3.2 BST Search — Recursive Call Stack Depth

sequenceDiagram
    participant Client
    participant BST
    participant Stack as JVM Call Stack

    Client->>BST: get("S")
    BST->>Stack: frame: get(root, "S")
    BST->>Stack: frame: get(root.right, "S") [cmp > 0]
    BST->>Stack: frame: get(node, "S") [cmp < 0 → left]
    BST->>Stack: frame: get(node, "S") [found]
    Stack-->>Client: return value
    Note over Stack: depth = tree height h
    Note over Stack: worst case: h = N (sorted insertion)
    Note over Stack: avg case: h ≈ 2.99 lg N (random)

3.3 Red-Black BST: 2-3 Tree Encoding

Sedgewick's red-black BST encodes a 2-3 tree where red links represent 3-nodes (a node with two keys). The invariant: no two consecutive red links, all paths from root to null have same number of black links.

flowchart TD
    subgraph TwoThree["2-3 Tree representation"]
        A["(E, S) — 3-node"] --> B["(A,C)"]
        A --> C["(H,M)"]
        A --> D["(X)"]
    end
    subgraph RBTree["Red-Black BST encoding"]
        S["S (black)"] -->|red left link| E2["E (red)"]
        S --> X2["X (black)"]
        E2 --> AC["... left subtree"]
        E2 --> HM["... right subtree"]
    end

3.4 Rotation and Color-Flip Operations

Left rotation (rotateLeft(h)): transforms a right-leaning red link into left-leaning:

sequenceDiagram
    participant h as Node h (black)
    participant x as Node x (red, h.right)

    Note over h,x: Before: h → x (red right link)
    h->>x: x.left = h
    x->>h: h becomes subtree under x
    Note over h,x: After: x replaces h, h.right = old x.left
    Note over h: set h.color = RED
    Note over x: set x.color = h's old color

Color flip: when both children are red (temporary 4-node), flip colors — children become black, parent becomes red (passes "extra" link up the tree).

stateDiagram-v2
    [*] --> Insert : add new key (always RED node)
    Insert --> CheckRight : right child red?
    CheckRight --> RotateLeft : YES → rotateLeft
    CheckRight --> CheckLeftLeft : NO
    RotateLeft --> CheckLeftLeft
    CheckLeftLeft --> RotateRight : left + left.left both RED → rotateRight
    CheckLeftLeft --> CheckBothRed : NO
    RotateRight --> CheckBothRed
    CheckBothRed --> FlipColors : both children RED → flipColors
    CheckBothRed --> Done : NO
    FlipColors --> Done
    Done --> [*]

After all rotations, tree height ≤ 2 lg N. Average path length ≈ 1.00 lg N (empirically validated in Sedgewick's experiments on FrequencyCounter).


4. Hash Tables: Collision Resolution Internals

4.1 Separate Chaining Memory Structure

block-beta
  columns 1
  block:table["SequentialSearchST[] st (array of M linked lists)"]:1
    H0["st[0]: null"]
    H1["st[1]: → Node{key1,val1} → Node{key2,val2} → null"]
    H2["st[2]: → Node{key3,val3} → null"]
    HM["st[M-1]: null"]
  end

Each Node in the chain: Key key + Value val + Node next reference = ~40 bytes on 64-bit JVM.

Hash function pipeline for a string key k:

h = 0
for each char c in k:
    h = (31 * h + c) & 0x7fffffff   // Horner's method, strip sign bit
h = h % M                            // map to [0, M)

Load factor α = N/M: separate chaining maintains α ≈ 1–10 for O(1) average search.

4.2 Linear Probing — Cluster Growth

Linear probing stores key-value pairs in a flat array of size M (must be prime or power of 2). Probe sequence for key k: h(k), h(k)+1, h(k)+2, ... wrapping at M.

flowchart LR
    K["hash(key) = 5"] --> S5["st[5] occupied?"]
    S5 -->|YES| S6["st[6] occupied?"]
    S6 -->|YES| S7["st[7] occupied?"]
    S7 -->|NO| Insert["Insert at st[7]"]

Primary clustering: runs of consecutive occupied slots grow, increasing average probe length. When load α > 0.5, average cluster length1/(1-α) — at α=0.9, average 10 probes per lookup.

Resize trigger: when N/M > 0.5, double M and rehash all N keys — O(N) cost amortized O(1) per insert.

sequenceDiagram
    participant Insert
    participant Table as keys[]/vals[] (size M)
    participant NewTable as keys[]/vals[] (size 2M)

    Insert->>Table: put(k, v) → N/M > 0.5
    Table->>NewTable: allocate new arrays of size 2M
    loop for each existing key
        Table->>NewTable: rehash and insert
    end
    Note over Table: old arrays eligible for GC
    Note over NewTable: becomes current table

5. Graph Algorithms: Adjacency-List Memory and DFS/BFS Stack Mechanics

5.1 Adjacency-List Representation

block-beta
  columns 2
  block:V["V-element Bag[] array"]:1
    V0["adj[0]: → 2 → 1 → 5 → null"]
    V1["adj[1]: → 0 → 2 → null"]
    V2["adj[2]: → 0 → 1 → 3 → null"]
    VN["..."]
  end
  block:Mem["Memory cost"]:1
    M1["Bag array: 16 + 8V bytes"]
    M2["Each Node: 32 bytes (key+next)"]
    M3["Total edges: 2E nodes (undirected)"]
    M4["= 16 + 8V + 64E bytes"]
  end

Sparse graphs (E ≈ V) use O(V) memory. Dense graphs (E ≈ V²) use O(V²).

5.2 DFS Recursive Call Stack vs Explicit Stack

Recursive DFS — JVM call stack depth = longest DFS path (up to V):

sequenceDiagram
    participant JVM as JVM Stack
    participant DFS

    DFS->>JVM: dfs(0) → frame pushed
    DFS->>JVM: dfs(2) → frame pushed
    DFS->>JVM: dfs(1) → frame pushed
    DFS->>JVM: dfs(5) → frame pushed (all neighbors marked)
    JVM-->>DFS: return (5 popped)
    JVM-->>DFS: return (1 popped)
    JVM-->>DFS: return (2 popped)
    Note over JVM: StackOverflowError if V > ~8000 (default stack size)

Iterative DFS — explicit Stack<Integer> on the heap avoids stack overflow:

flowchart TD
    A["Push source vertex s"] --> B["Stack empty?"]
    B -->|NO| C["v = stack.pop()"]
    C --> D["already marked[v]?"]
    D -->|YES| B
    D -->|NO| E["mark[v] = true"]
    E --> F["push all adj[v] onto stack"]
    F --> B
    B -->|YES| G["DFS complete"]

5.3 BFS — Queue-Based Level Traversal

sequenceDiagram
    participant Q as Queue<Integer>
    participant D as distTo[]
    participant E as edgeTo[]

    Note over Q,E: enqueue source s, distTo[s]=0
    loop while Q not empty
        Q->>Q: v = dequeue()
        loop for each w in adj[v]
            Note over D: if distTo[w] == ∞:
            Note over D: distTo[w] = distTo[v]+1
            Note over E: edgeTo[w] = v
            Q->>Q: enqueue(w)
        end
    end

BFS guarantees shortest paths in unweighted graphs because it explores vertices in non-decreasing distance order — each distance d is fully explored before d+1.

5.4 Topological Sort — DFS Finish-Order Reversal

stateDiagram-v2
    [*] --> White : vertex unvisited
    White --> Gray : DFS enters vertex
    Gray --> Black : DFS exits vertex (push to stack)
    Black --> [*]

    note right of Black : Reverse postorder\n= topological order

Cycle detection: if DFS encounters a gray vertex (back edge), a directed cycle exists → topological sort undefined.

5.5 Kosaraju-Sharir SCC Algorithm — Two-Pass DFS

sequenceDiagram
    participant G as Original Graph G
    participant GR as Reverse Graph G^R
    participant Stack as Finish-Order Stack
    participant SCC as SCC Labels[]

    Note over GR,Stack: Pass 1: DFS on G^R
    GR->>Stack: push vertices in DFS finish order

    Note over G,SCC: Pass 2: DFS on G in stack order
    loop while Stack not empty
        Stack->>G: v = pop()
        G->>SCC: DFS from v, label all reachable as same SCC
    end

Why it works: vertices reachable from v in G^R = vertices that can reach v in G. The finish-order of G^R gives the reverse topological order of the DAG of SCCs.


6. Minimum Spanning Trees: Prim's and Kruskal's

6.1 Lazy Prim — Priority Queue State

sequenceDiagram
    participant T as MST tree vertices
    participant PQ as MinPQ<Edge> (lazy)
    participant MST as mst[] (result)

    Note over T,PQ: Start: add vertex 0, enqueue all adj edges
    loop while PQ not empty and MST < V-1
        PQ->>PQ: e = delMin() [min weight crossing edge]
        Note over PQ: if both endpoints in T: skip (stale)
        Note over T: else: add new vertex w to T
        Note over MST: add edge e to MST
        Note over PQ: enqueue all adj[w] edges crossing T boundary
    end

Lazy vs Eager Prim: lazy keeps all edges (even stale) on PQ → O(E log E) space/time. Eager Prim replaces stale entries with decreaseKey() → O(E log V) time using indexed priority queue.

6.2 Indexed Priority Queue — Key Data Structure

Eager Prim requires decreaseKey(v, weight) — update the priority of vertex v already on the queue. Standard binary heap cannot do this in O(log V) without knowing v's position.

Indexed MinPQ internals:

pq[]   : heap-ordered array of vertex indices (pq[i] = vertex at heap position i)
qp[]   : inverse: qp[v] = i means vertex v is at heap position i
keys[] : keys[v] = current best edge weight to reach v

flowchart LR
    A["decreaseKey(v, w)"] --> B["keys[v] = w"]
    B --> C["swim(qp[v])"] 
    C --> D["update pq[] and qp[]\nfor swapped positions"]

6.3 Kruskal's — Union-Find Integration

sequenceDiagram
    participant PQ as MinPQ<Edge> (all E edges)
    participant UF as UnionFind (V components)
    participant MST as Queue<Edge> (result)

    Note over PQ: Initialize: all E edges enqueued
    loop while PQ not empty and MST.size < V-1
        PQ->>PQ: e = delMin()
        Note over UF: v = e.either(), w = e.other(v)
        UF->>UF: uf.connected(v, w)?
        Note over MST: if YES → skip (would create cycle)
        Note over UF: if NO → uf.union(v, w)
        MST->>MST: enqueue e
    end

Space: O(E) for PQ + O(V) for UF. Time: O(E log E) dominated by PQ operations.


7. Shortest Paths: Dijkstra and Bellman-Ford

7.1 Dijkstra — Relaxation-Based State Machine

stateDiagram-v2
    [*] --> Init : distTo[s]=0, all others=∞
    Init --> ExtractMin : dequeue min-dist vertex v
    ExtractMin --> Relax : for each edge v→w, weight w_vw
    Relax --> UpdateDist : if distTo[v] + w_vw < distTo[w]
    UpdateDist --> UpdatePQ : update PQ entry for w
    UpdatePQ --> ExtractMin : continue
    ExtractMin --> [*] : PQ empty

Relaxation invariant: after processing vertex v, distTo[v] is the true shortest path from s to v (assuming no negative edges).

flowchart TD
    A["distTo[s] = 0\nenqueue s with priority 0"] --> B["v = PQ.delMin()"]
    B --> C["for each v→w edge (weight e)"]
    C --> D["distTo[v] + e < distTo[w]?"]
    D -->|YES| E["distTo[w] = distTo[v] + e\nedgeTo[w] = v\nPQ.decreaseKey(w, distTo[w])"]
    D -->|NO| F["skip"]
    E --> B
    F --> B
    B --> G["PQ empty → done"]

7.2 Bellman-Ford — Negative Edge Handling

Dijkstra fails with negative edges. Bellman-Ford relaxes all V-1 passes over all E edges:

sequenceDiagram
    participant Edges as All E edges
    participant Dist as distTo[]
    participant Edge as edgeTo[]

    loop pass = 1 to V-1
        loop for each edge v→w, weight e
            Edges->>Dist: if distTo[v] + e < distTo[w]:
            Dist->>Dist: distTo[w] = distTo[v] + e
            Dist->>Edge: edgeTo[w] = v
        end
    end
    Note over Edges,Edge: Pass V: if any update → negative cycle detected

Queue-based optimization: only relax edges from vertices whose distTo changed in the previous pass → O(EV) worst case but O(E) typical (≈ E/V relaxations per pass).


8. String Algorithms: Tries and KMP

8.1 R-Way Trie — Node Memory Layout

Each trie node holds an array of R references (R = alphabet size, typically 256 for extended ASCII):

TrieNode:
  val: Object (null if no key ends here)
  next: Node[R]  (R × 8 bytes references = 256×8 = 2048 bytes per node!)

Memory problem: for ASCII, each empty node wastes 2048 bytes. A trie with N keys has O(N log_R N) nodes → O(N log_R N × R) bytes total — unacceptably large for large R.

Ternary Search Tries (TST) solve this: each node has 3 links (left/mid/right) and one character, not R links:

flowchart TD
    subgraph TST["TST Node Structure (32 bytes)"]
        C["char c (2B + 6B pad)"]
        L["left link (8B)"]
        M["mid link (8B)"]
        R["right link (8B)"]
        V["val ref (8B)"]
    end

8.2 KMP — Failure Function Construction

Knuth-Morris-Pratt builds a deterministic finite automaton (DFA) from the pattern before searching. The DFA dfa[c][j] = next state when at state j and reading character c.

sequenceDiagram
    participant Pat as Pattern "ABABAC"
    participant DFA as dfa[256][M] array
    participant X as restart state X

    Note over DFA: Build phase O(RM):
    DFA->>DFA: dfa[pat[0]][0] = 1 (initial)
    loop j = 1 to M-1
        loop c = 0 to R-1
            DFA->>DFA: dfa[c][j] = dfa[c][X]  (mismatch: copy restart state)
        end
        DFA->>DFA: dfa[pat[j]][j] = j+1  (match: advance)
        DFA->>X: X = dfa[pat[j]][X]  (update restart state)
    end

Search phase O(N): scan text left-to-right, transition DFA states. No backtracking — text pointer never moves backward.


9. Algorithm Complexity Summary

block-beta
  columns 4
  block:h1["Algorithm"]:1
  block:h2["Time (worst)"]:1
  block:h3["Time (avg)"]:1
  block:h4["Space"]:1

  block:r1["Union-Find (weighted+compress)"]:1
  block:r2["O(N α(N))"]:1
  block:r3["O(N α(N))"]:1
  block:r4["O(N)"]:1

  block:r5["Mergesort"]:1
  block:r6["O(N log N)"]:1
  block:r7["O(N log N)"]:1
  block:r8["O(N) aux"]:1

  block:r9["Quicksort (3-way)"]:1
  block:r10["O(N²)"]:1
  block:r11["O(N log N)"]:1
  block:r12["O(log N) stack"]:1

  block:r13["Red-Black BST"]:1
  block:r14["O(log N)"]:1
  block:r15["O(log N)"]:1
  block:r16["O(N)"]:1

  block:r17["Hash (linear probe)"]:1
  block:r18["O(N)"]:1
  block:r19["O(1) amort."]:1
  block:r20["O(N) amort."]:1

  block:r21["Dijkstra (indexed PQ)"]:1
  block:r22["O(E log V)"]:1
  block:r23["O(E log V)"]:1
  block:r24["O(V)"]:1

  block:r25["Bellman-Ford"]:1
  block:r26["O(VE)"]:1
  block:r27["O(E)"]:1
  block:r28["O(V)"]:1

  block:r29["Prim's MST (eager)"]:1
  block:r30["O(E log V)"]:1
  block:r31["O(E log V)"]:1
  block:r32["O(V)"]:1

10. Object Memory Overhead — JVM Internals

Sedgewick explicitly analyzes memory, which requires understanding the JVM object model:

block-beta
  columns 2
  block:prim["Primitive Arrays"]:1
    I["int[N]: 24 + 4N bytes"]
    D["double[N]: 24 + 8N bytes"]
    C["char[N]: 24 + 2N bytes"]
  end
  block:obj["Object Arrays"]:1
    OA["Object[N]: 16 + 8N bytes\n(+ each object's own size)"]
    STR["String: 40 bytes\n(+ char[] backing array)"]
    NODE["BST Node: ~56 bytes"]
  end

String substring O(1) sharing: genome.substring(6, 3) creates a new 40-byte String object but reuses the same char[] backing array — only the offset and count fields differ. This makes substring creation O(1) time and O(1) space, which is the foundation of O(N) KMP and O(N log N) suffix sort algorithms.


11. Data Flow: From Client API to Memory Operations

flowchart TD
    Client["Client Code:\nbst.put('S', 42)"] --> API["BST.put(key, val)"]
    API --> Compare["key.compareTo(node.key)"]
    Compare -->|lt| Left["recurse left subtree"]
    Compare -->|gt| Right["recurse right subtree"]
    Compare -->|eq| Update["node.val = val\nnode.N unchanged"]
    Left --> NewNode["if null: allocate Node\n56 bytes on JVM heap"]
    Right --> NewNode
    NewNode --> UpdateN["update N counts\nback up call stack"]
    UpdateN --> RB_Fix["Red-Black: check colors\nrotate/flip if needed"]
    RB_Fix --> GC["unreferenced nodes\neligible for GC"]

Every put() on a Red-Black BST: 1. Descends the tree (O(log N) pointer dereferences, each potentially a cache miss) 2. Allocates a new 56-byte Node on the JVM heap (if key is new) 3. Unwinds the call stack, updating N counts and performing ≤3 rotations 4. Total: ≤ 2 lg N + 3 pointer operations, with predictable memory access patterns


Document synthesized from Sedgewick & Wayne, Algorithms 4th Ed., Princeton University, covering Union-Find (Ch. 1.5), Sorting (Ch. 2), Symbol Tables (Ch. 3), Graphs (Ch. 4), and Strings (Ch. 5).