CMU 15-850 Advanced Algorithms: Under the Hood¶
Source: Advanced Algorithms — Notes for CMU 15-850 (Fall 2020), Anupam Gupta, Carnegie Mellon University (285 pp.)
Focus: Mathematical internals of advanced algorithmic machinery — how Ackermann's inverse emerges from data structure amortization, how randomized algorithms exploit polynomial sparsity, how convex optimization underpins modern graph algorithms, and how matroid theory generalizes greedy correctness proofs.
1. Minimum Spanning Trees: From O(m log n) to O(m α(n))¶
1.1 The Cut and Cycle Rules — Correctness Foundation¶
All MST algorithms are instances of the blue rule (cut rule) and red rule (cycle rule). The key invariant: color an edge blue if it is the min-weight edge crossing any cut; color it red if it is the max-weight edge on any cycle.
stateDiagram-v2
[*] --> Uncolored : all edges start uncolored
Uncolored --> Blue : Cut Rule — min-weight crossing edge
Uncolored --> Red : Cycle Rule — max-weight on cycle
Blue --> MST : blue edges form spanning tree
Red --> Excluded : red edges excluded from MST
MST --> [*]
Excluded --> [*]
Cut Rule proof sketch: If min-weight edge e across cut (S, S̄) is absent from some spanning tree T, then T ∪ {e} contains a cycle C. Since C crosses the cut, there exists another edge e' in C crossing the cut with w(e') > w(e). Replacing e' with e gives a lower-weight tree — contradiction.
1.2 Three Classical Algorithms — Data Structure Dependence¶
block-beta
columns 3
block:k["Kruskal's"]:1
K1["Sort all edges: O(m log m)"]
K2["Iterate sorted edges"]
K3["Union-Find: O(m α(m))"]
K4["Total: O(m log n)"]
end
block:p["Jarník/Prim"]:1
P1["Priority queue per vertex"]
P2["extractMin + decreaseKey"]
P3["Binary heap: O(m log n)"]
P4["Fibonacci heap: O(m + n log n)"]
end
block:b["Borůvka"]:1
B1["Each vertex picks lightest adj edge"]
B2["Contract blue components"]
B3["≤ log₂ n rounds (halves nodes)"]
B4["Total: O(m log n), no fancy DS"]
end
1.3 Fredman-Tarjan: O(m log* n) with Fibonacci Heaps¶
The key insight: don't grow the priority queue too large before contracting. Fredman and Tarjan run Prim's algorithm but stop each "phase" when the heap exceeds a threshold t — then contract all found MST components and restart.
sequenceDiagram
participant G as Graph G
participant PQ as Fibonacci Heap (≤t entries)
participant Blue as Blue Components
loop phase = 1, 2, ...
Note over PQ: Start Prim from any unprocessed vertex
loop until |PQ| > t or no edges remain
PQ->>Blue: extractMin() → add vertex to current component
G->>PQ: decreaseKey for new neighbors
end
Blue->>G: Contract all completed components
Note over G: Graph shrinks by ≥ t vertices per phase
end
Why log* n phases? With threshold t = 2^(2^(phase)) (Ackermann-like growth), the number of phases before all components merge is bounded by log* n — the number of times you can take log₂ before reaching 1.
1.4 The Ackermann Function and Its Inverse¶
The Ackermann function A(k, n) is defined by double recursion:
Growth rates:
- A(1, n) = 2n (linear)
- A(2, n) = 2^n (exponential)
- A(3, n) = 2^(2^(2^...)) (tower of n 2s)
- A(4, n) grows unimaginably fast
The inverse Ackermann function α(n) = smallest k such that A(k, k) ≥ n. For any n conceivable in practice, α(n) ≤ 4.
flowchart LR
A["n = 10^80 (atoms in universe)"] --> B["α(n) = 4"]
C["n = 10^(10^10)"] --> D["α(n) = 4"]
E["n = tower of googols"] --> F["α(n) = 5"]
G["Any practical n"] --> H["α(n) ≤ 5"]
Union-Find amortization: Path compression + union by rank give each operation amortized O(α(n)) cost. The proof uses a "rank function" on nodes — after compression, many nodes have their parent's rank jump several levels, amortizing future traversals.
1.5 Randomized Linear-Time MST (Karger-Klein-Tarjan)¶
Uses the Cycle Rule in a clever way: randomly sample a subgraph F (each edge included independently with probability ½), recursively find its MST T_F, then use T_F to eliminate heavy edges from the original graph.
sequenceDiagram
participant G as G (m edges)
participant F as Random subgraph F (≈m/2 edges)
participant TF as MST(F) via recursion
participant GF as G filtered by TF
G->>F: include each edge with prob 1/2
F->>TF: recursively find MST(F)
TF->>GF: for each edge e in G: if e is max on some cycle in TF, remove e
Note over GF: Expected O(m) edges remain
GF->>GF: recursively find MST(GF)
Note over GF,TF: merge results
F-heavy edge elimination: An edge e is F-heavy if it is the heaviest edge on the path between its endpoints in T_F. By cycle rule, F-heavy edges cannot be in the MST of G → discard them. Expected number of F-light edges = O(n) → recursion on much smaller graph.
2. Matroids: Generalizing Greedy Correctness¶
2.1 Matroid Definition¶
A matroid M = (E, I) consists of a ground set E and a family of independent sets I satisfying:
1. ∅ ∈ I
2. (Hereditary) If A ∈ I and B ⊆ A, then B ∈ I
3. (Augmentation) If A, B ∈ I and |A| < |B|, then ∃ e ∈ B \ A such that A ∪ {e} ∈ I
flowchart TD
A["Graphic Matroid:\nE = edges, I = forests"] --> C["Greedy finds MST"]
B["Linear Matroid:\nE = vectors, I = linearly independent sets"] --> C
D["Partition Matroid:\nE partitioned into groups,\nI = ≤k elements per group"] --> C
C --> E["THEOREM: Greedy algorithm finds\nmax-weight basis of any matroid"]
2.2 Why Greedy Works on Matroids — Exchange Argument¶
sequenceDiagram
participant G as Greedy solution (sorted by weight, descending)
participant OPT as Optimal solution
Note over G,OPT: Suppose G ≠ OPT
G->>OPT: Find first element e in G not in OPT
OPT->>OPT: OPT + {e} has a cycle C (by matroid circuit property)
G->>OPT: ∃ f in C ∩ OPT with w(f) ≤ w(e)
Note over OPT: OPT' = OPT + {e} - {f} is also independent
Note over OPT: w(OPT') ≥ w(OPT) — contradicts optimality of OPT
Kruskal's correctness is exactly this matroid greedy theorem applied to the graphic matroid — adding the lightest edge that doesn't form a cycle is the greedy step on the independent sets (forests) of the graphic matroid.
3. Algebraic Matching: Schwartz-Zippel and Polynomial Identity Testing¶
3.1 Schwartz-Zippel Lemma¶
For a non-zero polynomial p(x₁,...,xₙ) of degree ≤ d, if we sample each xᵢ uniformly from a set S:
This is the engine of randomized algebraic algorithms: if we evaluate the polynomial at random points and get 0, it's probably the zero polynomial (i.e., no matching exists).
flowchart TD
A["Polynomial p(x₁...xₙ) with degree d"] --> B["Sample R₁...Rₙ from S with |S| ≥ dn²"]
B --> C["Evaluate p(R₁...Rₙ)"]
C -->|"= 0"| D["Probably zero polynomial\n(false negative prob ≤ 1/n²)"]
C -->|"≠ 0"| E["Definitely non-zero polynomial\n(no false positives)"]
3.2 Perfect Matching via Edmonds Matrix¶
Edmonds matrix for bipartite graph G=(L,R,E): n×n matrix where E_ij = x_ij if edge (i,j) ∈ E, else 0.
Each permutation σ corresponds to a potential perfect matching. A term is non-zero iff all edges of that matching exist. Therefore:
- det(E(G)) ≠ 0 ↔ G has a perfect matching
sequenceDiagram
participant G as Bipartite graph G
participant E as Edmonds Matrix
participant R as Random Evaluation
G->>E: Build n×n matrix with variable x_ij for each edge
E->>R: Sample each x_ij uniformly from S ⊆ F with |S| ≥ n³
R->>R: Compute det(sampled matrix) in O(n^ω) via Gaussian elimination
alt det = 0
R-->>G: No perfect matching (w.h.p.)
else det ≠ 0
R-->>G: Perfect matching exists
end
Time: O(n^ω) where ω ≈ 2.37 (matrix multiplication exponent). This beats combinatorial O(m√n) algorithms for dense graphs.
4. Dimension Reduction: Johnson-Lindenstrauss Lemma¶
4.1 The JL Lemma — High-Dimensional Geometry¶
For n points in ℝ^d, we can project them to ℝ^k (with k = O(ε⁻² log n)) such that all pairwise distances are preserved within factor (1 ± ε):
flowchart LR
A["n points in ℝ^d\n(d potentially huge)"] --> B["Random projection matrix A\n(k×d, entries ~ N(0,1)/√k)"]
B --> C["n points in ℝ^k\n(k = O(ε⁻² log n))"]
C --> D["∀ u,v: (1-ε)‖u-v‖ ≤ ‖Au-Av‖ ≤ (1+ε)‖u-v‖"]
Why this works: Each coordinate of the projected point is the sum of k random variables (by rows of A). By concentration of measure, the sum concentrates tightly around its mean — the squared distance ‖Au-Av‖² concentrates around ‖u-v‖² with exponentially small deviation.
4.2 Concentration of Measure — The Core Tool¶
For independent random variables X₁,...,Xₙ each bounded in [a,b], Hoeffding's inequality:
sequenceDiagram
participant X as Random Variables X₁...Xₙ
participant S as Sum S = ΣXᵢ
participant C as Concentration
X->>S: Sum of independent bounded RVs
S->>C: |S - E[S]| > t
C-->>X: Probability ≤ 2exp(-2t²/n(b-a)²)
Note over C: Exponential decay → union bound over log n pairs → works
Application to JL: The squared norm of a projected Gaussian vector concentrates around its expectation with exp(-Ω(ε²k)) failure probability. Choosing k = O(ε⁻² log n) makes the union bound over all O(n²) pairs succeed with high probability.
5. Streaming Algorithms: O(log n) Space for Moments¶
5.1 AMS Sketch — Second Moment in Polylog Space¶
Given a data stream of updates to a frequency vector f ∈ ℝⁿ, estimate F₂ = Σ fᵢ² (second moment) using O(log n) space.
Key insight: Maintain a sketch Z = Σᵢ sᵢ fᵢ where sᵢ ∈ {+1, -1} are 4-wise independent random signs.
flowchart TD
A["Stream: update (i, Δ)"] --> B["Z += sᵢ × Δ"]
B --> C["Estimate: Ê = Z²"]
C --> D["E[Z²] = Σᵢ fᵢ² = F₂\n(cross terms cancel by 4-wise independence)"]
D --> E["Var[Z²] ≤ 2F₂²\n(bounded by 4-wise independence)"]
Space: O(log n) bits for Z + O(log n) to specify the hash function s. Running O(1/ε²) parallel sketches and taking median reduces variance → (1±ε) approximation with high probability.
5.2 Matrix View of AMS¶
The sketch Z = Σᵢ sᵢ fᵢ can be written as Z = S·f where S is a row of random ±1 signs. Multiple sketches form the count sketch matrix:
block-beta
columns 2
block:S["Sketch Matrix S\n(k rows × n columns)\nEach row: random ±1 signs"]:1
R1["row 1: s₁ s₂ ... sₙ"]
R2["row 2: s₁' s₂' ... sₙ'"]
RK["row k: ..."]
end
block:Res["Result"]:1
Z["Z = S·f ∈ ℝᵏ (k sketches)"]
Est["Estimate F₂ ≈ median(Zᵢ²/1)"]
App["Applications:\n- Approximate matrix multiply\n- Distinct elements\n- Heavy hitter detection"]
end
6. Online Learning: Multiplicative Weights Update¶
6.1 The Expert Problem — Regret Minimization¶
At each round, choose a distribution over n "experts", observe the outcome, and update weights based on expert performance. Goal: total loss close to the best single expert in hindsight.
stateDiagram-v2
[*] --> InitWeights : w_i = 1 for all experts i
InitWeights --> SelectExpert : Sample expert proportional to weights
SelectExpert --> Observe : See loss vector ℓ ∈ [0,1]^n
Observe --> UpdateWeights : w_i ← w_i × (1-ε)^ℓᵢ
UpdateWeights --> Normalize : W ← ΣW_i
Normalize --> SelectExpert
Note: after T rounds, regret ≤ ε·T + ln(n)/ε
Regret bound: Choosing ε = √(ln(n)/T) gives regret ≤ 2√(T ln n). Total loss ≤ OPT + 2√(T ln n).
6.2 Using MWU to Solve Linear Programs Approximately¶
Any LP {min c·x : Ax ≥ b, x ≥ 0} can be solved approximately using MWU as a "black box solver":
sequenceDiagram
participant MWU as Multiplicative Weights
participant Oracle as Feasibility Oracle
participant LP as Linear Program
Note over MWU: Initialize uniform distribution over constraints
loop T = O(log m / ε²) rounds
MWU->>Oracle: "Is this convex combination of constraints feasible?"
Oracle-->>MWU: Return violated constraint (or feasible solution)
MWU->>MWU: Increase weight of violated constraint
end
Note over MWU: Output average of all iterates → (1+ε)-approx solution
Key: This converts a separation oracle into an approximate optimization algorithm. Any problem where you can check feasibility efficiently can be approximately optimized via MWU.
7. Convex Optimization: Interior-Point Methods¶
7.1 Barrier Functions and the Central Path¶
For LP {min c·x : Ax ≤ b}, interior-point methods add a barrier function B(x) = -Σ log(bᵢ - aᵢ·x) that blows up at the boundary:
flowchart TD
A["Objective: min c·x"] --> B["Penalized: min c·x + (1/t)·B(x)"]
B --> C["Central path x*(t): solution as t → ∞"]
C --> D["As t grows, x*(t) → optimal solution"]
D --> E["Newton step to move along path"]
E --> F["O(√n · log(1/ε)) Newton steps total"]
7.2 Newton's Method — Quadratic Convergence¶
Near a minimum, the barrier function approximates a quadratic f(x+Δ) ≈ f(x) + ∇f·Δ + ½ Δᵀ∇²f·Δ. The Newton step Δ = -(∇²f)⁻¹∇f is the exact minimizer of this quadratic approximation.
sequenceDiagram
participant x as Current Point x
participant Grad as Gradient ∇f(x)
participant Hess as Hessian ∇²f(x)
participant New as New Point
x->>Grad: compute gradient (O(n) for LP)
x->>Hess: compute Hessian (O(n²) for LP)
Hess->>New: Solve system: Δ = -(∇²f)⁻¹∇f (O(n^ω) via matrix inversion)
New->>New: x ← x + Δ
Note over New: Quadratic convergence: error² each step
Self-concordance: barrier functions B(x) satisfying |∇³B[Δ,Δ,Δ]| ≤ 2(∇²B[Δ,Δ])^(3/2) guarantee Newton's method converges in O(√κ · log(1/ε)) steps where κ is condition number.
8. Graph Sparsification: Preserving Cut Structure¶
8.1 Spectral Sparsification¶
For graph G = (V, E) with Laplacian L_G, a spectral sparsifier H ⊆ G satisfies:
This means H preserves all cuts simultaneously to within (1±ε) — not just specific cuts.
flowchart LR
G["G: m edges"] --> W["Weight each edge by\neffective resistance r_e"]
W --> S["Sample each edge\nwith prob p_e ∝ w_e × r_e"]
S --> H["H: O(n log n / ε²) edges"]
H --> P["H is (1±ε) spectral sparsifier of G"]
Effective resistance of edge e = (u,v): r_e = (χ_e)^T L_G^† χ_e where χ_e is the edge incidence vector and L_G^† is the pseudoinverse of the Laplacian. Edges with high effective resistance are "bottleneck" edges — they must be in the sparsifier.
8.2 Concentration Proof — Matrix Bernstein¶
Matrix Bernstein inequality: for independent random PSD matrices X_i with ‖X_i‖ ≤ R and ‖Σ E[X_i]‖ ≤ σ²:
This is the matrix analog of scalar Bernstein — used to prove the sampled graph's Laplacian concentrates around the original's.
9. Near-Linear SSSP: Beyond Dijkstra¶
9.1 Low-Stretch Spanning Trees¶
A spanning tree T of G has stretch str_T(u,v) = d_T(u,v)/d_G(u,v). Average stretch = (Σ_{(u,v)∈E} str_T(u,v))/m.
sequenceDiagram
participant G as Graph G (n,m)
participant T as Spanning Tree T
participant LSST as Low-Stretch Tree (avg stretch O(log² n))
G->>T: Construct ball-growing decomposition
T->>LSST: Bartal's algorithm: O(log² n) avg stretch
LSST->>Solver: Use T as preconditioner for Laplacian system
Note over Solver: Σ edges: d_T(u,v)/d_G(u,v) ≤ O(m log² n)
Why this enables faster SSSP: With a low-stretch tree, the SDD (symmetric diagonally dominant) Laplacian system Lx = b can be solved in O(m log n) time using iterative methods where the tree provides an approximate inverse.
9.2 Electric Flow Framework¶
Electrical flows minimize Σ_e rₑ fₑ² subject to flow conservation. The solution satisfies Ohm's law: fₑ = (φᵤ - φᵥ)/rₑ where φ is the vertex potential vector:
flowchart TD
B["Flow demand vector b"] --> L["Solve Laplacian: Lφ = b"]
L --> F["Compute edge flows: f_e = (φ_u - φ_v)/r_e"]
F --> S["Electric flow = min energy flow"]
S --> MST2["Can be used to find MST, max-flow, SSSP"]
The Laplacian solve Lφ = b can be done in O(m log n polylog log n) time using Spielman-Teng solvers — achieving near-linear SSSP.
10. Matroid Intersection and Arborescences¶
10.1 Matroid Intersection — Two Matroids¶
Given two matroids M₁ = (E, I₁) and M₂ = (E, I₂), find the maximum-weight set in I₁ ∩ I₂.
Application — Arborescence (directed MST): Finding minimum-cost arborescence (directed spanning tree) in a digraph reduces to matroid intersection: - Matroid 1: graphic matroid of underlying undirected graph - Matroid 2: partition matroid (exactly one incoming edge per non-root vertex)
sequenceDiagram
participant M1 as Matroid M₁ (graphic)
participant M2 as Matroid M₂ (partition: 1 in-edge per vertex)
participant ALG as Intersection Algorithm
ALG->>M1: Find augmenting path in exchange graph
ALG->>M2: Check if augmenting path is valid in M₂
loop path found
ALG->>ALG: Update independent set along path
Note over ALG: Augment by 1 element each iteration
end
Note over ALG: O(r · (oracle time)) total
Note over ALG: r = rank of final solution ≤ n-1
10.2 Chu-Liu/Edmonds Algorithm — Contraction-Based¶
stateDiagram-v2
[*] --> FindCheapest : For each non-root vertex v:\npick cheapest incoming edge
FindCheapest --> CheckCycle : Union-Find: does selection form a directed cycle?
CheckCycle --> ReturnTree : NO → spanning arborescence found
CheckCycle --> Contract : YES → contract cycle to single super-node
Contract --> ReweightEdges : Adjust weights of edges entering super-node
ReweightEdges --> FindCheapest : Recurse on contracted graph
ReturnTree --> [*]
Reweighting rule: For edge (u, C) entering contracted cycle C via vertex v ∈ C, new weight = w(u,v) - w(cheapest_in(v)). This accounts for the fact that adding (u,v) to the arborescence removes the cycle's edge into v.
11. Complexity Landscape of Graph Problems¶
flowchart TD
subgraph Linear["O(m) or near-linear"]
MST_rand["MST (randomized KKT)"]
BFS_dfs["BFS/DFS"]
SSSP_unw["SSSP unweighted"]
end
subgraph NearLinear["O(m log* n) or O(m log n)"]
MST_det["MST (deterministic)"]
Prim["Prim + Fibonacci heap: O(m + n log n)"]
end
subgraph MatMul["O(n^ω)"]
Match_alg["Perfect Matching (algebraic)"]
APSP_dense["APSP dense graphs"]
end
subgraph Harder["O(mn) or harder"]
BF["Bellman-Ford: O(mn)"]
BlossoM["Blossom: O(n³)"]
end
MST_rand --> MST_det
Prim --> Match_alg
12. Online Optimization: Regret Bound Derivation¶
12.1 Hedge Algorithm — Weight Update Mechanics¶
sequenceDiagram
participant Learner
participant Weights as w_i (n experts)
participant Loss as Loss vector ℓ ∈ [0,1]^n
Note over Weights: Initialize w_i = 1 for all i
loop t = 1 to T
Learner->>Weights: Sample expert i with prob w_i / Σw_j
Learner->>Loss: Observe ℓ (all expert losses)
Weights->>Weights: Update: w_i ← w_i × exp(-ε ℓᵢ)
end
Note over Learner: Total loss ≤ (1+ε) × OPT + ln(n)/ε
Note over Learner: Setting ε = √(ln(n)/T) → regret = O(√(T ln n))
Potential function analysis: Let Φ_t = Σᵢ wᵢ^(t). After T rounds:
- Upper bound: Φ_T ≤ n · exp(-ε · LOSS_LEARNER · (1-ε/2)) (by geometric weight decay)
- Lower bound: Φ_T ≥ exp(-ε · LOSS_OPT) (one expert's weight)
- Combining: LOSS_LEARNER ≤ (1+ε) LOSS_OPT + ln(n)/ε
Document synthesized from CMU 15-850 Advanced Algorithms course notes (Fall 2020), Prof. Anupam Gupta — covering MST algorithms (Ch. 1), algebraic matching (Ch. 8), graph sparsification, online learning (Ch. 14-15), convex optimization (Ch. 17-20), and dimension reduction (Ch. 10-13).