C++ Programming Language — Under the Hood¶
Source: The C++ Programming Language, 3rd Edition — Bjarne Stroustrup (AT&T Labs)
Overview¶
C++ is not simply "C with classes." It is a multi-paradigm language whose runtime behavior emerges from a carefully designed set of memory layouts, virtual dispatch mechanisms, template instantiation engines, and exception propagation protocols. This document maps the internal machinery: how objects occupy memory, how virtual calls route through vtables, how templates generate code at compile time, and how exceptions unwind the call stack.
1. Object Memory Layout¶
1.1 Plain Object Layout¶
A C++ object's memory is a contiguous block determined at compile time. Member variables are laid out in declaration order, subject to alignment padding.
struct Foo {
char a; // offset 0, size 1
// 3 bytes padding
int b; // offset 4, size 4
double c; // offset 8, size 8
}; // total: 16 bytes
block-beta
columns 4
block:foo["Foo object (16 bytes)"]:4
a["a (1B)"] pad["pad (3B)"] b["b (4B)"] c["c (8B)"]
end
style a fill:#4a9eff,color:#fff
style pad fill:#555,color:#aaa
style b fill:#4a9eff,color:#fff
style c fill:#4a9eff,color:#fff
1.2 Class with Virtual Functions — vptr Injection¶
When a class declares any virtual function, the compiler injects a hidden pointer (vptr) into every instance, pointing to the class's virtual function table (vtable).
block-beta
columns 1
block:obj["Object Instance (on stack/heap)"]:1
vptr["vptr → vtable of MyClass"]
d1["data member 1"]
d2["data member 2"]
end
block:vtbl["vtable (read-only, per-class)"]:1
f0["[0] → MyClass::virtualFn1()"]
f1["[1] → MyClass::virtualFn2()"]
f2["[2] → type_info* (RTTI)"]
end
vptr --> f0
Key mechanics:
- vptr is set by the constructor, pointing to the class's own vtable
- If a derived class overrides a virtual function, the derived vtable replaces the slot
- A virtual call obj->fn() compiles to: *(obj->vptr[N])(obj) — load vptr, index into table, indirect call
1.3 Virtual Dispatch — Call Path¶
sequenceDiagram
participant Caller
participant CPU
participant vptr
participant vtable
participant Implementation
Caller->>CPU: p->virtualFn()
CPU->>vptr: load p->vptr (offset 0)
vptr->>vtable: index [N]
vtable->>Implementation: indirect jump to overriding function
Implementation-->>Caller: return value
Cost: 1 memory load (vptr) + 1 memory load (vtable slot) + 1 indirect call. This is why virtual cannot be inlined by the compiler in the general case.
2. Inheritance and Memory Layout¶
2.1 Single Inheritance¶
block-beta
columns 1
block:derived["Derived Object"]:1
vptr_d["vptr → Derived vtable"]
base_data["Base::x (inherited)"]
base_data2["Base::y (inherited)"]
derived_data["Derived::z (own)"]
end
The base subobject sits at offset 0. static_cast<Base*>(derived_ptr) is a no-op (same address).
2.2 Multiple Inheritance — Pointer Adjustment¶
When a class inherits from multiple bases, the second base subobject starts at a non-zero offset. Casting to the second base requires pointer adjustment.
block-beta
columns 1
block:obj["Diamond Object layout"]:1
vptr_a["vptr₁ → A vtable"]
a_data["A data"]
vptr_b["vptr₂ → B vtable"]
b_data["B data"]
c_data["C own data"]
end
note1["cast to A* = no adjustment (offset 0)"]
note2["cast to B* = +sizeof(A subobject) adjustment"]
flowchart LR
p["C* p"] -->|"static_cast<A*>"| A["A* (offset 0, same addr)"]
p -->|"static_cast<B*>"| B["B* (offset += sizeof A subobj)"]
A -->|"virtual call"| VA["A vtable slot"]
B -->|"virtual call"| VB["B vtable slot"]
2.3 Virtual Inheritance — Shared Base¶
Virtual inheritance (virtual public Base) introduces an indirection so only one Base subobject exists regardless of how many paths lead to it.
flowchart TB
D["class D: virtual public A, virtual public B"]
A["A (virtual base)"] --> shared["Shared Base subobject"]
B["B (virtual base)"] --> shared
D --> A
D --> B
D -->|"vbptr (virtual base pointer)"| shared
The compiler adds a vbptr (virtual base table pointer) to locate the shared base at runtime.
3. Templates — Compile-Time Code Generation¶
3.1 Template Instantiation Pipeline¶
Templates are not compiled until instantiated. The compiler generates a fresh, fully typed copy of the template body for each unique parameter combination.
flowchart LR
src["template<class T> class String { ... }"]
inst1["String<char> cs;"]
inst2["String<wchar_t> ws;"]
inst3["String<unsigned char> us;"]
src -->|"instantiate for char"| gen1["class String__char { ... }"]
src -->|"instantiate for wchar_t"| gen2["class String__wchar_t { ... }"]
src -->|"instantiate for unsigned char"| gen3["class String__uchar { ... }"]
gen1 --> obj1["compiled object code"]
gen2 --> obj2["compiled object code"]
gen3 --> obj3["compiled object code"]
Stroustrup's design intent: "A class generated from a class template is a perfectly ordinary class. Thus, use of a template does not imply any run-time mechanisms beyond what is used for an equivalent 'hand-written' class."
3.2 Function Template Deduction and Specialization¶
flowchart TD
call["sort(vec)"]
deduct["Compiler deduces T = int from argument type"]
check["Specialization for T=int* ? → Check"]
gen["Instantiate sort<int> if no specialization"]
special["Use sort<int*> specialization if exists"]
call --> deduct --> check
check -->|"no"| gen
check -->|"yes"| special
3.3 Template vs. Virtual — Polymorphism Choice¶
| Axis | Templates (compile-time) | Virtual (runtime) |
|---|---|---|
| Type resolution | At compile time | At runtime via vtable |
| Inlining | Yes (full visibility) | No (indirect call) |
| Code size | Can bloat (one copy per type) | Shared implementation |
| Requires inheritance | No | Yes (hierarchy) |
| Dynamic dispatch | No | Yes |
stateDiagram-v2
[*] --> NeedPolymorphism
NeedPolymorphism --> CompileTimeKnown: Types known at compile time?
NeedPolymorphism --> RuntimeKnown: Types only known at runtime?
CompileTimeKnown --> Templates: Use templates
RuntimeKnown --> VirtualFunctions: Use virtual + inheritance
Templates --> Inlined: Compiler can inline
VirtualFunctions --> VtableDispatch: Runtime vtable lookup
4. Exception Handling — Stack Unwinding Engine¶
4.1 Exception Flow¶
When throw executes, the C++ runtime performs stack unwinding: it walks up the call stack, calling destructors for all objects with automatic storage duration, until it finds a matching catch handler.
sequenceDiagram
participant main
participant f1
participant f2
participant f3
participant Runtime
main->>f1: call
f1->>f2: call
f2->>f3: call
f3->>Runtime: throw MyException
Runtime->>f3: unwind: ~LocalA(), ~LocalB()
Runtime->>f2: unwind: ~LocalC()
Runtime->>f1: unwind: ~LocalD()
Runtime->>main: match catch(MyException&)
main-->>main: handler executes
4.2 RTTI — type_info and dynamic_cast¶
Every polymorphic class (one with at least one virtual function) has a type_info object stored as part of its vtable layout. dynamic_cast<T*>(p) uses this at runtime.
flowchart LR
ptr["Base* p (pointing to Derived)"]
dyn["dynamic_cast<Derived*>(p)"]
check["Query p->vptr[-1] → type_info"]
match{"type_info matches Derived?"}
ok["Return adjusted pointer"]
fail["Return nullptr (ptr cast) or throw bad_cast (ref cast)"]
ptr --> dyn --> check --> match
match -->|yes| ok
match -->|no| fail
4.3 Exception Table (EH Tables)¶
Modern C++ compilers (with zero-cost exceptions) generate static exception handler tables alongside the code. No runtime overhead exists on the non-throwing path.
block-beta
columns 2
block:code["Code Section"]:1
fn["function body bytecode"]
fn2["more code"]
end
block:eh["EH Table (read-only)"]:1
range["PC range [start, end)"]
handler["→ catch handler offset"]
cleanup["→ cleanup (dtor) list"]
end
fn --> range
5. Memory Management — new/delete Internals¶
5.1 new Expression Decomposition¶
new Foo(args) is not atomic — the compiler decomposes it:
flowchart TD
expr["new Foo(args)"]
alloc["operator new(sizeof(Foo))\n→ calls malloc or custom allocator"]
construct["placement new: Foo::Foo(args) called in-place"]
ret["return typed pointer to initialized object"]
expr --> alloc --> construct --> ret
alloc -->|failure| bad_alloc["throw std::bad_alloc"]
5.2 delete Decomposition¶
flowchart TD
del["delete p"]
dtor["p->~Foo() called\n(virtual destructor if polymorphic)"]
free["operator delete(p)\n→ calls free or custom deallocator"]
null["pointer becomes dangling (not auto-nulled)"]
del --> dtor --> free --> null
Critical: The destructor must be virtual if you intend to delete through a base pointer. Without virtual ~Base(), only Base::~Base() runs — derived data leaks.
5.3 Reference Counting (Srep Pattern)¶
Stroustrup's String uses a shared Srep struct with a reference count to implement copy-on-write:
stateDiagram-v2
[*] --> Allocated: new Srep(sz, p)
Allocated --> Shared: String s2 = s1 (copy ctor: Srep::n++)
Shared --> Shared: Another copy (n++)
Shared --> CopyOnWrite: mutation needed (get_own_copy)
CopyOnWrite --> Allocated: n>1: n--, allocate new Srep
Shared --> Freed: n-- reaches 0 → delete Srep
Allocated --> Freed: last String destroyed
block-beta
columns 1
block:s1["String s1"]:1
rep1["rep → Srep{n=2, sz=5, s=→'hello'}"]
end
block:s2["String s2 (copy of s1)"]:1
rep2["rep → same Srep (n=2)"]
end
block:srep["Srep on heap"]:1
n["n = 2"]
sz["sz = 5"]
sdata["s = 'hello\0'"]
end
rep1 --> n
rep2 --> n
6. Namespaces and Linkage¶
6.1 Name Mangling¶
C++ functions are mangled to encode their full signature, enabling overloading. The linker sees mangled names, not source names.
flowchart LR
src1["void f(int)"] --> mangled1["_Z1fi"]
src2["void f(double)"] --> mangled2["_Z1fd"]
src3["void f(int, int)"] --> mangled3["_Z1fii"]
src4["namespace N::void f(int)"] --> mangled4["_ZN1N1fEi"]
6.2 Include Guard — Compilation Unit Isolation¶
sequenceDiagram
participant TU as Translation Unit (.cpp)
participant PP as Preprocessor
participant H as header.h
TU->>PP: #include "header.h"
PP->>H: open file
H-->>PP: #ifndef HEADER_H → not defined → process
PP->>TU: inject declarations
TU->>PP: #include "header.h" (second time)
PP->>H: open file
H-->>PP: #ifndef HEADER_H → already defined → skip
7. Standard Library Internals¶
7.1 std::vector — Amortized Growth¶
flowchart TD
push["push_back(x) called"]
check{"size == capacity?"}
append["place x at data[size++]"]
realloc["allocate new buffer: capacity *= 2"]
copy["copy/move all elements to new buffer"]
free["free old buffer"]
newappend["place x at new buffer[size++]"]
push --> check
check -->|no| append
check -->|yes| realloc --> copy --> free --> newappend
Amortized cost: O(1) per push_back. The doubling strategy ensures total reallocation work ≤ 2N for N insertions.
7.2 Iterator Architecture¶
Iterators form an abstraction layer between algorithms and containers. They're value types (not polymorphic), so the compiler can inline all operations.
flowchart LR
algo["std::sort(v.begin(), v.end())"]
begin["vector::iterator begin() = &data[0]"]
end["vector::iterator end() = &data[size]"]
deref["operator*() → *ptr"]
inc["operator++() → ++ptr"]
cmp["operator!=() → ptr != other.ptr"]
algo --> begin
algo --> end
begin --> deref
begin --> inc
begin --> cmp
7.3 std::map — Red-Black Tree Internal Layout¶
graph TD
root["root node (key=50)"]
l1["key=25"]
r1["key=75"]
l2["key=12"]
l3["key=37"]
r2["key=62"]
r3["key=87"]
root --> l1
root --> r1
l1 --> l2
l1 --> l3
r1 --> r2
r1 --> r3
style root fill:#cc0000,color:#fff
style l1 fill:#cc0000,color:#fff
style r1 fill:#000,color:#fff
style l2 fill:#000,color:#fff
style l3 fill:#cc0000,color:#fff
style r2 fill:#cc0000,color:#fff
style r3 fill:#000,color:#fff
Each map::find() traverses O(log N) nodes. Each node heap-allocates {color, left*, right*, parent*, key, value}.
8. Operator Overloading — Internal Call Resolution¶
8.1 Smart Pointer operator->¶
The overloaded -> must return a raw pointer or another object with -> defined. The compiler chains calls automatically.
sequenceDiagram
participant Code
participant SmartPtr
participant RawPtr
participant Object
Code->>SmartPtr: p->member
SmartPtr->>SmartPtr: operator->() called
SmartPtr-->>RawPtr: returns raw T*
RawPtr->>Object: .member accessed (compiler applies second ->)
8.2 Prefix vs. Postfix ++¶
flowchart LR
prefix["++p → operator++()"]
postfix["p++ → operator++(int)"]
prefix --> pp["increment, return *this (reference)"]
postfix --> copy["save copy = *this"]
postfix --> inc2["increment *this"]
postfix --> ret["return copy (by value)"]
The int dummy argument in operator++(int) distinguishes postfix from prefix at the type system level. The postfix version is inherently more expensive: it must copy the object before incrementing.
9. Compile-Time vs. Runtime Polymorphism — Decision Flow¶
flowchart TD
start["Need to work with multiple types?"]
known["Types fully known at compile time?"]
inlining["Performance critical / inlining needed?"]
hierarchy["Natural type hierarchy?"]
start --> known
known -->|yes| inlining
known -->|no| hierarchy
inlining -->|yes| templates["Use templates (compile-time polymorphism)"]
inlining -->|no| either["Either works — choose for clarity"]
hierarchy -->|yes| virtual["Use virtual functions (runtime polymorphism)"]
hierarchy -->|no| templates2["Use templates or function overloading"]
10. Data Flow Summary — Object Lifetime¶
stateDiagram-v2
[*] --> Uninitialized: memory allocated (stack frame / malloc)
Uninitialized --> Constructed: constructor runs\nvptr set, members initialized
Constructed --> InUse: object used, methods called
InUse --> Copying: copy ctor or copy assignment\n(Srep refcount++ for COW types)
Copying --> InUse: copy complete
InUse --> Destructing: out of scope / delete called
Destructing --> Destroyed: dtor runs\nmembers destroyed in reverse order\nbase dtor called last
Destroyed --> [*]: memory returned to allocator
Key Invariants¶
| Mechanism | Where it lives | When resolved |
|---|---|---|
| vptr | Object instance (offset 0) | Constructor execution |
| vtable | Read-only data segment, per-class | Compile/link time |
| type_info | vtable[-1] slot | Compile/link time |
| Template code | Instantiated in each TU | Compile time |
| Exception tables | Read-only section alongside code | Link time |
| Mangled names | Symbol table | Link time |
| Reference count | Heap-allocated rep object | Runtime |
C++'s power comes from the combination: zero-overhead abstractions at compile time (templates, inlining, static dispatch) layered with opt-in runtime mechanisms (virtual, RTTI, exceptions) that impose cost only when explicitly invoked.