Programming Languages — Under the Hood: Runtimes, Type Systems, and Compilation Internals¶

Focus: Not syntax or API usage — but how language runtimes schedule goroutines, how compilers monomorphize generics, how CPS transforms suspend coroutines, and how type inference propagates constraints across polymorphic call sites.

1. Go Runtime: Goroutine Scheduler Internals¶

Go's runtime implements M:N green-thread scheduling — N goroutines multiplexed over M OS threads using a work-stealing scheduler (GMP model).

flowchart TD
    subgraph "GMP Scheduler Model"
        G1[Goroutine G] -->|assigned to| P1[Processor P]
        G2[Goroutine G] -->|assigned to| P1
        P1 -->|runs on| M1[OS Thread M]
        P2[Processor P] -->|runs on| M2[OS Thread M]
        GQ[Global Run Queue] -->|steal| P2
        P1 -->|local run queue 256| LRQ1[Local RunQ]
        P2 -->|local run queue 256| LRQ2[Local RunQ]
        LRQ1 -->|work steal| LRQ2
    end
    M1 -->|blocks on syscall| M3[New OS Thread M]
    P1 -->|handoff P| M3

Goroutine Stack Growth¶

Goroutines start with a 2KB stack (vs OS thread 2MB). When the stack grows beyond capacity, the runtime performs a stack copy:

sequenceDiagram
    participant G as Goroutine
    participant RT as Runtime
    participant GC as Stack Allocator

    G->>RT: function call exceeds stack guard
    RT->>RT: morestack() triggered
    RT->>GC: allocate 2× stack
    GC-->>RT: new stack pointer
    RT->>RT: copy all frames to new stack
    RT->>RT: update all stack pointers (escape analysis)
    RT-->>G: resume on new stack

Stack frames are not pinned to address — all pointers on stack are rewritten on copy. This is why Go prohibits raw interior pointers to stack variables that escape.

Goroutine State Machine¶

stateDiagram-v2
    [*] --> Runnable: go func()
    Runnable --> Running: P picks up G
    Running --> Runnable: preempted (10ms async signal)
    Running --> Waiting: channel block / syscall / mutex
    Waiting --> Runnable: channel send / syscall return
    Running --> Dead: function returns
    Running --> Syscall: syscall enter
    Syscall --> Runnable: syscall exit (P reacquired)
    Syscall --> Waiting: syscall blocks (P handed off)

Work Stealing Algorithm¶

flowchart LR
    subgraph "Steal Decision"
        P_idle[Idle P] -->|1. check local| LRQ[Local RunQ empty?]
        LRQ -->|yes| GRQ[Check Global RunQ]
        GRQ -->|empty| Netpoll[Check epoll netpoller]
        Netpoll -->|empty| Steal[Steal from random P]
        Steal -->|take half| VictimQ[Victim P RunQ]
    end

Stealing takes half of the victim's local run queue — reduces contention while ensuring fairness. The global run queue is checked every 61^st scheduling tick to prevent starvation.

Channel Internals: hchan Structure¶

block-beta
    columns 4
    block:hchan:4
        qcount["qcount\n(len)"]
        dataqsiz["dataqsiz\n(cap)"]
        buf["buf *\n(ring buffer)"]
        elemsize["elemsize"]
        closed["closed uint32"]
        sendx["sendx (write idx)"]
        recvx["recvx (read idx)"]
        recvq["recvq\n(waiting receivers)"]
        sendq["sendq\n(waiting senders)"]
        lock["mutex"]
    end

Direct send optimization: If a receiver is blocked in recvq, the sender copies data directly to receiver's stack (bypassing the buffer), then wakes the receiver — zero extra copy.

2. Go Garbage Collector: Tri-Color Concurrent Mark¶

Go uses a concurrent tri-color mark-and-sweep GC with a write barrier to maintain invariants while the mutator runs:

flowchart TD
    subgraph "Tri-Color Invariant"
        White[White: not yet visited] -->|scan| Gray[Gray: discovered, children unscanned]
        Gray -->|scan children| Black[Black: fully scanned]
        Black -->|write barrier: new ptr| Gray2[Re-gray if needed]
    end
    subgraph "GC Phases"
        P1[STW mark setup + enable write barrier] --> P2[Concurrent mark: scan roots + heap]
        P2 --> P3[STW mark termination]
        P3 --> P4[Concurrent sweep: return white spans to allocator]
    end

Dijkstra write barrier: On any pointer write *slot = ptr, if *slot was black and ptr is white, shade ptr gray. This ensures no black→white reference without gray intermediary.

GOGC=100 means GC triggers when live heap doubles. GC target: goal = live * (1 + GOGC/100).

3. Rust: Ownership, Borrow Checker, and Zero-Cost Abstractions¶

Ownership as Type System State¶

stateDiagram-v2
    [*] --> Owned: let x = T.new()
    Owned --> Moved: let y = x (move semantics)
    Moved --> [*]: y drops (drop called)
    Owned --> BorrowedShared: &x (multiple allowed)
    Owned --> BorrowedMut: &mut x (exclusive)
    BorrowedShared --> Owned: borrow expires (NLL)
    BorrowedMut --> Owned: borrow expires (NLL)
    Owned --> [*]: scope end (drop)

NLL (Non-Lexical Lifetimes): Borrows end at last use, not scope close. The borrow checker operates over the MIR (Mid-level IR) control flow graph, not AST.

Monomorphization vs Dynamic Dispatch¶

flowchart LR
    subgraph "Generic fn<T: Trait>"
        GFn["fn process<T: Display>(x: T)"]
        GFn -->|monomorphize| F1["fn process_i32(x: i32)"]
        GFn -->|monomorphize| F2["fn process_String(x: String)"]
        GFn -->|monomorphize| F3["fn process_Vec_u8(x: Vec<u8>)"]
    end
    subgraph "dyn Trait (fat pointer)"
        DFn["fn process(x: &dyn Display)"]
        DFn --> FP["fat pointer: (data_ptr, vtable_ptr)"]
        FP --> VT["vtable: [drop_fn, size, align, display_fn, ...]"]
    end

Monomorphization: zero runtime cost, code bloat, inlined. dyn Trait: one copy, indirection via vtable, prevents inlining.

Memory Layout: Stack vs Heap¶

block-beta
    columns 2
    block:stack:1
        columns 1
        s_label["STACK"]
        s1["&str: (ptr=0x..., len=5)"]
        s2["Vec header: (ptr, len, cap)"]
        s3["Box<T>: ptr only"]
        s4["i32: 4 bytes inline"]
    end
    block:heap:1
        columns 1
        h_label["HEAP"]
        h1["str data: 'hello'"]
        h2["Vec backing array: [1,2,3,...]"]
        h3["Boxed T value"]
        h4["Arc<T>: {strong_count, weak_count, T}"]
    end

String = heap-allocated UTF-8. &str = stack fat pointer (ptr + len) into any string data. Vec<T> = heap buffer with (ptr, len, cap) header on stack.

LLVM IR Pipeline¶

flowchart TD
    Rust[Rust source] --> HIR[HIR: type checking]
    HIR --> THIR[THIR: pattern matching]
    THIR --> MIR[MIR: borrow checking + optimization]
    MIR --> LLVMIR[LLVM IR: codegen]
    LLVMIR --> Passes[LLVM optimization passes: inlining, LICM, vectorization]
    Passes --> MachineCode[Target machine code]

MIR is the key stage: it's a control flow graph with explicit drops — every variable drop is made explicit, allowing the borrow checker to verify safety without understanding complex control flow.

4. Scala: Type System, JVM Compilation, and Implicits¶

Variance and Higher-Kinded Types¶

flowchart TD
    subgraph "Variance"
        INV["Invariant: F[A]\nno subtyping between F[Cat] and F[Animal]"]
        COV["Covariant: F[+A]\nF[Cat] <: F[Animal] if Cat <: Animal\n(List[+A], Option[+A])"]
        CONTRA["Contravariant: F[-A]\nF[Animal] <: F[Cat]\n(Function1[-A, +B])"]
    end
    subgraph "Higher-Kinded"
        HK["type F[_] — type constructor\nFunctor[F[_]]: map[A,B](fa: F[A])(f: A=>B): F[B]"]
        HK --> List_inst["instance Functor[List]"]
        HK --> Option_inst["instance Functor[Option]"]
    end

Liskov Substitution determines variance: covariant positions (return types, read-only containers) allow +A. Contravariant positions (function parameters) require -A.

Implicit Resolution Algorithm¶

sequenceDiagram
    participant Compiler
    participant Local as Local Scope
    participant Import as Explicit Imports
    participant Companion as Companion Objects

    Compiler->>Local: search implicit val/def in scope
    Local-->>Compiler: not found
    Compiler->>Import: search imported implicits
    Import-->>Compiler: not found
    Compiler->>Companion: search companion of type A and B (implicit scope)
    Companion-->>Compiler: found: Ordering[Int] in Int companion
    Compiler->>Compiler: insert implicit argument at call site

Implicit search is deterministic but can diverge if implicit chains form cycles. Scala 3 (Dotty) replaced implicits with given/using for clarity and better error messages.

Scala JVM Bytecode: Traits and Mixins¶

flowchart TD
    subgraph "Scala Trait → JVM"
        T["trait Foo { def bar: Int; def baz = bar + 1 }"]
        T --> Interface["interface Foo { int bar(); default int baz() }"]
        T --> StaticImpl["Foo$.baz$impl(Foo self)"]
        Interface -->|class mixin| Class["class C extends Foo: bar=42"]
        Class -->|baz delegates| StaticImpl
    end

Traits with concrete methods compile to Java interfaces with default methods (JVM 8+). For complex diamond inheritance, a static forwarder is generated.

5. Kotlin: Coroutines as CPS Transformation¶

Continuation-Passing Style Transformation¶

flowchart TD
    subgraph "Source Code"
        S1["suspend fun fetchUser(id: Int): User {\n  val data = httpGet(url)  // suspend point\n  return parse(data)\n}"]
    end
    subgraph "Compiled State Machine"
        S2["fun fetchUser(id: Int, cont: Continuation<User>): Any {\n  val sm = cont as? SM ?: SM(cont)\n  when(sm.label) {\n    0: { sm.label=1; return httpGet(url, sm) }\n    1: { val data = sm.result; return parse(data) }\n  }\n}"]
    end
    S1 -->|CPS transform| S2

Each suspend call site becomes a state machine label. The Continuation object holds local variables across suspension. On resume, execution jumps to the correct when branch.

Coroutine Continuation Object Memory Layout¶

block-beta
    columns 1
    block:cont:1
        columns 2
        label1["label: Int\n(current state)"]
        result1["result: Any?\n(resumed value)"]
        locals1["captured locals\n(vars live across suspend)"]
        parent1["completion: Continuation\n(caller's continuation)"]
        ctx1["context: CoroutineContext\n(Dispatcher, Job, CoroutineId)"]
    end

Heap-allocated continuation object replaces the stack frame at each suspension. This is stackless coroutines — no dedicated OS stack per coroutine (unlike Go goroutines which have growable stacks).

Dispatcher and Thread Mapping¶

flowchart LR
    subgraph "Dispatchers"
        D_Default["Dispatchers.Default\nShared thread pool (CPU count)"]
        D_IO["Dispatchers.IO\nElastic pool up to 64 threads\n(blocking IO)"]
        D_Main["Dispatchers.Main\nUI thread (Android Looper)"]
        D_Unconf["Dispatchers.Unconfined\nCaller thread until first suspend"]
    end
    subgraph "Scheduling"
        Resume["Continuation.resume()"] --> Dispatch["dispatcher.dispatch(context, runnable)"]
        Dispatch --> ThreadPool[Thread executes block]
        ThreadPool -->|hits suspend| Park[Thread released back to pool]
    end

withContext(Dispatchers.IO) suspends current coroutine, dispatches to IO thread pool, resumes original dispatcher on completion — no thread blocking in the caller.

Structured Concurrency and Job Tree¶

flowchart TD
    Scope[CoroutineScope] -->|launch| Job1[Job: fetchUser]
    Scope -->|launch| Job2[Job: fetchPosts]
    Job1 -->|launch| Job1a[Job: parseUser]
    Job2 -->|fails| Cancel[CancellationException propagates up]
    Cancel -->|cancels siblings| Job1
    Cancel -->|cancels children| Job1a
    Cancel -->|notifies parent| Scope

Cancellation is cooperative — coroutines must check isActive or call suspend functions that are cancellation-aware. A parent Job failure cancels all children (structured concurrency).

6. JVM Internals: HotSpot JIT Compilation¶

flowchart TD
    Source[Java/Kotlin/Scala source] --> Bytecode[JVM Bytecode .class]
    Bytecode --> Interpreter[Interpreter: first execution]
    Interpreter -->|profiling counters| C1[C1 Compiler: light optimization\n~1500 invocations]
    C1 -->|profile-guided| C2[C2 Compiler: aggressive optimization\n~15000 invocations]
    C2 --> NativeCode[Optimized native code]
    NativeCode -->|deoptimize on wrong speculation| Interpreter

Speculative optimizations in C2: - Inlining: virtual call devirtualized if only one implementation seen in profile - Escape analysis: object stays on stack if it doesn't escape method - Loop unrolling + vectorization: SIMD intrinsics for array operations - Null check elimination: remove redundant null checks after profile confirms non-null

JVM Memory Layout¶

block-beta
    columns 3
    block:heap:2
        columns 2
        eden["Eden (Young Gen)\nNew object allocation\nBump pointer alloc"]
        s0["Survivor S0"]
        s1["Survivor S1"]
        old["Old Gen (Tenured)\nObjects surviving N GCs\nG1/ZGC concurrent collect"]
    end
    block:nonheap:1
        columns 1
        meta["Metaspace\nClass metadata\nMethod bytecode\nJIT compiled code"]
        stack["Thread Stacks\nStack frames\nLocal vars"]
    end

7. Go vs Rust vs JVM: Runtime Comparison¶

flowchart LR
    subgraph "Memory Management"
        Go_MM["Go: Concurrent GC\ntri-color mark-sweep\n~1ms STW pauses"]
        Rust_MM["Rust: Compile-time\nownership + drop\nzero runtime overhead"]
        JVM_MM["JVM: Generational GC\nG1/ZGC/Shenandoah\nconfigurable pauses"]
    end
    subgraph "Concurrency"
        Go_C["Go: goroutines\nM:N scheduling\nwork-stealing GMP"]
        Rust_C["Rust: async/await\nFuture polling model\ntokio/async-std runtimes"]
        JVM_C["JVM: OS threads\nProject Loom virtual threads\n(Java 21+)"]
    end
    subgraph "Type System"
        Go_T["Go: structural interfaces\nno generics variance\ntype parameters (1.18+)"]
        Rust_T["Rust: traits + lifetimes\nhigher-ranked trait bounds\nno GC needed via ownership"]
        JVM_T["JVM: nominal typing\ntype erasure (Java generics)\nreified generics (Kotlin)"]
    end

Async/Await vs Goroutines: Core Difference¶

sequenceDiagram
    participant RustFuture as Rust Future (Stackless)
    participant GoRoutine as Go Goroutine (Stackful)

    Note over RustFuture: poll() returns Poll::Pending
    RustFuture->>RustFuture: stores state in Future struct (heap)
    RustFuture->>RustFuture: registers Waker with reactor
    Note over RustFuture: Thread returns to event loop

    Note over GoRoutine: goroutine blocks on channel/syscall
    GoRoutine->>GoRoutine: goroutine stack preserved (2KB–nMB)
    GoRoutine->>GoRoutine: P handed to another goroutine
    Note over GoRoutine: OS thread may park or handle another G

Rust futures are poll-driven, zero-stack — no allocation per suspension unless explicitly boxed. Go goroutines are continuation-on-stack — simpler to write (looks sync), higher per-goroutine baseline.

8. Functional Language Runtimes: Haskell GHC and OCaml¶

GHC: Lazy Evaluation and Thunk Mechanics¶

flowchart TD
    subgraph "Thunk Lifecycle"
        T_created["Thunk created: closure ptr + env"] -->|first force| T_eval["Evaluate: enter closure"]
        T_eval -->|result| T_value["Update thunk to Value (WHNF)"]
        T_value -->|subsequent force| T_value
    end
    subgraph "Heap Object Layout"
        HO["Info Table Ptr | Payload..."]
        IT["Info Table: entry code | GC info | arity | srt"]
        HO --> IT
    end

WHNF (Weak Head Normal Form): evaluation stops at outermost constructor — Just _ is WHNF even if _ is a thunk. Full NF evaluation requires deepseq.

GHC Runtime System (RTS) Scheduler¶

flowchart LR
    HEC1["HEC 1 (OS Thread)\nHaskell Execution Context"] --> RunQ1[Spark Queue]
    HEC2["HEC 2 (OS Thread)"] --> RunQ2[Spark Queue]
    RunQ1 -->|work steal| RunQ2
    HEC1 -->|STM transaction| TLog[Transaction Log]
    TLog -->|commit: validate| STMHeap[STM Heap vars]
    TLog -->|conflict: retry| TLog

GHC's par/seq spark pool enables speculative parallelism. The STM (Software Transactional Memory) runtime uses optimistic concurrency: each transaction logs reads/writes, validates atomically on commit, retries on conflict.

9. Language Feature: Pattern Matching Compilation¶

All ML-family languages compile pattern matching to decision trees for O(1) dispatch:

flowchart TD
    Match["match expr with\n| (0, _) -> A\n| (_, 0) -> B\n| (x, y) -> C"] -->|compile| DT["Decision Tree"]
    DT --> T1["test expr.0 == 0?"]
    T1 -->|yes| A["return A"]
    T1 -->|no| T2["test expr.1 == 0?"]
    T2 -->|yes| B["return B"]
    T2 -->|no| C["return C(x, y)"]

Compiler selects between switch dispatch (integer tags) and if-chain based on constructor density. Rust match on enums compiles to a jump table indexed by discriminant tag.

Algebraic Data Type Memory Layout (Rust/Haskell)¶

block-beta
    columns 2
    block:rust_enum:1
        columns 1
        r_label["Rust: enum Option<T>"]
        r_none["None: discriminant=0, no payload"]
        r_some["Some(T): discriminant=1, T inline"]
        r_opt["niche optimization: &T None=0x0"]
    end
    block:haskell_adt:1
        columns 1
        h_label["Haskell: data Maybe a"]
        h_nothing["Nothing: info_ptr → Nothing_info, tag=0"]
        h_just["Just a: info_ptr → Just_info, a=thunk_ptr"]
    end

Rust applies niche optimization: Option<&T> uses null pointer as None — same size as &T, no discriminant needed.

10. Type Inference: Algorithm W / HM Unification¶

sequenceDiagram
    participant TC as Type Checker
    participant Env as Type Environment
    participant Uni as Unifier

    TC->>TC: generate fresh type var α for unknown
    TC->>Env: lookup variable → τ
    TC->>TC: instantiate polymorphic type (fresh vars)
    TC->>Uni: add constraint: τ1 = τ2
    Uni->>Uni: unify: walk structure recursively
    Uni->>Uni: occurs check: α ≠ F(α) (prevents infinite types)
    Uni-->>TC: substitution σ
    TC->>TC: generalize: ∀α. τ (if α free in τ, not in env)
    TC-->>TC: principal type derived

Unification is the core operation: unify(List α, List Int) → α := Int. Unification failure = type error. Occurs check prevents α = List α (infinite type).

Rank-2 Polymorphism (Rust/Haskell HRTB)¶

flowchart TD
    R1["Rank-1: ∀a. a → a\nType var instantiated at call site by caller"]
    R2["Rank-2: (∀a. a → a) → Int\nCallee receives a polymorphic function\nMust work for ANY a, not a specific one"]
    R1 -->|subsumes| R2
    R2 -->|Rust syntax| HRTB["for<'a> Fn(&'a T) -> &'a U\nHigher-Ranked Trait Bound"]

Rust uses HRTBs for expressing that a closure must work for any lifetime — not just a specific inferred lifetime.

11. Cross-Language: Interoperability and FFI Mechanics¶

sequenceDiagram
    participant Rust
    participant CRuntime as C ABI (cdecl/SysV)
    participant Python as Python (CPython)

    Rust->>CRuntime: #[no_mangle] extern "C" fn foo()
    CRuntime->>Python: ctypes.CDLL → dlopen + dlsym
    Python->>Python: convert Python int → c_int (boxing)
    Python->>CRuntime: call via function pointer
    CRuntime->>Rust: stack frame in C calling convention
    Rust-->>CRuntime: return value
    CRuntime-->>Python: unbox to Python int

PyO3 (Rust↔Python): GIL must be held when calling Python API. pyo3::Python<'py> token is a compile-time proof the GIL is held — safe Rust prevents GIL bugs at type level.

JNI (Java↔C/Rust): JNIEnv pointer passed to native; every Java object reference is a handle (JNI local/global ref) — not a direct heap pointer. GC can move objects; JNI refs pin them.

Summary: Language Runtime Internals Map¶

mindmap
  root((Language Runtimes))
    Go
      GMP scheduler M:N
      Work-stealing local runq
      Growable goroutine stacks
      Tri-color concurrent GC
      hchan direct send optimization
    Rust
      Ownership = compile-time GC
      MIR borrow checker CFG
      Monomorphization zero-cost
      LLVM backend optimization
      Async stackless Future polling
    Kotlin/JVM
      CPS transform suspend fns
      Continuation state machine
      Structured concurrency Job tree
      HotSpot C1/C2 JIT tiers
      G1/ZGC generational GC
    Scala/JVM
      Variance +A/-A type system
      Implicit resolution scope chain
      Trait → default method bytecode
      Higher-kinded type parameters
    Haskell/GHC
      Lazy thunk force + update
      WHNF evaluation strategy
      STM optimistic concurrency
      Spark pool parallel HEC