Miscellaneous CS Topics — Under the Hood: Graphics, Game Engines, IoT, SRE, and DevOps Internals¶
Focus: Internal mechanics — not how to use tools, but how game engines implement ECS, how GPUs execute shader pipelines, how Terraform tracks state, and how SRE error budgets drive automated decisions.
1. Real-Time Graphics: GPU Rendering Pipeline Internals¶
Rasterization Pipeline¶
flowchart TD
subgraph "GPU Rendering Pipeline"
IA["Input Assembler\nvertex/index buffer fetch"] --> VS["Vertex Shader\nper-vertex: MVP transform\nclip space position"]
VS --> TC["Tessellation Control\npatch subdivision factor"]
TC --> TE["Tessellation Evaluation\nnew vertex positions"]
TE --> GS["Geometry Shader (optional)\nper-primitive emission"]
GS --> RS["Rasterizer\ntriangle -> fragment coverage\nbary coords, interpolation"]
RS --> FS["Fragment Shader\nper-pixel color computation\ntexture sampling"]
FS --> OM["Output Merger\ndepth test + stencil test\nblend: src*srcAlpha + dst*(1-srcAlpha)"]
end
Depth Buffer and Early-Z¶
sequenceDiagram
participant Fragment
participant EarlyZ as Early-Z Test (before FS)
participant FS as Fragment Shader
participant Depth as Depth Buffer
participant Color as Color Buffer
Fragment->>EarlyZ: screen (x,y,z) from rasterizer
EarlyZ->>Depth: compare z vs depth[x][y]
alt z < depth[x][y]
EarlyZ->>FS: execute shader (expensive)
FS-->>Color: write color
FS-->>Depth: write z
else z >= depth[x][y]
EarlyZ->>EarlyZ: discard (save bandwidth)
end
Early-Z rejects fragments before shader execution — massive perf win. Broken if shader writes depth or calls discard.
Deferred Rendering vs Forward Rendering¶
flowchart LR
subgraph "Forward: O(objects x lights)"
FR_Geo["Render each object\nwith ALL lights in FS"] --> FR_Out["Final color buffer"]
end
subgraph "Deferred: O(objects + lights)"
DR_Geo["Geometry Pass\nwrite G-buffer: albedo, normal, depth"] --> DR_Light["Lighting Pass\nper-light: read G-buffer\ncompute only lit fragments"]
DR_Light --> DR_Out["Final color buffer"]
end
G-buffer layout (MRT — Multiple Render Targets): - RT0: Albedo (RGB) + Roughness (A) - RT1: World Normal (RGB) + Metalness (A) - RT2: Depth (32-bit float) - RT3: Emissive (RGB) + AO (A)
Ray Tracing: BVH Traversal¶
flowchart TD
Ray["Ray origin + direction"] --> BVH_Root["BVH Root AABB intersection test"]
BVH_Root -->|hit| BVH_L["Left Child AABB"]
BVH_Root -->|miss| Discard["No intersection"]
BVH_L -->|hit| Leaf["Leaf: triangle list"]
Leaf --> MollerTrumbore["Moller-Trumbore algorithm\nbary coords u,v,t\nray = O + tD"]
MollerTrumbore -->|t > 0| Hit["Record hit: (t, u, v, tri_id)"]
NVIDIA RTX uses RT Cores for hardware BVH traversal — a fixed-function unit offloading tree walks from shader processors. BVH build is O(N log N) via Surface Area Heuristic (SAH).
2. Game Engine Architecture: ECS and Physics¶
Entity-Component-System Data Layout¶
flowchart TD
subgraph "Traditional OOP (Array of Structs)"
AOS["Entity[]\n[{pos,vel,health,render,...},\n {pos,vel,health,render,...}]\nPoor cache locality for System traversal"]
end
subgraph "ECS (Struct of Arrays / Archetypes)"
SOA["Archetype: (Position, Velocity)\nPositions: [p0,p1,p2,...]\nVelocities: [v0,v1,v2,...]\nContiguous memory = cache lines filled"]
end
AOS -->|ECS migration| SOA
Archetype = unique set of component types. Entities with identical component sets share an archetype table. Moving a component to/from an entity = archetype change (memcpy row to new table).
Physics Engine: Broad Phase → Narrow Phase¶
flowchart LR
World[All Colliders] -->|Broad Phase| AABB_Prune["AABB sweep & prune\nSort by x-axis min/max\nO(N log N + pairs)"]
AABB_Prune -->|candidate pairs| Narrow["Narrow Phase\nGJK algorithm: convex shapes\nEPA: penetration depth"]
Narrow -->|contacts| Solver["Constraint Solver\nSequential Impulse (SI)\nIterative: N substeps × M iters"]
Solver -->|impulses| Integration["Semi-implicit Euler\nv += a·dt\nx += v·dt"]
GJK (Gilbert-Johnson-Keerthi): detects convex shape overlap by building a simplex in Minkowski difference space. If origin is inside Minkowski difference, shapes overlap.
Game Loop Timing¶
stateDiagram-v2
[*] --> Input: frame start
Input --> Update: process events
Update --> FixedUpdate: accumulate physics dt
FixedUpdate --> FixedUpdate: step if accum >= fixedDt
FixedUpdate --> Render: remaining time
Render --> Present: swap buffers (vsync/triple buffer)
Present --> [*]: wait for next frame
Fixed timestep (typically 50Hz) decouples physics stability from rendering. Interpolation alpha = accum / fixedDt blends last two physics states for smooth rendering at any FPS.
3. IoT Architecture: Edge Computing and Protocol Internals¶
MQTT Protocol Internals¶
sequenceDiagram
participant Device as IoT Device (Publisher)
participant Broker as MQTT Broker
participant App as Application (Subscriber)
Device->>Broker: CONNECT (clientId, keepAlive=60s, cleanSession=false)
Broker-->>Device: CONNACK (returnCode=0)
Device->>Broker: PUBLISH (topic="sensor/temp", payload=23.4, QoS=1, packetId=1)
Broker-->>Device: PUBACK (packetId=1)
Broker->>App: PUBLISH (topic="sensor/temp", payload=23.4)
App-->>Broker: PUBACK
Note over Broker: QoS 2: PUBLISH→PUBREC→PUBREL→PUBCOMP (exactly-once)
QoS levels: - QoS 0: at-most-once (fire and forget, no ACK) - QoS 1: at-least-once (PUBACK, may duplicate) - QoS 2: exactly-once (4-message handshake, store-and-forward)
Edge Computing: Data Pipeline¶
flowchart TD
Sensor["Sensors\n(temp, pressure, vibration)"] -->|BLE/Zigbee/LoRa| Gateway["Edge Gateway\nProtocol translation\nLocal filtering/aggregation"]
Gateway -->|MQTT/AMQP| EdgeBroker["Edge Message Broker\n(Mosquitto/EMQX)"]
EdgeBroker -->|stream processing| EdgeCompute["Edge Compute (k3s/microk8s)\nAnomaly detection\nPre-aggregation"]
EdgeCompute -->|batched uploads| Cloud["Cloud Platform\nS3/GCS/Azure Blob\nTime-series DB (InfluxDB)"]
EdgeCompute -->|low-latency control| Actuator["Actuators\n(motors, valves)"]
Local loop latency: edge-to-actuator < 10ms. Cloud round-trip: 50–200ms — too slow for real-time control.
TLS on Constrained Devices¶
flowchart LR
MCU["MCU 32-bit\n(Cortex-M4, 256KB RAM)"] -->|DTLS 1.3| Gateway
DTLS["DTLS (TLS over UDP)\nECDH Curve25519 (32-byte key)\nChaCha20-Poly1305 (no AES hw)"]
MCU --> DTLS
PSK["Pre-Shared Key mode\nno certificate chain\nsaves 2KB RAM"]
DTLS --> PSK
Constrained devices use DTLS (Datagram TLS) with PSK to avoid certificate parsing overhead. ChaCha20 is preferred on devices without AES hardware acceleration.
4. Infrastructure as Code: Terraform State Machine Internals¶
Terraform Plan/Apply Lifecycle¶
sequenceDiagram
participant User
participant TF as Terraform CLI
participant State as State Backend (S3+DynamoDB lock)
participant Provider as AWS Provider Plugin
participant API as AWS API
User->>TF: terraform plan
TF->>State: acquire DynamoDB lock (put-item if absent)
TF->>State: read current state.json
TF->>Provider: refresh: describe existing resources
Provider->>API: DescribeInstances, ListBuckets...
API-->>Provider: actual state
TF->>TF: diff: desired (HCL) vs actual state
TF-->>User: show planned changes (+/-/~)
User->>TF: terraform apply
TF->>Provider: create/update/delete resources
Provider->>API: CreateInstance, PutBucketPolicy...
TF->>State: write new state.json
TF->>State: release DynamoDB lock
Dependency Graph and Parallelism¶
flowchart TD
VPC["aws_vpc.main"] --> Subnet["aws_subnet.public"]
VPC --> IGW["aws_internet_gateway.main"]
Subnet --> EC2["aws_instance.web"]
IGW --> Route["aws_route.internet"]
Route --> EC2
EC2 --> EIP["aws_eip.web"]
subgraph "Parallel execution"
Subnet -.->|no dep| IGW
end
Terraform builds a DAG of resources and applies parallel where dependencies allow. The depends_on meta-argument forces explicit edges when implicit dependencies aren't captured by references.
State File Structure¶
block-beta
columns 1
block:state:1
columns 2
version["version: 4"]
serial["serial: 42 (monotonic)"]
lineage["lineage: UUID (immutable)"]
resources["resources: [\n { type, name, provider,\n instances: [{id, attributes}] }\n]"]
outputs["outputs: {key: {value, type}}"]
end
serial increments on every apply — used for optimistic locking. If remote serial > local serial, apply is rejected (someone else modified first).
5. Ansible: Push-Based Configuration Internals¶
Module Execution Mechanism¶
sequenceDiagram
participant Control as Control Node
participant SSH
participant Target as Target Node (Python)
Control->>Control: parse playbook YAML → task list
Control->>SSH: connect (multiplexed SSH ControlMaster)
Control->>Target: sftp upload: /tmp/ansible_tmp/command_module.py
Control->>Target: python /tmp/ansible_tmp/command_module.py '{"cmd":"..."}'
Target->>Target: execute module, collect facts
Target-->>Control: JSON result: {changed:bool, stdout:str, rc:int}
Control->>Target: rm -rf /tmp/ansible_tmp/
Ansible is agentless — Python module files are SCP'd to the target, executed, and cleaned up per-task. This means Python must be available on targets (except raw module which uses SSH directly).
Idempotency and Check Mode¶
stateDiagram-v2
[*] --> CheckState: task execution
CheckState --> AlreadyDesired: state matches desired
AlreadyDesired --> ReturnChanged_False: changed=false
CheckState --> NeedChange: state differs
NeedChange --> ApplyChange: check_mode=false
NeedChange --> ReportOnly: check_mode=true (--check)
ApplyChange --> ReturnChanged_True: changed=true
Modules must implement idempotent get_state → compare → set_state logic. --check mode runs get_state + compare only, dry-running the entire playbook.
6. SRE: Error Budgets, SLOs, and Alerting Internals¶
SLO Error Budget Mechanics¶
flowchart TD
SLO["SLO: 99.9% availability\n= 43.8 min downtime/month allowed"] --> EB["Error Budget = 1 - SLO\n= 0.1% = 43.8 min/month"]
Request["Request outcomes"] -->|success| Good["Good events"]
Request -->|error/timeout| Bad["Bad events"]
Bad --> Burn["Budget burn rate\n= actual error rate / error budget rate"]
Burn -->|rate > 1| Depleting["Budget depleting faster than allowed"]
Burn -->|rate < 1| Accumulate["Budget accumulating"]
subgraph "Multi-window alert"
MWA1["1h window: burn rate > 14.4\n(consumes 2% budget in 1h)"] -->|AND| MWA2["5min window: burn rate > 14.4\nConfirms ongoing burn"]
MWA2 --> Page["Page on-call"]
end
Multi-window alerting (Google's approach): short window detects fast burns, long window detects slow burns without false positives.
Prometheus Alerting Rule Evaluation¶
sequenceDiagram
participant TSDB as Prometheus TSDB
participant Engine as Query Engine
participant AM as Alertmanager
participant PD as PagerDuty
loop every evaluation_interval (15s)
Engine->>TSDB: evaluate PromQL: rate(http_errors[5m]) / rate(http_requests[5m]) > 0.01
TSDB-->>Engine: time series result
alt condition true for pending_period
Engine->>AM: send alert (labels, annotations, startsAt)
AM->>AM: group + route by label matchers
AM->>AM: inhibit if higher-severity firing
AM->>AM: silence check
AM->>PD: POST /integration/events (dedup_key = fingerprint)
end
end
Fingerprinting: alert identity = hash of all label key/value pairs. Alertmanager uses fingerprint for deduplication and grouping.
Distributed Tracing: OpenTelemetry Context Propagation¶
sequenceDiagram
participant Client
participant ServiceA
participant ServiceB
participant Collector as OTel Collector
Client->>ServiceA: HTTP GET /api\nW3C TraceContext header:\ntraceparent: 00-{traceId}-{spanId}-01
ServiceA->>ServiceA: extract trace context\ncreate child span (spanId_A)
ServiceA->>ServiceB: HTTP POST /internal\ntraceparent: 00-{traceId}-{spanId_A}-01
ServiceB->>ServiceB: create child span (spanId_B)
ServiceB-->>ServiceA: response
ServiceA-->>Client: response
ServiceA->>Collector: export spans (OTLP gRPC)
ServiceB->>Collector: export spans (OTLP gRPC)
Collector->>Collector: reconstruct trace tree from parentSpanId links
traceId (128-bit) spans the entire request tree. spanId (64-bit) identifies one service's work unit. parentSpanId links child to parent, enabling tree reconstruction.
7. Testing Methodologies: Property-Based and Mutation Testing Internals¶
Property-Based Testing (QuickCheck / Hypothesis)¶
flowchart TD
Property["Property: for all lists xs, sort(sort(xs)) == sort(xs)"] --> Generator["Generator: arbitrary List<Int>\nrandom size, random elements"]
Generator -->|100 samples| Runner["Run property for each"]
Runner -->|failure found| Shrink["Shrink: find minimal counterexample\nbinary search on size + element values"]
Shrink -->|minimal case| Report["Report: failing case [5, 3, 1]"]
Runner -->|all pass| Pass["Property holds for 100 samples"]
Shrinking is the key innovation — QuickCheck doesn't just report the random failure, it minimizes it to the smallest counterexample, making bugs debuggable.
Mutation Testing Internals¶
flowchart TD
Source["Source code AST"] -->|apply mutant| Mutant["Mutant: e.g., + → -, > → >="]
Mutant -->|run test suite| Result{Tests pass?}
Result -->|still pass| Survived["Survived mutant = test gap\n(test suite doesn't detect this change)"]
Result -->|fail| Killed["Killed mutant = good test\n(test caught the regression)"]
Survived --> Coverage["Mutation score = killed / (killed + survived)"]
Mutation testing reveals false confidence — 100% line coverage can still have high survived-mutation rate if assertions don't verify exact values.
8. Computer Graphics: Shader Compilation and SPIR-V¶
GLSL → SPIR-V → Native ISA¶
flowchart TD
GLSL["GLSL / HLSL source"] -->|glslangValidator| SPIRV["SPIR-V binary\n(vendor-neutral IR)"]
SPIRV -->|driver compilation| ISA["GPU ISA\n(PTX→SASS for NVIDIA\nGCN for AMD\nEU ISA for Intel)"]
SPIRV -->|optimization passes| SPIRV_Opt["spirv-opt: dead code elim\nvector width optimization"]
SPIRV --> Vulkan["Vulkan VkShaderModule"]
SPIRV --> Metal["SPIR-V Cross → MSL (Metal)"]
SPIRV --> WebGPU["Naga → WGSL (WebGPU)"]
SPIR-V is a register-based SSA IR with explicit type system for GPU programs. It separates frontend language from backend driver compilation — enables one shader to target multiple platforms.
Compute Shader Thread Hierarchy (CUDA/Vulkan Compute)¶
block-beta
columns 1
block:hierarchy:1
columns 1
g["Grid: 3D array of thread groups"]
b["Thread Group (workgroup): 64–256 threads\nshared local memory (LDS) ~48KB"]
t["Thread (lane): executes shader\nregisters: 32–255 per thread"]
w["Warp (SIMD32/SIMD64): 32/64 threads\nexecute in lockstep (SIMT)"]
end
Warp divergence: if threads in a warp take different branches (if (threadId % 2)), both paths execute sequentially with inactive lanes masked — worst case 50% utilization.
9. Database Internals: Query Optimizer and Execution Engine¶
Query Optimization: Cost-Based Optimizer¶
flowchart TD
SQL["SELECT * FROM orders o JOIN customers c ON o.cust_id = c.id WHERE c.country='US'"] --> Parse["Parse -> AST"]
Parse --> Bind["Bind -> resolve table/column refs"]
Bind --> Transform["Logical Plan: Filter -> Join -> Scan"]
Transform --> Enumerate["Plan Enumeration\nDP: enumerate join orderings\nO(3^N) with pruning"]
Enumerate --> CostModel["Cost Model\nI/O cost: #pages x seq_cost\nCPU cost: #rows x cpu_cost\nStats: table cardinality, column NDV, histograms"]
CostModel --> BestPlan["Best Physical Plan\n(NestLoop vs HashJoin vs MergeJoin)\n(SeqScan vs IndexScan)"]
Statistics drive cost estimation: pg_statistic histograms, most common values (MCV), and null fractions. Stale stats → wrong cardinality estimates → wrong plan.
Volcano/Iterator Execution Model¶
sequenceDiagram
participant Parent as HashAggregate
participant Child as HashJoin
participant L as SeqScan (orders)
participant R as IndexScan (customers)
Parent->>Child: next()
Child->>L: next() [build phase: read all right side]
L-->>Child: tuple
Child->>R: next()
R-->>Child: tuple
Child->>Child: probe hash table
Child-->>Parent: matched tuple
Parent->>Parent: update hash group aggregate
Pull model (Volcano): each operator calls next() on its children — simple but has high function call overhead. Vectorized execution (DuckDB, ClickHouse) processes batches of 1024 rows per next() call.
10. Agile and Software Engineering Process Internals¶
Git Branching Strategies: Internal Mechanics¶
flowchart LR
subgraph "Trunk-Based Development"
Main["main branch"] -->|short-lived| FB["feature branch\n(< 2 days)"]
FB -->|PR + merge| Main
Main -->|tag| Release["Release tag"]
end
subgraph "Gitflow"
GF_Main["main"] ---|sync| GF_Dev["develop"]
GF_Dev --> GF_Feature["feature/x"]
GF_Dev --> GF_Release["release/1.2"]
GF_Release --> GF_Main
GF_Main --> GF_Hotfix["hotfix/y"]
GF_Hotfix --> GF_Main
GF_Hotfix --> GF_Dev
end
CI/CD Pipeline Execution Graph¶
flowchart TD
Push["git push"] -->|webhook| CI["CI Server\n(Jenkins/GitLab CI/GitHub Actions)"]
CI --> Checkout["checkout + cache restore"]
Checkout --> Parallel{parallel jobs}
Parallel --> Lint["lint + static analysis"]
Parallel --> UnitTest["unit tests"]
Parallel --> Build["build artifact"]
Lint & UnitTest --> IntTest["integration tests\n(docker-compose up)"]
IntTest & Build --> Package["package Docker image\ndocker build + push registry"]
Package -->|tag=main| StagingDeploy["deploy to staging\nkubectl rollout"]
StagingDeploy --> E2E["e2e tests (Playwright)"]
E2E -->|manual approval| ProdDeploy["deploy to production\nblue/green or canary"]
11. Operating System Scheduling: Real-Time and Multi-Core¶
Multi-Core Cache Coherence (MESI Protocol)¶
stateDiagram-v2
[*] --> Invalid: cache line not present
Invalid --> Exclusive: CPU reads, no other has it
Exclusive --> Modified: CPU writes (no bus traffic)
Modified --> Shared: another CPU reads (write-back + share)
Exclusive --> Shared: another CPU reads
Shared --> Modified: CPU writes (invalidate others)
Shared --> Invalid: another CPU writes
Modified --> Invalid: another CPU writes (must evict)
False sharing: two variables on same cache line (64B) modified by different CPUs → constant MESI state transitions → serialize execution. Fix: __cacheline_aligned / @Contended annotation.
NUMA Memory Access Patterns¶
flowchart LR
subgraph "NUMA Node 0"
CPU0["CPU 0–15\nL3 Cache 30MB"]
RAM0["Local DRAM\n64GB\n~50ns latency"]
end
subgraph "NUMA Node 1"
CPU1["CPU 16–31\nL3 Cache 30MB"]
RAM1["Remote DRAM\n64GB\n~100ns latency (QPI/xGMI)"]
end
CPU0 -->|local| RAM0
CPU0 -->|remote 2× slower| RAM1
CPU1 -->|local| RAM1
numactl --membind=0 pins allocations to local NUMA node. Java GC overhead increases 40–60% with cross-NUMA allocations — JVM's NUMA-aware allocator (-XX:+UseNUMAInterleaving) mitigates this.
Summary: Miscellaneous CS Internals Map¶
mindmap
root((Misc CS Internals))
Graphics
Rasterization pipeline stages
G-buffer deferred rendering
BVH ray traversal hardware
SPIR-V cross-platform IR
Warp divergence SIMT
Game Engines
ECS archetype memory layout
GJK Minkowski difference
Fixed timestep physics loop
Interpolation rendering
IoT
MQTT QoS 3 levels
DTLS constrained devices
Edge compute local latency
DevOps/IaC
Terraform DAG parallelism
State serial optimistic lock
Ansible agentless SSH module
SRE
Error budget burn rate
Multi-window alerting
OTel trace context propagation
Testing
QuickCheck shrinking
Mutation testing survivors
Database
Cost-based optimizer statistics
Volcano pull model
Vectorized batch execution
OS/Multi-core
MESI cache coherence protocol
False sharing cache lines
NUMA topology memory latency