콘텐츠로 이동

Understanding fork() System Call and Memory Management in Unix/Linux

When a Unix/Linux process creates a new process using the fork() system call, a complex but elegant memory management mechanism takes place. This explanation will demonstrate how parent and child processes interact with memory during and after a fork() operation.

The fork() System Call Explained

The fork() system call creates a new process (called the child) by duplicating the calling process (the parent). After fork() completes, two processes continue to run the same program, but with different return values from fork():

  • In the parent process: fork() returns the PID of the newly created child
  • In the child process: fork() returns 0
  • On failure: fork() returns -1

A key characteristic of fork() is that the child process does not start execution from the beginning of the program. Instead, both processes continue execution from the point immediately after the fork() call.

Memory Mechanism During fork()

The memory handling during fork() is what makes this system call particularly interesting. Let's visualize this process:

Before fork()

Parent Process Memory Space
+---------------------------+
|      Program Code         |
+---------------------------+
|      Global Variables     |
+---------------------------+
|         Heap              |
|                           |
+---------------------------+
|           |               |
|   Stack   | (grows down)  |
|           v               |
+---------------------------+
|    PC → at fork() call    |
+---------------------------+

The fork() Call

When fork() is called, the operating system creates a new process entry in its process table and duplicates the parent's memory space:

                fork()
+------------------+-------------------+
|                  |                   |
| Parent Process   |   Child Process   |
|                  |                   |
+------------------+-------------------+

After fork() - Logical View

After fork() completes, both processes have their own memory spaces with identical content:

Parent Process Memory Space      Child Process Memory Space
+---------------------------+    +---------------------------+
|      Program Code         |    |      Program Code         |
+---------------------------+    +---------------------------+
|      Global Variables     |    |      Global Variables     |
+---------------------------+    +---------------------------+
|         Heap              |    |         Heap              |
|                           |    |                           |
+---------------------------+    +---------------------------+
|           |               |    |           |               |
|   Stack   | (grows down)  |    |   Stack   | (grows down)  |
|           v               |    |           v               |
+---------------------------+    +---------------------------+
| PC → after fork() call    |    | PC → after fork() call    |
| fork() returns child PID  |    | fork() returns 0          |
+---------------------------+    +---------------------------+

Actual Memory Implementation (Copy-on-Write)

Modern operating systems don't actually create a complete copy of the parent's memory when fork() is called. Instead, they use a technique called "copy-on-write" (COW) to optimize memory usage:

                    Physical Memory Pages
                    +------------------+
                    |    Page 1        |
                    +------------------+
                    |    Page 2        |
                    +------------------+
                    |    Page 3        |
                    +------------------+
                    |    Page 4        |
                    +------------------+
                           ▲   ▲
                           │   │
                           │   │
                           │   │
                       ┌───┘   └───┐
                       │           │
                       │           │
    Parent Process     │           │     Child Process
    Page Table         │           │     Page Table
+------------------+   │           │   +------------------+
| Virtual → Physical|  │           │   | Virtual → Physical|
+------------------+   │           │   +------------------+
| Page 1 → Page 1   |──┘           └──| Page 1 → Page 1   |
+------------------+                  +------------------+
| Page 2 → Page 2   |─────────────────| Page 2 → Page 2   |
+------------------+                  +------------------+
| Page 3 → Page 3   |─────────────────| Page 3 → Page 3   |
+------------------+                  +------------------+
| Page 4 → Page 4   |─────────────────| Page 4 → Page 4   |
+------------------+                  +------------------+

When either process modifies a memory page, only then is a copy made:

                    Physical Memory Pages
                    +------------------+
                    |    Page 1        |
                    +------------------+
                    |    Page 2        |        +------------------+
                    +------------------+        |    Page 2'       | (Copy created)
                    |    Page 3        |        +------------------+
                    +------------------+
                    |    Page 4        |
                    +------------------+
                           ▲   ▲             ▲
                           │   │             │
                           │   │             │
                           │   │             │
                       ┌───┘   └───┐         │
                       │           │         │
                       │           │         │
    Parent Process     │           │     Child Process
    Page Table         │           │     Page Table
+------------------+   │           │   +------------------+
| Virtual → Physical|  │           │   | Virtual → Physical|
+------------------+   │           │   +------------------+
| Page 1 → Page 1   |──┘           └──| Page 1 → Page 1   |
+------------------+                  +------------------+
| Page 2 → Page 2   |─────────────────| Page 2 → Page 2'  |──┘
+------------------+                  +------------------+
| Page 3 → Page 3   |─────────────────| Page 3 → Page 3   |
+------------------+                  +------------------+
| Page 4 → Page 4   |─────────────────| Page 4 → Page 4   |
+------------------+                  +------------------+

Practical Example

To illustrate this further, consider this simple code example:

#include <stdio.h>
#include <unistd.h>

int main() {
    int x = 5;  // Variable in memory

    printf("Before fork: x = %d (address: %p)\n", x, &x);

    pid_t pid = fork();

    if (pid < 0) {
        // Fork failed
        fprintf(stderr, "Fork failed\n");
        return 1;
    } else if (pid == 0) {
        // Child process
        printf("Child: x = %d (address: %p)\n", x, &x);
        x = 10;  // Child modifies x
        printf("Child after change: x = %d (address: %p)\n", x, &x);
    } else {
        // Parent process
        printf("Parent: x = %d (address: %p)\n", x, &x);
        x = 20;  // Parent modifies x
        printf("Parent after change: x = %d (address: %p)\n", x, &x);
    }

    return 0;
}

In this example, both parent and child initially see the same value of x (5), and the memory addresses reported will appear the same (though they refer to different physical memory after fork()). When either process modifies x, the copy-on-write mechanism creates a separate physical copy of that memory page, allowing each process to maintain its own independent value (10 for the child, 20 for the parent).

Key Insights About fork() and Memory

  1. fork() creates an almost exact duplicate of the parent process
  2. Both processes continue execution from the point after fork()
  3. Modern systems use copy-on-write to optimize memory usage, only creating copies of memory pages when they are modified
  4. This mechanism allows for efficient process creation while maintaining memory isolation between processes

This memory management strategy makes fork() both powerful and efficient, enabling the creation of new processes without excessive memory overhead.


References

Books

  • Windows via C/C++ - Jeffrey Richter, Christophe Nasarre (Microsoft Press, 2011)
  • Modern Operating Systems - Herbert Bos, Andrew S. Tanenbaum
  • The Elements of Computing Systems - Noam Nisan, Shimon Schocken (MIT Press, 2021)
  • The Linux Programming Interface - Michael Kerrisk (No Starch Press, 2018)
  • RISC-V Reader - An open architecture standard

Online Resources