Chapter 81: Memory Management in Linux: Virtual Memory & Paging Concepts

Chapter Objectives

By the end of this chapter, you will be able to:

  • Understand the fundamental reasons for using virtual memory in modern operating systems.
  • Explain the process of address translation from virtual to physical memory, including the roles of the Memory Management Unit (MMU) and page tables.
  • Implement C programs that interact with the memory subsystem and use standard Linux utilities to inspect their memory layout.
  • Analyze the virtual address space of a running process, identifying the stack, heap, and other memory segments.
  • Configure and monitor system memory, and debug common memory-related issues like page faults and out-of-memory conditions.
  • Describe the function of the Translation Lookaside Buffer (TLB) and its importance for memory access performance.

Introduction

Memory management is one of the most fundamental and critical responsibilities of an operating system kernel. In the early days of computing, programs accessed physical memory directly—a simple but fragile approach. This method offered no protection, allowing a single faulty program to corrupt the memory of other programs or even the operating system itself, leading to system-wide crashes. Furthermore, it forced developers to manage the finite amount of physical RAM manually, a complex and error-prone task. To overcome these limitations, modern operating systems, including Embedded Linux, employ a sophisticated abstraction known as virtual memory. This chapter delves into this foundational concept, exploring how the Linux kernel, in partnership with hardware, creates a private, linear address space for every process, providing memory protection, simplifying programming, and enabling features that allow a system to run programs larger than the available physical RAM. Understanding this mechanism is not merely an academic exercise; it is essential for writing efficient, stable, and secure embedded applications. On a resource-constrained device like a Raspberry Pi, knowing how to monitor and manage memory can be the difference between a reliable product and one that fails unpredictably in the field.

Technical Background

The Need for Abstraction: From Physical to Virtual Addressing

Imagine a library where every book has a permanent, fixed shelf location. If two librarians, working independently, decide to place two different books on the same shelf, chaos ensues. One book will be lost or damaged. This is analogous to early computer systems where programs used physical addresses. Each byte of RAM had a unique, hardware-defined address, and programs read from and wrote to these addresses directly. This created several significant problems. First, as in our library analogy, there was no protection. A buggy or malicious program could overwrite memory belonging to another program or, even worse, the operating system kernel itself. Second, it made multitasking difficult. If you wanted to run multiple programs, you had to load them into different, non-overlapping sections of physical RAM. The programmer had to know in advance where the program would be loaded, a process known as static relocation, which was inflexible and cumbersome.

To solve these profound issues, computer architects and operating system designers developed the concept of virtual memory. The core idea is to decouple the memory addresses used by a program from the actual physical addresses in the RAM chips. Each process is given its own private, contiguous address space, which we call the virtual address space. For a 64-bit system like the Raspberry Pi 5, this address space is enormous—264 bytes, a theoretical range far larger than any physical memory available today. From the program’s perspective, it has exclusive access to this vast expanse of memory. It can place its code, variables, stack, and dynamically allocated data anywhere it likes within this space, unaware of other programs running on the system. This illusion is powerfully liberating for the programmer.

The magic of translating these virtual addresses into their real, physical counterparts is handled by a collaboration between the operating system and a specialized piece of hardware called the Memory Management Unit (MMU). The MMU is typically part of the CPU itself. When a process attempts to access a memory location—for instance, by executing an instruction like MOV RAX, [0x400500]—the virtual address 0x400500 is sent to the MMU. The MMU’s job is to look up this virtual address and find the corresponding physical address in RAM. If a valid mapping exists, the MMU translates the address, and the memory access proceeds. If no valid mapping exists, the MMU triggers a hardware exception, known as a page fault, signaling the operating system to intervene.

This architecture elegantly solves the problems of the physical addressing model. Protection is achieved because one process’s virtual address 0x400500 will be mapped to a different physical address than another process’s 0x400500. A process is physically incapable of generating a physical address outside the set of RAM locations assigned to it by the kernel. Relocation becomes trivial; the OS can load a program into any available physical memory because the program only ever sees its own consistent virtual addresses.

Paging: The Mechanism of Virtual Memory

The most common technique for implementing virtual memory is paging. Instead of mapping individual bytes, which would require an impossibly large amount of tracking information, the MMU and the kernel divide both virtual and physical memory into fixed-size blocks. A block of virtual memory is called a page, and a block of physical memory is called a frame. Both pages and frames are the same size, typically 4 KiB on most architectures, including the ARM cores in the Raspberry Pi.

The operating system maintains a set of data structures called page tables for each process. A page table is essentially a map that stores the correspondence between a process’s virtual pages and the physical frames in RAM. When a process is created, the kernel allocates a page table for it. When the process needs to access a virtual address, the CPU’s MMU uses the page table to find out which physical frame holds the data.

Let’s trace this process. A virtual address generated by the CPU is split into two parts: a virtual page number (VPN) and an offset. The offset indicates the location of the desired byte within the page (e.g., for a 4 KiB page, the offset is 12 bits, as 212=4096). The VPN is used as an index into the page table. The page table entry (PTE) found at that index contains the physical frame number (PFN) where the page is stored in RAM. The PTE also contains several important control bits, such as:

  • Present/Valid bit: Indicates whether this page is currently in physical memory.
  • Read/Write bit: Specifies whether the page can be written to or is read-only.
  • User/Supervisor bit: Determines if the page can be accessed by user-level processes or only by the kernel.
  • Dirty bit: Set by the hardware when a write to the page occurs. This is useful for knowing if the page needs to be saved back to disk.
  • Accessed bit: Set by the hardware when the page is read or written. The OS can use this to determine which pages are actively being used.
flowchart TD
    subgraph CPU
        A[Virtual Address<br>e.g., 0x400500]
    end

    subgraph MMU
        B{Split Address}
        C[VPN: Virtual<br>Page Number]
        D[Offset]
        E{Check TLB for VPN}
        F[TLB Hit]
        G[TLB Miss]
        H(Page Table Walk)
        I[PTE: Page<br>Table Entry]
        J[PFN: Physical<br>Frame Number]
        K{Combine PFN + Offset}
    end

    subgraph System RAM
        L[Page Tables]
        M[Physical Memory Frame]
    end

    A --> B
    B --> C
    B --> D
    C --> E
    E -- Yes --> F
    E -- No --> G
    G --> H
    H --> L
    L --> I
    I --> J
    J --> K
    F --> J
    K --> M
    D --> K

    classDef primary fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff
    classDef process fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff
    classDef decision fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff
    classDef success fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff
    classDef system fill:#8b5cf6,stroke:#8b5cf6,stroke-width:1px,color:#ffffff
    classDef check fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff

    class A,M primary
    class B,H,K process
    class E decision
    class F,J success
    class G,I,C,D check
    class L system

Because a single-level page table for a 64-bit address space would be astronomically large, modern systems use multi-level page tables. In this scheme, the virtual page number is further subdivided. For example, in a four-level paging architecture (as used by x86-64 and conceptually similar on ARM), the VPN is split into four parts, each serving as an index into a different level of the page table hierarchy. This creates a tree-like structure. A top-level directory is used to find a page middle directory, which points to a page table, which finally contains the PTE with the physical frame number. This hierarchical approach saves a tremendous amount of space, as entire branches of the tree for unused portions of the address space do not need to be allocated at all.

The Page Fault: Not Always an Error

The term “fault” often has a negative connotation, but a page fault is a normal and essential part of how virtual memory works. It is simply a signal from the MMU to the kernel that it needs help. A page fault occurs when the MMU attempts to translate a virtual address but finds that the corresponding page table entry is marked as invalid (i.e., the present bit is clear).

This can happen for several reasons. A common, non-error case is demand paging. When you start a program, the kernel doesn’t load the entire executable file into memory at once. That would be slow and wasteful, as most programs don’t use all their code immediately. Instead, the kernel sets up the process’s page tables but marks all pages as not present. The first time the process tries to execute code in a particular page, the MMU triggers a page fault. The kernel’s page fault handler then inspects the faulting address, determines that this is a legitimate access to a page that simply hasn’t been loaded yet, finds the page’s content in the executable file on disk, allocates a physical frame, loads the data into it, updates the page table entry to point to the new frame and set the present bit, and finally resumes the process. From the process’s perspective, the instruction simply took a little longer to execute.

Another reason for a page fault is swapping. If the system runs low on physical memory, the kernel may decide to move an inactive page from RAM to a special area on the disk called the swap space. The page table entry is then marked as not present. If the process later tries to access that page, a page fault occurs. The kernel’s handler sees that the page exists in the swap space, brings it back into a physical frame (potentially swapping another page out to make room), updates the page table, and resumes the process. This mechanism allows the system to run more programs than can fit into physical RAM, though it comes at a performance cost due to the slowness of disk I/O.

Of course, a page fault can also indicate a genuine error. If a program tries to access a virtual address that is not part of any valid memory region (e.g., dereferencing a NULL pointer or accessing an out-of-bounds array element), the page fault handler will find no valid source for the data. In this case, it will terminate the process by sending it a segmentation fault signal (SIGSEGV).

flowchart TD
    A[MMU detects invalid PTE<br>for a virtual address]
    B(Triggers Page Fault Exception)
    C{Kernel Page Fault Handler Takes Over}
    D{Is the address<br>in a valid memory region<br>for this process?}

    subgraph "Valid Access (Not an Error)"
        E{Is the page in swap space?}
        F["Find page on disk<br>(swap partition)"]
        G[Allocate a free physical frame]
        H[Swap page from disk into frame]
        I["Update Page Table Entry (PTE)<br>with new frame number, set Present bit"]
        J[Resume Process Execution]
    end

    subgraph "Invalid Access (Error)"
        K[Send SIGSEGV signal<br>to the process]
        L["Process Terminates<br>"Segmentation fault""]
    end
    
    M{"Is this the first access?<br>(Demand Paging)"}
    N[Find page in executable file on disk]

    A --> B
    B --> C
    C --> D
    D -- No --> K
    D -- Yes --> E
    E -- Yes --> F
    E -- No --> M
    M -- Yes --> N
    M -- No --> G
    F --> G
    N --> G
    G --> H
    H --> I
    I --> J
    K --> L

    classDef primary fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff
    classDef process fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff
    classDef decision fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff
    classDef success fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff
    classDef system fill:#8b5cf6,stroke:#8b5cf6,stroke-width:1px,color:#ffffff
    classDef check fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff
    
    class A,B check
    class C,L,J system
    class D,E,M decision
    class F,G,H,I,K,N process

Optimizing Performance: The Translation Lookaside Buffer (TLB)

The process of walking a multi-level page table for every single memory access would be prohibitively slow. Each level of the page table is itself in memory, so a four-level walk could require four separate memory reads just to find the physical address, before the actual desired data can be fetched. Given that modern CPUs execute billions of instructions per second, many of which access memory, this overhead is unacceptable.

To solve this, MMUs include a small, very fast hardware cache called the Translation Lookaside Buffer (TLB). The TLB stores recently used virtual-to-physical address mappings. When the MMU gets a virtual address, it first checks the TLB. If it finds a matching entry (a TLB hit), the physical address is retrieved directly from the TLB in a single clock cycle, and the slow page table walk is avoided. If the entry is not in the TLB (a TLB miss), the hardware or the OS must perform the full page table walk. The resulting translation is then stored in the TLB, likely evicting another entry.

Because programs exhibit locality of reference—they tend to access the same memory locations (temporal locality) or nearby locations (spatial locality) repeatedly—the TLB is extremely effective. TLB hit rates are often above 99%, meaning the performance penalty of page table walks is paid only rarely. The TLB is a critical component that makes the virtual memory abstraction practical from a performance standpoint.

Practical Examples

Theory provides the foundation, but true understanding comes from seeing these concepts in action. In this section, we will use the Raspberry Pi 5 to explore the Linux memory model. We will use standard command-line tools and simple C programs to observe how a process’s virtual address space is structured and managed by the kernel.

Inspecting the System’s Memory

First, let’s get a high-level view of the memory on our Raspberry Pi. The free command provides a quick summary of physical memory and swap usage.

Bash
# Connect to your Raspberry Pi 5 via SSH
ssh pi@<raspberrypi_ip>

# Run the free command with the -h (human-readable) flag
free -h

You will see output similar to this:

Plaintext
               total        used        free      shared  buff/cache   available
Mem:           4.0Gi       1.0Gi       1.7Gi        75Mi       1.4Gi       3.0Gi
Swap:          511Mi          0B       511Mi

Let’s break down this output:

  • total: The total amount of physical RAM installed.
  • used: Memory currently in use by processes.
  • free: Memory that is completely unused.
  • shared: Memory used by tmpfs (a temporary file system in RAM).
  • buff/cache: This is a key value. Linux uses free RAM to cache data from the disk (page cache) and for buffers. This significantly speeds up file I/O. This memory is not “used” in the sense that it’s locked by a process; it can be instantly reclaimed by the kernel if an application needs it.
  • available: An estimate of how much memory is available for starting new applications, without swapping. It is calculated as free + buff/cache. This is often the most useful number to look at.

Tip: Don’t be alarmed if the free memory value is low on a running system. A good operating system uses idle memory for caching to improve performance. The available column gives a much better picture of the system’s memory health.

Exploring a Process’s Virtual Address Space

Now, let’s dive into a single process. Every process in Linux has a virtual directory under /proc/[pid], where [pid] is the process ID. This directory contains a wealth of information, including several pseudo-files related to memory.

Let’s write a simple C program that allocates some memory and then keeps running so we can inspect it.

memory_explorer.c

C
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int global_var = 100; // In .data segment
int global_uninit_var; // In .bss segment

int main() {
    printf("Welcome to the Memory Explorer!\n");
    printf("My process ID is: %d\n", getpid());

    // Allocate some memory on the heap
    void *heap_ptr = malloc(1024 * 1024); // Allocate 1 MB
    if (heap_ptr == NULL) {
        perror("malloc failed");
        return 1;
    }
    printf("Allocated 1MB on the heap at address: %p\n", heap_ptr);

    // The program will now sleep, allowing us to inspect its memory
    // from another terminal.
    printf("Sleeping for 5 minutes... Find my PID and inspect me!\n");
    sleep(300);

    free(heap_ptr);
    printf("Program finished.\n");
    return 0;
}

Build and Run Steps:

  1. Save the code above as memory_explorer.c on your Raspberry Pi.
  2. Compile it using GCC:
    gcc memory_explorer.c -o memory_explorer
  3. Run the program:
    ./memory_explorer

The program will print its process ID (PID). Note this PID. Now, open a second SSH terminal to your Raspberry Pi. In this new terminal, we will use the PID to inspect the running program. Let’s assume the PID is 24501.

Using /proc/[pid]/maps

The maps file shows the memory mappings for the process. It reveals how the virtual address space is laid out.

Bash
# In the second terminal
cat /proc/24501/maps

The output will be detailed, but let’s look at the key parts:

Plaintext
# Address Range      Perms  Offset   Dev   Inode   Pathname
00400000-00401000    r-xp   00000000 b3:02  12345   /home/pi/memory_explorer  # Text (code) segment
00401000-00402000    r--p   00001000 b3:02  12345   /home/pi/memory_explorer  # Read-only data
00402000-00403000    rw-p   00002000 b3:02  12345   /home/pi/memory_explorer  # .data and .bss segments
00503000-00604000    rw-p   00000000 00:00  0       [heap]                    # The heap
7f...-7f...          rw-p   00000000 00:00  0       [stack]                   # The stack
... (many more lines for shared libraries)

Let’s decipher this:

  • First column: The range of virtual addresses for this segment.
  • Second column (Perms): Permissions: r (read), w (write), x (execute), p (private). The r-xp permission for the code segment means it can be read and executed, but not written to—a crucial security feature.
  • Pathname: The file backing this memory mapping. Our executable file backs the code and data segments. The heap and stack are anonymous (not backed by a file).

You can clearly see the different segments of our program laid out in virtual memory: the executable code (.text), the initialized global variables (.data), the heap where our malloc‘d memory lives, and the stack used for local variables and function calls.

Using pmap

The pmap (process map) command provides a more summarized and sometimes more readable view of the same information.

Bash
# In the second terminal
pmap -x 24501

The -x flag gives extended details. The output will show the address, size, permissions, and mapping for each segment. You will see the 1024K (1MB) allocation we made with malloc clearly listed as part of the [heap] mapping.

Observing a Page Fault

We can’t easily trigger a demand paging fault on cue, as the kernel handles it transparently. However, we can easily trigger the “error” type of page fault: a segmentation fault.

segfault_demo.c

C
#include <stdio.h>

int main() {
    // Create a null pointer.
    int *ptr = NULL;

    printf("About to cause a segmentation fault...\n");

    // Attempt to write to the memory location pointed to by ptr.
    // Address 0 is never a valid user-space address to write to.
    // The MMU will detect this and trigger a page fault.
    // The kernel's fault handler will see it's an invalid access
    // and send a SIGSEGV signal to the process.
    *ptr = 42;

    // This line will never be reached.
    printf("This will not be printed.\n");

    return 0;
}

Build and Run:

Bash
gcc segfault_demo.c -o segfault_demo
./segfault_demo

Expected Output:

Plaintext
About to cause a segmentation fault...
Segmentation fault

The “Segmentation fault” message is the shell’s way of telling you that the process was terminated by a SIGSEGV signal. This happened because our code tried to access an invalid virtual address (NULL). The MMU could not find a valid page table entry for this address, triggered a page fault, and the kernel determined the access was illegal, leading it to terminate the program. This is a perfect example of the memory protection provided by the virtual memory system.

Common Mistakes & Troubleshooting

Navigating memory management can be tricky. Here are some common pitfalls and how to handle them.

Mistake / Issue Symptom(s) Troubleshooting / Solution
Confusing Virtual (VSZ) vs. Resident (RSS) Memory A process appears to use huge amounts of memory (e.g., in top or ps), causing concern about RAM usage. Focus on the RSS (Resident Set Size), not VSZ. RSS is the actual physical RAM the process occupies. VSZ is the total size of the virtual address space, which is often much larger and not indicative of a memory problem.
Application Disappears Randomly A long-running application suddenly terminates without any error messages or crash logs. Check the kernel log with dmesg | grep -i “out of memory”. The system was likely critically low on memory, and the OOM (Out-Of-Memory) Killer terminated your process to save the system. The solution is to reduce your application’s memory footprint.
Memory Leak The system becomes progressively slower over hours or days. The available memory in free -h continually decreases until the OOM Killer is invoked. For C/C++ applications, use tools like Valgrind during development to detect memory that is allocated but never freed. Regularly monitor the RSS of your process to spot gradual increases over time.
Stack Overflow The program crashes immediately with a “Segmentation fault” message, especially when entering a specific function. This is often caused by very large local variables (e.g., int big_array[5000000];) or infinite recursion. Move large data structures from the stack to the heap by allocating them with malloc() instead.
System “Thrashing” The system is extremely unresponsive, the disk activity light is constantly on, but CPU usage may not be at 100%. The system is spending all its time swapping memory pages between RAM and disk. Use vmstat 1 and watch the ‘si’ (swap-in) and ‘so’ (swap-out) columns. High values confirm thrashing. The only solutions are to add more RAM or reduce the system’s memory load.

Exercises

  1. Exploring Process Memory with /proc:
    • Objective: To become familiar with the information available in the /proc filesystem.
    • Steps:
      1. Run any long-running command, like top or sleep 100.
      2. Find its PID using pgrep top or pgrep sleep.
      3. Navigate to its /proc/[pid] directory.
      4. Examine the contents of the mapssmaps, and status files.
    • Verification: In the status file, find the VmRSS (Resident Set Size) line. Compare this value to the RSS column for that process in the top command. They should be very similar. In the smaps file, observe the detailed breakdown of memory usage for each mapping, including its RSS.
  2. The Heap vs. The Stack:
    • Objective: To practically observe the difference between heap and stack allocations.
    • Steps:
      1. Write a C program that declares a large array as a local variable inside main() (e.g., int stack_array[2000000];). Compile and run it. What happens? (Hint: likely a segmentation fault due to stack overflow).
      2. Modify the program to allocate the same amount of memory dynamically on the heap using malloc() (e.g., int *heap_array = malloc(sizeof(int) * 2000000);).
      3. Run the modified program in the background (./my_program &).
      4. Use pmap -x [pid] to inspect its memory map.
    • Verification: In the first case, the program should crash. In the second case, the program should run successfully, and the pmap output should show a large allocation in the [heap] segment.
  3. Monitoring Memory Pressure:
    • Objective: To simulate a low-memory situation and observe the system’s response.
    • Steps:
      1. Open two terminals to your Raspberry Pi.
      2. In the first terminal, run the command vmstat 1. This will print a new line of system statistics every second. Pay attention to the swpd (swapped), freebuffcache, and si/so (swap in/swap out) columns.
      3. In the second terminal, run the memory_explorer program from our earlier example, but modify it to allocate a very large amount of memory in a loop (e.g., allocate 100MB every second).
    • Verification: As the memory-hungry program runs, watch the vmstat output. You will see the free memory decrease, then the cache will shrink as the kernel reclaims it. Eventually, if you push it far enough, you may see the so (swap out) column become non-zero as the system starts swapping. Finally, the OOM killer might terminate your program. Check dmesg afterward to confirm.

Summary

This chapter provided a deep dive into the critical mechanisms of virtual memory management in Linux. We have moved from the theoretical underpinnings to practical, hands-on exploration.

  • Virtual Memory is an Abstraction: It provides each process with a private, large, and linear address space, independent of physical RAM. This is crucial for memory protection, ease of programming, and multitasking.
  • The MMU is the Hardware Translator: The Memory Management Unit translates virtual addresses to physical addresses using page tables.
  • Paging is the Core Mechanism: Memory is divided into fixed-size pages (virtual) and frames (physical). Page tables, managed by the kernel, map pages to frames.
  • Page Faults are Signals to the Kernel: They are not always errors. They are used to implement demand paging (loading from disk) and swapping, enabling the system to run programs larger than physical RAM.
  • The TLB Makes it Fast: The Translation Lookaside Buffer is a hardware cache for address translations that mitigates the performance overhead of page table walks.
  • Linux Provides Powerful Tools: Utilities like freepmapvmstat, and the /proc filesystem allow developers to inspect and monitor the memory behavior of the system and individual processes.
  • Understanding Memory is Key to Stability: In embedded systems, correctly managing memory, avoiding leaks, and understanding the OOM killer are essential for building reliable, long-running applications.

Further Reading

  1. Understanding the Linux Kernel, 3rd Edition by Daniel P. Bovet & Marco Cesati. (Specifically, Chapter 8: Memory Management). While slightly dated, its explanation of the core concepts is exceptional.
  2. Linux Kernel Documentation: The proc Filesystem. Available within the kernel source tree or online at https://www.kernel.org/doc/html/latest/filesystems/proc.html. The authoritative source for what these files contain.
  3. What Every Programmer Should Know About Memory by Ulrich Drepper. A comprehensive and deep paper covering everything from RAM hardware to CPU caches and OS memory handling. https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
  4. The ARM Architecture Reference Manual for ARMv8-A. For those wanting to go deeper, this official document from ARM details the specifics of the MMU, page table entry formats, and TLB management for the architecture used in the Raspberry Pi 5.
  5. “How The Kernel Manages Your Memory” – An article on LWN.net. LWN.net frequently has high-quality, in-depth articles on kernel internals. Search their archives for memory management topics.
  6. Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. (Specifically, the chapters on Virtualization). An excellent and very readable academic textbook on OS concepts.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top