Chapter 81: Memory Management in Linux: Virtual Memory & Paging Concepts
Chapter Objectives
By the end of this chapter, you will be able to:
- Understand the fundamental reasons for using virtual memory in modern operating systems.
- Explain the process of address translation from virtual to physical memory, including the roles of the Memory Management Unit (MMU) and page tables.
- Implement C programs that interact with the memory subsystem and use standard Linux utilities to inspect their memory layout.
- Analyze the virtual address space of a running process, identifying the stack, heap, and other memory segments.
- Configure and monitor system memory, and debug common memory-related issues like page faults and out-of-memory conditions.
- Describe the function of the Translation Lookaside Buffer (TLB) and its importance for memory access performance.
Introduction
Memory management is one of the most fundamental and critical responsibilities of an operating system kernel. In the early days of computing, programs accessed physical memory directly—a simple but fragile approach. This method offered no protection, allowing a single faulty program to corrupt the memory of other programs or even the operating system itself, leading to system-wide crashes. Furthermore, it forced developers to manage the finite amount of physical RAM manually, a complex and error-prone task. To overcome these limitations, modern operating systems, including Embedded Linux, employ a sophisticated abstraction known as virtual memory. This chapter delves into this foundational concept, exploring how the Linux kernel, in partnership with hardware, creates a private, linear address space for every process, providing memory protection, simplifying programming, and enabling features that allow a system to run programs larger than the available physical RAM. Understanding this mechanism is not merely an academic exercise; it is essential for writing efficient, stable, and secure embedded applications. On a resource-constrained device like a Raspberry Pi, knowing how to monitor and manage memory can be the difference between a reliable product and one that fails unpredictably in the field.
Technical Background
The Need for Abstraction: From Physical to Virtual Addressing
Imagine a library where every book has a permanent, fixed shelf location. If two librarians, working independently, decide to place two different books on the same shelf, chaos ensues. One book will be lost or damaged. This is analogous to early computer systems where programs used physical addresses. Each byte of RAM had a unique, hardware-defined address, and programs read from and wrote to these addresses directly. This created several significant problems. First, as in our library analogy, there was no protection. A buggy or malicious program could overwrite memory belonging to another program or, even worse, the operating system kernel itself. Second, it made multitasking difficult. If you wanted to run multiple programs, you had to load them into different, non-overlapping sections of physical RAM. The programmer had to know in advance where the program would be loaded, a process known as static relocation, which was inflexible and cumbersome.
To solve these profound issues, computer architects and operating system designers developed the concept of virtual memory. The core idea is to decouple the memory addresses used by a program from the actual physical addresses in the RAM chips. Each process is given its own private, contiguous address space, which we call the virtual address space. For a 64-bit system like the Raspberry Pi 5, this address space is enormous—264 bytes, a theoretical range far larger than any physical memory available today. From the program’s perspective, it has exclusive access to this vast expanse of memory. It can place its code, variables, stack, and dynamically allocated data anywhere it likes within this space, unaware of other programs running on the system. This illusion is powerfully liberating for the programmer.
The magic of translating these virtual addresses into their real, physical counterparts is handled by a collaboration between the operating system and a specialized piece of hardware called the Memory Management Unit (MMU). The MMU is typically part of the CPU itself. When a process attempts to access a memory location—for instance, by executing an instruction like MOV RAX, [0x400500]
—the virtual address 0x400500
is sent to the MMU. The MMU’s job is to look up this virtual address and find the corresponding physical address in RAM. If a valid mapping exists, the MMU translates the address, and the memory access proceeds. If no valid mapping exists, the MMU triggers a hardware exception, known as a page fault, signaling the operating system to intervene.
This architecture elegantly solves the problems of the physical addressing model. Protection is achieved because one process’s virtual address 0x400500
will be mapped to a different physical address than another process’s 0x400500
. A process is physically incapable of generating a physical address outside the set of RAM locations assigned to it by the kernel. Relocation becomes trivial; the OS can load a program into any available physical memory because the program only ever sees its own consistent virtual addresses.
Paging: The Mechanism of Virtual Memory
The most common technique for implementing virtual memory is paging. Instead of mapping individual bytes, which would require an impossibly large amount of tracking information, the MMU and the kernel divide both virtual and physical memory into fixed-size blocks. A block of virtual memory is called a page, and a block of physical memory is called a frame. Both pages and frames are the same size, typically 4 KiB on most architectures, including the ARM cores in the Raspberry Pi.
The operating system maintains a set of data structures called page tables for each process. A page table is essentially a map that stores the correspondence between a process’s virtual pages and the physical frames in RAM. When a process is created, the kernel allocates a page table for it. When the process needs to access a virtual address, the CPU’s MMU uses the page table to find out which physical frame holds the data.
Let’s trace this process. A virtual address generated by the CPU is split into two parts: a virtual page number (VPN) and an offset. The offset indicates the location of the desired byte within the page (e.g., for a 4 KiB page, the offset is 12 bits, as 212=4096). The VPN is used as an index into the page table. The page table entry (PTE) found at that index contains the physical frame number (PFN) where the page is stored in RAM. The PTE also contains several important control bits, such as:
- Present/Valid bit: Indicates whether this page is currently in physical memory.
- Read/Write bit: Specifies whether the page can be written to or is read-only.
- User/Supervisor bit: Determines if the page can be accessed by user-level processes or only by the kernel.
- Dirty bit: Set by the hardware when a write to the page occurs. This is useful for knowing if the page needs to be saved back to disk.
- Accessed bit: Set by the hardware when the page is read or written. The OS can use this to determine which pages are actively being used.
flowchart TD subgraph CPU A[Virtual Address<br>e.g., 0x400500] end subgraph MMU B{Split Address} C[VPN: Virtual<br>Page Number] D[Offset] E{Check TLB for VPN} F[TLB Hit] G[TLB Miss] H(Page Table Walk) I[PTE: Page<br>Table Entry] J[PFN: Physical<br>Frame Number] K{Combine PFN + Offset} end subgraph System RAM L[Page Tables] M[Physical Memory Frame] end A --> B B --> C B --> D C --> E E -- Yes --> F E -- No --> G G --> H H --> L L --> I I --> J J --> K F --> J K --> M D --> K classDef primary fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff classDef process fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff classDef decision fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff classDef success fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff classDef system fill:#8b5cf6,stroke:#8b5cf6,stroke-width:1px,color:#ffffff classDef check fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff class A,M primary class B,H,K process class E decision class F,J success class G,I,C,D check class L system
Because a single-level page table for a 64-bit address space would be astronomically large, modern systems use multi-level page tables. In this scheme, the virtual page number is further subdivided. For example, in a four-level paging architecture (as used by x86-64 and conceptually similar on ARM), the VPN is split into four parts, each serving as an index into a different level of the page table hierarchy. This creates a tree-like structure. A top-level directory is used to find a page middle directory, which points to a page table, which finally contains the PTE with the physical frame number. This hierarchical approach saves a tremendous amount of space, as entire branches of the tree for unused portions of the address space do not need to be allocated at all.
The Page Fault: Not Always an Error
The term “fault” often has a negative connotation, but a page fault is a normal and essential part of how virtual memory works. It is simply a signal from the MMU to the kernel that it needs help. A page fault occurs when the MMU attempts to translate a virtual address but finds that the corresponding page table entry is marked as invalid (i.e., the present bit is clear).
This can happen for several reasons. A common, non-error case is demand paging. When you start a program, the kernel doesn’t load the entire executable file into memory at once. That would be slow and wasteful, as most programs don’t use all their code immediately. Instead, the kernel sets up the process’s page tables but marks all pages as not present. The first time the process tries to execute code in a particular page, the MMU triggers a page fault. The kernel’s page fault handler then inspects the faulting address, determines that this is a legitimate access to a page that simply hasn’t been loaded yet, finds the page’s content in the executable file on disk, allocates a physical frame, loads the data into it, updates the page table entry to point to the new frame and set the present bit, and finally resumes the process. From the process’s perspective, the instruction simply took a little longer to execute.
Another reason for a page fault is swapping. If the system runs low on physical memory, the kernel may decide to move an inactive page from RAM to a special area on the disk called the swap space. The page table entry is then marked as not present. If the process later tries to access that page, a page fault occurs. The kernel’s handler sees that the page exists in the swap space, brings it back into a physical frame (potentially swapping another page out to make room), updates the page table, and resumes the process. This mechanism allows the system to run more programs than can fit into physical RAM, though it comes at a performance cost due to the slowness of disk I/O.
Of course, a page fault can also indicate a genuine error. If a program tries to access a virtual address that is not part of any valid memory region (e.g., dereferencing a NULL
pointer or accessing an out-of-bounds array element), the page fault handler will find no valid source for the data. In this case, it will terminate the process by sending it a segmentation fault signal (SIGSEGV
).
flowchart TD A[MMU detects invalid PTE<br>for a virtual address] B(Triggers Page Fault Exception) C{Kernel Page Fault Handler Takes Over} D{Is the address<br>in a valid memory region<br>for this process?} subgraph "Valid Access (Not an Error)" E{Is the page in swap space?} F["Find page on disk<br>(swap partition)"] G[Allocate a free physical frame] H[Swap page from disk into frame] I["Update Page Table Entry (PTE)<br>with new frame number, set Present bit"] J[Resume Process Execution] end subgraph "Invalid Access (Error)" K[Send SIGSEGV signal<br>to the process] L["Process Terminates<br>"Segmentation fault""] end M{"Is this the first access?<br>(Demand Paging)"} N[Find page in executable file on disk] A --> B B --> C C --> D D -- No --> K D -- Yes --> E E -- Yes --> F E -- No --> M M -- Yes --> N M -- No --> G F --> G N --> G G --> H H --> I I --> J K --> L classDef primary fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff classDef process fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff classDef decision fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff classDef success fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff classDef system fill:#8b5cf6,stroke:#8b5cf6,stroke-width:1px,color:#ffffff classDef check fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff class A,B check class C,L,J system class D,E,M decision class F,G,H,I,K,N process
Optimizing Performance: The Translation Lookaside Buffer (TLB)
The process of walking a multi-level page table for every single memory access would be prohibitively slow. Each level of the page table is itself in memory, so a four-level walk could require four separate memory reads just to find the physical address, before the actual desired data can be fetched. Given that modern CPUs execute billions of instructions per second, many of which access memory, this overhead is unacceptable.
To solve this, MMUs include a small, very fast hardware cache called the Translation Lookaside Buffer (TLB). The TLB stores recently used virtual-to-physical address mappings. When the MMU gets a virtual address, it first checks the TLB. If it finds a matching entry (a TLB hit), the physical address is retrieved directly from the TLB in a single clock cycle, and the slow page table walk is avoided. If the entry is not in the TLB (a TLB miss), the hardware or the OS must perform the full page table walk. The resulting translation is then stored in the TLB, likely evicting another entry.
Because programs exhibit locality of reference—they tend to access the same memory locations (temporal locality) or nearby locations (spatial locality) repeatedly—the TLB is extremely effective. TLB hit rates are often above 99%, meaning the performance penalty of page table walks is paid only rarely. The TLB is a critical component that makes the virtual memory abstraction practical from a performance standpoint.
Practical Examples
Theory provides the foundation, but true understanding comes from seeing these concepts in action. In this section, we will use the Raspberry Pi 5 to explore the Linux memory model. We will use standard command-line tools and simple C programs to observe how a process’s virtual address space is structured and managed by the kernel.
Inspecting the System’s Memory
First, let’s get a high-level view of the memory on our Raspberry Pi. The free
command provides a quick summary of physical memory and swap usage.
# Connect to your Raspberry Pi 5 via SSH
ssh pi@<raspberrypi_ip>
# Run the free command with the -h (human-readable) flag
free -h
You will see output similar to this:
total used free shared buff/cache available
Mem: 4.0Gi 1.0Gi 1.7Gi 75Mi 1.4Gi 3.0Gi
Swap: 511Mi 0B 511Mi
Let’s break down this output:
- total: The total amount of physical RAM installed.
- used: Memory currently in use by processes.
- free: Memory that is completely unused.
- shared: Memory used by
tmpfs
(a temporary file system in RAM). - buff/cache: This is a key value. Linux uses free RAM to cache data from the disk (page cache) and for buffers. This significantly speeds up file I/O. This memory is not “used” in the sense that it’s locked by a process; it can be instantly reclaimed by the kernel if an application needs it.
- available: An estimate of how much memory is available for starting new applications, without swapping. It is calculated as
free + buff/cache
. This is often the most useful number to look at.
Tip: Don’t be alarmed if the
free
memory value is low on a running system. A good operating system uses idle memory for caching to improve performance. Theavailable
column gives a much better picture of the system’s memory health.
Exploring a Process’s Virtual Address Space
Now, let’s dive into a single process. Every process in Linux has a virtual directory under /proc/[pid]
, where [pid]
is the process ID. This directory contains a wealth of information, including several pseudo-files related to memory.
Let’s write a simple C program that allocates some memory and then keeps running so we can inspect it.
memory_explorer.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int global_var = 100; // In .data segment
int global_uninit_var; // In .bss segment
int main() {
printf("Welcome to the Memory Explorer!\n");
printf("My process ID is: %d\n", getpid());
// Allocate some memory on the heap
void *heap_ptr = malloc(1024 * 1024); // Allocate 1 MB
if (heap_ptr == NULL) {
perror("malloc failed");
return 1;
}
printf("Allocated 1MB on the heap at address: %p\n", heap_ptr);
// The program will now sleep, allowing us to inspect its memory
// from another terminal.
printf("Sleeping for 5 minutes... Find my PID and inspect me!\n");
sleep(300);
free(heap_ptr);
printf("Program finished.\n");
return 0;
}
Build and Run Steps:
- Save the code above as
memory_explorer.c
on your Raspberry Pi. - Compile it using GCC:
gcc memory_explorer.c -o memory_explorer
- Run the program:
./memory_explorer
The program will print its process ID (PID). Note this PID. Now, open a second SSH terminal to your Raspberry Pi. In this new terminal, we will use the PID to inspect the running program. Let’s assume the PID is 24501
.
Using /proc/[pid]/maps
The maps file shows the memory mappings for the process. It reveals how the virtual address space is laid out.
# In the second terminal
cat /proc/24501/maps
The output will be detailed, but let’s look at the key parts:
# Address Range Perms Offset Dev Inode Pathname
00400000-00401000 r-xp 00000000 b3:02 12345 /home/pi/memory_explorer # Text (code) segment
00401000-00402000 r--p 00001000 b3:02 12345 /home/pi/memory_explorer # Read-only data
00402000-00403000 rw-p 00002000 b3:02 12345 /home/pi/memory_explorer # .data and .bss segments
00503000-00604000 rw-p 00000000 00:00 0 [heap] # The heap
7f...-7f... rw-p 00000000 00:00 0 [stack] # The stack
... (many more lines for shared libraries)
Let’s decipher this:
- First column: The range of virtual addresses for this segment.
- Second column (Perms): Permissions:
r
(read),w
(write),x
(execute),p
(private). Ther-xp
permission for the code segment means it can be read and executed, but not written to—a crucial security feature. - Pathname: The file backing this memory mapping. Our executable file backs the code and data segments. The heap and stack are anonymous (not backed by a file).
You can clearly see the different segments of our program laid out in virtual memory: the executable code (.text
), the initialized global variables (.data
), the heap where our malloc
‘d memory lives, and the stack used for local variables and function calls.
Using pmap
The pmap (process map) command provides a more summarized and sometimes more readable view of the same information.
# In the second terminal
pmap -x 24501
The -x
flag gives extended details. The output will show the address, size, permissions, and mapping for each segment. You will see the 1024K (1MB) allocation we made with malloc
clearly listed as part of the [heap]
mapping.
Observing a Page Fault
We can’t easily trigger a demand paging fault on cue, as the kernel handles it transparently. However, we can easily trigger the “error” type of page fault: a segmentation fault.
segfault_demo.c
#include <stdio.h>
int main() {
// Create a null pointer.
int *ptr = NULL;
printf("About to cause a segmentation fault...\n");
// Attempt to write to the memory location pointed to by ptr.
// Address 0 is never a valid user-space address to write to.
// The MMU will detect this and trigger a page fault.
// The kernel's fault handler will see it's an invalid access
// and send a SIGSEGV signal to the process.
*ptr = 42;
// This line will never be reached.
printf("This will not be printed.\n");
return 0;
}
Build and Run:
gcc segfault_demo.c -o segfault_demo
./segfault_demo
Expected Output:
About to cause a segmentation fault...
Segmentation fault
The “Segmentation fault” message is the shell’s way of telling you that the process was terminated by a SIGSEGV
signal. This happened because our code tried to access an invalid virtual address (NULL
). The MMU could not find a valid page table entry for this address, triggered a page fault, and the kernel determined the access was illegal, leading it to terminate the program. This is a perfect example of the memory protection provided by the virtual memory system.
Common Mistakes & Troubleshooting
Navigating memory management can be tricky. Here are some common pitfalls and how to handle them.
Exercises
- Exploring Process Memory with
/proc
:- Objective: To become familiar with the information available in the
/proc
filesystem. - Steps:
- Run any long-running command, like
top
orsleep 100
. - Find its PID using
pgrep top
orpgrep sleep
. - Navigate to its
/proc/[pid]
directory. - Examine the contents of the
maps
,smaps
, andstatus
files.
- Run any long-running command, like
- Verification: In the
status
file, find theVmRSS
(Resident Set Size) line. Compare this value to the RSS column for that process in thetop
command. They should be very similar. In thesmaps
file, observe the detailed breakdown of memory usage for each mapping, including its RSS.
- Objective: To become familiar with the information available in the
- The Heap vs. The Stack:
- Objective: To practically observe the difference between heap and stack allocations.
- Steps:
- Write a C program that declares a large array as a local variable inside
main()
(e.g.,int stack_array[2000000];
). Compile and run it. What happens? (Hint: likely a segmentation fault due to stack overflow). - Modify the program to allocate the same amount of memory dynamically on the heap using
malloc()
(e.g.,int *heap_array = malloc(sizeof(int) * 2000000);
). - Run the modified program in the background (
./my_program &
). - Use
pmap -x [pid]
to inspect its memory map.
- Write a C program that declares a large array as a local variable inside
- Verification: In the first case, the program should crash. In the second case, the program should run successfully, and the
pmap
output should show a large allocation in the[heap]
segment.
- Monitoring Memory Pressure:
- Objective: To simulate a low-memory situation and observe the system’s response.
- Steps:
- Open two terminals to your Raspberry Pi.
- In the first terminal, run the command
vmstat 1
. This will print a new line of system statistics every second. Pay attention to theswpd
(swapped),free
,buff
,cache
, andsi
/so
(swap in/swap out) columns. - In the second terminal, run the
memory_explorer
program from our earlier example, but modify it to allocate a very large amount of memory in a loop (e.g., allocate 100MB every second).
- Verification: As the memory-hungry program runs, watch the
vmstat
output. You will see thefree
memory decrease, then thecache
will shrink as the kernel reclaims it. Eventually, if you push it far enough, you may see theso
(swap out) column become non-zero as the system starts swapping. Finally, the OOM killer might terminate your program. Checkdmesg
afterward to confirm.
Summary
This chapter provided a deep dive into the critical mechanisms of virtual memory management in Linux. We have moved from the theoretical underpinnings to practical, hands-on exploration.
- Virtual Memory is an Abstraction: It provides each process with a private, large, and linear address space, independent of physical RAM. This is crucial for memory protection, ease of programming, and multitasking.
- The MMU is the Hardware Translator: The Memory Management Unit translates virtual addresses to physical addresses using page tables.
- Paging is the Core Mechanism: Memory is divided into fixed-size pages (virtual) and frames (physical). Page tables, managed by the kernel, map pages to frames.
- Page Faults are Signals to the Kernel: They are not always errors. They are used to implement demand paging (loading from disk) and swapping, enabling the system to run programs larger than physical RAM.
- The TLB Makes it Fast: The Translation Lookaside Buffer is a hardware cache for address translations that mitigates the performance overhead of page table walks.
- Linux Provides Powerful Tools: Utilities like
free
,pmap
,vmstat
, and the/proc
filesystem allow developers to inspect and monitor the memory behavior of the system and individual processes. - Understanding Memory is Key to Stability: In embedded systems, correctly managing memory, avoiding leaks, and understanding the OOM killer are essential for building reliable, long-running applications.
Further Reading
- Understanding the Linux Kernel, 3rd Edition by Daniel P. Bovet & Marco Cesati. (Specifically, Chapter 8: Memory Management). While slightly dated, its explanation of the core concepts is exceptional.
- Linux Kernel Documentation: The
proc
Filesystem. Available within the kernel source tree or online athttps://www.kernel.org/doc/html/latest/filesystems/proc.html
. The authoritative source for what these files contain. - What Every Programmer Should Know About Memory by Ulrich Drepper. A comprehensive and deep paper covering everything from RAM hardware to CPU caches and OS memory handling. https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
- The ARM Architecture Reference Manual for ARMv8-A. For those wanting to go deeper, this official document from ARM details the specifics of the MMU, page table entry formats, and TLB management for the architecture used in the Raspberry Pi 5.
- “How The Kernel Manages Your Memory” – An article on LWN.net. LWN.net frequently has high-quality, in-depth articles on kernel internals. Search their archives for memory management topics.
- Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. (Specifically, the chapters on Virtualization). An excellent and very readable academic textbook on OS concepts.