Chapter 54: Advanced File I/O: Memory-Mapped Files (mmapmunmap)

Chapter Objectives

Upon completing this chapter, you will be able to:

  • Understand the fundamental concepts of virtual memory and how memory-mapped I/O functions within the Linux kernel.
  • Explain the differences between traditional file I/O (read/write) and memory-mapped I/O (mmap), and identify scenarios where mmap is the superior choice.
  • Implement C programs on a Raspberry Pi 5 that use mmap to map files into a process’s address space for both reading and writing.
  • Configure memory mappings using different protection (PROT_*) and visibility (MAP_*) flags to control access and sharing behavior.
  • Debug common issues related to memory-mapped files, such as segmentation faults, bus errors, and synchronization problems.
  • Apply memory-mapped techniques to solve practical data-sharing and persistence problems in embedded systems.

Introduction

Performance is paramount in embedded Linux. Systems are often constrained by processing power and memory, yet are required to handle large volumes of data efficiently. Traditional file I/O, which relies on the read() and write() system calls, has served as the bedrock of data handling for decades. This model, however, introduces a layer of overhead. Data must be copied from the kernel’s page cache into a user-space buffer for a read(), and back again for a write(). For large files or performance-critical applications, this constant data movement between kernel and user space can become a significant bottleneck.

This chapter introduces a more elegant and powerful alternative: memory-mapped I/O. Using the mmap() system call, a process can ask the kernel to map a file directly into its virtual address space. Once mapped, the file can be accessed just like an array in memory, using simple pointer arithmetic. The tedious cycle of read()write(), and lseek() is replaced by direct memory access. This is not merely a convenience; it is a fundamental shift in how we interact with data. The kernel handles the loading of file pages into physical memory on-demand, a process known as demand paging. This lazy loading mechanism, combined with the elimination of explicit data copies, can yield dramatic performance improvements.

sequenceDiagram
    actor User as User Application
    participant Kernel
    participant Disk

    %% Traditional Read Path
    rect rgb(240, 240, 240)
        Note over User,Disk: Traditional I/O: read()
        User->>Kernel: read(fd, buffer, count)
        activate Kernel
        Note over Kernel,Disk: Kernel copies data from<br/>Disk to Page Cache
        Kernel->>Disk: Request data block
        Disk->>Kernel: Return data block
        Note over User,Kernel: Double Copy:<br/>1. Disk to Kernel Cache<br/>2. Kernel Cache to User Buffer
        Kernel->>User: Copy data to user buffer
        deactivate Kernel
    end

    %% mmap Path
    rect rgb(230, 245, 255)
        Note over User,Disk: Memory-Mapped I/O: mmap()
        User->>Kernel: mmap(fd, ...)
        activate Kernel
        Note over Kernel: Kernel sets up page table entries<br/>No data is copied yet
        Kernel->>User: Return pointer to mapped region
        deactivate Kernel

        User->>User: Access *ptr
        Note over User,Disk: Page Fault on first access<br/>Kernel handles loading from disk directly<br/>into a shared frame. No extra copy to user space
        activate Kernel
        Kernel->>Disk: Request data block (on-demand)
        Disk->>Kernel: Return data block to page cache
        deactivate Kernel
    end

Real-world applications of mmap are widespread and impactful. High-performance databases use it to manage large data files, dynamic linkers use it to load shared libraries into memory, and scientific applications use it to process massive datasets that would otherwise not fit in physical RAM. In this chapter, you will move beyond the theory and gain hands-on experience. Using your Raspberry Pi 5, you will learn to map files, manipulate their contents directly in memory, and understand the subtle but crucial differences between shared and private mappings. By the end, you will have a powerful new tool in your system programming arsenal, enabling you to build more efficient and sophisticated embedded applications.

Technical Background

To truly appreciate the power and elegance of mmap, one must first have a solid understanding of the virtual memory system that underpins modern operating systems like Linux. Every process running on the system operates within its own private, virtual address space, a conceptual sandbox that isolates it from other processes and the kernel itself. This address space is a contiguous range of memory addresses, typically from zero up to a very large number, that the process can use. However, these virtual addresses are not the same as the physical addresses of the RAM chips in the computer.

The magic of translation is handled by a collaboration between the operating system’s kernel and the CPU’s Memory Management Unit (MMU). The MMU is a piece of hardware that translates virtual addresses generated by the CPU into physical addresses in RAM. The kernel maintains a set of tables, called page tables, for each process. These tables store the mappings between the process’s virtual pages (chunks of virtual memory) and the physical frames (chunks of physical RAM) they correspond to. When a process accesses a memory location, the MMU uses these page tables to find the correct physical location.

This architecture is what makes mmap possible. The mmap() system call is essentially a request to the kernel to create a new mapping in the calling process’s page tables. Instead of mapping a virtual page to an anonymous frame of physical RAM (as is the case for normal program memory allocated by malloc), mmap maps it to a specific portion of a file on disk.

When a process first calls mmap to map a file, the kernel does not immediately read the entire file into memory. It simply sets up the necessary virtual memory structures and updates the process’s page tables to reflect the new mapping. The actual loading of data is deferred until the process attempts to access a memory address within the mapped region. The first time this happens, the MMU will find no valid physical memory mapping for that virtual address and will trigger a hardware exception called a page fault.

This fault is not an error in the traditional sense. It is a signal to the kernel that it needs to intervene. The kernel’s page fault handler inspects the address that caused the fault, determines that it belongs to a memory-mapped region, and identifies which part of the file corresponds to the requested page. It then allocates a physical frame of RAM, reads the relevant data from the file on disk into that frame, and updates the process’s page table to map the virtual page to the newly loaded physical frame. Finally, it resumes the process, which can now access the memory location as if it had been in RAM all along. This entire process is transparent to the application and is known as demand paging.

flowchart TD
    subgraph Process Execution
        A["Start: Process calls mmap()"]
        B{Access memory in<br>mapped region?}
        C[Pointer dereference: `*ptr`]
    end

    subgraph Kernel & MMU Interaction
        D{Page in<br>Physical RAM?}
        E[MMU triggers<br><b>Page Fault</b>]
        F[Kernel Page Fault Handler]
        G[Find VMA for address]
        H[Locate corresponding<br>block in file]
        I[Read file block<br>from disk into a<br>free RAM page]
        J[Update Page Table:<br>Map virtual page to<br>new physical RAM page]
    end

    subgraph Result
        K[Access Granted:<br>Data is returned to process]
        L[Resume Process Execution]
    end

    A --> B;
    B -- No --> B;
    B -- Yes --> C;
    C --> D;
    D -- Yes --> K;
    D -- No --> E;
    E --> F;
    F --> G;
    G --> H;
    H --> I;
    I --> J;
    J --> L;
    K --> L;
    L --> B;

    %% Styling
    classDef primary fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff;
    classDef success fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff;
    classDef decision fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff;
    classDef process fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff;
    classDef check fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff;
    classDef kernel fill:#8b5cf6,stroke:#8b5cf6,stroke-width:1px,color:#ffffff;

    class A primary;
    class B,D decision;
    class C,I,J,L process;
    class E check;
    class F,G,H kernel;
    class K success;

The mmap System Call

The mmap system call is the core of this mechanism. Its prototype, found in <sys/mman.h>, is as follows:

C
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

Let’s break down these arguments in detail.

  • void *addr: This argument is a hint to the kernel about where to place the mapping in the virtual address space. In modern practice, you should almost always pass NULL for this argument. This lets the kernel choose a suitable, available address, which is far more portable and reliable than trying to manage the address layout yourself.
  • size_t length: This specifies the number of bytes to be mapped, starting from offset. It does not need to be the entire file, allowing you to map just a specific segment of a larger file.
  • int prot: This argument controls the memory protection of the mapping and is crucial for security and correctness. It is a bitmask created by OR-ing together several flags:
    • PROT_READ: The pages can be read.
    • PROT_WRITE: The pages can be written. Attempting to write to a mapping without this flag will result in a segmentation fault.
    • PROT_EXEC: The pages can be executed. This is used by dynamic loaders for shared libraries but should be used with extreme caution in general applications due to security implications (e.g., enabling buffer overflow attacks).
    • PROT_NONE: The pages cannot be accessed at all.
  • int flags: This argument specifies the type of mapping and other options. The most important choice is between MAP_SHARED and MAP_PRIVATE.
    • MAP_SHARED: This is the key to sharing data. If a process writes to a region mapped with MAP_SHARED, the modification is carried back to the underlying file on disk. Furthermore, this change becomes visible to any other process that has also mapped the same file with MAP_SHARED. This is a highly efficient mechanism for inter-process communication (IPC).
    • MAP_PRIVATE: This creates a copy-on-write (COW) mapping. When the process reads from the mapping, it sees the file’s contents. However, the first time the process attempts to write to the mapping, the kernel intercepts the action. It creates a private copy of the modified page in RAM, and the process’s page table is updated to point to this new private copy. All subsequent writes go to this private copy. The original file on disk is never changed, and these changes are not visible to any other process. This is useful when you want to work with a file’s data as a starting template without modifying the original.
Feature MAP_SHARED MAP_PRIVATE
Modification Behavior Writes to the memory region are propagated to the underlying file and become visible to other processes sharing the same mapping. Writes create a private “copy-on-write” (COW) page. Changes are visible only to the process that made them and do not affect the original file.
Use Case High-performance Inter-Process Communication (IPC), persistent data stores (databases), direct file editing. Working with a file’s data as a template without modifying the original, loading shared libraries (allows relocation without altering the library file).
Disk I/O Modifications will eventually be written to disk. Use msync() for explicit synchronization. The original file on disk is never modified by the mapping.
Performance Extremely efficient for sharing data between processes, as it avoids kernel-to-user copies. Very efficient for reading. A performance cost is incurred on the first write to a page, as the kernel must create a private copy.
Analogy Editing a shared Google Doc. Everyone sees the changes in real-time. Making a personal copy of a Google Doc. Your edits don’t affect the original document.
  • int fd: This is the file descriptor of the open file you wish to map. The file must be opened with permissions compatible with the prot flags. For example, to create a writable mapping (PROT_WRITE), the file must have been opened with write permissions (e.g., O_RDWR).
  • off_t offset: This is the starting offset in the file from where the mapping should begin. A critical requirement is that this offset must be a multiple of the system’s page size. The page size is a fundamental unit of memory management, and on most systems, including the Raspberry Pi, it is 4096 bytes (4 KiB). You can retrieve this value programmatically using sysconf(_SC_PAGESIZE).

As a summary:

Argument Type Description Common Value / Notes
addr void * A hint to the kernel for the starting virtual address of the mapping. Almost always pass NULL to let the kernel choose a suitable, available address. This enhances portability.
length size_t The number of bytes to map from the file into memory. Typically the file size obtained via fstat(). Can be smaller if mapping only a portion of the file.
prot int Desired memory protection for the mapped region. This is a bitmask. Combine flags like PROT_READ, PROT_WRITE, and PROT_EXEC using the bitwise OR operator (|). Must be compatible with the file’s open permissions.
flags int Specifies the type of mapping and various options. The most critical choice is between MAP_SHARED (changes affect the file) and MAP_PRIVATE (copy-on-write).
fd int The file descriptor of the file to be mapped. Must be a valid file descriptor from a successful open() call. For a writable map, the file must be opened with write permissions (e.g., O_RDWR).
offset off_t The offset in the file where the mapping should begin. Crucial: This value must be a multiple of the system’s page size (use sysconf(_SC_PAGESIZE) to get this value). An invalid offset is a common cause of EINVAL errors.

Upon success, mmap returns a pointer to the start of the mapped memory region. On failure, it returns MAP_FAILED, which is a macro for (void *) -1, and errno is set to indicate the error.

Cleaning Up with munmap and msync

A mapping created with mmap persists until the process terminates or until it is explicitly removed with the munmap() system call. It is essential to clean up mappings to release the virtual address space they occupy. The prototype is simple:

C
int munmap(void *addr, size_t length);

Here, addr is the starting address returned by mmap, and length is the size of the mapping. A common source of bugs is a mismatch between the length provided to mmap and munmap. It’s safest to always unmap the exact same size that was originally mapped.

For MAP_SHARED mappings, modifications made to the memory region are not guaranteed to be written to the underlying file immediately. The kernel may buffer these changes in memory for efficiency. To explicitly control when data is written to disk, you can use the msync() system call:

C
int msync(void *addr, size_t length, int flags);

The flags argument controls the synchronization behavior:

Flag Behavior Use Case
MS_SYNC Synchronous Write: The system call blocks (waits) until all the modified data in the specified memory range has been physically written to the underlying disk. Maximum Data Integrity. Use this when you need to be certain data is saved before proceeding (e.g., committing a database transaction, saving a critical configuration file).
MS_ASYNC Asynchronous Write: The system call initiates the write operation but returns immediately, without waiting for the I/O to complete. The kernel will write the data to disk at its convenience. Higher Performance. Use this when responsiveness is more important than immediate on-disk persistence. Good for autosave features where losing a few seconds of data in a crash is acceptable.
MS_INVALIDATE Asks the kernel to invalidate other processes’ cached copies of the same file data. When those processes next access the data, they will be forced to re-read it from storage. Advanced/Niche. Rarely used in typical applications. Primarily relevant for clustered or distributed file systems to ensure cache coherency across multiple machines.

Using msync is critical in applications where data integrity is paramount, such as a database system. It ensures that after a successful msync call, the data is safely stored on the persistent medium, protecting it against a subsequent system crash or power failure.

Practical Examples

The following examples are designed to be compiled and run on a Raspberry Pi 5 running Raspberry Pi OS or a similar Linux distribution. You will need the standard C development tools (gccmake).

Example 1: Basic File Editing with mmap

This first example demonstrates the fundamental use of mmap to read from and write to a file. We will create a text file, map it into memory, modify its contents via a pointer, and then verify that the underlying file has changed. This showcases the power of MAP_SHARED.

Build and Configuration Steps

1. Create a test file. On your Raspberry Pi’s terminal, create a simple text file that we will manipulate.

Bash
echo "Hello Embedded Linux World!" > mmap_test.txt

2. Create the C source file. Using a text editor like nano or vim, create a file named mmap_editor.c.

Bash
nano mmap_editor.c

Code Snippet

Copy and paste the following C code into mmap_editor.c. The comments explain each step of the process.

C
// mmap_editor.c
// A simple program to demonstrate file editing using mmap.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>      // For O_RDWR
#include <unistd.h>     // For open(), close(), ftruncate()
#include <sys/mman.h>   // For mmap(), munmap()
#include <sys/stat.h>   // For fstat()

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    const char *filepath = argv[1];
    const char *new_text = "Greetings from Raspberry Pi 5!";

    // 1. Open the file for reading and writing.
    int fd = open(filepath, O_RDWR);
    if (fd == -1) {
        perror("Error opening file");
        exit(EXIT_FAILURE);
    }

    // 2. Get file statistics to determine its size.
    struct stat file_stat;
    if (fstat(fd, &file_stat) == -1) {
        perror("Error getting file size");
        close(fd);
        exit(EXIT_FAILURE);
    }
    off_t file_size = file_stat.st_size;
    printf("Original file size: %ld bytes\n", file_size);

    // 3. Map the file into memory.
    //    - addr = NULL: Let the kernel choose the address.
    //    - length = file_size: Map the entire file.
    //    - prot = PROT_READ | PROT_WRITE: We want to read and write.
    //    - flags = MAP_SHARED: Changes should be written back to the file.
    //    - fd: The file descriptor of our open file.
    //    - offset = 0: Start the mapping from the beginning of the file.
    char *mapped_region = mmap(NULL, file_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (mapped_region == MAP_FAILED) {
        perror("Error mapping file");
        close(fd);
        exit(EXIT_FAILURE);
    }

    // The file descriptor is no longer needed after mmap, so we can close it.
    // The mapping will remain active.
    close(fd); 

    // 4. Interact with the file as if it were a character array in memory.
    printf("Original file content: %s\n", mapped_region);

    // Overwrite the beginning of the file with our new text.
    // We must be careful not to write past the end of the mapped region (file_size).
    size_t new_text_len = strlen(new_text);
    if (new_text_len > file_size) {
        fprintf(stderr, "New text is larger than the file. Truncating.\n");
        new_text_len = file_size;
    }
    
    // Use memcpy to safely copy the data.
    memcpy(mapped_region, new_text, new_text_len);
    printf("Modified the mapped memory.\n");

    // 5. Synchronize the changes back to the disk.
    //    This ensures the data is persistently stored.
    if (msync(mapped_region, file_size, MS_SYNC) == -1) {
        perror("Error syncing file to disk");
    }
    printf("msync complete. Changes should be on disk.\n");

    // 6. Unmap the memory region.
    //    This is crucial to release the resources.
    if (munmap(mapped_region, file_size) == -1) {
        perror("Error unmapping file");
        // Continue to exit, but report the error.
    }

    printf("Program finished successfully.\n");
    return 0;
}

Build, Flash, and Boot Procedures

This example doesn’t involve flashing a device, as we are running it directly on the Raspberry Pi’s OS.

1. Compile the code. Use gcc to compile the program. The -o flag specifies the name of the output executable.

Bash
gcc mmap_editor.c -o mmap_editor

2. Run the program. Execute the compiled program, passing the mmap_test.txt file as an argument.

Bash
./mmap_editor mmap_test.txt

Expected Output

The program will print messages indicating its progress:

Plaintext
Original file size: 26 bytes
Original file content: Hello Embedded Linux World!
Modified the mapped memory.
msync complete. Changes should be on disk.
Program finished successfully.

Now, check the contents of the file to verify the change:

Bash
cat mmap_test.txt

The output should be the new text we wrote from our program:

Plaintext
Greetings from Raspberry Pi 5!

This confirms that by simply modifying a memory region through a pointer, we have successfully edited the underlying file on the disk, thanks to the MAP_SHARED flag.

Example 2: MAP_PRIVATE vs. MAP_SHARED

This example clearly illustrates the fundamental difference between MAP_PRIVATE and MAP_SHARED. We will map the same file twice, once with each flag, modify both mappings, and observe the effect on the original file.

Build and Configuration Steps

1. Reset the test file. Let’s restore our test file to its original state.

Bash
echo "Original Data for Comparison" > compare.txt

2. Create the C source file. Create a new file named mmap_compare.c.

Bash
nano mmap_compare.c

Code Snippet

Copy the following code into mmap_compare.c.

C
// mmap_compare.c
// Demonstrates the difference between MAP_SHARED and MAP_PRIVATE.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>

void map_and_modify(const char* filepath, int flags, const char* modification) {
    printf("\n--- Testing with %s ---\n", (flags == MAP_SHARED) ? "MAP_SHARED" : "MAP_PRIVATE");

    int fd = open(filepath, O_RDWR);
    if (fd == -1) {
        perror("open");
        return;
    }

    struct stat file_stat;
    if (fstat(fd, &file_stat) == -1) {
        perror("fstat");
        close(fd);
        return;
    }
    off_t file_size = file_stat.st_size;

    char *map = mmap(NULL, file_size, PROT_READ | PROT_WRITE, flags, fd, 0);
    if (map == MAP_FAILED) {
        perror("mmap");
        close(fd);
        return;
    }
    close(fd);

    printf("Original content in mapping: '%.*s'\n", (int)file_size, map);

    // Modify the memory
    strncpy(map, modification, file_size);
    printf("Content after modification:  '%.*s'\n", (int)file_size, map);
    
    // For MAP_SHARED, we sync to ensure the change is written back.
    if (flags == MAP_SHARED) {
        if (msync(map, file_size, MS_SYNC) == -1) {
            perror("msync");
        }
    }

    if (munmap(map, file_size) == -1) {
        perror("munmap");
    }
    
    printf("--- Test Finished ---\n");
}

int main() {
    const char *filename = "compare.txt";
    const char *shared_mod = "SHARED_WRITE";
    const char *private_mod = "PRIVATE_WRITE";

    // Test with MAP_PRIVATE first
    map_and_modify(filename, MAP_PRIVATE, private_mod);

    // Check the file's content after the private mapping test
    printf("\nContent of '%s' after MAP_PRIVATE test:\n", filename);
    system("cat compare.txt");

    // Now test with MAP_SHARED
    map_and_modify(filename, MAP_SHARED, shared_mod);

    // Check the file's content after the shared mapping test
    printf("\nContent of '%s' after MAP_SHARED test:\n", filename);
    system("cat compare.txt");

    return 0;
}

Build and Run

1. Compile the code.

Bash
gcc mmap_compare.c -o mmap_compare

2. Run the executable.

Bash
./mmap_compare

Expected Output

The output will clearly show the different behaviors:

Plaintext
--- Testing with MAP_PRIVATE ---
Original content in mapping: 'Original Data for Comparison'
Content after modification:  'PRIVATE_WRITEr Comparison'
--- Test Finished ---

Content of 'compare.txt' after MAP_PRIVATE test:
Original Data for Comparison

--- Testing with MAP_SHARED ---
Original content in mapping: 'Original Data for Comparison'
Content after modification:  'SHARED_WRITEor Comparison'
--- Test Finished ---

Content of 'compare.txt' after MAP_SHARED test:
SHARED_WRITEor Comparison

As you can see, the modification made to the MAP_PRIVATE mapping was discarded when the mapping was unmapped; the original file remained unchanged. This is the copy-on-write mechanism in action. Conversely, the modification to the MAP_SHARED mapping was successfully propagated back to the file, permanently altering its content.

Common Mistakes & Troubleshooting

While mmap is powerful, its direct memory access nature means that errors can have more severe consequences than with traditional I/O. Here are some common pitfalls and how to avoid them.

Mistake / Issue Symptom(s) Troubleshooting / Solution
Segmentation Fault (SIGSEGV) Process crashes immediately upon trying to access mapped memory. 1. Writing to a read-only map:
Ensure you used PROT_WRITE in mmap() and opened the file with O_RDWR.

2. Accessing out of bounds:
Verify pointer arithmetic. Do not access memory beyond the length specified in mmap(). Use memcpy or strncpy instead of unsafe functions like strcpy.
Bus Error (SIGBUS) Process crashes. Often happens when accessing memory that should be valid, but isn’t. This typically means the file backing the mapping is smaller than the mapped region. This can happen if another process truncates the file after you’ve mapped it.

Solution: Ensure the file size is stable during mapping. In complex scenarios, you might need a signal handler to catch SIGBUS and react gracefully.
Invalid Argument (EINVAL) mmap() call fails immediately, returning MAP_FAILED and setting errno to EINVAL. The most common cause is an offset that is not a multiple of the system’s page size.

Solution: Retrieve the page size with sysconf(_SC_PAGESIZE). Ensure your offset is aligned to this boundary (e.g., offset % page_size == 0).
Resource Leak A long-running process (like a daemon) slowly consumes more and more virtual memory over time, eventually leading to errors or crashes. You are calling mmap() but failing to call munmap() on all possible code paths (e.g., in an error-handling branch).

Solution: Treat mmap/munmap like malloc/free. Ensure every successful mmap() has a corresponding munmap() call before the process exits or loses the pointer.

Exercises

  1. Modify the Editor: Take the mmap_editor.c program and modify it to append text to the end of the file instead of overwriting the beginning. This will require you to use ftruncate() to increase the file’s size before you map it. The new size should accommodate the original content plus the new text.
  2. Implement File Copy with mmap: Write a new program called mmap_copy.c that copies a source file to a destination file. It should take two command-line arguments: source_path and destination_path. The logic should be:
    • Open the source file and map it into memory with PROT_READ.
    • Create or truncate the destination file, open it, and use ftruncate() to set its size to be the same as the source file.
    • Map the destination file into memory with PROT_READ | PROT_WRITE.
    • Use a single memcpy() call to copy the data from the source mapping to the destination mapping.
    • Clean up all mappings and file descriptors.
  3. Page-Alignment Calculator: Write a small utility that takes a file offset as a command-line argument. The program should print the system’s page size, the original offset, and the calculated page-aligned offset required for mmap. This reinforces the understanding of the page-alignment requirement. Use sysconf(_SC_PAGESIZE) to get the page size.
  4. Shared Counter: Write two separate programs. The first, mmap_init.c, should create a file, truncate it to the size of a single long int, map it, write the value 0 to it, and then unmap. The second program, mmap_increment.c, should map the same file, read the long int, print it, increment it, write it back, and unmap. Run mmap_init once, and then run mmap_increment multiple times in a row. Observe how the value persists and is shared between separate invocations of the program. This simulates a simple form of persistent, shared state.
  5. Exploring MAP_PRIVATE: Modify the mmap_compare.c program. After modifying the MAP_PRIVATE mapping, instead of immediately unmapping, add a sleep(30) call. While the program is sleeping, open another terminal and inspect the contents of the compare.txt file using cat. Then, from a third terminal, find the process ID (PID) of your sleeping program and inspect its virtual memory map using cat /proc/<PID>/maps. This will show you the memory mappings for the process. Try to identify the private, anonymous page that was created for your copy-on-write modification. This provides a deeper, practical look into what the kernel is doing behind the scenes.

Summary

  • Memory-mapped I/O is a high-performance alternative to traditional read/write system calls, eliminating memory copies between the kernel and user space.
  • The mmap() system call requests the kernel to map a file directly into a process’s virtual address space.
  • The kernel uses demand paging to load file data into physical RAM only when it is actually accessed by the program, triggered by a page fault.
  • MAP_SHARED creates a mapping where modifications are written back to the underlying file and are visible to other processes that have also mapped the file.
  • MAP_PRIVATE creates a copy-on-write (COW) mapping, where modifications are made to a private copy in memory and do not affect the original file.
  • The offset argument to mmap() must be a multiple of the system’s page size.
  • munmap() is essential for releasing the mapped region and avoiding resource leaks in long-running applications.
  • msync() provides explicit control over synchronizing changes in a MAP_SHARED mapping with the persistent storage.
  • Common errors include segmentation faults from access violations and bus errors from accessing parts of a file that no longer exist.

Further Reading

  1. Linux man-pages: The authoritative source. Read them carefully on your system:
    • man 2 mmap
    • man 2 munmap
    • man 2 msync
  2. The Linux Programming Interface by Michael Kerrisk. Chapter 49 provides an exhaustive and excellent treatment of memory mappings.
  3. Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen A. Rago. A classic text with deep insights into UNIX/Linux system calls.
  4. LWN.net: An excellent source for in-depth articles on kernel internals. Search for articles related to the memory management subsystem and mmap.
  5. POSIX.1-2017 Standard: The official standard defining mmap and related functions. Available from The Open Group website.
  6. “Anatomy of a Program in Memory” – A classic article explaining process memory layout, which provides crucial context for understanding virtual address space. (Many versions of this exist online; find a well-regarded one).
  7. Raspberry Pi Documentation: While not specific to mmap, the official hardware documentation can provide context for the underlying architecture (e.g., MMU capabilities). https://www.raspberrypi.com/documentation/

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top