Chapter 50: File I/O System Calls: lseek(), stat(), fstat(), lstat()

Chapter Objectives

Upon completing this chapter, you will be able to:

  • Understand and manipulate the file offset of an open file descriptor using the lseek() system call.
  • Retrieve detailed metadata about files, such as size, permissions, and timestamps, using the stat(), fstat(), and lstat() system calls.
  • Differentiate between the stat, fstat, and lstat functions, particularly in their handling of symbolic links.
  • Implement robust C programs that use file positioning and metadata to perform advanced file operations on an embedded Linux system.
  • Debug common issues related to file I/O, such as incorrect file positioning and permission errors.
  • Apply these system calls to solve practical problems in embedded applications, like managing data logs or verifying file integrity.

Introduction

We have established how to create, read, and write files using fundamental system calls like open(), read(), and write() in our journey through Linux system programming. These operations form the bedrock of all I/O, but they treat files as simple, sequential streams of bytes. In the world of embedded systems, this is often not enough. Consider a data logger on a remote environmental sensor. It might write thousands of data points to a single file every day. If you need to retrieve a specific record from the middle of the day, reading the entire file sequentially would be incredibly inefficient, wasting precious CPU cycles and power—two resources that are often scarce in embedded devices. Likewise, how does a system know if it has permission to write to a log file, or when a configuration file was last modified?

This is where the concepts of file positioning and metadata become critical. This chapter introduces the essential system calls that allow a program to move beyond simple sequential access and to query the underlying properties of a file. We will explore the lseek() system call, which acts like a cursor, allowing you to move the read/write position within a file to any desired location. This capability is the foundation for implementing record-based access, building simple databases, and efficiently parsing complex binary file formats. We will then delve into the stat() family of functions (stat(), fstat(), and lstat()), which serve as the system’s inquiry desk for files. These calls allow you to ask the kernel for a file’s “biography”—its size, ownership, permissions, modification times, and more. For an embedded system, this information is vital for tasks ranging from file system management and log rotation to security verification and over-the-air update mechanisms. By mastering these tools, you will unlock a more powerful and efficient way to interact with the file system, a skill indispensable for any embedded Linux developer.

Technical Background

At the heart of how the Linux kernel manages file I/O is a simple yet powerful abstraction: the file offset. Every time a process opens a file, the kernel maintains a pointer, often called the “current file offset” or “read/write pointer,” which indicates the location for the next read() or write() operation. When you read or write n bytes, this offset automatically advances by n bytes. This is why repeated calls to read() work their way through a file sequentially. The lseek() system call gives you direct control over this pointer, allowing you to move it to an arbitrary position within the file, thereby breaking the bonds of purely sequential access.

The lseek() System Call: Navigating Within a File

The lseek() system call is your primary tool for file positioning. Its function prototype, found in <unistd.h>, is deceptively simple:

C
off_t lseek(int fd, off_t offset, int whence);

Let’s dissect its parameters. The first, fd, is the file descriptor returned by a successful open() call, identifying the file you wish to manipulate. The second parameter, offset, is a value of type off_t (a signed integer type defined to be large enough to hold file offsets) that specifies the distance to move. The third parameter, whence, is the crucial one; it defines the reference point from which the offset is measured. The POSIX standard defines three possible values for whence:

`whence` Value Description Common Use Case
SEEK_SET The file offset is set to offset bytes from the beginning of the file. Absolute positioning. lseek(fd, 0, SEEK_SET); rewinds to the start.
SEEK_CUR The file offset is set to its current position plus offset bytes. Relative positioning. Skipping forward or backward a known number of bytes.
SEEK_END The file offset is set to the end of the file plus offset bytes. Positioning relative to the end. lseek(fd, 0, SEEK_END); gets file size.

The return value of lseek() is the new file offset in bytes from the beginning of the file upon success. If an error occurs, it returns (off_t)-1 and sets errno to indicate the specific error. A common error is ESPIPE, which occurs if you try to use lseek() on a file descriptor that does not support seeking, such as a pipe, FIFO, or socket. These are true streams, not files stored on a block device, and the concept of a “position” within them is meaningless.

flowchart TD
    A["Call lseek(fd, offset, whence)"];
    B{"Is return value == (off_t)-1?"};
    C[/"<b>Error:</b><br>Call perror('lseek')<br>Handle error (e.g., exit)"/];
    D[("<b>Success:</b><br>The returned value is the<br>new file offset (in bytes).<br>Proceed with read/write.")]

    A --> B;
    B -- Yes --> C;
    B -- No --> D;

    classDef process fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff;
    classDef decision fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff;
    classDef error fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff;
    classDef success fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff;

    class A process;
    class B decision;
    class C error;
    class D success;

One powerful trick with lseek() is determining the size of a file without reading it. By calling lseek(fd, 0, SEEK_END);, you position the offset at the end of the file and the return value is precisely the size of the file in bytes. This is often more efficient than reading the entire file or even using the stat() system call if the file is already open.

It’s also important to understand that the file offset is a property of the open file description in the kernel, not the process itself. If a process forks, the parent and child share the same open file description and thus the same file offset. A change to the offset by the parent will be seen by the child, and vice versa. This can be a source of subtle bugs if not handled with care.

The stat() Family: Uncovering File Metadata

While lseek() lets you navigate the contents of a file, the stat() family of system calls lets you inspect its properties. These properties, collectively known as metadata, are stored in a data structure called an inode (index node) on most Linux file systems like ext4. The inode contains nearly everything about a file except for its name and its actual data content. The file’s name is stored in the directory entry, which then points to the corresponding inode.

The three related system calls for retrieving this information are stat(), fstat(), and lstat(). They all populate the same data structure, struct stat, but differ in how they identify the target file.

The struct stat is defined in <sys/stat.h> and is the cornerstone of file metadata retrieval. While the exact fields can vary slightly across UNIX-like systems, the POSIX standard guarantees the presence of several key members. On a modern Linux system, the structure looks something like this:

C
struct stat {
    dev_t     st_dev;     /* ID of device containing file */
    ino_t     st_ino;     /* Inode number */
    mode_t    st_mode;    /* File type and mode */
    nlink_t   st_nlink;   /* Number of hard links */
    uid_t     st_uid;     /* User ID of owner */
    gid_t     st_gid;     /* Group ID of owner */
    dev_t     st_rdev;    /* Device ID (if special file) */
    off_t     st_size;    /* Total size, in bytes */
    blksize_t st_blksize; /* Block size for filesystem I/O */
    blkcnt_t  st_blocks;  /* Number of 512B blocks allocated */
    struct timespec st_atim;  /* Time of last access */
    struct timespec st_mtim;  /* Time of last modification */
    struct timespec st_ctim;  /* Time of last status change */
};

Let’s explore the most important fields in the context of embedded systems:

  • st_mode: This is a bitmask that holds two crucial pieces of information: the file type and the file permissions. A set of macros is provided to test the file type. For example, S_ISREG() checks if it’s a regular file, S_ISDIR() for a directory, S_ISLNK() for a symbolic link, and S_ISCHR() or S_ISBLK() for character or block special files (device files), which are extremely common in embedded Linux. The lower bits of st_mode contain the familiar read, write, and execute permissions for the owner, group, and others (e.g., S_IRUSR, S_IWUSR, S_IXUSR).
  • st_uid and st_gid: These fields identify the user and group that own the file. In an embedded context, this is critical for security. System configuration files should be owned by root, and daemons should run with the minimum necessary privileges, which often involves setting up specific users and groups.
  • st_size: This gives the size of the file in bytes. For a regular file, this is the amount of data it contains. For a symbolic link, it’s the length of the pathname it points to. For a directory, the size is implementation-dependent but is typically a multiple of the block size.
  • st_mtim: This is a timespec structure holding the time of the file’s last modification. This is invaluable for checking if a configuration file has been updated, if a data log is fresh, or for implementing caching mechanisms. The timespec struct itself has two members: tv_sec (seconds since the Unix Epoch) and tv_nsec (nanoseconds).
  • st_nlink: This field counts the number of hard links to the file. A file is only truly deleted from the file system when its link count drops to zero and no process has it open.

`st_mode` File Type Macros

Macro Returns True If File Is A…
S_ISREG(mode) Regular File
S_ISDIR(mode) Directory
S_ISLNK(mode) Symbolic Link
S_ISCHR(mode) Character Special File (e.g., /dev/tty)
S_ISBLK(mode) Block Special File (e.g., /dev/sda)
S_ISFIFO(mode) FIFO (Named Pipe)
S_ISSOCK(mode) Socket

The Three Flavors of stat

Now, let’s look at the three functions that populate this structure.

  1. int stat(const char *pathname, struct stat *statbuf);The stat() call takes a file path (e.g., /etc/config.txt) as input. It retrieves the metadata for the file at that path and fills the statbuf structure. If the pathname refers to a symbolic link, stat() will “follow” the link and return the information for the file the link points to, not the link itself.
  2. int fstat(int fd, struct stat *statbuf);The fstat() call operates on an already open file descriptor (fd). This is often more efficient than stat() if you are already working with the file, as it avoids the overhead of the kernel having to look up the path name again. Like stat(), if the file descriptor refers to a symbolic link (which can happen if you open a link with O_PATH), fstat() will also follow it and report on the target file.
  3. int lstat(const char *pathname, struct stat *statbuf);The lstat() call is the special case. It also takes a file path as input, just like stat(). However, if pathname is a symbolic link, lstat() does not follow it. Instead, it returns information about the symbolic link file itself. This is the key difference and the primary reason for lstat()’s existence. If you need to know the size of the link file, its owner, or simply to confirm that a given path is a symbolic link, lstat() is the tool you must use.
graph TD
    subgraph User Input
        A[/"<b>Path:</b><br><i>/path/to/link</i>"/]
        B[/"<b>File Descriptor:</b><br><i>fd = open(/path/to/link, ...)</i>"/]
    end

    subgraph Filesystem
        C(<b>Symbolic Link</b><br><i>mylink</i>)
        D((<b>Target File</b><br><i>real_file.txt</i>))
        C -- "points to" --> D
    end

    subgraph System Calls
        subgraph "stat()"
            direction LR
            stat_in(Path) --> stat_call{"stat()"}
        end
        subgraph "fstat()"
            direction LR
            fstat_in(fd) --> fstat_call{"fstat()"}
        end
        subgraph "lstat()"
            direction LR
            lstat_in(Path) --> lstat_call{"lstat()"}
        end
    end

    subgraph "Result: struct stat for..."
        E[("Target File's<br>Metadata")]
        F[("Symbolic Link's<br>Metadata")]
    end

    A --> stat_in
    B --> fstat_in
    A --> lstat_in

    stat_call -- "Follows link" --> E
    fstat_call -- "Follows link" --> E
    lstat_call -- "<b>Does NOT</b> follow link" --> F

    classDef primary fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff;
    classDef process fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff;
    classDef success fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff;
    classDef special fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff;
    classDef system fill:#8b5cf6,stroke:#8b5cf6,stroke-width:1px,color:#ffffff;

    class A,B primary;
    class C,D system;
    class stat_call,fstat_call,lstat_call process;
    class E success;
    class F special;

In summary, lseek() provides the means to control the position of I/O operations within a file, enabling random access patterns that are essential for performance in many embedded applications. The stat family of calls provides the complementary ability to inspect a file’s metadata, which is fundamental for file management, security, and system integrity. Together, they represent a significant step up from basic sequential I/O, giving the developer fine-grained control over how the system interacts with the underlying file system.

Practical Examples

Theory provides the foundation, but true understanding comes from hands-on implementation. In this section, we will use the Raspberry Pi 5 to explore practical applications of lseek() and the stat() family. We will write C programs that you can compile and run directly on your device to see these system calls in action.

Tip: All examples can be compiled on your Raspberry Pi 5 using gcc. For example, to compile a file named my_program.c, you would use the command: gcc -o my_program my_program.c.

Example 1: Using lseek() to Read a Specific Record

Imagine a data logging application that writes fixed-size records to a file. Each record represents a sensor reading and has a defined structure. Using lseek(), we can directly access any record without reading the preceding ones.

Scenario: A temperature sensor writes 16-byte records to temperatures.dat. We want to write a program to fetch the Nth record from this file.

File Structure (temperatures.dat):

This will be a binary file. We’ll first create a helper program to generate some sample data.

Data Generation Code (generate_data.c):

This program creates the temperatures.dat file with 10 sample records.

C
// generate_data.c
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>

// Our fixed-size record structure
struct sensor_record {
    long timestamp;
    float temperature;
    char status_flags;
    char reserved[3]; // Padding for alignment
};

int main() {
    const char *filename = "temperatures.dat";
    int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) {
        perror("open");
        return 1;
    }

    printf("Generating sample data file: %s\n", filename);
    printf("Record size: %ld bytes\n", sizeof(struct sensor_record));

    for (int i = 0; i < 10; i++) {
        struct sensor_record rec;
        rec.timestamp = time(NULL) + (i * 10); // Timestamps 10s apart
        rec.temperature = 20.0f + (i * 1.5f);  // Temp increases
        rec.status_flags = 0x01;

        if (write(fd, &rec, sizeof(struct sensor_record)) != sizeof(struct sensor_record)) {
            perror("write");
            close(fd);
            return 1;
        }
        printf("Wrote record %d\n", i);
    }

    printf("Data generation complete.\n");
    close(fd);
    return 0;
}

Record Reading Code (read_record.c):

This program takes a record number as a command-line argument and uses lseek() to read and display it.

C
// read_record.c
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>

// The same record structure
struct sensor_record {
    long timestamp;
    float temperature;
    char status_flags;
    char reserved[3];
};

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <record_number>\n", argv[0]);
        return 1;
    }

    int record_num = atoi(argv[1]);
    if (record_num < 0) {
        fprintf(stderr, "Record number must be non-negative.\n");
        return 1;
    }

    const char *filename = "temperatures.dat";
    int fd = open(filename, O_RDONLY);
    if (fd == -1) {
        perror("open");
        return 1;
    }

    // Calculate the offset
    off_t offset = record_num * sizeof(struct sensor_record);

    // Use lseek to position the file offset
    off_t new_pos = lseek(fd, offset, SEEK_SET);
    if (new_pos == (off_t)-1) {
        perror("lseek");
        close(fd);
        return 1;
    }

    // Read the record from the calculated position
    struct sensor_record rec;
    ssize_t bytes_read = read(fd, &rec, sizeof(struct sensor_record));
    if (bytes_read == -1) {
        perror("read");
        close(fd);
        return 1;
    }

    if (bytes_read == 0) {
        fprintf(stderr, "Error: Reached end of file. Record %d does not exist.\n", record_num);
    } else if (bytes_read < sizeof(struct sensor_record)) {
        fprintf(stderr, "Warning: Read a partial record. File may be corrupt.\n");
    } else {
        char time_buf[100];
        strftime(time_buf, sizeof(time_buf), "%Y-%m-%d %H:%M:%S", localtime(&rec.timestamp));
        printf("--- Record %d ---\n", record_num);
        printf("Position: %ld bytes\n", new_pos);
        printf("Timestamp:   %s\n", time_buf);
        printf("Temperature: %.2f C\n", rec.temperature);
        printf("Status Flags: 0x%02X\n", rec.status_flags);
    }

    close(fd);
    return 0;
}

flowchart TD
    Start([Start Program]);
    Input["Get Record Number (N)<br>from Command Line"];
    OpenFile["open('temperatures.dat', O_RDONLY)"];
    CheckOpen{File Opened Successfully?};
    
    CalcOffset["Calculate Offset:<br><i>offset = N * sizeof(record)</i>"];
    Seek["lseek(fd, offset, SEEK_SET)"];
    CheckSeek{Seek Successful?};

    Read["read(fd, &buffer, sizeof(record))"];
    CheckRead{Bytes Read > 0?};
    
    Display["Display Record Data:<br>Timestamp, Temperature, etc."];
    End([End Program]);
    
    ErrorOpen[/"Display 'open' error"/];
    ErrorSeek[/"Display 'lseek' error"/];
    ErrorRead[/"Display 'read' error or EOF"/];

    Start --> Input;
    Input --> OpenFile;
    OpenFile --> CheckOpen;
    CheckOpen -- Yes --> CalcOffset;
    CheckOpen -- No --> ErrorOpen --> End;
    
    CalcOffset --> Seek;
    Seek --> CheckSeek;
    CheckSeek -- Yes --> Read;
    CheckSeek -- No --> ErrorSeek --> End;
    
    Read --> CheckRead;
    CheckRead -- Yes --> Display --> End;
    CheckRead -- No --> ErrorRead --> End;

    classDef start-end fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff;
    classDef process fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff;
    classDef decision fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff;
    classDef error fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff;
    classDef success fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff;

    class Start,End start-end;
    class Input,OpenFile,CalcOffset,Seek,Read,Display process;
    class CheckOpen,CheckSeek,CheckRead decision;
    class ErrorOpen,ErrorSeek,ErrorRead error;

Build and Run Steps:

1. Compile the data generator:

Bash
gcc -o generate_data generate_data.c

2. Run the generator:

Bash
./generate_data


This will create the temperatures.dat file in your current directory.

3. Compile the record reader:

Bash
gcc -o read_record read_record.c

4. Run the reader to fetch a specific record (e.g., record 5):

Bash
./read_record 5

Expected Output:

Plaintext
--- Record 5 ---
Position: 80 bytes
Timestamp:   2025-07-22 03:00:50
Temperature: 27.50 C
Status Flags: 0x01

This output demonstrates that lseek() successfully moved the file offset to 5 * 16 = 80 bytes before the read() call, allowing us to access the desired record directly.

Example 2: A Simple stat Implementation

Let’s build a utility that mimics the basic functionality of the stat command-line tool. This program will take a file path as an argument and print out its key metadata. This example highlights how to use the stat structure and the file type macros.

File Information Code (simple_stat.c):

C
// simple_stat.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
#include <unistd.h>

// Helper function to describe file type
const char* get_file_type(mode_t mode) {
    if (S_ISREG(mode)) return "Regular File";
    if (S_ISDIR(mode)) return "Directory";
    if (S_ISLNK(mode)) return "Symbolic Link";
    if (S_ISCHR(mode)) return "Character Device";
    if (S_ISBLK(mode)) return "Block Device";
    if (S_ISFIFO(mode)) return "FIFO/Pipe";
    if (S_ISSOCK(mode)) return "Socket";
    return "Unknown Type";
}

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <file_or_directory>\n", argv[0]);
        return 1;
    }

    const char *path = argv[1];
    struct stat file_stat;

    // Use lstat to get info. This won't follow symlinks.
    if (lstat(path, &file_stat) == -1) {
        perror("lstat");
        return 1;
    }

    printf("  File: %s\n", path);
    printf("  Size: %ld Bytes\n", file_stat.st_size);
    printf("  Type: %s\n", get_file_type(file_stat.st_mode));
    printf(" Inode: %ld\n", file_stat.st_ino);
    printf(" Links: %ld\n", file_stat.st_nlink);

    // Print permissions
    char perms[11];
    perms[0] = (S_ISDIR(file_stat.st_mode)) ? 'd' : '-';
    perms[1] = (file_stat.st_mode & S_IRUSR) ? 'r' : '-';
    perms[2] = (file_stat.st_mode & S_IWUSR) ? 'w' : '-';
    perms[3] = (file_stat.st_mode & S_IXUSR) ? 'x' : '-';
    perms[4] = (file_stat.st_mode & S_IRGRP) ? 'r' : '-';
    perms[5] = (file_stat.st_mode & S_IWGRP) ? 'w' : '-';
    perms[6] = (file_stat.st_mode & S_IXGRP) ? 'x' : '-';
    perms[7] = (file_stat.st_mode & S_IROTH) ? 'r' : '-';
    perms[8] = (file_stat.st_mode & S_IWOTH) ? 'w' : '-';
    perms[9] = (file_stat.st_mode & S_IXOTH) ? 'x' : '-';
    perms[10] = '\0';
    printf("Access: (%04o/%s)\n", file_stat.st_mode & 07777, perms);

    // Print Owner/Group IDs
    printf("   Uid: %d\n", file_stat.st_uid);
    printf("   Gid: %d\n", file_stat.st_gid);

    // Print timestamps
    char time_buf[100];
    strftime(time_buf, sizeof(time_buf), "%Y-%m-%d %H:%M:%S %z", localtime(&file_stat.st_atim.tv_sec));
    printf("Access: %s\n", time_buf);
    strftime(time_buf, sizeof(time_buf), "%Y-%m-%d %H:%M:%S %z", localtime(&file_stat.st_mtim.tv_sec));
    printf("Modify: %s\n", time_buf);
    strftime(time_buf, sizeof(time_buf), "%Y-%m-%d %H:%M:%S %z", localtime(&file_stat.st_ctim.tv_sec));
    printf("Change: %s\n", time_buf);

    return 0;
}

Build and Run Steps:

  1. Compile the program:
    gcc -o simple_stat simple_stat.c
  2. Create a test file and a symbolic link:
    echo "hello world" > testfile.txt
    ln -s testfile.txt mylink
  3. Run simple_stat on the regular file:
    ./simple_stat testfile.txt
  4. Run simple_stat on the symbolic link:
    ./simple_stat mylink
  5. Run simple_stat on a directory:
    ./simple_stat /etc
  6. Run simple_stat on a device file:
    ./simple_stat /dev/tty1

Expected Output for ./simple_stat mylink:

Plaintext
  File: mylink
  Size: 12 Bytes
  Type: Symbolic Link
 Inode: 123457
 Links: 1
Access: (0777/lrwxrwxrwx)
   Uid: 1000
   Gid: 1000
Access: 2025-07-22 03:05:10 +0300
Modify: 2025-07-22 03:05:10 +0300
Change: 2025-07-22 03:05:10 +0300

Notice that because we used lstat(), the type is correctly identified as “Symbolic Link” and the size is 12 bytes—the length of the string “testfile.txt”. If we had used stat(), it would have reported the details of testfile.txt instead. This example clearly illustrates the critical difference between the two calls.

Common Mistakes & Troubleshooting

When working with file positioning and metadata, developers often encounter a few common pitfalls. Understanding these ahead of time can save hours of debugging.

Mistake / Issue Symptom(s) Troubleshooting / Solution
Using stat() on a symlink when you need info about the link itself. Program behaves unexpectedly; e.g., a file-cleanup script refuses to delete a link because it thinks the link is a root-owned file. Use lstat() to get metadata about the symlink file. Use stat() only when you need metadata for the file the link points to.
Ignoring system call return values. Program crashes with a segmentation fault, or operates on garbage data. read() might return 0 unexpectedly. Always check for a -1 return value. If found, call perror("function_name") to print a meaningful error message based on errno.
Off-by-one errors in lseek() offset calculation. Reading the wrong record from a data file, or reading past the end of the file, resulting in a short read. Double-check your math. For N zero-indexed records of size S, the offset for record i is i * S. Test with a small, known data file.
lseek() past EOF followed by write(). File size suddenly becomes huge. ls -l shows a large size, but du -sh shows small disk usage. A “hole” of null bytes is created. This creates a sparse file. If unintentional, validate offsets before seeking. To get current file size, use lseek(fd, 0, SEEK_END).
Using lseek() on non-seekable files. lseek() returns -1 and errno is set to ESPIPE. You cannot seek on pipes, FIFOs, or sockets. They are data streams, not files on disk. Your application logic must treat them sequentially.
Incorrectly checking permissions with st_mode. A check for write permissions passes, but a subsequent write() call fails with “Permission Denied”. Permission checks are complex. The easiest and most reliable way is to use the access(path, W_OK) system call, which checks permissions for the current process’s effective user/group ID.

Exercises

These exercises are designed to reinforce the concepts of file positioning and metadata retrieval. Attempt them on your Raspberry Pi 5.

  1. File Size Calculator:
    • Objective: Write a C program named filesize that takes a filename as a command-line argument and prints its size in bytes.
    • Guidance: You must implement this in two ways within the same program:
      1. Using lseek() with SEEK_END on an open file descriptor.
      2. Using stat() and retrieving the st_size member.
    • Verification: The output from both methods should be identical. Compare your program’s output with the ls -l command.
  2. Log Appender:
    • Objective: Create a program logappend that takes a string as an argument and appends it as a new line to a file named app.log.
    • Guidance: Open the file using the O_RDWR | O_CREAT flags. Use lseek() to position the file offset to the end of the file before every write(). This ensures that even if other processes are writing to the log, your message will always be correctly appended.
    • Verification: Run the program multiple times with different messages. The app.log file should contain all messages in the correct order.
  3. Find Largest File:
    • Objective: Write a program findlarge that takes a directory path as an argument and recursively finds the largest regular file within that directory and its subdirectories.
    • Guidance: You will need to use functions for directory traversal (e.g., opendir(), readdir(), closedir()). For each entry, use lstat() to check if it’s a regular file or a directory. If it’s a file, get its size. If it’s a directory, make a recursive call. Keep track of the path and size of the largest file found so far.
    • Verification: Run your program on /usr/include and compare the result with what you might find using shell commands like find /usr/include -type f -printf "%s %p\n" | sort -nr | head -1.
  4. Symbolic Link Inspector:
    • Objective: Write a tool linkstat that takes a path as an argument. The program should identify if the path is a symbolic link. If it is, it should print information about the link itself (using lstat()) and then information about the file the link points to (using stat()). If the path is not a symbolic link, it should just print its stat() information.
    • Guidance: First, call lstat() on the path. Check the st_mode field with S_ISLNK(). If it’s a link, print the lstat data. Then, call stat() on the same path to get the target’s data. If it’s not a link, just call stat() and print the results.
    • Verification: Create a symbolic link and run your tool on it. Then run it on a regular file and a directory to see the different outputs.
  5. File Hole Puncher:
    • Objective: Create a program punchhole that creates a sparse file. It should take a filename and two numbers, offset and length, as arguments. The program should create a file, write 1KB of data at the beginning, then use lseek() to jump forward by offset bytes, and write another 1KB of data.
    • Guidance: Use lseek(fd, offset, SEEK_CUR) to create the gap. After creating the file, use ls -lh to see its apparent size and du -h to see its actual disk usage.
    • Verification: The apparent size reported by ls should be roughly 2KB + offset. The actual disk usage reported by du should be much smaller, typically just a few blocks (e.g., 8K), because the “hole” does not consume disk space.

Summary

  • File Offset: The Linux kernel maintains a file offset for each open file, indicating the position for the next read or write. This offset is shared between parent and child processes after a fork().
  • lseek() System Call: Provides direct control over the file offset, allowing for random access to file contents. It uses SEEK_SET for absolute, SEEK_CUR for relative, and SEEK_END for end-of-file positioning.
  • File Metadata: File properties like size, permissions, and timestamps are stored in inodes and can be retrieved using the stat() family of system calls.
  • struct stat: This structure is the container for all file metadata returned by the stat() calls. Key fields include st_mode (type and permissions), st_size (size in bytes), st_uid (owner), and st_mtim (modification time).
  • stat(), fstat(), and lstat(): These three functions retrieve file metadata. stat() uses a path and follows symbolic links. fstat() uses an open file descriptor and also follows links. lstat() uses a path but does not follow symbolic links, providing information about the link itself.
  • Practical Applications: These system calls are fundamental for building efficient and robust embedded applications, including data loggers, configuration managers, file system utilities, and security-monitoring tools.

Further Reading

  1. Linux man-pages: The official documentation is the most authoritative source. https://man7.org/linux/man-pages/
    • man 2 lseek
    • man 2 stat
    • man 2 fstat
    • man 2 lstat
  2. The Linux Programming Interface by Michael Kerrisk. Chapters 4 and 5 provide an exhaustive treatment of file I/O, and Chapter 15 covers file attributes in great detail.
  3. Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen A. Rago. A classic text that provides deep insights into the behavior of these system calls across various UNIX-like systems.
  4. POSIX.1-2017 Standard: The official standard defining the behavior of these functions. You can find the specifications for lseek() and stat() on the Open Group’s website.
  5. Raspberry Pi Documentation: While not specific to these system calls, the official hardware and software documentation can provide context for how the underlying file systems are used on the device. https://www.raspberrypi.com/documentation/
  6. “How to use lseek” – GeeksforGeeks: A good tutorial with simple examples that can serve as a quick reference. https://www.geeksforgeeks.org/cpp/lseek-in-c-to-read-the-alternate-nth-byte-and-write-it-in-another-file/
  7. LWN.net: An excellent source for deep dives into kernel-level implementation details and the history behind certain system call behaviors. Searching the archives for lseek or stat can yield fascinating articles.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top