Chapter 227: Critical Sections and Interrupt Handling

Chapter Objectives

By the end of this chapter, you will be able to:

  • Define what a race condition is and why it’s a problem in embedded systems.
  • Understand the concept of a critical section and the need for atomic operations.
  • Use FreeRTOS mutexes to protect shared resources between tasks.
  • Use ESP-IDF spinlocks for fast, inter-core synchronization.
  • Correctly implement critical sections by disabling interrupts for short, time-sensitive operations.
  • Design and implement safe and efficient Interrupt Service Routines (ISRs).
  • Apply the “ISR-gives, task-takes” pattern for deferred interrupt processing.
  • Differentiate between synchronization needs on single-core vs. dual-core ESP32 variants.

Introduction

In a multi-threaded environment like the one provided by FreeRTOS on the ESP32, multiple tasks run concurrently, and the scheduler can switch between them at any time. Furthermore, hardware interrupts can preempt task code execution unexpectedly. This concurrency is powerful, but it introduces a significant challenge: how do we safely access data or peripherals shared between multiple tasks or between a task and an interrupt?

If two tasks attempt to modify the same variable simultaneously, the result can be unpredictable and lead to data corruption. This scenario is known as a race condition. Preventing race conditions is fundamental to building reliable and stable embedded systems. This chapter introduces the core concepts and tools for managing shared resources: critical sections, mutexes, spinlocks, and proper interrupt handling techniques. Mastering these is not optional; it is essential for any serious ESP-IDF developer.

Theory

What is a Race Condition?

A race condition occurs when the behavior of a system depends on the unpredictable sequence or timing of uncontrollable events. In our context, it typically happens when two or more threads (tasks or ISRs) access a shared resource without any mechanism to ensure the operations happen in the correct order.

Analogy: The Shared Bank Account

Imagine you and a family member share a bank account with $100. You both decide to withdraw $50 at the exact same time from different ATMs. The sequence of operations might look like this:

  1. Your ATM: Reads the balance ($100).
  2. Scheduler Switch: The CPU switches to the other task before yours completes.
  3. Other ATM: Reads the balance ($100).
  4. Other ATM: Calculates new balance ($100 – $50 = $50).
  5. Other ATM: Writes the new balance ($50) to the account.
  6. Scheduler Switch: The CPU switches back to your task.
  7. Your ATM: Calculates its new balance from the outdated value it read earlier ($100 – $50 = $50).
  8. Your ATM: Writes the new balance ($50) to the account.

The final balance is $50, but $100 was withdrawn. The bank has lost $50. This happened because the “read-modify-write” operation was not atomic. An atomic operation is one that is performed as a single, indivisible unit.

Critical Sections

A critical section is a piece of code that accesses a shared resource (like a global variable, a peripheral, or a communication buffer) and must not be executed by more than one task at a time. To prevent race conditions, we must protect these critical sections. ESP-IDF and FreeRTOS provide several mechanisms to do this.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans'}}}%%
sequenceDiagram
    participant Task_A as Task A
    participant Mutex
    participant Task_B as Task B
    
    autonumber
    
    Task_A->>Mutex: Take Mutex
    activate Mutex
    Note right of Task_A: Mutex is available
    Mutex-->>Task_A: Granted
    deactivate Mutex

    activate Task_A
    Task_A->>Task_A: Enter Critical Section<br>(Access Shared Resource)
    
    par
        Task_B->>Mutex: Take Mutex
        activate Mutex
        Note left of Task_B: Mutex is held by Task A
        Mutex-->>Task_B: Denied (Task B blocks/sleeps)
        deactivate Mutex
    and
        Task_A->>Task_A: ...continues work...
    end
    
    Task_A->>Task_A: Exit Critical Section
    deactivate Task_A
    
    Task_A->>Mutex: Give Mutex
    activate Mutex
    
    Note over Mutex, Task_B: Mutex is now available. Scheduler wakes Task B.
    
    Mutex-->>Task_B: Granted
    deactivate Mutex
    
    activate Task_B
    Task_B->>Task_B: Enter Critical Section
    Task_B->>Task_B: Exit Critical Section
    deactivate Task_B
    
    Task_B->>Mutex: Give Mutex

1. Mutexes (Mutual Exclusion)

A mutex is the most common tool for protecting critical sections between tasks. It acts like a key or a “token.” A task must “take” the mutex before it can enter the critical section. If another task tries to take the same mutex while it’s already held, the second task will be blocked by the FreeRTOS scheduler (put into a sleeping state) until the first task “gives” the mutex back.

Key Characteristics:

  • Task-level Protection: Designed for synchronizing tasks, not for ISRs.
  • Blocking: Tasks that wait for a mutex do not consume CPU time; they are put to sleep by the scheduler. This is very efficient.
  • Ownership: A mutex is “owned” by the task that takes it. The same task must be the one to give it back.
  • Recursive Mutexes: A special type of mutex that can be “taken” multiple times by the same owner task. A corresponding number of “gives” must be performed to release it.

2. Spinlocks

A spinlock is another locking mechanism, but it works very differently from a mutex. When a task tries to acquire a spinlock that is already held, it does not go to sleep. Instead, it enters a tight loop (“spins”), repeatedly checking if the lock is available.

Key Characteristics:

  • Non-blocking (Busy-Waiting): The waiting CPU core runs at 100% utilization, constantly checking the lock. This wastes power and CPU cycles.
  • Very Fast Acquisition/Release: When there is no contention, acquiring and releasing a spinlock is extremely fast because it involves no context switching.
  • ISR Safe: Spinlocks are lightweight enough to be used inside ISRs.
  • Multi-Core Synchronization: They are the primary tool for protecting resources shared between two CPU cores on dual-core ESP32s. Simply disabling interrupts on Core 0 does not prevent Core 1 from accessing the shared data. A spinlock ensures only one core can enter the critical section at a time.

Warning: Spinlocks should only be used to protect very short, fast critical sections. Holding a spinlock for a long time can starve other tasks and even trigger the task watchdog timer, causing a system reboot. If the critical section involves I/O or any delay, a mutex is the correct choice.

3. Disabling Interrupts (Critical Sections)

The most forceful way to protect a critical section is to disable all interrupts. FreeRTOS provides macros for this: portENTER_CRITICAL() and portEXIT_CRITICAL().

When portENTER_CRITICAL(&my_spinlock) is called, it disables interrupts up to configMAX_SYSCALL_INTERRUPT_PRIORITY. On a dual-core system, it also acquires a spinlock to provide inter-core protection. This guarantees that the code between the ENTER and EXIT calls will not be preempted by any FreeRTOS-managed interrupt or another task.

Feature Mutexes Spinlocks Disabling Interrupts
Primary Use Protecting shared resources between tasks. Protecting against inter-core access; short ISR/task protection. Protecting short critical sections from task preemption and interrupts.
Waiting Mechanism Blocking (Sleeps)
Scheduler puts task to sleep. Efficient.
Busy-Waiting (Spins)
CPU core is at 100%, wasting power.
N/A (System Halts)
Prevents scheduler and interrupts from running.
ISR Safe? No. Cannot call xSemaphoreTake() from an ISR. Yes. Lightweight enough for ISR use. Yes. Designed for this.
Inter-Core Safe (Dual-Core ESP32)? Yes. A mutex is core-agnostic. Yes. This is their main purpose on dual-core chips. Yes. portENTER_CRITICAL on ESP32 uses a spinlock internally.
Recommended Duration Can be held for longer durations (e.g., I/O operations). Very short. A few lines of code at most. Extremely short. Must be as brief as possible.
Typical API xSemaphoreTake()
xSemaphoreGive()
spinlock_acquire()
spinlock_release() (ESP-IDF)
portENTER_CRITICAL()
portEXIT_CRITICAL() (FreeRTOS)

Key Characteristics:

  • The Ultimate Protection: Stops nearly everything, ensuring true atomicity on a single core.
  • Increases Interrupt Latency: While interrupts are disabled, the system cannot respond to external events. This can be disastrous for real-time applications if used improperly.
  • Must be Brief: Like spinlocks, critical sections protected this way must be extremely short.

Interrupt Handling in ESP-IDF

Interrupts signal that a hardware peripheral needs attention. An Interrupt Service Routine (ISR) is the function that runs when a specific interrupt occurs.

ISR Design Philosophy: Keep it Short and Fast

An ISR preempts all task code. If it runs for too long, it starves the rest of the system. The best practice is to do the absolute minimum work possible in the ISR and then defer the longer processing to a regular task.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans'}}}%%
sequenceDiagram
    participant Hardware
    participant ISR
    participant Semaphore
    participant Handler_Task as Handler Task

    Handler_Task->>Semaphore: xSemaphoreTake(portMAX_DELAY)
    note right of Handler_Task: Task blocks, waiting for signal
    
    Hardware-->>ISR: GPIO Event (e.g., Button Press)
    activate ISR
    
    note over ISR: ISR runs immediately! <br> KEEP IT SHORT.
    
    ISR->>ISR: Read minimal data if needed
    
    ISR->>Semaphore: xSemaphoreGiveFromISR()
    
    activate Semaphore
    note over Semaphore, Handler_Task: Semaphore is given. <br> Scheduler unblocks Handler Task.
    deactivate Semaphore
    
    ISR-->>Hardware: End of Interrupt
    deactivate ISR
    
    activate Handler_Task
    Handler_Task->>Handler_Task: Wakes up and runs
    Handler_Task->>Handler_Task: Perform long processing here...<br>(e.g., calculations, logging, network I/O)
    deactivate Handler_Task
    
    Handler_Task->>Semaphore: Loop back to wait for next signal

The standard pattern is:

  1. ISR: An event occurs (e.g., a GPIO pin changes state, UART receives a byte). The ISR runs, perhaps reads a single value from the hardware, and then “gives” a semaphore or sends a message to a queue. The entire ISR should take only microseconds.
  2. Handler Task: A dedicated, high-priority task is waiting (“pending”) on that same semaphore or queue.
  3. Unblocking: When the ISR gives the semaphore, the handler task immediately unblocks and runs. It can then perform the complex processing (e.g., parsing the data, logging to a file, updating a display) without delaying other interrupts.

FreeRTOS functions that are safe to call from an ISR have the ...FromISR suffix (e.g., xSemaphoreGiveFromISR, xQueueSendFromISR). Never call a standard FreeRTOS API function from an ISR.

Practical Examples

Example 1: Fixing a Race Condition with a Mutex

This code first demonstrates a race condition by having two tasks increment a shared counter. Then, it shows the fix using a mutex.

Code
C
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "freertos/semphr.h" // For mutexes
#include "esp_log.h"

static const char *TAG = "MUTEX_EXAMPLE";

// Shared resource
volatile long shared_counter = 0;

// Mutex handle
SemaphoreHandle_t xCounterMutex;

#define ITERATIONS 100000

// Set this to 1 to enable mutex protection, 0 to see the race condition
#define USE_MUTEX 1

/**
 * @brief A task that increments the shared counter.
 */
static void incrementer_task(void *pvParameters)
{
    int task_num = (int)pvParameters;
    ESP_LOGI(TAG, "Incrementer task %d started", task_num);

    for (int i = 0; i < ITERATIONS; i++)
    {
#if USE_MUTEX
        // Take the mutex, waiting up to 100ms if it's not available
        if (xSemaphoreTake(xCounterMutex, pdMS_TO_TICKS(100)) == pdTRUE)
        {
#endif
            // --- CRITICAL SECTION START ---
            shared_counter++;
            // --- CRITICAL SECTION END ---

#if USE_MUTEX
            // Give the mutex back
            xSemaphoreGive(xCounterMutex);
        }
        else
        {
            ESP_LOGE(TAG, "Task %d failed to take mutex", task_num);
        }
#endif
    }

    ESP_LOGI(TAG, "Incrementer task %d finished.", task_num);
    vTaskDelete(NULL);
}

void app_main(void)
{
#if USE_MUTEX
    ESP_LOGI(TAG, "Running WITH mutex protection.");
#else
    ESP_LOGI(TAG, "Running WITHOUT mutex protection (expecting race condition).");
#endif

    // Create the mutex before creating tasks that use it
    xCounterMutex = xSemaphoreCreateMutex();
    if (xCounterMutex == NULL) {
        ESP_LOGE(TAG, "Failed to create mutex.");
        return;
    }

    // Reset counter
    shared_counter = 0;

    // Create two tasks to increment the counter
    xTaskCreate(incrementer_task, "inc_task_1", 2048, (void *)1, 5, NULL);
    xTaskCreate(incrementer_task, "inc_task_2", 2048, (void *)2, 5, NULL);

    // Wait for tasks to finish (in a real app, you'd use a better sync mechanism)
    vTaskDelay(pdMS_TO_TICKS(5000));

    // The expected final value is 2 * ITERATIONS
    long expected_value = 2 * ITERATIONS;
    ESP_LOGI(TAG, "Expected final count: %ld", expected_value);
    ESP_LOGI(TAG, "Actual final count:   %ld", shared_counter);
    
    if (shared_counter != expected_value) {
        ESP_LOGE(TAG, "RACE CONDITION DETECTED! The final count is incorrect.");
    } else {
        ESP_LOGI(TAG, "Success! The final count is correct.");
    }
}

Build and Run Steps
  1. Create a new project and copy the code into main/main.c.
  2. First, run with USE_MUTEX set to 0. Build, flash, and monitor. You will likely see that the Actual final count is less than the Expected final count. This is the race condition in action.
  3. Now, change USE_MUTEX to 1. Re-build, flash, and monitor. The final count will now correctly be 200,000 because the mutex ensured that the shared_counter++ operation was atomic.

Example 2: Deferred GPIO Interrupt Handling

This example shows the standard “ISR-gives, task-takes” pattern. A button press on a GPIO triggers an ISR, which gives a semaphore to a handler task.

Code
C
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "freertos/semphr.h"
#include "driver/gpio.h"
#include "esp_log.h"

static const char *TAG = "ISR_EXAMPLE";

#define BUTTON_GPIO GPIO_NUM_0 // GPIO0 is the BOOT button on many devkits

// Semaphore to signal from ISR to task
SemaphoreHandle_t xIsrSemaphore = NULL;

/**
 * @brief The ISR handler function.
 * This function is called every time the GPIO level changes (on a falling edge).
 */
static void IRAM_ATTR gpio_isr_handler(void* arg)
{
    // A high-priority task has been woken by this interrupt.
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;

    // Give the semaphore. xSemaphoreGiveFromISR is the correct version for ISRs.
    xSemaphoreGiveFromISR(xIsrSemaphore, &xHigherPriorityTaskWoken);

    // If xHigherPriorityTaskWoken is now set to pdTRUE, we should yield.
    // This will cause a context switch to the woken task.
    if (xHigherPriorityTaskWoken == pdTRUE) {
        portYIELD_FROM_ISR();
    }
}

/**
 * @brief The task that waits for the semaphore and processes the event.
 */
static void button_handler_task(void* arg)
{
    ESP_LOGI(TAG, "Button handler task started. Waiting for button presses...");
    while (1) {
        // Wait indefinitely for the semaphore to be given by the ISR.
        if (xSemaphoreTake(xIsrSemaphore, portMAX_DELAY) == pdTRUE) {
            ESP_LOGI(TAG, "Button pressed! GPIO interrupt occurred.");
            // In a real application, you would do your processing here.
        }
    }
}

void app_main(void)
{
    // Create the binary semaphore.
    xIsrSemaphore = xSemaphoreCreateBinary();
    if (xIsrSemaphore == NULL) {
        ESP_LOGE(TAG, "Failed to create semaphore");
        return;
    }

    // Create the handler task
    xTaskCreate(button_handler_task, "button_handler", 2048, NULL, 10, NULL);

    // Configure the GPIO pin
    gpio_config_t io_conf;
    io_conf.intr_type = GPIO_INTR_NEGEDGE; // Interrupt on falling edge
    io_conf.pin_bit_mask = (1ULL << BUTTON_GPIO);
    io_conf.mode = GPIO_MODE_INPUT;
    io_conf.pull_up_en = GPIO_PULLUP_ENABLE;
    io_conf.pull_down_en = GPIO_PULLDOWN_DISABLE;
    gpio_config(&io_conf);
    
    // Install the GPIO ISR service
    gpio_install_isr_service(0);
    // Hook the ISR handler for our specific GPIO pin
    gpio_isr_handler_add(BUTTON_GPIO, gpio_isr_handler, NULL);

    ESP_LOGI(TAG, "ISR example configured. Press the BOOT button (GPIO0).");
}
Build and Run Steps
  1. Setup a new project with this code.
  2. Build, flash, and monitor.
  3. Press the “BOOT” button (or whichever button is connected to GPIO0) on your ESP32 board.
  4. Observe the Output: Each time you press the button, the message "Button pressed! GPIO interrupt occurred." will appear almost instantly. This demonstrates the low-latency communication from the hardware interrupt to the handler task.

Variant Notes

The concepts of critical sections and interrupt handling are universal, but their implementation details vary slightly with the CPU architecture.

  • Dual-Core (ESP32, ESP32-S3) vs. Single-Core (ESP32-S2, C3, C6, H2): This is the most significant difference. On a single-core chip, portENTER_CRITICAL only needs to disable interrupts to protect a resource. On a dual-core chip, disabling interrupts on Core 0 does not stop code on Core 1 from accessing a shared resource.
    • Because of this, ESP-IDF’s critical section implementation for dual-core systems uses spinlocks. portENTER_CRITICAL acquires a spinlock and disables interrupts. This ensures both inter-task (on the same core) and inter-core (between cores) safety.
    • This makes spinlocks a first-class citizen in dual-core development. If you know a resource is only ever accessed by tasks pinned to the same core, a mutex is fine. If it can be accessed from tasks on different cores, a spinlock is often the more direct solution for very short sections.
  • Xtensa (ESP32, S2, S3) vs. RISC-V (C3, C6, H2): The underlying interrupt controller hardware is different. Xtensa has a more complex, multi-level interrupt system. RISC-V has a standard CLIC (Core-Local Interrupt Controller). As a developer, you rarely interact with this directly, as the ESP-IDF hardware abstraction layer (HAL) and interrupt driver provide a consistent API (gpio_install_isr_service, etc.) across all variants.

Common Mistakes & Troubleshooting Tips

Mistake / Issue Symptom(s) Troubleshooting / Solution
Using a Mutex in an ISR Guru Meditation Error or crash. The error message may indicate an invalid FreeRTOS API call from an ISR context. An ISR cannot block. Use the “FromISR” versions of API calls.
Don’t use: xSemaphoreTake()
Use: xSemaphoreGiveFromISR()
Follow the “ISR-gives, task-takes” deferred processing pattern.
Long-Running ISR System feels unstable, other interrupts are missed, watchdog timer reboots with a Task watchdog got triggered error. Keep ISRs extremely short. Move all logic to a handler task.
Avoid in ISR: printf(), loops, delays, complex calculations.
Do in ISR: Read a value, give a semaphore, and exit.
Forgetting `volatile` Keyword Code in a task does not see the updated value of a variable that is changed by an ISR. The compiler has optimized away the memory read. Any variable shared between a task and an ISR (or between two different threads) must be declared with the volatile keyword.
Example: long shared_counter;
Correct: volatile long shared_counter;
Deadlock (Deadly Embrace) Two or more tasks freeze and the system becomes unresponsive. A deadlock happens when tasks wait for each other’s locks.
Rule: In all tasks, acquire multiple mutexes in the exact same order.
If Task A locks Mutex1 then Mutex2, Task B must also lock Mutex1 then Mutex2.
Holding Spinlock / Critical Section Too Long Task watchdog got triggered error and system reboot. Interrupts were disabled for too long, preventing the watchdog from being reset. Code inside a spinlock or portENTER_CRITICAL() must be minimal.
– Only protect a few lines of simple C assignments or checks.
Never include loops, I/O functions (printf), or delays inside.

Exercises

  1. Shared Structure Protection: Create a struct that contains a counter, a status flag, and a character buffer. Create two tasks. Task 1 will modify all members of the struct, and Task 2 will read them and print them to the console. Use a single mutex to protect the entire struct from race conditions.
  2. Spinlock Performance Test: Write a test that increments a shared counter 1,000,000 times within a loop. First, protect the increment operation using a mutex. Use esp_timer_get_time() to measure how long it takes. Second, repeat the test but protect the increment with a spinlock (portENTER_CRITICAL/portEXIT_CRITICAL). Compare the execution times. Why is the spinlock version so much faster for this specific case?
  3. Create a Deadlock: Write an application with two tasks and two mutexes, mutexA and mutexB.
    • Task 1: Takes mutexA, delays for 10ms, then tries to take mutexB.
    • Task 2: Takes mutexB, delays for 10ms, then tries to take mutexA.
    • Run the code and observe how the system hangs. This is a practical demonstration of what to avoid.
  4. Pulse Counter with Interrupts: Configure a GPIO pin as an input. Write an ISR that increments a volatile counter on every rising edge. Create a separate task that, once per second, enters a critical section, reads the value of the counter, prints it, and resets the counter to zero. This simulates a basic frequency counter.

Summary

  • A race condition occurs when multiple threads access a shared resource without protection, leading to unpredictable outcomes.
  • A critical section is a code block that must be executed atomically.
  • Mutexes are the standard way to protect shared resources between tasks. They are efficient because waiting tasks sleep.
  • Spinlocks are for protecting very short critical sections, especially between CPU cores or between an ISR and a task. They work by busy-waiting.
  • Disabling interrupts (portENTER_CRITICAL) is the strongest form of protection but must be used for the shortest possible duration to maintain system responsiveness.
  • ISRs must be short, fast, and non-blocking. Use the “ISR-gives, task-takes” pattern with semaphores or queues to defer long processing to a handler task.
  • Synchronization on dual-core systems requires inter-core locking (spinlocks), as disabling interrupts on one core does not affect the other.

Further Reading

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top