Chapter 228: SMP and Dual-Core Programming
Chapter Objectives
By the end of this chapter, you will be able to:
- Understand the concept of Symmetric Multiprocessing (SMP).
- Explain how FreeRTOS is configured to run on dual-core ESP32 variants.
- Identify which ESP32 variants support dual-core operation.
- Write code that creates tasks that can run on either available CPU core.
- Understand and prevent race conditions using SMP-safe synchronization primitives like spinlocks.
- Recognize the difference between spinlocks and mutexes in an SMP context.
- Debug common issues related to multi-core programming.
Introduction
Many models in the ESP32 family are equipped with two identical processor cores, transforming them from simple microcontrollers into powerful platforms capable of true parallel processing. This capability is known as Symmetric Multiprocessing (SMP). Leveraging both cores effectively can dramatically boost the performance of your application, allowing for the simultaneous execution of demanding tasks, such as running a communication stack on one core while performing intensive data processing on the other.
In previous chapters, we treated the underlying processor as a single unit where tasks were scheduled concurrently. In this chapter, we will dive deeper into the dual-core architecture. You will learn how the ESP-IDF FreeRTOS implementation manages tasks across two cores and, more importantly, how to write code that is safe and efficient in a parallel-processing environment. Mastering SMP is key to unlocking the full potential of your dual-core ESP32.
Theory
What is Symmetric Multiprocessing (SMP)?
Symmetric Multiprocessing is a computer architecture where two or more identical processors are connected to a single, shared main memory and are controlled by a single operating system instance. The “symmetric” part of the name comes from the fact that all processors are equal; they can run the operating system kernel and any task in the system.
Think of it like a restaurant kitchen with two identical, highly skilled chefs (the cores). A single head chef (the FreeRTOS scheduler) hands out cooking orders (tasks). If both chefs are free, the head chef can give one order to the first chef and a second order to the other, and they can work on them simultaneously. Both chefs have access to the same pantry and equipment (shared memory and peripherals).
In the context of the ESP32, this means you have two Tensilica LX6 or LX7 cores (depending on the variant) that can execute tasks in parallel. ESP-IDF configures FreeRTOS to run in an SMP configuration, enabling the scheduler to dispatch tasks to either core.
FreeRTOS SMP Scheduler
The FreeRTOS kernel used in ESP-IDF is specifically adapted for SMP. While the core FreeRTOS API remains largely the same, its internal behavior is different:
- Task Scheduling: When a task is ready to run, the scheduler can place it on any available core. If both cores are free and two high-priority tasks are ready, they will run in parallel, one on each core. If one core is busy with a high-priority task, a lower-priority task can run on the other core. This significantly improves throughput.
- System Tasks: By default, ESP-IDF handles the core affinity of critical system tasks. For example, the Wi-Fi and Bluetooth stacks are typically pinned to a specific core (usually Core 0, the “protocol core” or PRO_CPU) to ensure their real-time requirements are met. Your application tasks usually run on the other core (Core 1, the “application core” or APP_CPU), but they are free to run on Core 0 if it is available.
graph TD subgraph "FreeRTOS SMP Scheduler" A[Task Ready to Run] --> B{Is Core 0 Free?}; B -->|Yes| C{Is Core 1 Free?}; B -->|No| D[Schedule on Core 1]; C -->|Yes| E["Schedule on Higher-Priority Core <br> or first available (e.g., Core 0)"]; C -->|No| F[Schedule on Core 0]; end %% Styling classDef start fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6; classDef decision fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E; classDef process fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; class A,Start start; class B,C decision; class D,E,F process;
Race Conditions and the Need for New Synchronization Tools
The greatest challenge in SMP programming is managing access to shared resources. A “shared resource” could be a global variable, a piece of hardware like a UART peripheral, or a data structure in memory.
When two tasks running on different cores try to access and modify the same resource at the same time, a race condition can occur. For example, consider a simple counter:
- Task A (on Core 0) reads the value of a global counter, which is 5.
- Task B (on Core 1) reads the value of the same counter, which is also 5.
- Task A increments its local copy to 6 and writes it back to the global counter. The counter is now 6.
- Task B increments its local copy to 6 and writes it back to the global counter. The counter is still 6.
Two increments occurred, but the final value is 6 instead of the correct value of 7. This is a classic race condition.
sequenceDiagram actor Core0 as Core 0 actor Core1 as Core 1 participant SharedRAM as Shared Counter <br> (Value: 5) par "Tasks Run in Parallel" Core0->>SharedRAM: 1. Read value (gets 5) and Core1->>SharedRAM: 2. Read value (gets 5) end Core0-->>Core0: 3. Increment local value (5 -> 6) Core0->>SharedRAM: 4. Write back 6 note right of Core0: Counter is now 6 Core1-->>Core1: 5. Increment local value (5 -> 6) Core1->>SharedRAM: 6. Write back 6 note right of Core1: Counter is still 6! <br> An increment was lost. Note over Core0, SharedRAM: Race Condition Outcome SharedRAM-->>SharedRAM: Final Value: 6 <br> <b>(Incorrect)</b> activate SharedRAM deactivate SharedRAM Note over Core0, SharedRAM: Expected Outcome box "Correct, Synchronized Process" #D1FAE5 participant ExpectedRAM as Shared Counter <br> (Value: 7) end
In a single-core system, we used mutexes to prevent this. A mutex ensures that only one task can access a resource at a time. However, in an SMP system, mutexes alone are not always sufficient, especially when dealing with interrupts. A task on Core 0 could take a mutex, and an ISR on Core 1 would be unable to acquire it.
Spinlocks: The SMP Solution for Critical Sections
To handle low-level critical sections that must be safe across both cores and from within ISRs, ESP-IDF provides spinlocks.
A spinlock is a lightweight synchronization primitive. When a task tries to acquire a spinlock that is already held, it doesn’t block (go to sleep) like with a mutex. Instead, it “spins” in a tight loop, repeatedly checking if the lock is available. This is called busy-waiting.
Why use a spinlock instead of a mutex?
- Speed: Acquiring and releasing a spinlock is extremely fast because it doesn’t involve a context switch, which is a heavyweight operation.
- ISR Safety: They can be used inside Interrupt Service Routines (ISRs), whereas mutexes cannot. An ISR cannot block, and spinning is not considered blocking.
The downside is that busy-waiting wastes CPU cycles. Therefore, spinlocks should only be used to protect very short, critical sections of code where the lock will be held for a minimal amount of time.
Warning: Holding a spinlock for a long time is a critical performance error. While one core holds the lock, the other core might be spinning uselessly, unable to do any productive work. Always keep the code inside a spinlock as short and fast as possible.
Feature | Spinlock | Mutex |
---|---|---|
Waiting Behavior | Busy-waits (spins), consuming 100% CPU while waiting. | Blocks (sleeps), yielding the CPU to other tasks. |
ISR Context Safe | Yes, can be used safely inside ISRs. | No, cannot be used inside ISRs. |
Acquire/Release Speed | Extremely fast, minimal overhead. | Slower, involves a context switch and scheduler logic. |
Typical Use Case | Protecting very short, low-level critical sections (e.g., a single variable increment). | Protecting longer operations or complex data structures where blocking is acceptable. |
Primary Risk | Holding the lock for too long, causing performance degradation as other cores spin uselessly. | Potential for priority inversion and deadlocks if not used carefully. |
ESP-IDF API | portENTER_CRITICAL / portEXIT_CRITICAL | xSemaphoreTake / xSemaphoreGive |
In ESP-IDF, you use a spinlock via the portMUX_TYPE
and associated macros:
portMUX_TYPE myLock = portMUX_INITIALIZER_UNLOCKED;
: Declares and initializes a spinlock.portENTER_CRITICAL(&myLock);
: Acquires the spinlock.portEXIT_CRITICAL(&myLock);
: Releases the spinlock.
Practical Examples
Let’s demonstrate these concepts with code.
Example 1: Observing Tasks on Different Cores
This example creates two simple tasks that do nothing but print which core they are running on.
Code
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
static const char *TAG = "SMP_DEMO";
// A simple task that prints its core ID in a loop
void core_display_task(void *pvParameters)
{
// The parameter is the task's name
char *task_name = (char *)pvParameters;
while (1)
{
ESP_LOGI(task_name, "Hello! I am running on Core %d", xPortGetCoreID());
vTaskDelay(pdMS_TO_TICKS(1000));
}
}
void app_main(void)
{
ESP_LOGI(TAG, "Starting SMP Demo.");
// The main task is running on a specific core (usually Core 1 on ESP32)
ESP_LOGI(TAG, "app_main is running on Core %d", xPortGetCoreID());
// Create two instances of the same task.
// FreeRTOS scheduler will automatically distribute them across available cores.
xTaskCreate(core_display_task, "Task_A", 2048, "Task_A", 5, NULL);
xTaskCreate(core_display_task, "Task_B", 2048, "Task_B", 5, NULL);
}
Build and Flash Instructions
- Open VS Code with the ESP-IDF extension.
- Create a new project.
- Copy the code above into your
main.c
file. - Build the project (Click the “Build” button in the status bar).
- Flash the project to your ESP32 board (Click the “Flash” button).
- Open the Monitor to view the serial output (Click the “Monitor” button).
Observation
You should see output similar to this. Notice how Task_A
consistently runs on one core and Task_B
on the other. app_main
itself also runs on one of the cores.
I (314) SMP_DEMO: Starting SMP Demo.
I (314) SMP_DEMO: app_main is running on Core 1
I (324) Task_A: Hello! I am running on Core 0
I (324) Task_B: Hello! I am running on Core 1
I (1324) Task_A: Hello! I am running on Core 0
I (1324) Task_B: Hello! I am running on Core 1
I (2324) Task_A: Hello! I am running on Core 0
I (2324) Task_B: Hello! I am running on Core 1
Example 2: Demonstrating a Race Condition and Fixing it with a Spinlock
Here, we will create two tasks that rapidly increment a shared counter. First, we’ll see the race condition in action, then we’ll fix it with a spinlock.
Code
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
static const char *TAG = "RACE_CONDITION_DEMO";
// The shared resource - a global counter
volatile uint32_t shared_counter = 0;
// A spinlock to protect the shared resource
// We will uncomment this for the fix
static portMUX_TYPE counter_lock = portMUX_INITIALIZER_UNLOCKED;
#define USE_SPINLOCK 1 // Set to 0 to see the race condition, 1 to see the fix
// Task that increments the counter a large number of times
void incrementer_task(void *pvParameters)
{
int increments = 500000;
char *task_name = (char *)pvParameters;
for (int i = 0; i < increments; i++)
{
#if USE_SPINLOCK
portENTER_CRITICAL(&counter_lock);
#endif
// This is the critical section
shared_counter++;
#if USE_SPINLOCK
portEXIT_CRITICAL(&counter_lock);
#endif
}
ESP_LOGI(task_name, "Finished incrementing.");
vTaskDelete(NULL); // Delete self when done
}
void app_main(void)
{
ESP_LOGI(TAG, "Starting Race Condition Demo on Core %d", xPortGetCoreID());
// Create two tasks that will run in parallel on different cores
xTaskCreate(incrementer_task, "Incrementer_1", 2048, "Incrementer_1", 10, NULL);
xTaskCreate(incrementer_task, "Incrementer_2", 2048, "Incrementer_2", 10, NULL);
// Wait for the tasks to finish.
// In a real app, you'd use a more robust synchronization method like an event group.
// For this demo, a simple delay is sufficient to allow tasks to complete.
vTaskDelay(pdMS_TO_TICKS(3000));
// Total increments should be 500,000 * 2 = 1,000,000
ESP_LOGI(TAG, "Expected final counter value: 1000000");
ESP_LOGI(TAG, "Actual final counter value: %lu", shared_counter);
#if USE_SPINLOCK
ESP_LOGI(TAG, "Test was run WITH spinlock protection.");
#else
ESP_LOGW(TAG, "Test was run WITHOUT spinlock protection. Expect data corruption.");
#endif
}
Build and Flash Instructions
- Use the same project as before.
- Set
USE_SPINLOCK
to0
. - Build, Flash, and Monitor.
Observation (Without Spinlock)
The final value will be incorrect and will vary on each run. This is the race condition in action.
I (314) RACE_CONDITION_DEMO: Starting Race Condition Demo on Core 1
I (1104) Incrementer_1: Finished incrementing.
I (1104) Incrementer_2: Finished incrementing.
I (3324) RACE_CONDITION_DEMO: Expected final counter value: 1000000
I (3324) RACE_CONDITION_DEMO: Actual final counter value: 678123
W (3324) RACE_CONDITION_DEMO: Test was run WITHOUT spinlock protection. Expect data corruption.
Re-run With the Fix
- Change
USE_SPINLOCK
to1
. - Build, Flash, and Monitor again.
Observation (With Spinlock)
The final value is now correct every time because the spinlock ensures that the shared_counter++
operation is atomic (indivisible).
I (314) RACE_CONDITION_DEMO: Starting Race Condition Demo on Core 1
I (1454) Incrementer_1: Finished incrementing.
I (1454) Incrementer_2: Finished incrementing.
I (3324) RACE_CONDITION_DEMO: Expected final counter value: 1000000
I (3324) RACE_CONDITION_DEMO: Actual final counter value: 1000000
I (3324) RACE_CONDITION_DEMO: Test was run WITH spinlock protection.
Variant Notes
The concepts of SMP are only relevant for ESP32 variants that have dual cores.
- Dual-Core (SMP Capable):
- ESP32: The original powerhouse with two Tensilica LX6 cores.
- ESP32-S3: A modern successor with two Tensilica LX7 cores, also featuring vector instructions for AI/ML acceleration.
- Single-Core (Not SMP Capable):
- ESP32-S2: Single LX7 core.
- ESP32-C3: Single 32-bit RISC-V core.
- ESP32-C6: Single 32-bit RISC-V core.
- ESP32-H2: Single 32-bit RISC-V core.
On single-core variants, FreeRTOS runs in a standard uniprocessor configuration. The function xPortGetCoreID()
will always return 0
. The spinlock macros (portENTER_CRITICAL
, portEXIT_CRITICAL
) still work, but they simply disable and re-enable interrupts, as there is no second core to compete for the lock.
Common Mistakes & Troubleshooting Tips
Mistake / Issue | Symptom(s) | Troubleshooting / Solution |
---|---|---|
Forgetting Protection for Shared Data | Random crashes, corrupted data, variables with nonsensical values, unpredictable behavior. | Review all global/static variables and hardware peripherals accessed by multiple tasks. Protect them with a portMUX_TYPE spinlock for short access or a mutex for longer, blockable operations. |
Using a Mutex in an ISR | A Guru Meditation Error and system crash, often with an “ISR can’t block” assertion failure. | Never use mutex functions like xSemaphoreTake in an ISR. If a resource is shared between a task and an ISR, you must use a spinlock: portENTER_CRITICAL / portEXIT_CRITICAL. |
Holding a Spinlock for Too Long | Poor performance, task watchdog timer timeouts (TWDT), system feels sluggish or hangs. One core is at 100% usage while doing nothing productive. | Keep code inside a portENTER_CRITICAL block as minimal as possible. Never place blocking calls (vTaskDelay, I/O) inside it. Refactor to use a mutex if the operation is long. |
Incorrect Spinlock Usage (Mismatched calls) | Deadlock. The system completely freezes because one task takes a lock and never releases it, causing other tasks to spin forever waiting for it. | Ensure every portENTER_CRITICAL(&myLock) has a corresponding portEXIT_CRITICAL(&myLock). Structure your code cleanly to make the critical section obvious. |
Assuming Task-Core Affinity | A task that depends on a specific core’s resources (e.g., certain timers or peripherals) fails intermittently when the scheduler moves it to the other core. | If a task must run on a specific core, create it using xTaskCreatePinnedToCore() instead of xTaskCreate(). Do not rely on the scheduler’s default behavior for core-specific code. |
Exercises
- Core Identifier: Write a program that creates four tasks, each with a different priority. Have each task loop and print its name, priority, and the core ID it is running on. Observe how the scheduler assigns tasks to cores.
- Shared Array Corruption: Create a global integer array of 10 elements, all initialized to 0. Create two tasks. Task A should iterate through the array, setting each element to
i * 10
. Task B should iterate through it setting each element toi * 20
. Run both simultaneously and print the final array. Observe the mangled result. - Spinlock Fix for Array: Fix the previous exercise by using a single spinlock to protect the entire array during modification. Task A should acquire the lock, fill the array, and then release it. Task B will have to wait. Verify the final contents are consistent (they will reflect whichever task got the lock first).
- Performance Test: Write a task that performs a mathematically intensive calculation (e.g., calculating a large number of prime numbers). Create two instances of this task. First, run them on a dual-core ESP32 and measure the total time taken for both to complete. Then, modify the code to pin both tasks to a single core (
xTaskCreatePinnedToCore
) and measure the time again. Compare the results. - Bank Account Simulation: Simulate a shared bank account. Create a global variable for the balance. Create a “deposit” task that adds a random amount to the balance in a loop. Create a “withdraw” task that removes a random amount. Run both tasks. Show how without protection, the final balance is nonsensical. Fix it using a spinlock.
Summary
- Symmetric Multiprocessing (SMP) allows an OS to manage two or more identical processors, executing tasks in parallel.
- The ESP32 and ESP32-S3 are dual-core and support SMP. Other variants like the S2, C3, C6, and H2 are single-core.
- The ESP-IDF FreeRTOS scheduler is SMP-aware and can distribute tasks across both available cores to maximize throughput.
- Accessing shared resources (like global variables) from tasks running on different cores simultaneously can cause race conditions, leading to data corruption.
- Spinlocks (
portMUX_TYPE
) are the primary mechanism for protecting short critical sections of code in an SMP environment. They are fast and safe to use in ISRs. - Spinlocks work by busy-waiting, so they must be held for the shortest possible duration to avoid wasting CPU cycles.
- For longer critical sections where blocking is acceptable, a mutex is still the appropriate tool, but it cannot be used from an ISR.
Further Reading
- ESP-IDF FreeRTOS (SMP) Documentation: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos_smp.html
- ESP-IDF Critical Sections Documentation: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/critical_sections.html
- FreeRTOS Task Management: https://www.freertos.org/taskman.html (Note: The official FreeRTOS documentation describes the uniprocessor version; refer to Espressif’s docs for SMP-specific details).