Chapter 251: ESP32 Original Series Architecture Details
Chapter Objectives
By the end of this chapter, you will be able to:
- Describe the dual-core Xtensa LX6 CPU architecture of the original ESP32.
- Understand the ESP32’s memory map, including IRAM, DRAM, and RTC memory.
- Explain the function of the system’s bus matrix and its impact on performance.
- Detail the role and operation of the Ultra Low-Power (ULP) coprocessor.
- Identify the key peripherals and hardware accelerators present in the ESP32.
- Compare the architecture of the original ESP32 with newer variants in the Espressif ecosystem.
Introduction
Welcome to the deep dive into the ESP32 family! Before we can appreciate the unique features of the newer S, C, and H-series variants, we must first build a solid foundation by understanding the chip that started a revolution in the IoT space: the original ESP32. Released in 2016, its combination of a powerful dual-core processor, extensive memory, rich peripheral set, and integrated Wi-Fi and Bluetooth connectivity was unprecedented for its price point.
This chapter will peel back the layers of abstraction and look directly at the silicon. Understanding the hardware architecture is not merely an academic exercise; it is a prerequisite for writing highly optimized, power-efficient, and robust applications. When you know how the CPUs access memory, why certain code runs faster from IRAM, and how the ULP coprocessor works, you unlock the full potential of the hardware.
Theory
The architecture of the ESP32 is a sophisticated System-on-Chip (SoC) designed for high performance and connectivity. Let’s break down its core components.
1. CPU and Memory
At the heart of the ESP32 are two Tensilica Xtensa LX6 microprocessor cores, named PRO_CPU
(Protocol CPU, Core 0) and APP_CPU
(Application CPU, Core 1).
- Dual-Core Architecture: The two cores are largely identical and can operate at clock speeds up to 240 MHz. This symmetric multiprocessing (SMP) capability, managed by FreeRTOS, allows for true parallel execution. Typically, ESP-IDF assigns Core 0 to handle the Wi-Fi and Bluetooth stacks (“protocol”), while Core 1 is free for user application code. This division prevents demanding application code from interfering with the timing-sensitive network stacks.
- Floating-Point Unit (FPU): Each core includes a hardware FPU, enabling efficient single-precision floating-point computations without software emulation.
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'Open Sans, sans-serif'}}}%% graph TD subgraph "ESP32 On-Chip Memory (520 KB SRAM)" direction LR A["<b>CPU Core</b><br>(PRO_CPU or APP_CPU)"] subgraph Main SRAM direction TB IRAM(Instruction RAM<br><i>For executable code</i><br>Fastest Execution) DRAM(Data RAM<br><i>For variables, stacks, heaps</i><br>General Purpose Data Storage) end subgraph RTC Power Domain direction TB RTCMEM(RTC Memory<br><b>8KB FAST + 8KB SLOW</b><br><i>Retained in Deep Sleep</i><br>Used by ULP & for state saving) end end A -- Instruction Bus --> IRAM; A -- Data Bus --> DRAM; A -- Can Access --> RTCMEM; ULP["<b>ULP Coprocessor</b><br><i>(Runs in Deep Sleep)</i>"] -- Accesses --> RTCMEM; %% Styling %% classDef cpu fill:#FEF3C7,stroke:#D97706,stroke-width:2px,color:#92400E; classDef sram fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; classDef rtc fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B; classDef ulp fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6; class A cpu; class IRAM,DRAM sram; class RTCMEM,ULP rtc;
- Memory: The ESP32 features 520 KB of on-chip SRAM. This memory is not one contiguous block; it’s a collection of smaller banks with different access properties.
- DRAM (Data RAM): This memory is connected to the CPU’s data bus and is used for storing data, such as variables and the task stacks.
- IRAM (Instruction RAM): This memory is connected to the CPU’s instruction bus. Placing code in IRAM allows for significantly faster execution compared to running it from external flash memory, as it bypasses the potential bottleneck of the flash cache.
- RTC Memory: A small amount of RAM (8 KB FAST, 8 KB SLOW) is located in the Real-Time Clock (RTC) power domain. This memory retains its contents during deep sleep, allowing the ULP coprocessor to operate or for the main CPUs to store state before sleeping.
2. System Bus Architecture
The CPUs, memory, and peripherals are not directly connected. They communicate through a complex bus matrix. This matrix acts like an intelligent telephone exchange, routing requests from “masters” (like a CPU or a DMA controller) to “slaves” (like a block of SRAM or a peripheral’s registers).
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'Open Sans, sans-serif'}}}%% flowchart TD subgraph "Masters (Initiate Access)" direction LR M1(<b>PRO_CPU</b><br>Core 0) M2(<b>APP_CPU</b><br>Core 1) M3(DMA<br>Controller) end subgraph "Slaves (Respond to Access)" direction LR S1(Internal RAM<br><i>DRAM / IRAM</i>) S2(Internal ROM) S3(Flash Controller) S4(Peripherals<br><i>UART, SPI, etc.</i>) S5(RTC Block) end BusMatrix{<B>Bus Matrix</B><br>Intelligent Arbiter} M1 -- Request --> BusMatrix M2 -- Request --> BusMatrix M3 -- Request --> BusMatrix BusMatrix -- "Grant & Route Access" --> S1 BusMatrix --> S2 BusMatrix --> S3 BusMatrix --> S4 BusMatrix --> S5 %% Styling %% classDef master fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; classDef slave fill:#EDE9FE,stroke:#5B21B6,stroke-width:1px,color:#5B21B6; classDef matrix fill:#FEF3C7,stroke:#D97706,stroke-width:2px,color:#92400E; class M1,M2,M3 master; class S1,S2,S3,S4,S5 slave; class BusMatrix matrix;
The ESP32’s matrix allows for simultaneous access as long as the master and slave paths do not conflict. For example, APP_CPU
can execute a mathematical computation using its internal registers while PRO_CPU
fetches data from the Wi-Fi MAC peripheral via the bus. However, if both CPUs attempt to write to the same block of SRAM at the exact same time, one will be momentarily stalled by the bus arbiter. Understanding this is key to diagnosing advanced performance issues.
3. Peripherals and Hardware Accelerators
The original ESP32 is packed with a wide array of peripherals. These are not just connected to the GPIO pins; they are distinct hardware blocks within the SoC.
- Connectivity: Wi-Fi (802.11 b/g/n) and Bluetooth (Classic v4.2 and BLE).
- Standard Interfaces: UART, SPI, I2C, I2S, RMT (Remote Control), SD/MMC Host.
- Analog: Two 12-bit SAR ADCs, two 8-bit DACs, Hall Effect sensor, and 10 capacitive touch GPIOs.
- Security: Hardware acceleration for AES, SHA, RSA, and a Random Number Generator (RNG). These are crucial for implementing features like Flash Encryption and Secure Boot efficiently.
4. The Ultra Low-Power (ULP) Coprocessor
One of the most innovative features of the ESP32 is its ULP coprocessor. This is a tiny, simple processor that can execute a program while the main dual-core CPUs are in a deep sleep state, consuming only microamperes of current.
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'Open Sans, sans-serif'}}}%% graph TD A(Start: Main CPUs Running) --> B{Enter Deep Sleep?}; B -- Yes --> C[<b>Main CPUs Power Down</b><br>State saved to RTC Memory]; B -- No --> A; C --> D(<b>ULP Coprocessor Takes Over</b><br>Runs program from RTC Memory); subgraph "ULP Program Loop (μA current)" direction TB D --> E{Periodically<br>Check Sensor/GPIO}; E --> F{Wake-up Condition Met?<br>e.g., moisture < threshold}; F -- No --> E; end F -- Yes --> G[ULP issues wake-up signal]; G --> H(<b>Main CPUs Power On</b><br>Restores state from RTC Memory); H --> I["Execute Main Task<br>e.g., Send Wi-Fi Alert"]; I --> A; %% Styling %% classDef start fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6; classDef endo fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46; classDef decision fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E; classDef process fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; classDef check fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B; class A,I start; class H endo; class B,F decision; class C,D,G process; class E check;
Its purpose is to periodically perform simple tasks—such as polling a sensor or checking a GPIO state—and decide whether to wake the main CPUs. For example, you could program the ULP to read an analog moisture sensor every 10 minutes. It will only wake the power-hungry main system to activate Wi-Fi and send a notification if the moisture level drops below a critical threshold. This enables long battery life in applications that require intermittent monitoring. The ULP code is written in a special assembly language and stored in the RTC memory.
Practical Examples
Let’s demonstrate two architectural concepts: using both cores for parallel processing and accelerating code with IRAM.
Example 1: Dual-Core Processing in Action
This example runs two computationally intensive tasks and pins them to separate cores, demonstrating true parallelism.
1. Code: Create a new project in VS Code and replace the contents of main.c
with the following.
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
#include "esp_timer.h"
static const char *TAG = "DUAL_CORE_TEST";
// A computationally intensive dummy task
void intensive_task(void *pvParameters)
{
int core_id = xPortGetCoreID();
uint32_t task_num = (uint32_t)pvParameters;
volatile uint64_t counter = 0;
ESP_LOGI(TAG, "Task %lu started on Core %d", task_num, core_id);
for (int i = 0; i < 200000000; i++) {
counter++;
}
ESP_LOGI(TAG, "Task %lu finished on Core %d", task_num, core_id);
vTaskDelete(NULL);
}
void app_main(void)
{
ESP_LOGI(TAG, "Starting dual-core performance test.");
// Get the start time
int64_t start_time = esp_timer_get_time();
// Create and pin Task 1 to Core 0
xTaskCreatePinnedToCore(
intensive_task, // Task function
"IntensiveTask1", // Task name
4096, // Stack size
(void *)1, // Task parameter (task number)
5, // Priority
NULL, // Task handle
0 // Core ID (PRO_CPU)
);
// Create and pin Task 2 to Core 1
xTaskCreatePinnedToCore(
intensive_task, // Task function
"IntensiveTask2", // Task name
4096, // Stack size
(void *)2, // Task parameter (task number)
5, // Priority
NULL, // Task handle
1 // Core ID (APP_CPU)
);
// Note: In a real app, we'd wait for tasks to finish.
// Here, we just log the start and observe the finish logs.
// We add a delay to let the tasks run.
vTaskDelay(pdMS_TO_TICKS(10000));
// The finish logs from the tasks will show they run in parallel.
// Try changing both tasks to run on the same core and observe the time difference.
}
2. Build and Flash:
- Connect your ESP32 board.
- Open the VS Code command palette (Ctrl+Shift+P).
- Select “ESP-IDF: Build, Flash and Monitor”.
3. Observe:
You will see logs showing that both tasks start almost simultaneously, each on its designated core. They will also finish at roughly the same time, because they were executing in parallel.
I (301) DUAL_CORE_TEST: Starting dual-core performance test.
I (311) DUAL_CORE_TEST: Task 1 started on Core 0
I (311) DUAL_CORE_TEST: Task 2 started on Core 1
...
I (4151) DUAL_CORE_TEST: Task 1 finished on Core 0
I (4161) DUAL_CORE_TEST: Task 2 finished on Core 1
Experiment: Modify the code to run both tasks on Core 1. You will observe that Task 2 does not start until Task 1 has completely finished, and the total execution time will be roughly double.
Example 2: Speed-Critical Code in IRAM
This example shows the performance benefit of placing a function in IRAM.
1. Code:
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_attr.h"
#include "esp_log.h"
#include "esp_timer.h"
#include "soc/soc.h"
static const char *TAG = "IRAM_TEST";
#define ITERATIONS 10000000
// This function will be placed in IRAM
void IRAM_ATTR fast_function() {
__asm__ __volatile__ ("nop");
}
// This function will be executed from flash
void slow_function() {
__asm__ __volatile__ ("nop");
}
void app_main(void)
{
volatile int i;
int64_t start_time, end_time;
ESP_LOGI(TAG, "Starting performance comparison...");
// --- Test 1: Function in Flash ---
start_time = esp_timer_get_time();
for (i = 0; i < ITERATIONS; i++) {
slow_function();
}
end_time = esp_timer_get_time();
ESP_LOGI(TAG, "Function in Flash took %lld microseconds.", (end_time - start_time));
// --- Test 2: Function in IRAM ---
start_time = esp_timer_get_time();
for (i = 0; i < ITERATIONS; i++) {
fast_function();
}
end_time = esp_timer_get_time();
ESP_LOGI(TAG, "Function in IRAM took %lld microseconds.", (end_time - start_time));
}
2. Build and Flash: Use the standard “ESP-IDF: Build, Flash and Monitor” command.
3. Observe:
The monitor output will clearly show that the function marked with IRAM_ATTR executes significantly faster. The exact numbers will vary, but the difference will be substantial. This is because the CPU can fetch instructions directly from the high-speed internal RAM instead of going through the cache to access the slower external flash memory.
I (301) IRAM_TEST: Starting performance comparison...
I (1021) IRAM_TEST: Function in Flash took 720123 microseconds.
I (1151) IRAM_TEST: Function in IRAM took 130056 microseconds.
Variant Notes
The original ESP32 set the stage, but newer variants have evolved its architecture significantly.
Feature / Variant | ESP32 (Original) | ESP32-S2 | ESP32-S3 | ESP32-C3 | ESP32-C6 / H2 |
---|---|---|---|---|---|
CPU Core(s) | Dual-Core Xtensa LX6 | Single-Core Xtensa LX7 | Dual-Core Xtensa LX7 | Single-Core RISC-V | Single-Core RISC-V |
CPU Arch | Xtensa | Xtensa | Xtensa | RISC-V | RISC-V |
Bluetooth | Classic + BLE 4.2 | No | BLE 5.0 | BLE 5.0 | BLE 5.0 |
Wi-Fi | Wi-Fi 4 | Wi-Fi 4 | Wi-Fi 4 | Wi-Fi 4 | Wi-Fi 6 |
802.15.4 (Thread/Zigbee) | No | No | No | No | Yes |
USB | No (UART bridge) | Yes (OTG) | Yes (OTG) | No (UART bridge) | No (UART bridge) |
AI Acceleration | No | No | Yes (Vector Instructions) | No | No |
Primary Focus | General Purpose IoT | Low-Power, HMI | AIoT, HMI | Cost-Effective IoT | Next-Gen Connectivity |
Common Mistakes & Troubleshooting Tips
Mistake / Issue | Symptom(s) | Troubleshooting / Solution |
---|---|---|
Forgetting Core Affinity | Unpredictable behavior, race conditions, crashes when tasks share resources. | Be explicit. Use xTaskCreatePinnedToCore() for tasks needing deterministic placement. Use the debugger or logs (xPortGetCoreID()) to verify which core a task runs on. |
Blocking Calls on Core 0 | Wi-Fi/Bluetooth disconnects, hangs, or has high latency. System seems unstable under network load. | Keep Core 0 (PRO_CPU) free for network stacks. Run application logic on Core 1 (APP_CPU) by default. Offload heavy work from network event handlers to a dedicated app task. |
Inefficient Memory Usage | Build fails with IRAM section overflow errors. Slower than expected performance. | Profile first. Use IRAM_ATTR only for provably critical functions like ISRs or high-frequency loops. Use DRAM_ATTR for data that needs to be in RAM but isn’t executable code. |
Misunderstanding ULP Limitations | ULP code fails to compile or run. Trying to use standard C functions or complex logic. | Think of the ULP as a simple gatekeeper. Use its special assembly language for basic checks (e.g., read ADC, check GPIO) to decide when to wake the main CPUs. All complex logic belongs on the main system. |
Exercises
- ULP Wake-up: Read the ESP-IDF documentation for the ULP coprocessor. Write a simple application where the ESP32 goes into deep sleep. Program the ULP (using the provided assembly macros) to periodically check the state of GPIO0. If GPIO0 is pulled to ground, the ULP should wake the main CPUs, which will then print a “Woken up by ULP!” message to the console.
- Core Performance Analysis: Expand on the first practical example. Create three tasks: Task A (calculates prime numbers), Task B (computes a Fast Fourier Transform on a sample array), and Task C (waits on a queue for results). Run an experiment with the following configurations and measure the total time taken to complete both computations:
- A and B pinned to Core 1 (APP_CPU).
- A pinned to Core 0, B pinned to Core 1.
- A pinned to Core 1, B pinned to Core 0.Log the results and explain why the dual-core configuration is faster and if there’s any performance difference between the two asymmetric configurations.
Summary
- The original ESP32 SoC is built around a powerful dual-core Xtensa LX6 architecture, enabling true parallel processing for application and network stacks.
- It contains 520 KB of SRAM, logically divided into instruction RAM (IRAM) for fast code execution and data RAM (DRAM) for variables.
- A bus matrix manages access between CPUs, memory, and peripherals, allowing for a high degree of concurrent operation.
- A unique Ultra Low-Power (ULP) coprocessor allows for sensor monitoring and simple tasks while the main CPUs are in deep sleep, enabling excellent battery life.
- The ESP32 includes a rich set of peripherals, including Wi-Fi, dual-mode Bluetooth, and hardware accelerators for security operations.
- While foundational, the ESP32’s architecture differs from newer variants, which may offer RISC-V cores, Wi-Fi 6, 802.15.4 radio support, or native USB OTG.
Further Reading
- ESP32 Technical Reference Manual: This is the authoritative source for every detail of the ESP32’s hardware. An essential document for any advanced developer.
- ESP-IDF Programming Guide – Application Level Tracing: A library for analyzing application performance, useful for identifying bottlenecks.
- ESP-IDF Programming Guide – ULP Coprocessor: Detailed guide on how to program the ULP coprocessor.