Chapter 251: ESP32 Original Series Architecture Details

Chapter Objectives

By the end of this chapter, you will be able to:

  • Describe the dual-core Xtensa LX6 CPU architecture of the original ESP32.
  • Understand the ESP32’s memory map, including IRAM, DRAM, and RTC memory.
  • Explain the function of the system’s bus matrix and its impact on performance.
  • Detail the role and operation of the Ultra Low-Power (ULP) coprocessor.
  • Identify the key peripherals and hardware accelerators present in the ESP32.
  • Compare the architecture of the original ESP32 with newer variants in the Espressif ecosystem.

Introduction

Welcome to the deep dive into the ESP32 family! Before we can appreciate the unique features of the newer S, C, and H-series variants, we must first build a solid foundation by understanding the chip that started a revolution in the IoT space: the original ESP32. Released in 2016, its combination of a powerful dual-core processor, extensive memory, rich peripheral set, and integrated Wi-Fi and Bluetooth connectivity was unprecedented for its price point.

This chapter will peel back the layers of abstraction and look directly at the silicon. Understanding the hardware architecture is not merely an academic exercise; it is a prerequisite for writing highly optimized, power-efficient, and robust applications. When you know how the CPUs access memory, why certain code runs faster from IRAM, and how the ULP coprocessor works, you unlock the full potential of the hardware.

Theory

The architecture of the ESP32 is a sophisticated System-on-Chip (SoC) designed for high performance and connectivity. Let’s break down its core components.

1. CPU and Memory

At the heart of the ESP32 are two Tensilica Xtensa LX6 microprocessor cores, named PRO_CPU (Protocol CPU, Core 0) and APP_CPU (Application CPU, Core 1).

  • Dual-Core Architecture: The two cores are largely identical and can operate at clock speeds up to 240 MHz. This symmetric multiprocessing (SMP) capability, managed by FreeRTOS, allows for true parallel execution. Typically, ESP-IDF assigns Core 0 to handle the Wi-Fi and Bluetooth stacks (“protocol”), while Core 1 is free for user application code. This division prevents demanding application code from interfering with the timing-sensitive network stacks.
  • Floating-Point Unit (FPU): Each core includes a hardware FPU, enabling efficient single-precision floating-point computations without software emulation.
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'Open Sans, sans-serif'}}}%%
graph TD
    subgraph "ESP32 On-Chip Memory (520 KB SRAM)"
        direction LR
        
        A["<b>CPU Core</b><br>(PRO_CPU or APP_CPU)"]
        
        subgraph Main SRAM
            direction TB
            IRAM(Instruction RAM<br><i>For executable code</i><br>Fastest Execution)
            DRAM(Data RAM<br><i>For variables, stacks, heaps</i><br>General Purpose Data Storage)
        end

        subgraph RTC Power Domain
             direction TB
             RTCMEM(RTC Memory<br><b>8KB FAST + 8KB SLOW</b><br><i>Retained in Deep Sleep</i><br>Used by ULP & for state saving)
        end
    end

    A -- Instruction Bus --> IRAM;
    A -- Data Bus --> DRAM;
    A -- Can Access --> RTCMEM;

    ULP["<b>ULP Coprocessor</b><br><i>(Runs in Deep Sleep)</i>"] -- Accesses --> RTCMEM;

    %% Styling %%
    classDef cpu fill:#FEF3C7,stroke:#D97706,stroke-width:2px,color:#92400E;
    classDef sram fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF;
    classDef rtc fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B;
    classDef ulp fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6;

    class A cpu;
    class IRAM,DRAM sram;
    class RTCMEM,ULP rtc;
  • Memory: The ESP32 features 520 KB of on-chip SRAM. This memory is not one contiguous block; it’s a collection of smaller banks with different access properties.
    • DRAM (Data RAM): This memory is connected to the CPU’s data bus and is used for storing data, such as variables and the task stacks.
    • IRAM (Instruction RAM): This memory is connected to the CPU’s instruction bus. Placing code in IRAM allows for significantly faster execution compared to running it from external flash memory, as it bypasses the potential bottleneck of the flash cache.
    • RTC Memory: A small amount of RAM (8 KB FAST, 8 KB SLOW) is located in the Real-Time Clock (RTC) power domain. This memory retains its contents during deep sleep, allowing the ULP coprocessor to operate or for the main CPUs to store state before sleeping.

2. System Bus Architecture

The CPUs, memory, and peripherals are not directly connected. They communicate through a complex bus matrix. This matrix acts like an intelligent telephone exchange, routing requests from “masters” (like a CPU or a DMA controller) to “slaves” (like a block of SRAM or a peripheral’s registers).

%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'Open Sans, sans-serif'}}}%%
flowchart TD
    subgraph "Masters (Initiate Access)"
        direction LR
        M1(<b>PRO_CPU</b><br>Core 0)
        M2(<b>APP_CPU</b><br>Core 1)
        M3(DMA<br>Controller)
    end

    subgraph "Slaves (Respond to Access)"
        direction LR
        S1(Internal RAM<br><i>DRAM / IRAM</i>)
        S2(Internal ROM)
        S3(Flash Controller)
        S4(Peripherals<br><i>UART, SPI, etc.</i>)
        S5(RTC Block)
    end

    BusMatrix{<B>Bus Matrix</B><br>Intelligent Arbiter}

    M1 -- Request --> BusMatrix
    M2 -- Request --> BusMatrix
    M3 -- Request --> BusMatrix

    BusMatrix -- "Grant & Route Access" --> S1
    BusMatrix --> S2
    BusMatrix --> S3
    BusMatrix --> S4
    BusMatrix --> S5
    
    %% Styling %%
    classDef master fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF;
    classDef slave fill:#EDE9FE,stroke:#5B21B6,stroke-width:1px,color:#5B21B6;
    classDef matrix fill:#FEF3C7,stroke:#D97706,stroke-width:2px,color:#92400E;
    
    class M1,M2,M3 master;
    class S1,S2,S3,S4,S5 slave;
    class BusMatrix matrix;

The ESP32’s matrix allows for simultaneous access as long as the master and slave paths do not conflict. For example, APP_CPU can execute a mathematical computation using its internal registers while PRO_CPU fetches data from the Wi-Fi MAC peripheral via the bus. However, if both CPUs attempt to write to the same block of SRAM at the exact same time, one will be momentarily stalled by the bus arbiter. Understanding this is key to diagnosing advanced performance issues.

3. Peripherals and Hardware Accelerators

The original ESP32 is packed with a wide array of peripherals. These are not just connected to the GPIO pins; they are distinct hardware blocks within the SoC.

  • Connectivity: Wi-Fi (802.11 b/g/n) and Bluetooth (Classic v4.2 and BLE).
  • Standard Interfaces: UART, SPI, I2C, I2S, RMT (Remote Control), SD/MMC Host.
  • Analog: Two 12-bit SAR ADCs, two 8-bit DACs, Hall Effect sensor, and 10 capacitive touch GPIOs.
  • Security: Hardware acceleration for AES, SHA, RSA, and a Random Number Generator (RNG). These are crucial for implementing features like Flash Encryption and Secure Boot efficiently.

4. The Ultra Low-Power (ULP) Coprocessor

One of the most innovative features of the ESP32 is its ULP coprocessor. This is a tiny, simple processor that can execute a program while the main dual-core CPUs are in a deep sleep state, consuming only microamperes of current.

%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'Open Sans, sans-serif'}}}%%
graph TD
    A(Start: Main CPUs Running) --> B{Enter Deep Sleep?};
    B -- Yes --> C[<b>Main CPUs Power Down</b><br>State saved to RTC Memory];
    B -- No --> A;

    C --> D(<b>ULP Coprocessor Takes Over</b><br>Runs program from RTC Memory);
    
    subgraph "ULP Program Loop (μA current)"
      direction TB
      D --> E{Periodically<br>Check Sensor/GPIO};
      E --> F{Wake-up Condition Met?<br>e.g., moisture < threshold};
      F -- No --> E;
    end
    
    F -- Yes --> G[ULP issues wake-up signal];
    G --> H(<b>Main CPUs Power On</b><br>Restores state from RTC Memory);
    H --> I["Execute Main Task<br>e.g., Send Wi-Fi Alert"];
    I --> A;

    %% Styling %%
    classDef start fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6;
    classDef endo fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46;
    classDef decision fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E;
    classDef process fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF;
    classDef check fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B;

    class A,I start;
    class H endo;
    class B,F decision;
    class C,D,G process;
    class E check;

Its purpose is to periodically perform simple tasks—such as polling a sensor or checking a GPIO state—and decide whether to wake the main CPUs. For example, you could program the ULP to read an analog moisture sensor every 10 minutes. It will only wake the power-hungry main system to activate Wi-Fi and send a notification if the moisture level drops below a critical threshold. This enables long battery life in applications that require intermittent monitoring. The ULP code is written in a special assembly language and stored in the RTC memory.

Practical Examples

Let’s demonstrate two architectural concepts: using both cores for parallel processing and accelerating code with IRAM.

Example 1: Dual-Core Processing in Action

This example runs two computationally intensive tasks and pins them to separate cores, demonstrating true parallelism.

1. Code: Create a new project in VS Code and replace the contents of main.c with the following.

C
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
#include "esp_timer.h"

static const char *TAG = "DUAL_CORE_TEST";

// A computationally intensive dummy task
void intensive_task(void *pvParameters)
{
    int core_id = xPortGetCoreID();
    uint32_t task_num = (uint32_t)pvParameters;
    volatile uint64_t counter = 0;

    ESP_LOGI(TAG, "Task %lu started on Core %d", task_num, core_id);

    for (int i = 0; i < 200000000; i++) {
        counter++;
    }

    ESP_LOGI(TAG, "Task %lu finished on Core %d", task_num, core_id);
    vTaskDelete(NULL);
}

void app_main(void)
{
    ESP_LOGI(TAG, "Starting dual-core performance test.");

    // Get the start time
    int64_t start_time = esp_timer_get_time();

    // Create and pin Task 1 to Core 0
    xTaskCreatePinnedToCore(
        intensive_task,   // Task function
        "IntensiveTask1", // Task name
        4096,             // Stack size
        (void *)1,        // Task parameter (task number)
        5,                // Priority
        NULL,             // Task handle
        0                 // Core ID (PRO_CPU)
    );

    // Create and pin Task 2 to Core 1
    xTaskCreatePinnedToCore(
        intensive_task,   // Task function
        "IntensiveTask2", // Task name
        4096,             // Stack size
        (void *)2,        // Task parameter (task number)
        5,                // Priority
        NULL,             // Task handle
        1                 // Core ID (APP_CPU)
    );

    // Note: In a real app, we'd wait for tasks to finish.
    // Here, we just log the start and observe the finish logs.
    // We add a delay to let the tasks run.
    vTaskDelay(pdMS_TO_TICKS(10000));

    // The finish logs from the tasks will show they run in parallel.
    // Try changing both tasks to run on the same core and observe the time difference.
}

2. Build and Flash:

  1. Connect your ESP32 board.
  2. Open the VS Code command palette (Ctrl+Shift+P).
  3. Select “ESP-IDF: Build, Flash and Monitor”.

3. Observe:

You will see logs showing that both tasks start almost simultaneously, each on its designated core. They will also finish at roughly the same time, because they were executing in parallel.

Plaintext
I (301) DUAL_CORE_TEST: Starting dual-core performance test.
I (311) DUAL_CORE_TEST: Task 1 started on Core 0
I (311) DUAL_CORE_TEST: Task 2 started on Core 1
...
I (4151) DUAL_CORE_TEST: Task 1 finished on Core 0
I (4161) DUAL_CORE_TEST: Task 2 finished on Core 1

Experiment: Modify the code to run both tasks on Core 1. You will observe that Task 2 does not start until Task 1 has completely finished, and the total execution time will be roughly double.

Example 2: Speed-Critical Code in IRAM

This example shows the performance benefit of placing a function in IRAM.

1. Code:

C
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_attr.h"
#include "esp_log.h"
#include "esp_timer.h"
#include "soc/soc.h"

static const char *TAG = "IRAM_TEST";
#define ITERATIONS 10000000

// This function will be placed in IRAM
void IRAM_ATTR fast_function() {
    __asm__ __volatile__ ("nop");
}

// This function will be executed from flash
void slow_function() {
    __asm__ __volatile__ ("nop");
}


void app_main(void)
{
    volatile int i;
    int64_t start_time, end_time;

    ESP_LOGI(TAG, "Starting performance comparison...");

    // --- Test 1: Function in Flash ---
    start_time = esp_timer_get_time();
    for (i = 0; i < ITERATIONS; i++) {
        slow_function();
    }
    end_time = esp_timer_get_time();
    ESP_LOGI(TAG, "Function in Flash took %lld microseconds.", (end_time - start_time));


    // --- Test 2: Function in IRAM ---
    start_time = esp_timer_get_time();
    for (i = 0; i < ITERATIONS; i++) {
        fast_function();
    }
    end_time = esp_timer_get_time();
    ESP_LOGI(TAG, "Function in IRAM took %lld microseconds.", (end_time - start_time));
}

2. Build and Flash: Use the standard “ESP-IDF: Build, Flash and Monitor” command.

3. Observe:

The monitor output will clearly show that the function marked with IRAM_ATTR executes significantly faster. The exact numbers will vary, but the difference will be substantial. This is because the CPU can fetch instructions directly from the high-speed internal RAM instead of going through the cache to access the slower external flash memory.

Plaintext
I (301) IRAM_TEST: Starting performance comparison...
I (1021) IRAM_TEST: Function in Flash took 720123 microseconds.
I (1151) IRAM_TEST: Function in IRAM took 130056 microseconds.

Variant Notes

The original ESP32 set the stage, but newer variants have evolved its architecture significantly.

Feature / Variant ESP32 (Original) ESP32-S2 ESP32-S3 ESP32-C3 ESP32-C6 / H2
CPU Core(s) Dual-Core Xtensa LX6 Single-Core Xtensa LX7 Dual-Core Xtensa LX7 Single-Core RISC-V Single-Core RISC-V
CPU Arch Xtensa Xtensa Xtensa RISC-V RISC-V
Bluetooth Classic + BLE 4.2 No BLE 5.0 BLE 5.0 BLE 5.0
Wi-Fi Wi-Fi 4 Wi-Fi 4 Wi-Fi 4 Wi-Fi 4 Wi-Fi 6
802.15.4 (Thread/Zigbee) No No No No Yes
USB No (UART bridge) Yes (OTG) Yes (OTG) No (UART bridge) No (UART bridge)
AI Acceleration No No Yes (Vector Instructions) No No
Primary Focus General Purpose IoT Low-Power, HMI AIoT, HMI Cost-Effective IoT Next-Gen Connectivity

Common Mistakes & Troubleshooting Tips

Mistake / Issue Symptom(s) Troubleshooting / Solution
Forgetting Core Affinity Unpredictable behavior, race conditions, crashes when tasks share resources. Be explicit. Use xTaskCreatePinnedToCore() for tasks needing deterministic placement. Use the debugger or logs (xPortGetCoreID()) to verify which core a task runs on.
Blocking Calls on Core 0 Wi-Fi/Bluetooth disconnects, hangs, or has high latency. System seems unstable under network load. Keep Core 0 (PRO_CPU) free for network stacks. Run application logic on Core 1 (APP_CPU) by default. Offload heavy work from network event handlers to a dedicated app task.
Inefficient Memory Usage Build fails with IRAM section overflow errors. Slower than expected performance. Profile first. Use IRAM_ATTR only for provably critical functions like ISRs or high-frequency loops. Use DRAM_ATTR for data that needs to be in RAM but isn’t executable code.
Misunderstanding ULP Limitations ULP code fails to compile or run. Trying to use standard C functions or complex logic. Think of the ULP as a simple gatekeeper. Use its special assembly language for basic checks (e.g., read ADC, check GPIO) to decide when to wake the main CPUs. All complex logic belongs on the main system.

Exercises

  1. ULP Wake-up: Read the ESP-IDF documentation for the ULP coprocessor. Write a simple application where the ESP32 goes into deep sleep. Program the ULP (using the provided assembly macros) to periodically check the state of GPIO0. If GPIO0 is pulled to ground, the ULP should wake the main CPUs, which will then print a “Woken up by ULP!” message to the console.
  2. Core Performance Analysis: Expand on the first practical example. Create three tasks: Task A (calculates prime numbers), Task B (computes a Fast Fourier Transform on a sample array), and Task C (waits on a queue for results). Run an experiment with the following configurations and measure the total time taken to complete both computations:
    • A and B pinned to Core 1 (APP_CPU).
    • A pinned to Core 0, B pinned to Core 1.
    • A pinned to Core 1, B pinned to Core 0.Log the results and explain why the dual-core configuration is faster and if there’s any performance difference between the two asymmetric configurations.

Summary

  • The original ESP32 SoC is built around a powerful dual-core Xtensa LX6 architecture, enabling true parallel processing for application and network stacks.
  • It contains 520 KB of SRAM, logically divided into instruction RAM (IRAM) for fast code execution and data RAM (DRAM) for variables.
  • bus matrix manages access between CPUs, memory, and peripherals, allowing for a high degree of concurrent operation.
  • A unique Ultra Low-Power (ULP) coprocessor allows for sensor monitoring and simple tasks while the main CPUs are in deep sleep, enabling excellent battery life.
  • The ESP32 includes a rich set of peripherals, including Wi-Fi, dual-mode Bluetooth, and hardware accelerators for security operations.
  • While foundational, the ESP32’s architecture differs from newer variants, which may offer RISC-V coresWi-Fi 6802.15.4 radio support, or native USB OTG.

Further Reading

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top