Chapter 250: Custom Exception and Crash Handlers

Chapter ObjectivesIn

By the end of this chapter, you will be able to:

Understand what CPU exceptions are and why they cause system crashes.
Describe the behavior of the default ESP-IDF panic handler.
Implement a custom handler function that executes when a crash occurs.
Extract critical information, like the program counter and cause code, from an exception frame.
Save crash data to persistent storage for post-mortem analysis.
Differentiate between exception handling on Xtensa and RISC-V based ESP32 variants.

Introduction

In an ideal world, our firmware would be perfect and never crash. In the real world, however, bugs happen. Unexpected inputs, hardware faults, or subtle race conditions can lead to unrecoverable errors that halt the system. When this occurs, the ESP-IDF’s default behavior is to print a “Guru Meditation Error” to the console, providing a wealth of diagnostic information for a developer connected via serial port.

But what happens when a device is deployed in the field, far away from a developer’s computer? A cryptic crash loop is of no use to the end-user and provides no feedback to the engineering team. This is where custom exception handlers become indispensable. By registering our own handler, we can intercept the crash process to perform critical last-rites: logging the cause to persistent memory for later retrieval, putting the system into a safe state, or even displaying a user-friendly error message before rebooting. This chapter will teach you how to tame system failures and build more resilient, field-ready devices.

Theory

1. What is a CPU Exception?

A CPU exception is a condition that disrupts the normal, sequential execution of instructions. It’s the hardware’s way of saying, “I can’t proceed.” Common causes include:

Illegal Instruction: The CPU tries to execute an invalid or privileged opcode.
Memory Access Violation: The program attempts to read from or write to a memory address it doesn’t have permission to access (e.g., writing to read-only memory or accessing a null pointer).
Divide by Zero: An attempt to perform a mathematical impossibility.

When such an event occurs, the CPU automatically halts the current program flow, saves the state of its internal registers (most importantly, the Program Counter PC and the reason for the exception), and jumps to a special, pre-defined address to execute an exception handler.

2. The ESP-IDF Panic Handler

ESP-IDF provides a default exception handler, often called the panic handler. Its behavior is designed for developers during the debugging phase.

After printing this information, the default handler disables interrupts on the crashing core and enters an infinite loop. This preserves the state for a developer to inspect with a JTAG debugger but will eventually cause the watchdog timer (WDT) to fire and reboot the device.

graph TD
    subgraph "ESP-IDF Default Panic Handler Output"
        direction TB
        A["<b>Guru Meditation Error: Core 0 panic'd (IllegalInstruction)</b>"]
        B["<b>Core 0 Register Dump</b><br>PC: 0x400d1234<br>PS: 0x00060030<br>A0: 0x800d14b8 ... etc."]
        C["<b>Backtrace</b><br>0x400d1234:0x3ffb1f30<br>0x400d12ab:0x3ffb1f50<br>0x400d5555:0x3ffb1f70 ..."]
        D["<b>Rebooting...</b><br>(via Watchdog Timer)"]

        A -- "Leads to Register Snapshot" --> B
        B -- "Used to generate" --> C
        C -- "After printing, system hangs until" --> D
    end

    subgraph "Key Information for Developers"
        P["<b>Program Counter (PC)</b><br>Address of the instruction<br>that caused the crash."]
        S["<b>Call Stack (Backtrace)</b><br>Sequence of function calls<br>leading to the error."]
        R["<b>Exception Cause</b><br>The specific type of error,<br>e.g., 'IllegalInstruction'."]
    end

    B --> P
    C --> S
    A --> R

    classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px;
    classDef keyInfo fill:#FFFBEB,stroke:#F59E0B,stroke-width:2px,color:#B45309;
    class A,B,C,D default;
    class P,S,R keyInfo;

    style A fill:#FEE2E2,stroke:#DC2626,stroke-width:1.5px,color:#991B1B
    style B fill:#DBEAFE,stroke:#2563EB,stroke-width:1.5px,color:#1E40AF
    style C fill:#DBEAFE,stroke:#2563EB,stroke-width:1.5px,color:#1E40AF
    style D fill:#E5E7EB,stroke:#4B5563,stroke-width:1.5px,color:#1F2937

3. Intercepting the Panic: Custom Handlers

The ESP-IDF allows us to register our own function to be called at the very beginning of the panic handling process. This gives us a golden opportunity to execute our own logic before the default handler takes over.

The key function is esp_panic_handler_register:

void esp_panic_handler_register(esp_panic_handler_t handler);

The esp_panic_handler_t is a function pointer type defined as void (*)(void *). The void * argument passed to our custom handler is a pointer to the exception frame—the data structure where the CPU saved all its registers.

Our custom handler’s primary responsibility is to quickly and robustly save the most critical information from this frame into a non-volatile storage location (like a flash partition or NVS) before the system reboots.

Warning: Code within a panic handler must be extremely simple and robust. The system state is unstable; many FreeRTOS features (like mutexes) or complex drivers may not be safe to use. The handler code itself must not cause another exception.

Practical Examples

Let’s build a system that catches a crash, saves the core details, and reports them on the next boot.

1. Forcing a Crash

graph TD
    subgraph "Device Runtime"
        A(Start: app_main) --> B{Normal Operation};
        B --> C(Crash Occurs!);
    end

    subgraph "Panic Sequence"
        C -- "HW Exception" --> D[Custom Panic Handler Executes];
        D --> E{Find Crash Partition};
        E -- "Success" --> F["Populate Log Struct<br><i>(PC, Cause, etc.)</i>"];
        F --> G[Write Log to Flash];
        G --> H["Default Panic Handler<br><i>(Prints Guru Meditation Error)</i>"];
        H --> I(System Reboots);
    end
    
    subgraph "Next Boot Sequence"
        J(Start: app_main) --> K{Check for Crash Log};
        K -- "Log Found & Valid" --> L["Read & Report Crash<br><i>(via ESP_LOGE)</i>"];
        L --> M[Erase Crash Log];
        M --> N(Continue Normal Boot);
        K -- "No Log Found" --> N;
    end

    classDef start-node fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6;
    classDef process-node fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF;
    classDef decision-node fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E;
    classDef check-node fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B;
    classDef success-node fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46;

    class A,J start-node;
    class C check-node;
    class B,D,F,G,H,I,L,M,N process-node;
    class E,K decision-node;

First, we need a reliable way to trigger an exception. Dereferencing a null pointer is a classic and effective method.

// This function will cause a store-prohibited exception.
void cause_a_crash() {
    ESP_LOGI("CRASH", "About to write to a null pointer...");
    // Writing to address 0 is forbidden.
    volatile int *bad_pointer = NULL;
    *bad_pointer = 42;
}

2. Creating a Crash Log Partition

We need a dedicated place in flash to store the crash log.

Create a file named partitions_crash.csv in your project root:# Name, Type, SubType, Offset, Size, Flags nvs, data, nvs, , 24K, phy_init, data, phy, , 4K, factory, app, factory, , 2M, crash_log,data, 0x42, , 4K,We’ve added a 4KB crash_log partition with a custom subtype 0x42.
In menuconfig, go to Partition Table —> and select Custom partition table CSV, ensuring the filename matches.

Name	Type	SubType	Size
nvs	data	nvs	24K
phy_init	data	phy	4K
factory	app	factory	2M
crash_log	data	0x42	4K

3. The Custom Handler and Reporting Logic

Now for the main application. We will define a structure for our crash log, implement the handler to write it, and add logic in app_main to read it on boot.

Modify your main/main.c:

#include <stdio.h>
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
#include "esp_partition.h"
#include "esp_panic.h"
#include "esp_system.h"

// Use a specific magic number to identify a valid crash log.
#define CRASH_LOG_MAGIC 0xBAD0C0DE

// Define a simple struct to hold the most important crash info.
typedef struct {
    uint32_t magic;
    uint32_t pc;      // Program Counter
    uint32_t cause;   // Exception Cause
    uint32_t vaddr;   // Associated Virtual Address
} crash_log_t;

static const char *TAG = "PANIC_HANDLER";

// Our custom handler function. It will be called on a crash.
void custom_panic_handler(void *frame) {
    // 1. Find the dedicated crash log partition.
    const esp_partition_t* partition = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x42, "crash_log");
    if (!partition) {
        // Cannot find partition, can't do anything else.
        return;
    }

    // 2. Populate the crash log structure.
    crash_log_t log;
    log.magic = CRASH_LOG_MAGIC;

    // This part is architecture-specific (Xtensa vs RISC-V).
    #if CONFIG_IDF_TARGET_ARCH_XTENSA
    XtensaExcFrame *xt_frame = (XtensaExcFrame *)frame;
    log.pc = xt_frame->pc;
    log.cause = xt_frame->exccause;
    log.vaddr = xt_frame->excvaddr;
    #else // Assuming RISC-V
    // RISC-V frame structure is part of the toolchain includes, but less standardized.
    // We extract registers from the frame pointer. This is a simplified example.
    uint32_t *regs = (uint32_t *)frame;
    log.pc = regs[1]; // mepc
    log.cause = regs[2]; // mcause
    log.vaddr = regs[3]; // mtval
    #endif

    // 3. Erase the partition sector before writing.
    esp_partition_erase_range(partition, 0, sizeof(crash_log_t));

    // 4. Write the crash log struct to the partition.
    esp_partition_write(partition, 0, &log, sizeof(crash_log_t));

    // NOTE: The default handler will run after this, printing the full log
    // and eventually rebooting the device.
}

// This function checks for a crash log on boot.
void check_and_report_crash() {
    const esp_partition_t* partition = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x42, "crash_log");
    if (!partition) {
        ESP_LOGE(TAG, "Crash log partition not found!");
        return;
    }

    crash_log_t log;
    esp_err_t err = esp_partition_read(partition, 0, &log, sizeof(crash_log_t));
    
    if (err == ESP_OK && log.magic == CRASH_LOG_MAGIC) {
        ESP_LOGE(TAG, "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!");
        ESP_LOGE(TAG, "!! PREVIOUS BOOT CRASHED !!");
        ESP_LOGE(TAG, "!! Cause: 0x%08X, PC: 0x%08X, VAddr: 0x%08X", log.cause, log.pc, log.vaddr);
        ESP_LOGE(TAG, "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!");

        // Erase the log so we don't report it again.
        esp_partition_erase_range(partition, 0, sizeof(crash_log_t));
    } else {
        ESP_LOGI(TAG, "No previous crash log found. Booting normally.");
    }
}

// Function to reliably cause a crash.
void cause_a_crash() {
    ESP_LOGW(TAG, "Triggering a crash in 3 seconds by writing to a NULL pointer...");
    vTaskDelay(pdMS_TO_TICKS(3000));
    volatile int *bad_pointer = NULL;
    *bad_pointer = 42;
}

void app_main(void)
{
    // Check for a crash log from the previous boot.
    check_and_report_crash();
    
    // Register our custom handler.
    esp_panic_handler_register(custom_panic_handler);
    ESP_LOGI(TAG, "Custom panic handler registered.");

    // Now, let's cause a crash to test it.
    cause_a_crash();
}

4. Build, Flash, and Monitor

Build and Flash the project with the new partition table.
Monitor: Watch the serial output. You will see it crash.

First Boot (The Crash):

Plaintext

...
I (314) PANIC_HANDLER: Custom panic handler registered.
W (314) PANIC_HANDLER: Triggering a crash in 3 seconds by writing to a NULL pointer...
Guru Meditation Error: Core 0 panic'd (StoreProhibited). Exception was unhandled.
...
Core 0 register dump:
PC      : 0x400d14b4  PS      : 0x00060030  A0      : 0x800d12e8  A1      : 0x3ffb1f40
...

The device crashes and reboots. Our custom handler ran silently in the background and saved the log.

Second Boot (The Report):

Plaintext

...
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
...
E (288) PANIC_HANDLER: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
E (288) PANIC_HANDLER: !! PREVIOUS BOOT CRASHED !!
E (298) PANIC_HANDLER: !! Cause: 0x0000001d, PC: 0x400d14b4, VAddr: 0x00000000
E (308) PANIC_HANDLER: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I (318) PANIC_HANDLER: Custom panic handler registered.
W (318) PANIC_HANDLER: Triggering a crash in 3 seconds...

Success! On the next boot, our check_and_report_crash function found the log, printed the critical details, and erased it.

Variant Notes

The concept of exception handling is universal, but the low-level details are specific to the CPU architecture.

Xtensa (ESP32, S2, S3): This architecture has a well-defined exception frame structure, XtensaExcFrame, available from <xtensa/xtensa_context.h>. The exception cause is stored in the exccause member, and the faulting memory address (if applicable) is in excvaddr. The cause codes are specific to Xtensa. For example, 29 (STORE_PROHIBITED) is the cause for writing to a bad address.
RISC-V (ESP32-C3, C6, H2): This architecture also has an exception frame, but its structure is less standardized in a single header. The key registers are mepc (Machine Exception Program Counter), mcause (Machine Cause), and mtval (Machine Trap Value). The cause codes are defined by the RISC-V specification (e.g., 7 is “Store/AMO access fault”).

Feature	Xtensa (ESP32, S2, S3)	RISC-V (ESP32-C3, C6, H2)
Exception Frame	Standardized XtensaExcFrame struct	Pointer to saved registers on the stack (no standard struct name in public API)
Program Counter (PC)	frame->pc	regs[1] (mepc register)
Exception Cause	frame->exccause	regs[2] (mcause register)
Faulting Address	frame->excvaddr	regs[3] (mtval register)
Example Cause Code	29 (StoreProhibited)	7 (Store/AMO access fault)
Header File	<xtensa/xtensa_context.h>	(Varies, handled by toolchain)

The example code uses #if preprocessor directives to handle both cases, demonstrating how to write portable handler code.

Common Mistakes & Troubleshooting Tips

Mistake / Issue	Symptom(s)	Troubleshooting / Solution
Handler is Too Complex Using complex APIs like printf or Wi-Fi functions inside the handler.	Device hangs in the handler, or a “double fault” crash occurs (a second crash inside the first handler).	Keep it simple. Only use low-level, robust functions like esp_partition_write. Avoid any functions that use mutexes, delays, or complex drivers.
Forgetting to Erase Crash Log The startup code reads the log but never invalidates it.	The same crash is reported on every single boot, making it impossible to tell if a new crash has occurred.	Invalidate after reading. After successfully reading and reporting the log, immediately call esp_partition_erase_range or overwrite the magic number.
Ignoring Watchdog Timers (WDT) The handler code takes too long to execute.	The device reboots due to a WDT timeout before the handler finishes, resulting in a partial or completely missing crash log.	Be fast. The handler must execute very quickly. Avoid loops or long-running operations. If a delay is absolutely necessary (not recommended), you must feed the watchdog.
Misinterpreting the Exception Frame Using XtensaExcFrame on a RISC-V chip or vice-versa.	The PC, cause, and vaddr values in the crash log are garbage or zero, providing no useful debug information.	Use conditional compilation. Wrap architecture-specific code in #if CONFIG_IDF_TARGET_ARCH_XTENSA … #else … #endif blocks to use the correct structs and offsets.

Exercises

Save to NVS: Modify the example project. Instead of using a dedicated partition, save the pc and cause of the crash to the Non-Volatile Storage (NVS) library. This can be simpler if you only need to store a small amount of data. Remember that your handler will need to initialize NVS if it’s not already, adding complexity.
Decode the Cause: Enhance the check_and_report_crash function. Add a helper function that takes the cause code and prints a human-readable string (e.g., “Illegal Instruction”, “Load Prohibited”, “Store Prohibited”). You will need separate switch statements for Xtensa and RISC-V cause codes.
Graceful Failure Indicator: Implement a handler that, upon detecting a crash, configures a GPIO pin to blink an LED in an SOS pattern (...---...). This provides a physical, visual indicator that the device has failed, which can be invaluable for field diagnostics.

Cause Code (Decimal)	Xtensa Meaning	RISC-V Meaning
1	Illegal Instruction	Instruction access fault
2	Syscall	Illegal instruction
5	Load/Store Alignment Error	Load access fault
6	Load Prohibited	AMO address misaligned
7	Store Prohibited	Store/AMO access fault
9	Unaligned Instruction Address	Environment call from U-mode
20	Integer Divide by Zero	(Not a standard code)
28	LoadProhibited	(Not a standard code)
29	StoreProhibited	(Not a standard code)

Summary

Custom exception handlers provide a mechanism to manage system crashes gracefully, which is essential for deployed devices.
Use esp_panic_handler_register() to register a function that is called at the start of the crash-handling sequence.
The handler receives a pointer to the exception frame, which contains the saved CPU state at the time of the crash.
A robust handler should be minimal, avoid complex APIs, and quickly save critical information (like PC and Cause) to persistent storage.
The startup code must check for a saved crash log, report it, and then erase it to prepare for the next boot.
The low-level details of exception handling differ between Xtensa and RISC-V cores, requiring architecture-specific code.

Chapter 250: Custom Exception and Crash Handlers

Chapter ObjectivesIn

Introduction

Theory

1. What is a CPU Exception?

2. The ESP-IDF Panic Handler

3. Intercepting the Panic: Custom Handlers

Practical Examples

1. Forcing a Crash

2. Creating a Crash Log Partition

3. The Custom Handler and Reporting Logic

4. Build, Flash, and Monitor

Variant Notes

Common Mistakes & Troubleshooting Tips

Exercises

Summary

Further Reading

Leave a Comment Cancel Reply

Chapter 250: Custom Exception and Crash Handlers

Chapter ObjectivesIn

Introduction

Theory

1. What is a CPU Exception?

2. The ESP-IDF Panic Handler

3. Intercepting the Panic: Custom Handlers

Practical Examples

1. Forcing a Crash

2. Creating a Crash Log Partition

3. The Custom Handler and Reporting Logic

4. Build, Flash, and Monitor

Variant Notes

Common Mistakes & Troubleshooting Tips

Exercises

Summary

Further Reading

Related Posts

Leave a Comment Cancel Reply