Chapter 250: Custom Exception and Crash Handlers
Chapter ObjectivesIn
By the end of this chapter, you will be able to:
- Understand what CPU exceptions are and why they cause system crashes.
- Describe the behavior of the default ESP-IDF panic handler.
- Implement a custom handler function that executes when a crash occurs.
- Extract critical information, like the program counter and cause code, from an exception frame.
- Save crash data to persistent storage for post-mortem analysis.
- Differentiate between exception handling on Xtensa and RISC-V based ESP32 variants.
Introduction
In an ideal world, our firmware would be perfect and never crash. In the real world, however, bugs happen. Unexpected inputs, hardware faults, or subtle race conditions can lead to unrecoverable errors that halt the system. When this occurs, the ESP-IDF’s default behavior is to print a “Guru Meditation Error” to the console, providing a wealth of diagnostic information for a developer connected via serial port.
But what happens when a device is deployed in the field, far away from a developer’s computer? A cryptic crash loop is of no use to the end-user and provides no feedback to the engineering team. This is where custom exception handlers become indispensable. By registering our own handler, we can intercept the crash process to perform critical last-rites: logging the cause to persistent memory for later retrieval, putting the system into a safe state, or even displaying a user-friendly error message before rebooting. This chapter will teach you how to tame system failures and build more resilient, field-ready devices.
Theory
1. What is a CPU Exception?
A CPU exception is a condition that disrupts the normal, sequential execution of instructions. It’s the hardware’s way of saying, “I can’t proceed.” Common causes include:
- Illegal Instruction: The CPU tries to execute an invalid or privileged opcode.
- Memory Access Violation: The program attempts to read from or write to a memory address it doesn’t have permission to access (e.g., writing to read-only memory or accessing a null pointer).
- Divide by Zero: An attempt to perform a mathematical impossibility.
When such an event occurs, the CPU automatically halts the current program flow, saves the state of its internal registers (most importantly, the Program Counter PC
and the reason for the exception), and jumps to a special, pre-defined address to execute an exception handler.
2. The ESP-IDF Panic Handler
ESP-IDF provides a default exception handler, often called the panic handler. Its behavior is designed for developers during the debugging phase.
After printing this information, the default handler disables interrupts on the crashing core and enters an infinite loop. This preserves the state for a developer to inspect with a JTAG debugger but will eventually cause the watchdog timer (WDT) to fire and reboot the device.
graph TD subgraph "ESP-IDF Default Panic Handler Output" direction TB A["<b>Guru Meditation Error: Core 0 panic'd (IllegalInstruction)</b>"] B["<b>Core 0 Register Dump</b><br>PC: 0x400d1234<br>PS: 0x00060030<br>A0: 0x800d14b8 ... etc."] C["<b>Backtrace</b><br>0x400d1234:0x3ffb1f30<br>0x400d12ab:0x3ffb1f50<br>0x400d5555:0x3ffb1f70 ..."] D["<b>Rebooting...</b><br>(via Watchdog Timer)"] A -- "Leads to Register Snapshot" --> B B -- "Used to generate" --> C C -- "After printing, system hangs until" --> D end subgraph "Key Information for Developers" P["<b>Program Counter (PC)</b><br>Address of the instruction<br>that caused the crash."] S["<b>Call Stack (Backtrace)</b><br>Sequence of function calls<br>leading to the error."] R["<b>Exception Cause</b><br>The specific type of error,<br>e.g., 'IllegalInstruction'."] end B --> P C --> S A --> R classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px; classDef keyInfo fill:#FFFBEB,stroke:#F59E0B,stroke-width:2px,color:#B45309; class A,B,C,D default; class P,S,R keyInfo; style A fill:#FEE2E2,stroke:#DC2626,stroke-width:1.5px,color:#991B1B style B fill:#DBEAFE,stroke:#2563EB,stroke-width:1.5px,color:#1E40AF style C fill:#DBEAFE,stroke:#2563EB,stroke-width:1.5px,color:#1E40AF style D fill:#E5E7EB,stroke:#4B5563,stroke-width:1.5px,color:#1F2937
3. Intercepting the Panic: Custom Handlers
The ESP-IDF allows us to register our own function to be called at the very beginning of the panic handling process. This gives us a golden opportunity to execute our own logic before the default handler takes over.
The key function is esp_panic_handler_register
:
void esp_panic_handler_register(esp_panic_handler_t handler);
The esp_panic_handler_t
is a function pointer type defined as void (*)(void *)
. The void *
argument passed to our custom handler is a pointer to the exception frame—the data structure where the CPU saved all its registers.
Our custom handler’s primary responsibility is to quickly and robustly save the most critical information from this frame into a non-volatile storage location (like a flash partition or NVS) before the system reboots.
Warning: Code within a panic handler must be extremely simple and robust. The system state is unstable; many FreeRTOS features (like mutexes) or complex drivers may not be safe to use. The handler code itself must not cause another exception.
Practical Examples
Let’s build a system that catches a crash, saves the core details, and reports them on the next boot.
1. Forcing a Crash
graph TD subgraph "Device Runtime" A(Start: app_main) --> B{Normal Operation}; B --> C(Crash Occurs!); end subgraph "Panic Sequence" C -- "HW Exception" --> D[Custom Panic Handler Executes]; D --> E{Find Crash Partition}; E -- "Success" --> F["Populate Log Struct<br><i>(PC, Cause, etc.)</i>"]; F --> G[Write Log to Flash]; G --> H["Default Panic Handler<br><i>(Prints Guru Meditation Error)</i>"]; H --> I(System Reboots); end subgraph "Next Boot Sequence" J(Start: app_main) --> K{Check for Crash Log}; K -- "Log Found & Valid" --> L["Read & Report Crash<br><i>(via ESP_LOGE)</i>"]; L --> M[Erase Crash Log]; M --> N(Continue Normal Boot); K -- "No Log Found" --> N; end classDef start-node fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6; classDef process-node fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; classDef decision-node fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E; classDef check-node fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B; classDef success-node fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46; class A,J start-node; class C check-node; class B,D,F,G,H,I,L,M,N process-node; class E,K decision-node;
First, we need a reliable way to trigger an exception. Dereferencing a null pointer is a classic and effective method.
// This function will cause a store-prohibited exception.
void cause_a_crash() {
ESP_LOGI("CRASH", "About to write to a null pointer...");
// Writing to address 0 is forbidden.
volatile int *bad_pointer = NULL;
*bad_pointer = 42;
}
2. Creating a Crash Log Partition
We need a dedicated place in flash to store the crash log.
- Create a file named
partitions_crash.csv
in your project root:# Name, Type, SubType, Offset, Size, Flags nvs, data, nvs, , 24K, phy_init, data, phy, , 4K, factory, app, factory, , 2M, crash_log,data, 0x42, , 4K,
We’ve added a 4KBcrash_log
partition with a custom subtype0x42
. - In
menuconfig
, go to Partition Table —> and selectCustom partition table CSV
, ensuring the filename matches.
Name | Type | SubType | Offset | Size | Flags |
---|---|---|---|---|---|
nvs | data | nvs | 24K | ||
phy_init | data | phy | 4K | ||
factory | app | factory | 2M | ||
crash_log | data | 0x42 | 4K |
3. The Custom Handler and Reporting Logic
Now for the main application. We will define a structure for our crash log, implement the handler to write it, and add logic in app_main
to read it on boot.
Modify your main/main.c
:
#include <stdio.h>
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
#include "esp_partition.h"
#include "esp_panic.h"
#include "esp_system.h"
// Use a specific magic number to identify a valid crash log.
#define CRASH_LOG_MAGIC 0xBAD0C0DE
// Define a simple struct to hold the most important crash info.
typedef struct {
uint32_t magic;
uint32_t pc; // Program Counter
uint32_t cause; // Exception Cause
uint32_t vaddr; // Associated Virtual Address
} crash_log_t;
static const char *TAG = "PANIC_HANDLER";
// Our custom handler function. It will be called on a crash.
void custom_panic_handler(void *frame) {
// 1. Find the dedicated crash log partition.
const esp_partition_t* partition = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x42, "crash_log");
if (!partition) {
// Cannot find partition, can't do anything else.
return;
}
// 2. Populate the crash log structure.
crash_log_t log;
log.magic = CRASH_LOG_MAGIC;
// This part is architecture-specific (Xtensa vs RISC-V).
#if CONFIG_IDF_TARGET_ARCH_XTENSA
XtensaExcFrame *xt_frame = (XtensaExcFrame *)frame;
log.pc = xt_frame->pc;
log.cause = xt_frame->exccause;
log.vaddr = xt_frame->excvaddr;
#else // Assuming RISC-V
// RISC-V frame structure is part of the toolchain includes, but less standardized.
// We extract registers from the frame pointer. This is a simplified example.
uint32_t *regs = (uint32_t *)frame;
log.pc = regs[1]; // mepc
log.cause = regs[2]; // mcause
log.vaddr = regs[3]; // mtval
#endif
// 3. Erase the partition sector before writing.
esp_partition_erase_range(partition, 0, sizeof(crash_log_t));
// 4. Write the crash log struct to the partition.
esp_partition_write(partition, 0, &log, sizeof(crash_log_t));
// NOTE: The default handler will run after this, printing the full log
// and eventually rebooting the device.
}
// This function checks for a crash log on boot.
void check_and_report_crash() {
const esp_partition_t* partition = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x42, "crash_log");
if (!partition) {
ESP_LOGE(TAG, "Crash log partition not found!");
return;
}
crash_log_t log;
esp_err_t err = esp_partition_read(partition, 0, &log, sizeof(crash_log_t));
if (err == ESP_OK && log.magic == CRASH_LOG_MAGIC) {
ESP_LOGE(TAG, "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!");
ESP_LOGE(TAG, "!! PREVIOUS BOOT CRASHED !!");
ESP_LOGE(TAG, "!! Cause: 0x%08X, PC: 0x%08X, VAddr: 0x%08X", log.cause, log.pc, log.vaddr);
ESP_LOGE(TAG, "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!");
// Erase the log so we don't report it again.
esp_partition_erase_range(partition, 0, sizeof(crash_log_t));
} else {
ESP_LOGI(TAG, "No previous crash log found. Booting normally.");
}
}
// Function to reliably cause a crash.
void cause_a_crash() {
ESP_LOGW(TAG, "Triggering a crash in 3 seconds by writing to a NULL pointer...");
vTaskDelay(pdMS_TO_TICKS(3000));
volatile int *bad_pointer = NULL;
*bad_pointer = 42;
}
void app_main(void)
{
// Check for a crash log from the previous boot.
check_and_report_crash();
// Register our custom handler.
esp_panic_handler_register(custom_panic_handler);
ESP_LOGI(TAG, "Custom panic handler registered.");
// Now, let's cause a crash to test it.
cause_a_crash();
}
4. Build, Flash, and Monitor
- Build and Flash the project with the new partition table.
- Monitor: Watch the serial output. You will see it crash.
First Boot (The Crash):
...
I (314) PANIC_HANDLER: Custom panic handler registered.
W (314) PANIC_HANDLER: Triggering a crash in 3 seconds by writing to a NULL pointer...
Guru Meditation Error: Core 0 panic'd (StoreProhibited). Exception was unhandled.
...
Core 0 register dump:
PC : 0x400d14b4 PS : 0x00060030 A0 : 0x800d12e8 A1 : 0x3ffb1f40
...
The device crashes and reboots. Our custom handler ran silently in the background and saved the log.
Second Boot (The Report):
...
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
...
E (288) PANIC_HANDLER: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
E (288) PANIC_HANDLER: !! PREVIOUS BOOT CRASHED !!
E (298) PANIC_HANDLER: !! Cause: 0x0000001d, PC: 0x400d14b4, VAddr: 0x00000000
E (308) PANIC_HANDLER: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I (318) PANIC_HANDLER: Custom panic handler registered.
W (318) PANIC_HANDLER: Triggering a crash in 3 seconds...
Success! On the next boot, our check_and_report_crash
function found the log, printed the critical details, and erased it.
Variant Notes
The concept of exception handling is universal, but the low-level details are specific to the CPU architecture.
- Xtensa (ESP32, S2, S3): This architecture has a well-defined exception frame structure,
XtensaExcFrame
, available from<xtensa/xtensa_context.h>
. The exception cause is stored in theexccause
member, and the faulting memory address (if applicable) is inexcvaddr
. The cause codes are specific to Xtensa. For example,29
(STORE_PROHIBITED
) is the cause for writing to a bad address. - RISC-V (ESP32-C3, C6, H2): This architecture also has an exception frame, but its structure is less standardized in a single header. The key registers are
mepc
(Machine Exception Program Counter),mcause
(Machine Cause), andmtval
(Machine Trap Value). The cause codes are defined by the RISC-V specification (e.g.,7
is “Store/AMO access fault”).
Feature | Xtensa (ESP32, S2, S3) | RISC-V (ESP32-C3, C6, H2) |
---|---|---|
Exception Frame | Standardized XtensaExcFrame struct | Pointer to saved registers on the stack (no standard struct name in public API) |
Program Counter (PC) | frame->pc | regs[1] (mepc register) |
Exception Cause | frame->exccause | regs[2] (mcause register) |
Faulting Address | frame->excvaddr | regs[3] (mtval register) |
Example Cause Code | 29 (StoreProhibited) | 7 (Store/AMO access fault) |
Header File | <xtensa/xtensa_context.h> | (Varies, handled by toolchain) |
The example code uses #if
preprocessor directives to handle both cases, demonstrating how to write portable handler code.
Common Mistakes & Troubleshooting Tips
Mistake / Issue | Symptom(s) | Troubleshooting / Solution |
---|---|---|
Handler is Too Complex Using complex APIs like printf or Wi-Fi functions inside the handler. |
Device hangs in the handler, or a “double fault” crash occurs (a second crash inside the first handler). | Keep it simple. Only use low-level, robust functions like esp_partition_write. Avoid any functions that use mutexes, delays, or complex drivers. |
Forgetting to Erase Crash Log The startup code reads the log but never invalidates it. |
The same crash is reported on every single boot, making it impossible to tell if a new crash has occurred. | Invalidate after reading. After successfully reading and reporting the log, immediately call esp_partition_erase_range or overwrite the magic number. |
Ignoring Watchdog Timers (WDT) The handler code takes too long to execute. |
The device reboots due to a WDT timeout before the handler finishes, resulting in a partial or completely missing crash log. | Be fast. The handler must execute very quickly. Avoid loops or long-running operations. If a delay is absolutely necessary (not recommended), you must feed the watchdog. |
Misinterpreting the Exception Frame Using XtensaExcFrame on a RISC-V chip or vice-versa. |
The PC, cause, and vaddr values in the crash log are garbage or zero, providing no useful debug information. | Use conditional compilation. Wrap architecture-specific code in #if CONFIG_IDF_TARGET_ARCH_XTENSA … #else … #endif blocks to use the correct structs and offsets. |
Exercises
- Save to NVS: Modify the example project. Instead of using a dedicated partition, save the
pc
andcause
of the crash to the Non-Volatile Storage (NVS) library. This can be simpler if you only need to store a small amount of data. Remember that your handler will need to initialize NVS if it’s not already, adding complexity. - Decode the Cause: Enhance the
check_and_report_crash
function. Add a helper function that takes thecause
code and prints a human-readable string (e.g., “Illegal Instruction”, “Load Prohibited”, “Store Prohibited”). You will need separateswitch
statements for Xtensa and RISC-V cause codes. - Graceful Failure Indicator: Implement a handler that, upon detecting a crash, configures a GPIO pin to blink an LED in an SOS pattern (
...---...
). This provides a physical, visual indicator that the device has failed, which can be invaluable for field diagnostics.
Cause Code (Decimal) | Xtensa Meaning | RISC-V Meaning |
---|---|---|
1 | Illegal Instruction | Instruction access fault |
2 | Syscall | Illegal instruction |
5 | Load/Store Alignment Error | Load access fault |
6 | Load Prohibited | AMO address misaligned |
7 | Store Prohibited | Store/AMO access fault |
9 | Unaligned Instruction Address | Environment call from U-mode |
20 | Integer Divide by Zero | (Not a standard code) |
28 | LoadProhibited | (Not a standard code) |
29 | StoreProhibited | (Not a standard code) |
Summary
- Custom exception handlers provide a mechanism to manage system crashes gracefully, which is essential for deployed devices.
- Use
esp_panic_handler_register()
to register a function that is called at the start of the crash-handling sequence. - The handler receives a pointer to the exception frame, which contains the saved CPU state at the time of the crash.
- A robust handler should be minimal, avoid complex APIs, and quickly save critical information (like
PC
andCause
) to persistent storage. - The startup code must check for a saved crash log, report it, and then erase it to prepare for the next boot.
- The low-level details of exception handling differ between Xtensa and RISC-V cores, requiring architecture-specific code.
Further Reading
- ESP-IDF Panic Handler Handling: https://docs.espressif.com/projects/esp-idf/en/v5.2.1/api-reference/system/panic_handler.html
- ESP-IDF Core Dump: A more advanced, built-in mechanism for saving a much larger snapshot of system state to flash. https://docs.espressif.com/projects/esp-idf/en/v5.2.1/api-guides/core_dump.html
- The RISC-V ISA Specification (Volume 2, Privileged Architecture): For official details on
mcause
values. - Xtensa Instruction Set Architecture (ISA) Reference Manual: For official details on
exccause
values.