Chapter 243: Watchdog Timers Implementation

Chapter Objectives

By the end of this chapter, you will be able to:

  • Explain the purpose of a watchdog timer in ensuring system reliability.
  • Differentiate between the Interrupt Watchdog Timer (IWDT) and the Task Watchdog Timer (TWDT).
  • Configure the Task Watchdog Timer to monitor specific FreeRTOS tasks.
  • Correctly “feed” the watchdog to prevent system resets.
  • Analyze the system’s behavior when a watchdog timer triggers a panic.
  • Enable and disable watchdog monitoring for tasks at runtime.
  • Implement watchdog timers in your own applications to build more robust systems.

Introduction

In an ideal world, software would run flawlessly without ever freezing or getting stuck in an infinite loop. In the real world, however, complex embedded systems operating in unpredictable environments can encounter unforeseen states. A task might get blocked waiting for a resource that never becomes available, a sensor might fail to respond, or a logical error in the code could lead to a deadlock. When this happens in a deployed device, the system can become unresponsive, requiring a manual power cycle to recover—a solution that is often impractical or impossible.

This is where a watchdog timer becomes an indispensable tool. A watchdog is a hardware or software-based safety mechanism that acts as a fail-safe, automatically resetting the system if the main application becomes unresponsive. It is the electronic equivalent of a “dead man’s switch.” By correctly implementing watchdog timers, you can build self-recovering, highly reliable systems that can operate unattended for long periods.

This chapter will guide you through the implementation of the watchdog timers available in the ESP32 ecosystem, a fundamental skill for any professional embedded systems developer.

Theory

A watchdog timer (WDT) is essentially a hardware counter that counts down from an initial value to zero. The application software is responsible for periodically resetting this counter before it reaches zero. This act of resetting is commonly known as “kicking” or “feeding” the watchdog. If the software fails to feed the watchdog in time—perhaps because it has crashed or is stuck in a loop—the counter will reach zero, triggering a hardware reset or another predefined recovery action.

%%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#DBEAFE', 'primaryTextColor': '#1E40AF', 'primaryBorderColor': '#2563EB', 'lineColor': '#2563EB', 'textColor': '#1F2937' }}}%%
graph TD
    subgraph Watchdog Mechanism
        A[Application Code in<br>Main Loop or Task] -->|Feeds/Kicks| B{"Watchdog Timer<br>(Hardware Counter)"};
        B -->|Counts down...| C{Timer == 0?};
        C -- No --> B;
        C -- Yes --> D[MCU Reset!];
    end

    style A fill:#DBEAFE,stroke:#2563EB,stroke-width:2px,color:#1E40AF
    style B fill:#FEF3C7,stroke:#D97706,stroke-width:2px,color:#92400E
    style C fill:#FEE2E2,stroke:#DC2626,stroke-width:2px,color:#991B1B
    style D fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46

ESP-IDF provides two main watchdog timers to protect against different kinds of software failures, plus a third one for low-power modes.

1. The Interrupt Watchdog Timer (IWDT)

The Interrupt Watchdog Timer is a hardware watchdog that guards against a specific type of failure: a stalled CPU. Its primary purpose is to ensure that FreeRTOS interrupts are not disabled for an extended period. If the CPU spends too much time inside a critical section or an Interrupt Service Routine (ISR) with interrupts disabled, other essential system functions (like the FreeRTOS tick) cannot run. The IWDT is designed to detect this scenario.

  • Trigger Condition: Triggers if interrupts are disabled for a period longer than its configured timeout.
  • Feeding: It is fed automatically by the FreeRTOS tick ISR on each CPU core. As long as the scheduler is running, the IWDT is implicitly fed.
  • Configuration: The IWDT is enabled by default in menuconfig (CONFIG_ESP_INT_WDT). Its timeout is also configured there. For most applications, you do not need to interact with the IWDT directly; it works silently in the background.

2. The Task Watchdog Timer (TWDT)

While the IWDT ensures the scheduler is running, it cannot detect if a specific application task has crashed or stalled. A high-priority task could get stuck in an infinite loop, starving all lower-priority tasks, but since the scheduler tick interrupt is still running, the IWDT would not be triggered.

The Task Watchdog Timer is designed to solve this problem. It is a more flexible watchdog that monitors the health of individual FreeRTOS tasks.

  • Trigger Condition: Triggers if a subscribed task fails to feed the TWDT within its configured timeout period.
  • Feeding: A task must explicitly “subscribe” to the TWDT and then periodically call an API function to feed it.
  • Configuration: The TWDT is also enabled by default via menuconfig (CONFIG_ESP_TASK_WDT). You can configure its timeout period and choose whether it should be triggered when a task is idle for too long (CONFIG_ESP_TASK_WDT_CHECK_IDLE_TASK_CPU0/CPU1).

How the Task Watchdog Works

  1. Initialization: The application first initializes the TWDT, setting a global timeout period (e.g., 5 seconds).
  2. Subscription: A task that needs to be monitored is “added” or “subscribed” to the watchdog timer.
  3. Feeding: The subscribed task must then call esp_task_wdt_reset() periodically within its main loop. This call must occur more frequently than the watchdog timeout period.
  4. Panic: If the task fails to call esp_task_wdt_reset() within the timeout period, the TWDT will assume the task is stalled. It will trigger a system “panic,” printing a detailed report of all task states to the console and then resetting the chip.
%%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#DBEAFE', 'primaryTextColor': '#1E40AF', 'primaryBorderColor': '#2563EB', 'lineColor': '#2563EB', 'textColor': '#1F2937', 'actorBorder': '#2563EB' }}}%%
sequenceDiagram
    actor Main as app_main
    actor Task as Monitored Task

    Main->>Main: esp_task_wdt_init()
    note right of Main: Configure global timeout<br>and panic behavior.
    Main->>Task: xTaskCreate()
    
    activate Task
    Task->>Task: esp_task_wdt_add(NULL)
    note left of Task: Subscribe self to TWDT.

    loop Healthy Operation
        Task->>Task: Do work...
        Task->>Task: esp_task_wdt_reset()
        note left of Task: Feed the watchdog.
        Task-->>Task: vTaskDelay()
    end

    alt Task Stalls (e.g., infinite loop)
        Task->>Task: Do work...
        Task->>Task: <b>Task gets stuck!</b>
        note over Task: Fails to call esp_task_wdt_reset()
        critical System Panics
            Note over Main,Task: TWDT triggers a system reset!
        end
    end
    deactivate Task

This mechanism ensures that not just the system scheduler but the core application logic is executing as expected.

3. RTC Watchdog (RWDT)

The RTC Watchdog is another hardware watchdog located in the RTC power domain. Its primary role is to monitor the system during deep sleep and to recover the main digital domain if the main code fails to run correctly after a wake-up. It is generally handled automatically by the sleep and bootloader components and is less commonly interacted with directly in application code.

Feature Interrupt Watchdog (IWDT) Task Watchdog (TWDT) RTC Watchdog (RWDT)
Purpose Protects against stalled CPUs or disabled interrupts. Protects against stalled or crashed application tasks. Protects against failures during deep sleep and boot-up.
Trigger Condition Interrupts are disabled for too long, preventing the scheduler from running. A subscribed task fails to “feed” the watchdog within the timeout period. The main system fails to boot or wake from sleep correctly.
Feeding Mechanism Fed automatically by the FreeRTOS tick ISR. No user code needed. Must be fed explicitly by application tasks via esp_task_wdt_reset(). Handled automatically by the bootloader and deep sleep logic.
Primary Use Case Ensuring the core OS scheduler remains responsive. It’s a low-level safeguard. Monitoring the health and responsiveness of specific application logic. Recovering from sleep-wake cycles or catastrophic boot failures.

Practical Examples

Let’s put the Task Watchdog Timer into practice. We will create a simple task and monitor it with the TWDT.

Example 1: Basic Task Monitoring

In this example, we will configure the TWDT to watch a single task that correctly feeds it.

1. Create a New Project

Use the “ESP-IDF: New Project” command in VS Code to create a new project.

2. Configure the Watchdog

  1. Open the project configuration with the “ESP-IDF: menuconfig” command.
  2. Navigate to Component config -> ESP System Settings.
  3. Ensure Task Watchdog Timer is enabled ([*]).
  4. Set the Task Watchdog timeout period (seconds) to 5.
  5. Save the configuration and exit.

3. Write the Application Code

Replace the contents of main/main.c with the following:

C
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_task_wdt.h"
#include "esp_log.h"

static const char *TAG = "WDT_EXAMPLE";

void monitored_task(void *pvParameters)
{
    ESP_LOGI(TAG, "Monitored task started.");

    // Subscribe this task to the TWDT.
    // This is a one-time setup per task.
    ESP_ERROR_CHECK(esp_task_wdt_add(NULL));
    ESP_LOGI(TAG, "Subscribed to Task Watchdog Timer.");

    // This task must now feed the watchdog periodically.
    while (1) {
        ESP_LOGI(TAG, "Feeding the watchdog...");
        esp_task_wdt_reset(); // Feed the TWDT

        // Do some work...
        vTaskDelay(pdMS_TO_TICKS(2000)); // Delay for 2 seconds
    }
}

void app_main(void)
{
    ESP_LOGI(TAG, "Initializing Task Watchdog Timer...");
    // Initialize the TWDT with a 5-second timeout and enable panic on timeout
    esp_task_wdt_config_t twdt_config = {
        .timeout_ms = 5000,
        .idle_core_mask = (1 << 0) | (1 << 1), // Check idle tasks on both cores
        .trigger_panic = true,
    };
    ESP_ERROR_CHECK(esp_task_wdt_init(&twdt_config));
    ESP_LOGI(TAG, "Task Watchdog Timer initialized.");

    // Create the task to be monitored
    xTaskCreate(monitored_task, "monitored_task", 4096, NULL, 5, NULL);
}

Code Explanation:

  • esp_task_wdt_init(): In app_main, we initialize the TWDT with a configuration structure. We set a timeout of 5000 ms and specify that it should trigger a panic if it times out.
  • xTaskCreate(): We create our monitored_task.
  • esp_task_wdt_add(NULL): Inside the task, we subscribe it to the TWDT. Passing NULL as the handle subscribes the currently running task.
  • esp_task_wdt_reset(): This is the crucial “feed” call. Our task calls this every 2 seconds, which is well within the 5-second timeout, so the system runs indefinitely.

4. Build, Flash, and Monitor

Run the “Build, Flash, and Monitor” task in VS Code.

Observe the Output:

You will see the following messages repeat every 2 seconds without any resets:

Plaintext
I (278) WDT_EXAMPLE: Initializing Task Watchdog Timer...
I (288) WDT_EXAMPLE: Task Watchdog Timer initialized.
I (288) WDT_EXAMPLE: Monitored task started.
I (298) WDT_EXAMPLE: Subscribed to Task Watchdog Timer.
I (298) WDT_EXAMPLE: Feeding the watchdog...
I (2308) WDT_EXAMPLE: Feeding the watchdog...
I (4308) WDT_EXAMPLE: Feeding the watchdog...
...

Example 2: Simulating a Stalled Task

Now, let’s see what happens when a task fails to feed the watchdog.

1. Modify the Code

Modify the while(1) loop in the monitored_task function as follows:

C
// ... inside monitored_task function
int counter = 0;
while (1) {
    if (counter < 3) {
        ESP_LOGI(TAG, "Feeding the watchdog... (counter: %d)", counter);
        esp_task_wdt_reset(); // Feed the TWDT
    } else {
        ESP_LOGW(TAG, "Simulating a stalled task... NOT feeding the watchdog.");
        // We just keep delaying without feeding the watchdog
    }

    counter++;
    vTaskDelay(pdMS_TO_TICKS(2000));
}

2. Build, Flash, and Monitor

Flash and monitor the modified application.

Observe the Output:

The task will feed the watchdog a few times, but then it will stop. After 5 seconds of not being fed, the TWDT will trigger a panic. The output will look something like this:

Plaintext
...
I (298) WDT_EXAMPLE: Feeding the watchdog... (counter: 0)
I (2308) WDT_EXAMPLE: Feeding the watchdog... (counter: 1)
I (4308) WDT_EXAMPLE: Feeding the watchdog... (counter: 2)
W (6308) WDT_EXAMPLE: Simulating a stalled task... NOT feeding the watchdog.
W (8308) WDT_EXAMPLE: Simulating a stalled task... NOT feeding the watchdog.
E (11308) esp_task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (11308) esp_task_wdt:  - monitored_task (CPU 0)
E (11308) esp_task_wdt: Tasks currently running:
E (11308) esp_task_wdt: CPU 0: main
E (11308) esp_task_wdt: CPU 1: IDLE
E (11308) esp_task_wdt: Aborting.

abort() was called at PC 0x400e5a6c on core 0

Backtrace: 0x4008985c:0x3ffbba20 0x40089ac9:0x3ffbba40 0x400e5a6c:0x3ffbba60 0x400839dd:0x3ffbba80 0x4008778d:0x3ffbbaa0 0x400d3d3b:0x3ffbcae0 0x400d2346:0x3ffbcba0

ELF file SHA256: ...

Rebooting...

This output is extremely valuable for debugging. It tells you exactly which task (monitored_task) caused the timeout, allowing you to quickly pinpoint the source of the problem in your code. After the panic, the chip reboots, demonstrating the self-recovery mechanism.

Variant Notes

The IWDT and TWDT are available and work consistently across all ESP32 variants, including the ESP32, ESP32-S2, ESP32-S3, ESP32-C3, ESP32-C6, and ESP32-H2. The core API and behavior are identical. The main difference is that single-core variants (like ESP32-S2, -C3) will only have CPU0 to monitor, whereas dual-core variants (ESP32, ESP32-S3) have CPU0 and CPU1. The example code using (1 << 0) | (1 << 1) for the core mask works safely on both single and dual-core chips.

Common Mistakes & Troubleshooting Tips

Mistake / Issue Symptom(s) Troubleshooting / Solution
Feeding in Wrong Place
Calling reset() before work is done.
Watchdog reset occurs despite the feed call being present in the task’s code. Solution: Only call esp_task_wdt_reset() after a complete and successful cycle of the task’s main logic has finished. It signals “the work for this cycle is done”.
Task Blocks Indefinitely
Waiting forever on a queue or semaphore.
System resets, and the panic log points to a task that is blocked (e.g., on xQueueReceive). Solution: Use finite timeouts for all blocking calls (e.g., pdMS_TO_TICKS(1000) instead of portMAX_DELAY). If the call times out, ensure the task can still loop around to feed the watchdog.
Forgetting to Add Task
Calling reset() without calling add() first.
The call to esp_task_wdt_reset() returns an ESP_ERR_NOT_FOUND error. The task is not actually being monitored. Solution: Ensure esp_task_wdt_add() is called once successfully during the task’s setup phase before any feed calls are made.
Incorrect Timeout Value
Timeout is too short for the task’s workload.
Occasional, sporadic watchdog resets, especially when the system is under heavy load or processing a large amount of data. Solution: Analyze the task’s worst-case execution time. Set the TWDT timeout to a safe margin above this, typically at least twice the expected maximum loop duration.

Exercises

  1. Monitor a Second Task: Modify the first example to create a second task. Subscribe both tasks to the TWDT. Make sure both tasks feed the watchdog correctly and run without causing a reset.
  2. Conditional Stall: Modify the second example. Introduce a new variable, simulate_stall, initialized to false. Add a GPIO input to your project. If the GPIO button is pressed, set simulate_stall to true, causing the task to stop feeding the watchdog and trigger a reset. This simulates an external event causing a software failure.
  3. Dynamic Watchdog Management: In the first example, after the monitored_task has fed the watchdog 10 times, make it call esp_task_wdt_delete(NULL) to unsubscribe itself from the TWDT. Log a message confirming that it is no longer being watched and continue the loop. Verify that the system no longer resets even if the task stops calling esp_task_wdt_reset().

Summary

  • Watchdogs are Essential: They provide a critical safety net to automatically recover a system from software freezes and stalls.
  • ESP-IDF has Two Main Watchdogs: The Interrupt Watchdog (IWDT) protects against stalled CPUs/interrupts, while the Task Watchdog (TWDT) protects against stalled application tasks.
  • TWDT requires a 3-Step Process: 1. Initialize the watchdog (esp_task_wdt_init). 2. Subscribe a task (esp_task_wdt_add). 3. Periodically feed from within the task (esp_task_wdt_reset).
  • Feeding is a Signal of Health: A task should only “feed” the watchdog after successfully completing a meaningful unit of work.
  • Panics Provide Clues: A watchdog panic provides invaluable debug information, identifying the exact task that failed, which is crucial for fixing the underlying bug.
  • Implementation is Consistent: The watchdog APIs and behavior are consistent across all modern ESP32 variants.

Further Reading

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top