Chapter 279: Runtime Performance Analysis

Chapter ObjectivesIn

By the end of this chapter, you will be able to:

Understand the principles and importance of runtime performance analysis (profiling).
Configure and use FreeRTOS runtime statistics to measure CPU usage per task.
Implement a high-resolution timer required for generating runtime statistics.
Use ESP-IDF‘s Application Level Tracing (apptrace) feature to log system events.
Analyze trace data to identify performance bottlenecks and measure execution times.
Apply profiling techniques to optimize your embedded applications.

Introduction

In the preceding chapters, we established a robust foundation for code quality. Unit testing validates the correctness of our logic in isolation, while static analysis finds potential bugs before the code is ever run. However, these methods do not tell us how our software behaves in a dynamic, real-world environment. How much CPU time does the Wi-Fi task consume? Is our custom algorithm a performance bottleneck? Is a critical task getting starved of CPU time?

Answering these questions is the domain of Runtime Performance Analysis, also known as profiling. This is the process of measuring and analyzing an application’s behavior while it is running on the target hardware. Profiling is essential for optimizing performance, meeting real-time deadlines, and debugging complex timing-related issues that are invisible to static tools.

In this chapter, we will explore the powerful profiling tools integrated into ESP-IDF, empowering you to move beyond simply writing correct code to writing efficient and performant code.

Theory

What is Runtime Performance Analysis?

Runtime analysis involves collecting data about a program’s execution, such as which functions are being called, how long they take to run, and how CPU resources are distributed among different tasks. It is a form of dynamic analysis that provides a macroscopic view of the system’s health and efficiency.

Imagine a chef managing a busy kitchen. They can have perfect recipes (correct code) and a clean, organized workspace (good architecture), but to find out why orders are slow, they must observe the kitchen in action (runtime analysis). They might discover one cook is overworked (a bottleneck task), or a specific preparation step is taking too long (an inefficient function).

The primary goals of profiling are:

Bottleneck Identification: Finding which tasks or functions are consuming the most CPU time. The 80/20 rule often applies: 80% of the CPU time is spent in 20% of the code. Profiling helps you find that 20%.
Performance Validation: Verifying that the system meets its timing requirements (e.g., a motor control loop must execute every 10 milliseconds).
Optimization: Providing concrete data to guide optimization efforts, ensuring you focus on the parts of the code that will yield the greatest performance gains.

Method 1: FreeRTOS Runtime Statistics

One of the simplest yet most effective profiling tools available is built directly into FreeRTOS. When enabled, the scheduler can track the amount of time each task spends in the “Running” state. This data provides a clear percentage breakdown of CPU utilization per task.

To make this work, FreeRTOS needs two things:

Configuration: You must enable CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS in menuconfig. This compiles the necessary statistics-gathering code into the kernel.
A High-Resolution Timer: The scheduler needs a fast-ticking clock to accurately measure the short time slices that tasks execute for. The standard FreeRTOS tick timer (typically 1ms) is too coarse. We must provide a timer with a frequency at least 10-100 times higher than the system tick rate. On the ESP32, we can easily use one of its built-in hardware timers for this purpose.

pie title CPU Utilization Analysis
    "Task A" : 50
    "Task B" : 30  
    "Task C" : 20

Method 2: Application Level Tracing (apptrace)

While runtime stats give you a great high-level view of task utilization, they don’t tell you what’s happening inside the tasks. For a more granular view, ESP-IDF provides the Application Level Tracing library.

This feature allows you to log events from your firmware to a buffer on the ESP32. A host-side tool can then read this buffer in real-time and save the trace data to a file. The key advantage is its low overhead; writing an event to the buffer is very fast and has minimal impact on the application’s timing.

After collecting the trace, you can process it offline to analyze the sequence of events, measure the time between them, and understand the flow of your program in detail. This is invaluable for debugging complex interactions between tasks, interrupts, and drivers.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans'}}}%%
graph TD
    subgraph "On ESP32 Device"
        A[<b>Start:</b> App Execution] --> B{Event Occurs};
        B --> C["Call <b>esp_trace_write()</b><br>e.g., at function start/end"];
        C --> D[("Trace data written<br>to internal RAM buffer")];
    end

    subgraph "On Host PC"
        E[Run <b>idf.py apptrace</b>] --> F{Connects to ESP32};
        F --> G["Continuously reads<br>trace buffer data"];
        G --> H["Saves data to<br><b>trace.log</b> and <b>trace.elf</b>"];
    end

    subgraph "Offline Analysis"
        I["Stop Tracing (Ctrl+C)"] --> J["Process data:<br><b>idf.py apptrace process trace.log</b>"];
        J --> K{Analysis};
        K --> L["Review human-readable<br>trace output"];
        K --> M["Measure time between events"];
        K --> N["Identify bottlenecks or<br>incorrect logic flow"];
        N --> O[<b>End:</b> Optimization Complete];
    end

    D --> G;

    %% Styling
    classDef startNode fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6;
    classDef processNode fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF;
    classDef decisionNode fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E;
    classDef checkNode fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B;
    classDef endNode fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46;

    class A,E,I startNode;
    class C,G,H,J,L,M,N processNode;
    class B,F,K decisionNode;
    class D checkNode;
    class O endNode;

Feature	FreeRTOS Runtime Statistics	Application Level Tracing (apptrace)
Primary Goal	To measure the percentage of CPU time used by each task.	To log a detailed sequence of system events and measure time between them.
Granularity	High-level (per task). Shows “who” is using the CPU.	Low-level (per event). Shows “what” is happening and “when”.
Overhead	Very low. Requires a high-resolution timer.	Low, but higher than runtime stats. Involves writing to a memory buffer.
Use Case	Identifying CPU-hungry tasks (bottlenecks) and checking overall system load.	Debugging complex interactions, measuring function execution time, and tracing program flow.
Setup	Enable `CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS` and provide a timer function.	Enable `CONFIG_APPTRACE_ENABLE` and use `idf.py apptrace` tools.
Output	A text table showing task names, absolute time spent running, and CPU % usage.	A trace file that can be processed to show a timeline of events and their data.

Practical Example 1: Profiling with FreeRTOS Runtime Statistics

Let’s create an application with a few tasks and use the runtime statistics feature to see how they share the CPU.

Step 1: Enable Runtime Statistics in `menuconfig`

Run idf.py menuconfig.
Navigate to Component config -> FreeRTOS -> Kernel.
Enable ([*]) Enable FreeRTOS trace facility.
Enable ([*]) Enable generation of run time stats.
Save and exit.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans'}}}%%
graph TD
    subgraph "Development Phase"
        A[Start: ESP-IDF Project] --> B{Enable in menuconfig};
        B --> C[<b>CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS</b>];
        C --> D["Implement High-Res Timer<br><b>ulGetRunTimeCounterValue()</b><br><b>vConfigureTimerForRunTimeStats()</b>"];
        D --> E[Add Tasks to Application<br>e.g., heavy_task, light_task];
        E --> F["Create Monitoring Task<br>Calls <b>vTaskGetRunTimeStats()</b> periodically"];
    end

    subgraph "Execution & Analysis Phase"
        G[Build & Flash to ESP32] --> H[Run & Monitor Serial Output];
        H --> I{Analyze Stats Table};
        I --> J["Identify High-Usage Tasks<br>e.g., heavy_task: 93%"];
        J --> K["Identify Available CPU Time<br>e.g., IDLE0: 6%"];
        K --> L[Optimize Code Based on Data];
    end

    %% Styling
    classDef startNode fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6;
    classDef processNode fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF;
    classDef decisionNode fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E;
    classDef checkNode fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B;
    classDef endNode fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46;

    class A,C,D,E,F,G,H,J,K,L processNode;
    class B,I decisionNode;

Step 2: Implement the Main Application

Create a new project and replace the contents of main.c with the following code.

main/main.c

#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_timer.h"

// This function is required for FreeRTOS runtime stats
unsigned long ulGetRunTimeCounterValue(void)
{
    // esp_timer_get_time() returns time in microseconds since boot
    return (unsigned long)esp_timer_get_time();
}

// Configuration function for runtime stats
void vConfigureTimerForRunTimeStats(void)
{
    // No configuration needed for esp_timer, it's already running
    printf("Timer for runtime stats configured.\n");
}

// A task that performs some "heavy" computation (simulated)
void computationally_intensive_task(void *pvParameters)
{
    printf("Computationally intensive task started.\n");
    volatile unsigned long i; // volatile to prevent compiler optimization
    while (1) {
        for (i = 0; i < 2000000; i++);
        vTaskDelay(pdMS_TO_TICKS(10));
    }
}

// A task that performs some light work
void light_work_task(void *pvParameters)
{
    printf("Light work task started.\n");
    while (1) {
        // This task just delays, consuming very little CPU time
        vTaskDelay(pdMS_TO_TICKS(100));
    }
}

// A task to periodically print the runtime stats
void monitoring_task(void *pvParameters)
{
    printf("Monitoring task started.\n");
    const int buffer_size = 2048;
    char *stats_buffer = malloc(buffer_size);

    while (1) {
        vTaskDelay(pdMS_TO_TICKS(5000)); // Print stats every 5 seconds

        if (stats_buffer == NULL) {
            printf("Error: Could not allocate buffer for stats.\n");
            continue;
        }

        printf("\n--- Task Runtime Stats ---\n");
        vTaskGetRunTimeStats(stats_buffer);
        printf("%s\n", stats_buffer);
        printf("--------------------------\n");
    }
}

void app_main(void)
{
    xTaskCreate(&computationally_intensive_task, "heavy_task", 4096, NULL, 5, NULL);
    xTaskCreate(&light_work_task, "light_task", 2048, NULL, 5, NULL);
    xTaskCreate(&monitoring_task, "monitor_task", 4096, NULL, 3, NULL);
}

Warning: You MUST define vConfigureTimerForRunTimeStats and ulGetRunTimeCounterValue in your project when CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS is enabled. Our implementation uses the esp_timer, which is simple and accurate.

Step 3: Build, Flash, and Observe

Build and flash the project: idf.py flash.
Open the serial monitor: idf.py monitor.

After a few seconds, you will see the statistics table printed to the console.

Step 4: Interpret the Output

The output from the monitor will look similar to this:

--- Task Runtime Stats ---
Task Name       Abs Time      % Time
****************************************
heavy_task      4689711       93%
monitor_task    4532          <1%
IDLE0           304388        6%
Tmr Svc         0             <1%
light_task      27            <1%
esp_timer       20129         <1%
main            25            <1%
--------------------------

Task Name: The name assigned to the task at creation.
Abs Time: The total time in microseconds (from our timer) this task has spent in the Running state.
% Time: The percentage of total CPU time this task has consumed since the last stats printout.

As expected, heavy_task is consuming the vast majority of the CPU time (~93%), while light_task and other system tasks use very little. The IDLE0 task runs only when no other application task is ready to run, so its percentage indicates how much free CPU time is available.

Variant Notes

The methods for runtime analysis described are largely variant-independent.

FreeRTOS Runtime Stats: This is a core FreeRTOS feature. The implementation using esp_timer works consistently across all ESP32, ESP32-S2/S3, and ESP32-C/H series chips. The underlying hardware timer implementation is handled by the esp_timer component, abstracting any minor differences away from the user.
Application Level Tracing (apptrace): This feature is also part of the core ESP-IDF framework and is supported on all Xtensa and RISC-V based ESP32 variants. The commands and workflow are identical regardless of the target chip.

Common Mistakes & Troubleshooting Tips

Mistake / Issue	Symptom(s)	Troubleshooting / Solution
Timer Not Configured	Runtime stats from `vTaskGetRunTimeStats` are all 0 or do not update.	1. Define Required Functions: Ensure these two functions exist in your project: `unsigned long ulGetRunTimeCounterValue(void);` `void vConfigureTimerForRunTimeStats(void);` 2. Check Timer Return Value: The `ulGetRunTimeCounterValue` function must return a value that continuously increases at a high frequency (e.g., microseconds).
Stats Buffer Too Small	The output of `vTaskGetRunTimeStats` is cut off, garbled, or incomplete, especially with many active tasks.	Increase Buffer Size: In your monitoring task, increase the size of the character array passed to the function. Example: `char stats_buffer[2048];`
Mismatched ELF File	The `idf.py apptrace process` command fails with an error about an invalid ELF file or address not found.	Use the Correct ELF File: The `.elf` file used for processing must be the exact one that was flashed to the device when the trace was captured. Save the `.elf` file alongside its corresponding trace file.
Apptrace Buffer Overflow	The serial monitor shows repeated messages like `ESP_APPTRACE: rb overflow`.	1. Reduce Trace Frequency: Temporarily comment out some `esp_trace_write()` calls in high-frequency loops. 2. Increase Buffer Size: In menuconfig, navigate to `Component config -> Application Level Tracing` and increase the “Trace Buffer Size”.

Exercises

Profile a GPIO Interrupt: Write an application that toggles a GPIO in a task. Attach an interrupt to a different GPIO pin (connect the two pins with a wire). In the ISR, give a semaphore. Make a second task wait on this semaphore. Use runtime stats to measure the CPU load of the ISR-handling task.
Find a Bottleneck: Write a function that sorts an array using an inefficient algorithm like bubble sort. Create a task that repeatedly generates a random array and sorts it. Use the profiler to measure its CPU usage.
Optimize and Measure: Replace the bubble sort from the previous exercise with a more efficient algorithm (e.g., qsort from the standard library). Use the runtime profiler again to measure the new CPU usage and quantify the performance improvement.
Trace an Event with apptrace: Using the application from Exercise 2, enable apptrace. Add esp_trace_write calls at the beginning and end of your sort function. Use the idf.py apptrace tools to capture a trace and measure the exact time in microseconds your sort function takes to execute.

Summary

Runtime Performance Analysis (Profiling) is the act of measuring a program’s behavior while it runs on target hardware.
It is essential for finding bottlenecks, validating performance, and guiding optimization efforts.
FreeRTOS Runtime Statistics provide a high-level view of CPU utilization per task and require a high-resolution timer to function.
The vTaskGetRunTimeStats() function is used to retrieve and print the statistics.
Application Level Tracing (apptrace) offers a more granular, event-based view of your application’s execution flow.
Profiling tools and techniques are generally hardware-agnostic and work consistently across the ESP32 family of SoCs.

Chapter 279: Runtime Performance Analysis

Chapter ObjectivesIn

Introduction

Theory

What is Runtime Performance Analysis?

Method 1: FreeRTOS Runtime Statistics

Method 2: Application Level Tracing (apptrace)

Practical Example 1: Profiling with FreeRTOS Runtime Statistics

Step 1: Enable Runtime Statistics in `menuconfig`

Step 2: Implement the Main Application

Step 3: Build, Flash, and Observe

Step 4: Interpret the Output

Variant Notes

Common Mistakes & Troubleshooting Tips

Exercises

Summary

Further Reading

Leave a Comment Cancel Reply

Chapter 279: Runtime Performance Analysis

Chapter ObjectivesIn

Introduction

Theory

What is Runtime Performance Analysis?

Method 1: FreeRTOS Runtime Statistics

Method 2: Application Level Tracing (apptrace)

Practical Example 1: Profiling with FreeRTOS Runtime Statistics

Step 1: Enable Runtime Statistics in menuconfig

Step 2: Implement the Main Application

Step 3: Build, Flash, and Observe

Step 4: Interpret the Output

Variant Notes

Common Mistakes & Troubleshooting Tips

Exercises

Summary

Further Reading

Related Posts

Leave a Comment Cancel Reply

Step 1: Enable Runtime Statistics in `menuconfig`