Chapter 134: SPI DMA Transfers with ESP32

Chapter Objectives

After completing this chapter, you will be able to:

  • Understand the concept of Direct Memory Access (DMA) and its advantages for SPI communication.
  • Explain how DMA is utilized by ESP32’s SPI peripherals to offload the CPU.
  • Configure SPI buses in ESP-IDF to use DMA for data transfers.
  • Recognize the importance of buffer memory type and alignment for DMA operations.
  • Differentiate between DMA-driven and CPU-polled SPI transactions.
  • Implement high-throughput SPI communication using DMA.
  • Troubleshoot common issues related to SPI DMA transfers.
  • Appreciate the performance gains achieved by using DMA for SPI.

Introduction

In the preceding chapters, we’ve established how to use the Serial Peripheral Interface (SPI) for communicating with single and multiple devices. While effective, the methods discussed so far, particularly when relying heavily on CPU intervention for each byte transferred (polling), can become a bottleneck for high-speed or large-volume data exchanges. As data rates increase or the amount of data grows, the CPU might spend a significant portion of its time merely shuffling bytes to and from the SPI peripheral, leaving less capacity for other critical application tasks.

This is where Direct Memory Access (DMA) becomes invaluable. DMA allows peripherals, like the SPI controller, to transfer data directly to or from memory without constant CPU oversight. By offloading these data transfer tasks to the DMA controller, the CPU is freed to perform other computations or manage other peripherals, leading to more efficient and higher-performance embedded systems. This chapter will explore how to leverage DMA for SPI communication on ESP32 devices using the ESP-IDF, enabling faster and more efficient data handling.

Theory

What is DMA (Direct Memory Access)?

Direct Memory Access (DMA) is a feature of modern microcontrollers and computer systems that allows certain hardware subsystems (peripherals) to access main system memory (RAM) to read and/or write data independently of the Central Processing Unit (CPU).

Analogy: Imagine a busy office manager (the CPU) who needs to send and receive many packages (data) via a mailroom (the SPI peripheral).

  • Without DMA (CPU-polled/interrupt-driven): The manager has to personally carry each package to the mailroom, wait for outgoing packages to be sent, and personally pick up each incoming package. This takes up a lot of the manager’s time.
  • With DMA: The manager can instruct a dedicated courier service (the DMA controller) to handle the package transfers. The manager tells the courier where the packages are in storage (memory location) and how many there are. The courier then moves the packages between storage and the mailroom directly. The manager is free to do other important work and is only notified when the entire batch of packages has been sent or received.

Benefits of using DMA:

  1. CPU Offload: The CPU initiates the DMA transfer and can then perform other tasks while the DMA controller handles the data movement. This significantly reduces CPU load.
  2. Increased Throughput: DMA transfers are typically faster than CPU-mediated transfers for larger data blocks because the dedicated DMA hardware is optimized for this task.
  3. Lower Power Consumption: Since the CPU can be idle or in a lower power state during DMA operations, overall system power consumption can be reduced.
  4. Deterministic Transfers: DMA can provide more predictable data transfer times, as it’s less susceptible to CPU workload variations.

DMA in ESP32 SPI Peripherals

ESP32 family microcontrollers are equipped with general-purpose DMA (GDMA) controllers that can be linked with various peripherals, including the SPI controllers. When SPI DMA is enabled:

  • For Transmission (MOSI): The CPU prepares a data buffer in memory. It then configures the SPI peripheral and its associated DMA channel to transfer a specified amount of data from this buffer to the SPI peripheral’s transmit FIFO (First-In, First-Out buffer). The DMA controller reads data from memory and writes it to the SPI TX FIFO as space becomes available. The SPI peripheral then clocks this data out onto the MOSI line.
  • For Reception (MISO): The CPU allocates a buffer in memory to store incoming data. It configures the SPI peripheral and DMA channel to transfer data from the SPI receive FIFO to this memory buffer. As data arrives on the MISO line and fills the SPI RX FIFO, the DMA controller reads it and writes it into the designated memory buffer.
  1. CPU configuring DMA and SPI.
  2. DMA reading from Memory to SPI TX FIFO for transmission.
  3. DMA writing from SPI RX FIFO to Memory for reception.]

The ESP-IDF spi_master driver abstracts the low-level details of DMA channel assignment and configuration, making it relatively straightforward to use.

ESP-IDF Configuration for SPI DMA

The primary way to enable DMA for an SPI bus in ESP-IDF is during the bus initialization phase using spi_bus_initialize(). The key parameter in the spi_bus_config_t structure is dma_chan:

C
esp_err_t spi_bus_initialize(spi_host_device_t host_id,
                             const spi_bus_config_t *bus_config,
                             spi_dma_chan_t dma_chan);
  • dma_chan parameter: This specifies which DMA channel should be allocated for the SPI bus.
    • SPI_DMA_CH_AUTO: This is the highly recommended option. The driver will automatically find and allocate a free and suitable DMA channel for the specified SPI host. This avoids manual channel management and potential conflicts. If SPI_DMA_CH_AUTO is used and no DMA channel is available, the function will return an error.
    • SPI_DMA_DISABLED (or 0): This explicitly disables the use of DMA for this SPI bus. All transactions will be CPU-driven (data copied by CPU to/from SPI FIFOs).
    • Specific channel numbers (e.g., 12, or an enum like SPI_DMA_CH1 if defined for the target): This allows for manual assignment of a DMA channel. This is an advanced option and generally not recommended unless you have a deep understanding of the ESP32’s DMA controller and are sure the chosen channel is available and appropriate for the SPI host. Incorrect manual assignment can lead to errors or conflicts.

max_transfer_sz in spi_bus_config_t:

This field in spi_bus_config_t specifies the maximum size, in bytes, of a single data transfer that the bus should be prepared to handle, particularly for DMA.

C
typedef struct {
    // ... other fields
    int max_transfer_sz;          ///< Maximum transfer size, in bytes. Defaults to 4094 if 0.
    // ... other fields
} spi_bus_config_t;

Field in spi_bus_config_t Parameter for spi_bus_initialize() Description & Relevance to DMA Common Values / Notes
N/A (Direct Parameter) dma_chan Specifies the DMA channel to be used by the SPI bus. This is the primary setting to enable/disable DMA.
  • SPI_DMA_CH_AUTO: Recommended. Driver automatically allocates a suitable DMA channel. Fails if no channel is available.
  • SPI_DMA_DISABLED (or 0): Explicitly disables DMA. Transactions are CPU-polled.
  • Specific Channel (e.g., 1, 2): Advanced, manual assignment. Not generally recommended due to risk of conflicts.
max_transfer_sz Part of bus_config struct Maximum transfer size in bytes that the bus (and its DMA configuration) should be prepared to handle in a single low-level operation. Affects internal buffer allocation for DMA.
  • If 0, defaults to 4094 bytes.
  • Should be large enough for your typical largest DMA transaction segment.
  • Setting it excessively large might waste DMA-capable memory if not needed.
  • The driver can segment larger application transactions if they exceed this or hardware limits.
mosi_io_num, miso_io_num, sclk_io_num, etc. Part of bus_config struct Standard SPI pin configurations. While not directly DMA settings, they define the bus that DMA will operate on. Must be correctly set for any SPI communication, regardless of DMA usage.

If you plan to send or receive large chunks of data in a single transaction using DMA, max_transfer_sz should be set to accommodate this. If a transaction larger than this size (or a hardware limit) is attempted, the driver might need to break it into smaller segments. The default value (4094 if max_transfer_sz is set to 0) is often sufficient for many applications. For DMA transfers, internal buffers might be allocated based on this size, so setting it excessively large without need could consume more memory.

Data Buffers for DMA

When using DMA, the memory buffers for transmission (tx_buffer) and reception (rx_buffer) in spi_transaction_t have specific requirements:

  • Memory Location and Capability:
    • DMA controllers can typically only access certain memory regions directly. On ESP32 devices, internal SRAM (Data RAM) is generally DMA-accessible.
    • To ensure a buffer is suitable for DMA, it’s best practice to allocate it using heap_caps_malloc() with the MALLOC_CAP_DMA flag. This guarantees the memory is placed in a DMA-capable region.
C
// Example: Allocate a DMA-capable buffer
uint8_t *my_tx_buffer = heap_caps_malloc(BUFFER_SIZE, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
if (my_tx_buffer == NULL) {
    // Handle allocation failure
}
// ... use buffer ...
heap_caps_free(my_tx_buffer); // Don't forget to free it
  • PSRAM: For ESP32 variants with PSRAM, direct DMA access to PSRAM by SPI peripherals can be limited or require specific configurations. If MALLOC_CAP_DMA is used and internal SRAM is scarce, the allocator might still return internal SRAM. If you specifically allocate in PSRAM and need DMA, the driver might internally use “bounce buffers” (temporary buffers in internal SRAM) to facilitate the DMA transfer, which adds overhead. ESP32-S3 has more capable DMA that can often access PSRAM directly. Always check the documentation for your specific ESP32 variant regarding PSRAM and DMA.
  • Buffer Alignment:
    • DMA transfers are often more efficient if data buffers are aligned to certain memory boundaries (e.g., 4-byte or 32-byte alignment). The MALLOC_CAP_DMA flag usually ensures sufficient alignment. If you are using statically allocated arrays or custom memory pools, you might need to manually ensure alignment (e.g., using __attribute__((aligned(4)))). The ESP-IDF SPI driver often handles minor misalignments with internal buffering, but optimal performance comes from properly aligned buffers.
  • Buffer Lifetime:
    • The data buffers provided for a DMA transaction must remain valid and unchanged for the entire duration of the DMA operation. This is especially critical for asynchronous (queued) transactions. If a buffer is allocated on the stack of a function, that function must not return (and thus deallocate the stack frame) until the DMA transaction using that buffer is fully completed. If the buffer is dynamically allocated, it must not be freed prematurely.
Consideration Requirement / Best Practice Reasoning & Impact
Memory Location & Capability Buffers (tx_buffer, rx_buffer) must be in DMA-accessible memory.
  • Allocate using heap_caps_malloc(size, MALLOC_CAP_DMA | MALLOC_CAP_8BIT).
DMA controllers can only access specific memory regions (typically internal SRAM). MALLOC_CAP_DMA ensures this. Using non-DMA memory can lead to transfer failures or data corruption. PSRAM DMA access varies by ESP32 variant; driver may use bounce buffers if direct access isn’t possible.
Buffer Alignment DMA transfers are more efficient with aligned buffers (e.g., 4-byte).
  • MALLOC_CAP_DMA usually ensures sufficient alignment.
  • For static arrays, use __attribute__((aligned(X))) if needed.
Aligned access can speed up DMA operations and prevent hardware exceptions on some architectures. Misalignment might incur performance penalties or require driver to use intermediate aligned buffers.
Buffer Lifetime & Validity Buffers must remain valid and unchanged for the entire duration of the DMA transaction.
  • Especially critical for asynchronous (queued) transfers.
If a buffer is deallocated (e.g., stack variable goes out of scope, heap memory freed) or modified while DMA is active, it can lead to:
  • Data corruption.
  • System crashes (DMA accessing invalid memory).
  • Unpredictable behavior.
Buffer Content (Transmit) The tx_buffer must contain the complete data to be sent before initiating the DMA transfer. DMA reads directly from this buffer. Any changes made after starting the transfer might not be reflected or could corrupt the ongoing transmission.
Buffer Size (Receive) The rx_buffer must be large enough to hold all expected incoming data. DMA writes directly into this buffer. Insufficient size will lead to buffer overflows, data corruption, and potential system instability.

Transaction Types and DMA

%%{ init: { 'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans' } } }%%
sequenceDiagram
    participant App as Application (CPU)
    participant IDF as ESP-IDF SPI Driver
    participant DMA as DMA Controller
    participant SPI_HW as SPI Hardware

    App->>IDF: 1. Prepare `spi_transaction_t` <br>(tx/rx buffers)
    App->>IDF: 2. `spi_device_queue_trans<br>(handle, &t, ...)`
    IDF-->>App: 3. Returns <br>(e.g., ESP_OK if queued)
    Note over App: CPU is now free for other tasks
    App->>App: 4. Perform other application logic...
    
    activate IDF
    IDF->>DMA: 5. Configure DMA for transfer<br> (source, dest, size)
    IDF->>SPI_HW: 6. Configure SPI HW<br> (mode, speed from handle)
    deactivate IDF
    
    activate DMA
    Note over DMA, SPI_HW: DMA & <br>SPI HW work in parallel
    DMA->>SPI_HW: 7. Data from Memory to<br> SPI TX FIFO (for Tx)
    SPI_HW-->>SPI_HW: 8. SPI clocks data out<br> (MOSI) / in (MISO)
    SPI_HW->>DMA: 9. Data from SPI RX FIFO<br> to Memory (for Rx)
    deactivate DMA
    
    activate DMA #DarkSlateGray
    DMA->>IDF: 10. DMA Transfer Complete <br>(Interrupt)
    deactivate DMA
    
    activate IDF
    IDF->>IDF: 11. Mark transaction as complete
    deactivate IDF

    Note over App: Later...
    App->>IDF: 12. `spi_device_get_trans_result<br>(handle, &t_result, ...)`
    activate IDF
    alt Transaction Completed
        IDF-->>App: 13. Returns `t_result` (ESP_OK)
        App->>App: 14. Process received data<br> from `t_result->rx_buffer`
    else Transaction Pending (or Timeout)
        IDF-->>App: 13. Returns <br>(e.g., ESP_ERR_TIMEOUT or still pending)
    end
    deactivate IDF
  • spi_device_polling_transmit(spi_device_handle_t handle, spi_transaction_t *trans_desc):
    • This function performs SPI transactions by polling the SPI hardware registers.
    • It does not use DMA, regardless of whether DMA is enabled on the bus.
    • Suitable for very short transactions where the overhead of setting up a DMA transfer might outweigh its benefits.
    • It’s a blocking call; the CPU is busy managing the transfer.
  • spi_device_transmit(spi_device_handle_t handle, spi_transaction_t *trans_desc):
    • This function is a convenient wrapper around spi_device_queue_trans() and spi_device_get_trans_result().
    • It will use DMA if DMA is enabled on the bus (i.e., dma_chan was set to SPI_DMA_CH_AUTO or a specific channel during spi_bus_initialize()).
    • It behaves as a blocking call from the application’s perspective, but DMA handles the actual data movement in the background.
  • spi_device_queue_trans(spi_device_handle_t handle, spi_transaction_t *trans_desc, TickType_t ticks_to_wait):
    • This function queues an SPI transaction for execution.
    • If DMA is enabled, the transaction will be performed using DMA.
    • This is non-blocking (or can block for a specified timeout if the queue is full). The CPU can continue with other tasks after queuing the transaction.
  • spi_device_get_trans_result(spi_device_handle_t handle, spi_transaction_t **trans_desc, TickType_t ticks_to_wait):
    • This function is used to retrieve the result of a completed transaction that was previously queued using spi_device_queue_trans().
    • This is where the application synchronizes with the completion of the DMA transfer.
ESP-IDF SPI Function Uses DMA? Behavior Typical Use Case
spi_device_polling_transmit() No CPU-driven, polls hardware registers. Blocking. Very short transactions where DMA setup overhead is undesirable. Simple, infrequent transfers.
spi_device_transmit() Yes (if DMA enabled on bus) Effectively blocking from application view. Internally queues transaction and waits for completion. Uses DMA for actual data movement if bus is DMA-enabled. Convenient for DMA-backed blocking transfers. Good for most general-purpose DMA transfers where simplicity is desired.
spi_device_queue_trans() Yes (if DMA enabled on bus) Non-blocking (or timed block if queue full). Queues transaction for DMA execution. CPU is freed after queuing. High-performance applications requiring CPU offload. Allows CPU to perform other tasks while SPI DMA transfer occurs in parallel. Used with spi_device_get_trans_result().
spi_device_get_trans_result() N/A (manages DMA completion) Blocking. Waits for a previously queued (DMA) transaction to complete and retrieves its result. Used in conjunction with spi_device_queue_trans() to synchronize and get the outcome of asynchronous DMA operations.

For leveraging the full power of DMA (especially CPU offload), the asynchronous pattern with spi_device_queue_trans and spi_device_get_trans_result is the most effective.

Practical Examples

Example 1: SPI Loopback with DMA Enabled

This example demonstrates a basic SPI loopback test (MOSI connected to MISO) with DMA enabled on the SPI bus.

Hardware Setup:

  • Connect the MOSI pin to the MISO pin on your ESP32 board.
  • (Refer to Chapter 132/133 for typical pin numbers like MOSI=23, MISO=19, SCLK=18, CS=5 on an ESP32 DevKitC).

Project Setup:

  1. Create/open an ESP-IDF project.
  2. Ensure spi_masterdriveresp_log, and heap are in REQUIRES in main/CMakeLists.txt.idf_component_register(SRCS "main.c" INCLUDE_DIRS "." REQUIRES spi_master driver esp_log heap)

Code (main/main.c):

C
#include <stdio.h>
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/spi_master.h"
#include "driver/gpio.h"
#include "esp_log.h"
#include "esp_heap_caps.h" // For heap_caps_malloc

static const char *TAG = "SPI_DMA_EXAMPLE";

#define SPI_HOST_ID      SPI2_HOST
#define PIN_NUM_MOSI     23
#define PIN_NUM_MISO     19 // Connect to MOSI for loopback
#define PIN_NUM_SCLK     18
#define PIN_NUM_CS       5

#define BUFFER_SIZE_BYTES 64

void app_main(void)
{
    esp_err_t ret;
    spi_device_handle_t spi_device;

    ESP_LOGI(TAG, "Initializing SPI bus with DMA...");

    spi_bus_config_t buscfg = {
        .mosi_io_num = PIN_NUM_MOSI,
        .miso_io_num = PIN_NUM_MISO,
        .sclk_io_num = PIN_NUM_SCLK,
        .quadwp_io_num = -1,
        .quadhd_io_num = -1,
        .max_transfer_sz = BUFFER_SIZE_BYTES + 4 // A bit larger than buffer for safety
    };

    // Initialize the SPI bus with DMA enabled (SPI_DMA_CH_AUTO)
    // The ESP-IDF driver will automatically select an available DMA channel.
    ret = spi_bus_initialize(SPI_HOST_ID, &buscfg, SPI_DMA_CH_AUTO);
    ESP_ERROR_CHECK(ret);
    ESP_LOGI(TAG, "SPI bus initialized.");

    spi_device_interface_config_t devcfg = {
        .clock_speed_hz = 10 * 1000 * 1000, // 10 MHz
        .mode = 0,
        .spics_io_num = PIN_NUM_CS,
        .queue_size = 1 // Only one transaction in flight for this simple example
    };

    ret = spi_bus_add_device(SPI_HOST_ID, &devcfg, &spi_device);
    ESP_ERROR_CHECK(ret);
    ESP_LOGI(TAG, "SPI device added.");

    // Allocate DMA-capable memory for transmit and receive buffers
    uint8_t *tx_buffer = heap_caps_malloc(BUFFER_SIZE_BYTES, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
    uint8_t *rx_buffer = heap_caps_malloc(BUFFER_SIZE_BYTES, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);

    if (tx_buffer == NULL || rx_buffer == NULL) {
        ESP_LOGE(TAG, "Failed to allocate DMA buffers!");
        if(tx_buffer) heap_caps_free(tx_buffer);
        if(rx_buffer) heap_caps_free(rx_buffer);
        return;
    }
    ESP_LOGI(TAG, "DMA buffers allocated.");

    // Fill transmit buffer with some data
    for (int i = 0; i < BUFFER_SIZE_BYTES; i++) {
        tx_buffer[i] = i % 256;
    }
    memset(rx_buffer, 0xAA, BUFFER_SIZE_BYTES); // Pre-fill rx_buffer to see changes

    spi_transaction_t t;
    memset(&t, 0, sizeof(t));
    t.length = BUFFER_SIZE_BYTES * 8; // Length in bits
    t.tx_buffer = tx_buffer;
    t.rx_buffer = rx_buffer;

    ESP_LOGI(TAG, "Performing SPI transaction using spi_device_transmit (uses DMA if enabled)...");
    // spi_device_transmit will use DMA because the bus was initialized with DMA.
    ret = spi_device_transmit(spi_device, &t);
    ESP_ERROR_CHECK(ret);

    ESP_LOGI(TAG, "Transaction complete.");
    ESP_LOG_BUFFER_HEXDUMP(TAG, tx_buffer, 16, ESP_LOG_INFO); // Log first 16 bytes sent
    ESP_LOG_BUFFER_HEXDUMP(TAG, rx_buffer, 16, ESP_LOG_INFO); // Log first 16 bytes received

    // Verify loopback data
    if (memcmp(tx_buffer, rx_buffer, BUFFER_SIZE_BYTES) == 0) {
        ESP_LOGI(TAG, "Loopback successful! Sent and received data match.");
    } else {
        ESP_LOGE(TAG, "Loopback failed. Data mismatch.");
    }

    // Free DMA buffers
    heap_caps_free(tx_buffer);
    heap_caps_free(rx_buffer);
    ESP_LOGI(TAG, "DMA buffers freed.");

    // Optional: remove device and free bus
    // spi_bus_remove_device(spi_device);
    // spi_bus_free(SPI_HOST_ID);

    ESP_LOGI(TAG, "SPI DMA example finished.");
}

Build, Flash, and Observe:

  1. Build the project (Ctrl+E B).
  2. Flash it to your ESP32 (Ctrl+E F).
  3. Open the serial monitor (Ctrl+E M).You should see logs indicating DMA buffer allocation, the transaction, and whether the loopback was successful. The key here is that spi_device_transmit utilized DMA because the bus was initialized with SPI_DMA_CH_AUTO.

Example 2: Comparing DMA vs. CPU-Polled (Conceptual Timing)

This example outlines how you might compare the performance. For accurate timing, you’d use esp_timer_get_time().

C
#include <stdio.h>
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/spi_master.h"
#include "driver/gpio.h"
#include "esp_log.h"
#include "esp_heap_caps.h"
#include "esp_timer.h" // For timing

static const char *TAG = "SPI_DMA_PERF_TEST";

#define SPI_HOST_ID      SPI2_HOST
#define PIN_NUM_MOSI     23
#define PIN_NUM_MISO     19 // Connect to MOSI for loopback
#define PIN_NUM_SCLK     18
#define PIN_NUM_CS       5

#define LARGE_BUFFER_SIZE_BYTES (4 * 1024) // 4KB
#define NUM_ITERATIONS 100

// Function to perform SPI transfer and measure time
void perform_spi_test(const char* test_name, spi_dma_chan_t dma_setting) {
    esp_err_t ret;
    spi_device_handle_t spi_device_handle; // Renamed to avoid conflict

    ESP_LOGI(TAG, "Starting test: %s", test_name);

    spi_bus_config_t buscfg = {
        .mosi_io_num = PIN_NUM_MOSI,
        .miso_io_num = PIN_NUM_MISO,
        .sclk_io_num = PIN_NUM_SCLK,
        .quadwp_io_num = -1,
        .quadhd_io_num = -1,
        .max_transfer_sz = LARGE_BUFFER_SIZE_BYTES + 16
    };

    ret = spi_bus_initialize(SPI_HOST_ID, &buscfg, dma_setting);
    if (ret != ESP_OK) {
        ESP_LOGE(TAG, "Failed to initialize SPI bus for %s: %s", test_name, esp_err_to_name(ret));
        // Attempt to free bus if it was partially initialized by a previous failed test run
        spi_bus_free(SPI_HOST_ID); 
        // Re-attempt initialization (could be problematic if bus is stuck, but for test)
        ret = spi_bus_initialize(SPI_HOST_ID, &buscfg, dma_setting);
        ESP_ERROR_CHECK(ret); // If it fails again, it will abort
    }

    spi_device_interface_config_t devcfg = {
        .clock_speed_hz = 20 * 1000 * 1000, // 20 MHz
        .mode = 0,
        .spics_io_num = PIN_NUM_CS,
        .queue_size = 1
    };
    ret = spi_bus_add_device(SPI_HOST_ID, &devcfg, &spi_device_handle);
    ESP_ERROR_CHECK(ret);

    uint8_t *tx_buffer = heap_caps_malloc(LARGE_BUFFER_SIZE_BYTES, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
    uint8_t *rx_buffer = heap_caps_malloc(LARGE_BUFFER_SIZE_BYTES, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
    if (!tx_buffer || !rx_buffer) {
        ESP_LOGE(TAG, "Failed to allocate DMA buffers for %s", test_name);
        goto cleanup;
    }
    for(int i=0; i<LARGE_BUFFER_SIZE_BYTES; i++) tx_buffer[i] = i;


    spi_transaction_t t;
    memset(&t, 0, sizeof(t));
    t.length = LARGE_BUFFER_SIZE_BYTES * 8;
    t.tx_buffer = tx_buffer;
    t.rx_buffer = rx_buffer; // For loopback

    int64_t start_time, end_time;
    uint64_t total_duration = 0;

    ESP_LOGI(TAG, "Running %d iterations for %s...", NUM_ITERATIONS, test_name);
    for (int i = 0; i < NUM_ITERATIONS; i++) {
        start_time = esp_timer_get_time();
        if (dma_setting == SPI_DMA_DISABLED) {
            // For non-DMA, spi_device_polling_transmit is the CPU-bound equivalent
            ret = spi_device_polling_transmit(spi_device_handle, &t);
        } else {
            // For DMA, spi_device_transmit uses DMA
            ret = spi_device_transmit(spi_device_handle, &t);
        }
        ESP_ERROR_CHECK(ret);
        end_time = esp_timer_get_time();
        total_duration += (end_time - start_time);
    }

    ESP_LOGI(TAG, "%s: Transferred %d KB data %d times.", test_name, LARGE_BUFFER_SIZE_BYTES / 1024, NUM_ITERATIONS);
    ESP_LOGI(TAG, "%s: Average time per transaction: %.2f us", test_name, (float)total_duration / NUM_ITERATIONS);
    ESP_LOGI(TAG, "%s: Total time: %.2f ms", test_name, (float)total_duration / 1000.0);


cleanup:
    if(tx_buffer) heap_caps_free(tx_buffer);
    if(rx_buffer) heap_caps_free(rx_buffer);
    spi_bus_remove_device(spi_device_handle); // Remove device
    spi_bus_free(SPI_HOST_ID);                // Free bus
    ESP_LOGI(TAG, "Finished test: %s, bus freed.", test_name);
    vTaskDelay(pdMS_TO_TICKS(500)); // Delay to allow logs to print and bus to settle if needed
}

void app_main(void)
{
    // Test with DMA enabled
    perform_spi_test("DMA Enabled Test", SPI_DMA_CH_AUTO);

    // Test with DMA disabled (CPU Polling)
    perform_spi_test("DMA Disabled (CPU Polled) Test", SPI_DMA_DISABLED);

    ESP_LOGI(TAG, "All SPI performance tests finished.");
}

Observe:

When you run this, you should observe that the “DMA Enabled Test” (using spi_device_transmit) is significantly faster for transferring the large buffer multiple times compared to the “DMA Disabled (CPU Polled) Test” (using spi_device_polling_transmit). This demonstrates the efficiency of DMA.

Caution: Re-initializing the SPI bus (spi_bus_initialize) repeatedly as done in this test function is generally not standard practice in a final application. An application would typically initialize the bus once. This structure is for isolated testing. Ensure spi_bus_free() is called correctly to allow re-initialization.

Variant Notes

The general principles of SPI DMA apply across ESP32 variants, but there are nuances:

  • DMA Controller and Channels:
    • ESP32: Has two DMA controllers. SPI2 (HSPI) and SPI3 (VSPI) can be connected to DMA channels. SPI_DMA_CH_AUTO typically assigns channel 1 or 2.
    • ESP32-S2: Features a GDMA controller. SPI2 and SPI3 can use DMA.
    • ESP32-S3: Features a GDMA controller with more flexibility. SPI2 and SPI3 can use DMA.
    • ESP32-C3 / C6 / H2 (RISC-V based): Feature GDMA controllers. The general-purpose SPI controller (SPI2, often named FSPI or GPSPI) supports DMA.
    • The SPI_DMA_CH_AUTO setting in spi_bus_initialize is the best way to ensure correct DMA channel allocation compatible with the specific variant and current IDF version.
  • PSRAM and DMA Accessibility:
    • ESP32 (original): DMA access to external PSRAM by SPI peripherals is generally not direct. Bounce buffers in internal SRAM are typically used by the driver if PSRAM buffers are provided for DMA, incurring some overhead.
    • ESP32-S2: Similar limitations to original ESP32 regarding direct PSRAM DMA by SPI.
    • ESP32-S3: The GDMA on ESP32-S3 is more advanced and can directly access external PSRAM. This makes using PSRAM for large SPI DMA buffers more efficient.
    • ESP32-C6 / H2 (with PSRAM support): Check the specific variant’s TRM and ESP-IDF documentation. Generally, newer chips tend to have better DMA capabilities with PSRAM.
    • Recommendation: Always use heap_caps_malloc(size, MALLOC_CAP_DMA | ...) for buffers intended for SPI DMA. This ensures the buffer is placed in memory that the SPI DMA can access (preferring internal SRAM if suitable, or PSRAM on S3 if configured and appropriate).
  • Maximum DMA Transfer Size: While max_transfer_sz is a software configuration, the underlying hardware DMA controllers might have their own limits per DMA descriptor or block. The ESP-IDF driver manages these details, potentially splitting larger application requests into multiple DMA operations.

Tip: Always refer to the latest ESP-IDF documentation and the Technical Reference Manual (TRM) for your specific ESP32 variant for the most accurate details on DMA capabilities and configurations.

Common Mistakes & Troubleshooting Tips

Mistake / Issue Symptom(s) Troubleshooting / Solution
Using Non-DMA-Capable Buffers
  • Transaction fails with an error (e.g., ESP_ERR_INVALID_ARG).
  • Data corruption or garbage data received/transmitted.
  • System crash or Guru Meditation Error.
  • Works with spi_device_polling_transmit() but fails with DMA-enabled spi_device_transmit().
  • Allocate with MALLOC_CAP_DMA: Always allocate tx_buffer and rx_buffer using heap_caps_malloc(size, MALLOC_CAP_DMA | MALLOC_CAP_8BIT).
  • Check Buffer Origin: Ensure buffers are not in flash-mapped memory (e.g., string literals used directly) or PSRAM on older variants without proper driver handling (bounce buffers).
  • Verify Const Data: For transmit-only, const data must also reside in DMA-capable memory if pointed to by tx_buffer.
Buffer Deallocated or Modified Prematurely
  • Data corruption (DMA reads old/invalid data or writes to deallocated memory).
  • System crash, often at a later, unrelated time.
  • Works with blocking calls but fails with asynchronous spi_device_queue_trans().
  • Ensure Buffer Lifetime: For asynchronous transfers (spi_device_queue_trans()), the buffer must remain valid until spi_device_get_trans_result() confirms completion.
  • Stack vs. Heap: If using stack-allocated buffers, the function owning the stack must not return before the transaction completes. Heap-allocated buffers must not be free()‘d prematurely.
  • Data Integrity: Do not modify tx_buffer content after queuing the transaction until it’s done.
Expecting DMA with spi_device_polling_transmit()
  • CPU usage is unexpectedly high during SPI transfers.
  • Performance is not as good as expected for DMA.
  • Understand Function Roles: spi_device_polling_transmit() never uses DMA, it’s CPU-driven.
  • Use Correct Function for DMA: For DMA-backed transfers:
    • Use spi_device_transmit() for blocking DMA.
    • Use spi_device_queue_trans() / spi_device_get_trans_result() for non-blocking DMA.
DMA Channel Misconfiguration / Exhaustion
  • spi_bus_initialize() fails, possibly returning ESP_ERR_NOT_FOUND or ESP_ERR_NO_MEM if SPI_DMA_CH_AUTO is used and no channels are free.
  • Conflicts if manually assigning a DMA channel already in use.
  • Prefer SPI_DMA_CH_AUTO: This is the safest way to allocate DMA channels.
  • Check Return Codes: Always check the return value of spi_bus_initialize().
  • Resource Management: Be aware of how many SPI buses (and other peripherals) are using DMA. ESP32 variants have a limited number of DMA channels.
  • Free Bus Correctly: Ensure spi_bus_free() is called to release DMA channels when an SPI bus is no longer needed, allowing them to be reused.
Incorrect max_transfer_sz
  • May not directly cause errors for single transactions (driver segments), but can impact performance if too small for large, frequent DMA transfers.
  • Potentially inefficient use of DMA descriptors if set very small.
  • If set extremely large without need, might consume more memory for internal DMA buffers than necessary.
  • Set Reasonably: Set max_transfer_sz in spi_bus_config_t to a value appropriate for the largest typical segment of data you expect to transfer via DMA in one go. Default (4094 if 0) is often fine.
  • Understand Segmentation: The driver handles segmentation if transaction length exceeds max_transfer_sz or hardware limits.
Forgetting to Free DMA Buffers
  • Memory leaks over time.
  • Eventual failure to allocate memory (heap_caps_malloc returns NULL).
  • Balance Allocations: Every heap_caps_malloc for a DMA buffer should have a corresponding heap_caps_free when the buffer is no longer needed (and after any DMA transaction using it has fully completed).
  • Track Buffer Lifecycles: Carefully manage the lifecycle of dynamically allocated DMA buffers.

Exercises

  1. DMA Buffer Placement Analysis:
    • Modify Example 1. Instead of heap_caps_malloc with MALLOC_CAP_DMA, try allocating the tx_buffer and rx_buffer in a few different ways:
      • As global static arrays: static uint8_t tx_buffer_static[BUFFER_SIZE_BYTES];
      • Using malloc() (standard C library malloc, which usually maps to heap_caps_malloc(size, MALLOC_CAP_DEFAULT)).
    • Observe if the DMA transfer still works correctly for each case.
    • Research and explain why heap_caps_malloc with MALLOC_CAP_DMA is the most robust approach for DMA buffers. (Hint: Consider where global static arrays are placed and the default heap capabilities).
  2. Asynchronous DMA with CPU Work:
    • Take Example 1 (SPI Loopback with DMA). Convert it to use spi_device_queue_trans() and spi_device_get_trans_result().
    • In the time between queuing the transaction and waiting for its result, implement a simple counter that increments and prints its value to the console rapidly (e.g., in a loop that runs for a fixed number of iterations or a short delay).
    • Observe how the counter continues to run while the SPI DMA transfer is presumably happening in the background. This demonstrates CPU offload.
  3. Investigating max_transfer_sz:
    • Using the DMA loopback setup from Example 1, try transferring a relatively large amount of data (e.g., 2048 bytes).
    • Experiment by setting max_transfer_sz in spi_bus_config_t to:
      • A value larger than your transaction size (e.g., 4096).
      • A value smaller than your transaction size (e.g., 512 bytes).
      • 0 (to use the default).
    • Does the transfer still succeed in all cases? (It should, as the driver handles segmentation).
    • Conceptually, why might having a max_transfer_sz that is too small potentially impact performance for very large, continuous data streams, even if the driver segments it? (Hint: overhead of managing multiple smaller DMA operations vs. fewer larger ones).

Summary

  • DMA (Direct Memory Access) allows peripherals like SPI to transfer data to/from memory without continuous CPU intervention, freeing up the CPU for other tasks.
  • Using DMA for SPI can significantly improve data throughput and reduce CPU load, especially for large or high-speed transfers.
  • In ESP-IDF, DMA for an SPI bus is enabled by setting the dma_chan parameter in spi_bus_config_t to SPI_DMA_CH_AUTO (recommended) or a specific channel number during spi_bus_initialize().
  • Data buffers used in DMA transactions must be allocated in DMA-capable memory (use heap_caps_malloc with MALLOC_CAP_DMA) and must remain valid throughout the transaction.
  • spi_device_transmit() uses DMA if enabled on the bus. For non-blocking DMA, use spi_device_queue_trans() and spi_device_get_trans_result().
  • spi_device_polling_transmit() does not use DMA and is CPU-driven.
  • Different ESP32 variants have varying DMA capabilities, especially concerning PSRAM access; SPI_DMA_CH_AUTO and MALLOC_CAP_DMA help abstract these differences.

Further Reading

  • ESP-IDF SPI Master Driver Documentation:
  • ESP-IDF Heap Memory Allocation:
  • ESP32 Technical Reference Manual (TRM):
    • Consult the TRM for your specific ESP32 variant for detailed information on the GDMA controller and SPI peripheral hardware. (e.g., ESP32 TRM).

1 thought on “SPI DMA Transfers with ESP32”

  1. Hello!
    Example #2 won’t work if DMA is disabled, since the transaction will then be limited to SOC_SPI_MAXIMUM_BUFFER_SIZE = 64 bytes, and .max_transfer_sz is ignored. It’s more interesting to compare DMA modes with polling and interrupt.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top