Chapter 134: SPI DMA Transfers with ESP32
Chapter Objectives
After completing this chapter, you will be able to:
- Understand the concept of Direct Memory Access (DMA) and its advantages for SPI communication.
- Explain how DMA is utilized by ESP32’s SPI peripherals to offload the CPU.
- Configure SPI buses in ESP-IDF to use DMA for data transfers.
- Recognize the importance of buffer memory type and alignment for DMA operations.
- Differentiate between DMA-driven and CPU-polled SPI transactions.
- Implement high-throughput SPI communication using DMA.
- Troubleshoot common issues related to SPI DMA transfers.
- Appreciate the performance gains achieved by using DMA for SPI.
Introduction
In the preceding chapters, we’ve established how to use the Serial Peripheral Interface (SPI) for communicating with single and multiple devices. While effective, the methods discussed so far, particularly when relying heavily on CPU intervention for each byte transferred (polling), can become a bottleneck for high-speed or large-volume data exchanges. As data rates increase or the amount of data grows, the CPU might spend a significant portion of its time merely shuffling bytes to and from the SPI peripheral, leaving less capacity for other critical application tasks.
This is where Direct Memory Access (DMA) becomes invaluable. DMA allows peripherals, like the SPI controller, to transfer data directly to or from memory without constant CPU oversight. By offloading these data transfer tasks to the DMA controller, the CPU is freed to perform other computations or manage other peripherals, leading to more efficient and higher-performance embedded systems. This chapter will explore how to leverage DMA for SPI communication on ESP32 devices using the ESP-IDF, enabling faster and more efficient data handling.
Theory
What is DMA (Direct Memory Access)?
Direct Memory Access (DMA) is a feature of modern microcontrollers and computer systems that allows certain hardware subsystems (peripherals) to access main system memory (RAM) to read and/or write data independently of the Central Processing Unit (CPU).
Analogy: Imagine a busy office manager (the CPU) who needs to send and receive many packages (data) via a mailroom (the SPI peripheral).
- Without DMA (CPU-polled/interrupt-driven): The manager has to personally carry each package to the mailroom, wait for outgoing packages to be sent, and personally pick up each incoming package. This takes up a lot of the manager’s time.
- With DMA: The manager can instruct a dedicated courier service (the DMA controller) to handle the package transfers. The manager tells the courier where the packages are in storage (memory location) and how many there are. The courier then moves the packages between storage and the mailroom directly. The manager is free to do other important work and is only notified when the entire batch of packages has been sent or received.
Benefits of using DMA:
- CPU Offload: The CPU initiates the DMA transfer and can then perform other tasks while the DMA controller handles the data movement. This significantly reduces CPU load.
- Increased Throughput: DMA transfers are typically faster than CPU-mediated transfers for larger data blocks because the dedicated DMA hardware is optimized for this task.
- Lower Power Consumption: Since the CPU can be idle or in a lower power state during DMA operations, overall system power consumption can be reduced.
- Deterministic Transfers: DMA can provide more predictable data transfer times, as it’s less susceptible to CPU workload variations.
DMA in ESP32 SPI Peripherals
ESP32 family microcontrollers are equipped with general-purpose DMA (GDMA) controllers that can be linked with various peripherals, including the SPI controllers. When SPI DMA is enabled:
- For Transmission (MOSI): The CPU prepares a data buffer in memory. It then configures the SPI peripheral and its associated DMA channel to transfer a specified amount of data from this buffer to the SPI peripheral’s transmit FIFO (First-In, First-Out buffer). The DMA controller reads data from memory and writes it to the SPI TX FIFO as space becomes available. The SPI peripheral then clocks this data out onto the MOSI line.
- For Reception (MISO): The CPU allocates a buffer in memory to store incoming data. It configures the SPI peripheral and DMA channel to transfer data from the SPI receive FIFO to this memory buffer. As data arrives on the MISO line and fills the SPI RX FIFO, the DMA controller reads it and writes it into the designated memory buffer.
- CPU configuring DMA and SPI.
- DMA reading from Memory to SPI TX FIFO for transmission.
- DMA writing from SPI RX FIFO to Memory for reception.]
The ESP-IDF spi_master driver abstracts the low-level details of DMA channel assignment and configuration, making it relatively straightforward to use.
ESP-IDF Configuration for SPI DMA
The primary way to enable DMA for an SPI bus in ESP-IDF is during the bus initialization phase using spi_bus_initialize(). The key parameter in the spi_bus_config_t structure is dma_chan:
esp_err_t spi_bus_initialize(spi_host_device_t host_id,
const spi_bus_config_t *bus_config,
spi_dma_chan_t dma_chan);
dma_chanparameter: This specifies which DMA channel should be allocated for the SPI bus.SPI_DMA_CH_AUTO: This is the highly recommended option. The driver will automatically find and allocate a free and suitable DMA channel for the specified SPI host. This avoids manual channel management and potential conflicts. IfSPI_DMA_CH_AUTOis used and no DMA channel is available, the function will return an error.SPI_DMA_DISABLED(or0): This explicitly disables the use of DMA for this SPI bus. All transactions will be CPU-driven (data copied by CPU to/from SPI FIFOs).- Specific channel numbers (e.g.,
1,2, or an enum likeSPI_DMA_CH1if defined for the target): This allows for manual assignment of a DMA channel. This is an advanced option and generally not recommended unless you have a deep understanding of the ESP32’s DMA controller and are sure the chosen channel is available and appropriate for the SPI host. Incorrect manual assignment can lead to errors or conflicts.
max_transfer_sz in spi_bus_config_t:
This field in spi_bus_config_t specifies the maximum size, in bytes, of a single data transfer that the bus should be prepared to handle, particularly for DMA.
typedef struct {
// ... other fields
int max_transfer_sz; ///< Maximum transfer size, in bytes. Defaults to 4094 if 0.
// ... other fields
} spi_bus_config_t;
| Field in spi_bus_config_t | Parameter for spi_bus_initialize() | Description & Relevance to DMA | Common Values / Notes |
|---|---|---|---|
| N/A (Direct Parameter) | dma_chan | Specifies the DMA channel to be used by the SPI bus. This is the primary setting to enable/disable DMA. |
|
| max_transfer_sz | Part of bus_config struct | Maximum transfer size in bytes that the bus (and its DMA configuration) should be prepared to handle in a single low-level operation. Affects internal buffer allocation for DMA. |
|
| mosi_io_num, miso_io_num, sclk_io_num, etc. | Part of bus_config struct | Standard SPI pin configurations. While not directly DMA settings, they define the bus that DMA will operate on. | Must be correctly set for any SPI communication, regardless of DMA usage. |
If you plan to send or receive large chunks of data in a single transaction using DMA, max_transfer_sz should be set to accommodate this. If a transaction larger than this size (or a hardware limit) is attempted, the driver might need to break it into smaller segments. The default value (4094 if max_transfer_sz is set to 0) is often sufficient for many applications. For DMA transfers, internal buffers might be allocated based on this size, so setting it excessively large without need could consume more memory.
Data Buffers for DMA
When using DMA, the memory buffers for transmission (tx_buffer) and reception (rx_buffer) in spi_transaction_t have specific requirements:
- Memory Location and Capability:
- DMA controllers can typically only access certain memory regions directly. On ESP32 devices, internal SRAM (Data RAM) is generally DMA-accessible.
- To ensure a buffer is suitable for DMA, it’s best practice to allocate it using
heap_caps_malloc()with theMALLOC_CAP_DMAflag. This guarantees the memory is placed in a DMA-capable region.
// Example: Allocate a DMA-capable buffer
uint8_t *my_tx_buffer = heap_caps_malloc(BUFFER_SIZE, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
if (my_tx_buffer == NULL) {
// Handle allocation failure
}
// ... use buffer ...
heap_caps_free(my_tx_buffer); // Don't forget to free it
- PSRAM: For ESP32 variants with PSRAM, direct DMA access to PSRAM by SPI peripherals can be limited or require specific configurations. If
MALLOC_CAP_DMAis used and internal SRAM is scarce, the allocator might still return internal SRAM. If you specifically allocate in PSRAM and need DMA, the driver might internally use “bounce buffers” (temporary buffers in internal SRAM) to facilitate the DMA transfer, which adds overhead. ESP32-S3 has more capable DMA that can often access PSRAM directly. Always check the documentation for your specific ESP32 variant regarding PSRAM and DMA. - Buffer Alignment:
- DMA transfers are often more efficient if data buffers are aligned to certain memory boundaries (e.g., 4-byte or 32-byte alignment). The
MALLOC_CAP_DMAflag usually ensures sufficient alignment. If you are using statically allocated arrays or custom memory pools, you might need to manually ensure alignment (e.g., using__attribute__((aligned(4)))). The ESP-IDF SPI driver often handles minor misalignments with internal buffering, but optimal performance comes from properly aligned buffers.
- DMA transfers are often more efficient if data buffers are aligned to certain memory boundaries (e.g., 4-byte or 32-byte alignment). The
- Buffer Lifetime:
- The data buffers provided for a DMA transaction must remain valid and unchanged for the entire duration of the DMA operation. This is especially critical for asynchronous (queued) transactions. If a buffer is allocated on the stack of a function, that function must not return (and thus deallocate the stack frame) until the DMA transaction using that buffer is fully completed. If the buffer is dynamically allocated, it must not be freed prematurely.
| Consideration | Requirement / Best Practice | Reasoning & Impact |
|---|---|---|
| Memory Location & Capability |
Buffers (tx_buffer, rx_buffer) must be in DMA-accessible memory.
|
DMA controllers can only access specific memory regions (typically internal SRAM). MALLOC_CAP_DMA ensures this. Using non-DMA memory can lead to transfer failures or data corruption. PSRAM DMA access varies by ESP32 variant; driver may use bounce buffers if direct access isn’t possible. |
| Buffer Alignment |
DMA transfers are more efficient with aligned buffers (e.g., 4-byte).
|
Aligned access can speed up DMA operations and prevent hardware exceptions on some architectures. Misalignment might incur performance penalties or require driver to use intermediate aligned buffers. |
| Buffer Lifetime & Validity |
Buffers must remain valid and unchanged for the entire duration of the DMA transaction.
|
If a buffer is deallocated (e.g., stack variable goes out of scope, heap memory freed) or modified while DMA is active, it can lead to:
|
| Buffer Content (Transmit) | The tx_buffer must contain the complete data to be sent before initiating the DMA transfer. | DMA reads directly from this buffer. Any changes made after starting the transfer might not be reflected or could corrupt the ongoing transmission. |
| Buffer Size (Receive) | The rx_buffer must be large enough to hold all expected incoming data. | DMA writes directly into this buffer. Insufficient size will lead to buffer overflows, data corruption, and potential system instability. |
Transaction Types and DMA
%%{ init: { 'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans' } } }%%
sequenceDiagram
participant App as Application (CPU)
participant IDF as ESP-IDF SPI Driver
participant DMA as DMA Controller
participant SPI_HW as SPI Hardware
App->>IDF: 1. Prepare `spi_transaction_t` <br>(tx/rx buffers)
App->>IDF: 2. `spi_device_queue_trans<br>(handle, &t, ...)`
IDF-->>App: 3. Returns <br>(e.g., ESP_OK if queued)
Note over App: CPU is now free for other tasks
App->>App: 4. Perform other application logic...
activate IDF
IDF->>DMA: 5. Configure DMA for transfer<br> (source, dest, size)
IDF->>SPI_HW: 6. Configure SPI HW<br> (mode, speed from handle)
deactivate IDF
activate DMA
Note over DMA, SPI_HW: DMA & <br>SPI HW work in parallel
DMA->>SPI_HW: 7. Data from Memory to<br> SPI TX FIFO (for Tx)
SPI_HW-->>SPI_HW: 8. SPI clocks data out<br> (MOSI) / in (MISO)
SPI_HW->>DMA: 9. Data from SPI RX FIFO<br> to Memory (for Rx)
deactivate DMA
activate DMA #DarkSlateGray
DMA->>IDF: 10. DMA Transfer Complete <br>(Interrupt)
deactivate DMA
activate IDF
IDF->>IDF: 11. Mark transaction as complete
deactivate IDF
Note over App: Later...
App->>IDF: 12. `spi_device_get_trans_result<br>(handle, &t_result, ...)`
activate IDF
alt Transaction Completed
IDF-->>App: 13. Returns `t_result` (ESP_OK)
App->>App: 14. Process received data<br> from `t_result->rx_buffer`
else Transaction Pending (or Timeout)
IDF-->>App: 13. Returns <br>(e.g., ESP_ERR_TIMEOUT or still pending)
end
deactivate IDF
spi_device_polling_transmit(spi_device_handle_t handle, spi_transaction_t *trans_desc):- This function performs SPI transactions by polling the SPI hardware registers.
- It does not use DMA, regardless of whether DMA is enabled on the bus.
- Suitable for very short transactions where the overhead of setting up a DMA transfer might outweigh its benefits.
- It’s a blocking call; the CPU is busy managing the transfer.
spi_device_transmit(spi_device_handle_t handle, spi_transaction_t *trans_desc):- This function is a convenient wrapper around
spi_device_queue_trans()andspi_device_get_trans_result(). - It will use DMA if DMA is enabled on the bus (i.e.,
dma_chanwas set toSPI_DMA_CH_AUTOor a specific channel duringspi_bus_initialize()). - It behaves as a blocking call from the application’s perspective, but DMA handles the actual data movement in the background.
- This function is a convenient wrapper around
spi_device_queue_trans(spi_device_handle_t handle, spi_transaction_t *trans_desc, TickType_t ticks_to_wait):- This function queues an SPI transaction for execution.
- If DMA is enabled, the transaction will be performed using DMA.
- This is non-blocking (or can block for a specified timeout if the queue is full). The CPU can continue with other tasks after queuing the transaction.
spi_device_get_trans_result(spi_device_handle_t handle, spi_transaction_t **trans_desc, TickType_t ticks_to_wait):- This function is used to retrieve the result of a completed transaction that was previously queued using
spi_device_queue_trans(). - This is where the application synchronizes with the completion of the DMA transfer.
- This function is used to retrieve the result of a completed transaction that was previously queued using
| ESP-IDF SPI Function | Uses DMA? | Behavior | Typical Use Case |
|---|---|---|---|
| spi_device_polling_transmit() | No | CPU-driven, polls hardware registers. Blocking. | Very short transactions where DMA setup overhead is undesirable. Simple, infrequent transfers. |
| spi_device_transmit() | Yes (if DMA enabled on bus) | Effectively blocking from application view. Internally queues transaction and waits for completion. Uses DMA for actual data movement if bus is DMA-enabled. | Convenient for DMA-backed blocking transfers. Good for most general-purpose DMA transfers where simplicity is desired. |
| spi_device_queue_trans() | Yes (if DMA enabled on bus) | Non-blocking (or timed block if queue full). Queues transaction for DMA execution. CPU is freed after queuing. | High-performance applications requiring CPU offload. Allows CPU to perform other tasks while SPI DMA transfer occurs in parallel. Used with spi_device_get_trans_result(). |
| spi_device_get_trans_result() | N/A (manages DMA completion) | Blocking. Waits for a previously queued (DMA) transaction to complete and retrieves its result. | Used in conjunction with spi_device_queue_trans() to synchronize and get the outcome of asynchronous DMA operations. |
For leveraging the full power of DMA (especially CPU offload), the asynchronous pattern with spi_device_queue_trans and spi_device_get_trans_result is the most effective.
Practical Examples
Example 1: SPI Loopback with DMA Enabled
This example demonstrates a basic SPI loopback test (MOSI connected to MISO) with DMA enabled on the SPI bus.
Hardware Setup:
- Connect the MOSI pin to the MISO pin on your ESP32 board.
- (Refer to Chapter 132/133 for typical pin numbers like MOSI=23, MISO=19, SCLK=18, CS=5 on an ESP32 DevKitC).
Project Setup:
- Create/open an ESP-IDF project.
- Ensure
spi_master,driver,esp_log, andheapare inREQUIRESinmain/CMakeLists.txt.idf_component_register(SRCS "main.c" INCLUDE_DIRS "." REQUIRES spi_master driver esp_log heap)
Code (main/main.c):
#include <stdio.h>
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/spi_master.h"
#include "driver/gpio.h"
#include "esp_log.h"
#include "esp_heap_caps.h" // For heap_caps_malloc
static const char *TAG = "SPI_DMA_EXAMPLE";
#define SPI_HOST_ID SPI2_HOST
#define PIN_NUM_MOSI 23
#define PIN_NUM_MISO 19 // Connect to MOSI for loopback
#define PIN_NUM_SCLK 18
#define PIN_NUM_CS 5
#define BUFFER_SIZE_BYTES 64
void app_main(void)
{
esp_err_t ret;
spi_device_handle_t spi_device;
ESP_LOGI(TAG, "Initializing SPI bus with DMA...");
spi_bus_config_t buscfg = {
.mosi_io_num = PIN_NUM_MOSI,
.miso_io_num = PIN_NUM_MISO,
.sclk_io_num = PIN_NUM_SCLK,
.quadwp_io_num = -1,
.quadhd_io_num = -1,
.max_transfer_sz = BUFFER_SIZE_BYTES + 4 // A bit larger than buffer for safety
};
// Initialize the SPI bus with DMA enabled (SPI_DMA_CH_AUTO)
// The ESP-IDF driver will automatically select an available DMA channel.
ret = spi_bus_initialize(SPI_HOST_ID, &buscfg, SPI_DMA_CH_AUTO);
ESP_ERROR_CHECK(ret);
ESP_LOGI(TAG, "SPI bus initialized.");
spi_device_interface_config_t devcfg = {
.clock_speed_hz = 10 * 1000 * 1000, // 10 MHz
.mode = 0,
.spics_io_num = PIN_NUM_CS,
.queue_size = 1 // Only one transaction in flight for this simple example
};
ret = spi_bus_add_device(SPI_HOST_ID, &devcfg, &spi_device);
ESP_ERROR_CHECK(ret);
ESP_LOGI(TAG, "SPI device added.");
// Allocate DMA-capable memory for transmit and receive buffers
uint8_t *tx_buffer = heap_caps_malloc(BUFFER_SIZE_BYTES, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
uint8_t *rx_buffer = heap_caps_malloc(BUFFER_SIZE_BYTES, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
if (tx_buffer == NULL || rx_buffer == NULL) {
ESP_LOGE(TAG, "Failed to allocate DMA buffers!");
if(tx_buffer) heap_caps_free(tx_buffer);
if(rx_buffer) heap_caps_free(rx_buffer);
return;
}
ESP_LOGI(TAG, "DMA buffers allocated.");
// Fill transmit buffer with some data
for (int i = 0; i < BUFFER_SIZE_BYTES; i++) {
tx_buffer[i] = i % 256;
}
memset(rx_buffer, 0xAA, BUFFER_SIZE_BYTES); // Pre-fill rx_buffer to see changes
spi_transaction_t t;
memset(&t, 0, sizeof(t));
t.length = BUFFER_SIZE_BYTES * 8; // Length in bits
t.tx_buffer = tx_buffer;
t.rx_buffer = rx_buffer;
ESP_LOGI(TAG, "Performing SPI transaction using spi_device_transmit (uses DMA if enabled)...");
// spi_device_transmit will use DMA because the bus was initialized with DMA.
ret = spi_device_transmit(spi_device, &t);
ESP_ERROR_CHECK(ret);
ESP_LOGI(TAG, "Transaction complete.");
ESP_LOG_BUFFER_HEXDUMP(TAG, tx_buffer, 16, ESP_LOG_INFO); // Log first 16 bytes sent
ESP_LOG_BUFFER_HEXDUMP(TAG, rx_buffer, 16, ESP_LOG_INFO); // Log first 16 bytes received
// Verify loopback data
if (memcmp(tx_buffer, rx_buffer, BUFFER_SIZE_BYTES) == 0) {
ESP_LOGI(TAG, "Loopback successful! Sent and received data match.");
} else {
ESP_LOGE(TAG, "Loopback failed. Data mismatch.");
}
// Free DMA buffers
heap_caps_free(tx_buffer);
heap_caps_free(rx_buffer);
ESP_LOGI(TAG, "DMA buffers freed.");
// Optional: remove device and free bus
// spi_bus_remove_device(spi_device);
// spi_bus_free(SPI_HOST_ID);
ESP_LOGI(TAG, "SPI DMA example finished.");
}
Build, Flash, and Observe:
- Build the project (
Ctrl+E B). - Flash it to your ESP32 (
Ctrl+E F). - Open the serial monitor (Ctrl+E M).You should see logs indicating DMA buffer allocation, the transaction, and whether the loopback was successful. The key here is that spi_device_transmit utilized DMA because the bus was initialized with SPI_DMA_CH_AUTO.
Example 2: Comparing DMA vs. CPU-Polled (Conceptual Timing)
This example outlines how you might compare the performance. For accurate timing, you’d use esp_timer_get_time().
#include <stdio.h>
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/spi_master.h"
#include "driver/gpio.h"
#include "esp_log.h"
#include "esp_heap_caps.h"
#include "esp_timer.h" // For timing
static const char *TAG = "SPI_DMA_PERF_TEST";
#define SPI_HOST_ID SPI2_HOST
#define PIN_NUM_MOSI 23
#define PIN_NUM_MISO 19 // Connect to MOSI for loopback
#define PIN_NUM_SCLK 18
#define PIN_NUM_CS 5
#define LARGE_BUFFER_SIZE_BYTES (4 * 1024) // 4KB
#define NUM_ITERATIONS 100
// Function to perform SPI transfer and measure time
void perform_spi_test(const char* test_name, spi_dma_chan_t dma_setting) {
esp_err_t ret;
spi_device_handle_t spi_device_handle; // Renamed to avoid conflict
ESP_LOGI(TAG, "Starting test: %s", test_name);
spi_bus_config_t buscfg = {
.mosi_io_num = PIN_NUM_MOSI,
.miso_io_num = PIN_NUM_MISO,
.sclk_io_num = PIN_NUM_SCLK,
.quadwp_io_num = -1,
.quadhd_io_num = -1,
.max_transfer_sz = LARGE_BUFFER_SIZE_BYTES + 16
};
ret = spi_bus_initialize(SPI_HOST_ID, &buscfg, dma_setting);
if (ret != ESP_OK) {
ESP_LOGE(TAG, "Failed to initialize SPI bus for %s: %s", test_name, esp_err_to_name(ret));
// Attempt to free bus if it was partially initialized by a previous failed test run
spi_bus_free(SPI_HOST_ID);
// Re-attempt initialization (could be problematic if bus is stuck, but for test)
ret = spi_bus_initialize(SPI_HOST_ID, &buscfg, dma_setting);
ESP_ERROR_CHECK(ret); // If it fails again, it will abort
}
spi_device_interface_config_t devcfg = {
.clock_speed_hz = 20 * 1000 * 1000, // 20 MHz
.mode = 0,
.spics_io_num = PIN_NUM_CS,
.queue_size = 1
};
ret = spi_bus_add_device(SPI_HOST_ID, &devcfg, &spi_device_handle);
ESP_ERROR_CHECK(ret);
uint8_t *tx_buffer = heap_caps_malloc(LARGE_BUFFER_SIZE_BYTES, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
uint8_t *rx_buffer = heap_caps_malloc(LARGE_BUFFER_SIZE_BYTES, MALLOC_CAP_DMA | MALLOC_CAP_8BIT);
if (!tx_buffer || !rx_buffer) {
ESP_LOGE(TAG, "Failed to allocate DMA buffers for %s", test_name);
goto cleanup;
}
for(int i=0; i<LARGE_BUFFER_SIZE_BYTES; i++) tx_buffer[i] = i;
spi_transaction_t t;
memset(&t, 0, sizeof(t));
t.length = LARGE_BUFFER_SIZE_BYTES * 8;
t.tx_buffer = tx_buffer;
t.rx_buffer = rx_buffer; // For loopback
int64_t start_time, end_time;
uint64_t total_duration = 0;
ESP_LOGI(TAG, "Running %d iterations for %s...", NUM_ITERATIONS, test_name);
for (int i = 0; i < NUM_ITERATIONS; i++) {
start_time = esp_timer_get_time();
if (dma_setting == SPI_DMA_DISABLED) {
// For non-DMA, spi_device_polling_transmit is the CPU-bound equivalent
ret = spi_device_polling_transmit(spi_device_handle, &t);
} else {
// For DMA, spi_device_transmit uses DMA
ret = spi_device_transmit(spi_device_handle, &t);
}
ESP_ERROR_CHECK(ret);
end_time = esp_timer_get_time();
total_duration += (end_time - start_time);
}
ESP_LOGI(TAG, "%s: Transferred %d KB data %d times.", test_name, LARGE_BUFFER_SIZE_BYTES / 1024, NUM_ITERATIONS);
ESP_LOGI(TAG, "%s: Average time per transaction: %.2f us", test_name, (float)total_duration / NUM_ITERATIONS);
ESP_LOGI(TAG, "%s: Total time: %.2f ms", test_name, (float)total_duration / 1000.0);
cleanup:
if(tx_buffer) heap_caps_free(tx_buffer);
if(rx_buffer) heap_caps_free(rx_buffer);
spi_bus_remove_device(spi_device_handle); // Remove device
spi_bus_free(SPI_HOST_ID); // Free bus
ESP_LOGI(TAG, "Finished test: %s, bus freed.", test_name);
vTaskDelay(pdMS_TO_TICKS(500)); // Delay to allow logs to print and bus to settle if needed
}
void app_main(void)
{
// Test with DMA enabled
perform_spi_test("DMA Enabled Test", SPI_DMA_CH_AUTO);
// Test with DMA disabled (CPU Polling)
perform_spi_test("DMA Disabled (CPU Polled) Test", SPI_DMA_DISABLED);
ESP_LOGI(TAG, "All SPI performance tests finished.");
}
Observe:
When you run this, you should observe that the “DMA Enabled Test” (using spi_device_transmit) is significantly faster for transferring the large buffer multiple times compared to the “DMA Disabled (CPU Polled) Test” (using spi_device_polling_transmit). This demonstrates the efficiency of DMA.
Caution: Re-initializing the SPI bus (spi_bus_initialize) repeatedly as done in this test function is generally not standard practice in a final application. An application would typically initialize the bus once. This structure is for isolated testing. Ensure spi_bus_free() is called correctly to allow re-initialization.
Variant Notes
The general principles of SPI DMA apply across ESP32 variants, but there are nuances:
- DMA Controller and Channels:
- ESP32: Has two DMA controllers. SPI2 (HSPI) and SPI3 (VSPI) can be connected to DMA channels.
SPI_DMA_CH_AUTOtypically assigns channel 1 or 2. - ESP32-S2: Features a GDMA controller. SPI2 and SPI3 can use DMA.
- ESP32-S3: Features a GDMA controller with more flexibility. SPI2 and SPI3 can use DMA.
- ESP32-C3 / C6 / H2 (RISC-V based): Feature GDMA controllers. The general-purpose SPI controller (SPI2, often named FSPI or GPSPI) supports DMA.
- The
SPI_DMA_CH_AUTOsetting inspi_bus_initializeis the best way to ensure correct DMA channel allocation compatible with the specific variant and current IDF version.
- ESP32: Has two DMA controllers. SPI2 (HSPI) and SPI3 (VSPI) can be connected to DMA channels.
- PSRAM and DMA Accessibility:
- ESP32 (original): DMA access to external PSRAM by SPI peripherals is generally not direct. Bounce buffers in internal SRAM are typically used by the driver if PSRAM buffers are provided for DMA, incurring some overhead.
- ESP32-S2: Similar limitations to original ESP32 regarding direct PSRAM DMA by SPI.
- ESP32-S3: The GDMA on ESP32-S3 is more advanced and can directly access external PSRAM. This makes using PSRAM for large SPI DMA buffers more efficient.
- ESP32-C6 / H2 (with PSRAM support): Check the specific variant’s TRM and ESP-IDF documentation. Generally, newer chips tend to have better DMA capabilities with PSRAM.
- Recommendation: Always use
heap_caps_malloc(size, MALLOC_CAP_DMA | ...)for buffers intended for SPI DMA. This ensures the buffer is placed in memory that the SPI DMA can access (preferring internal SRAM if suitable, or PSRAM on S3 if configured and appropriate).
- Maximum DMA Transfer Size: While
max_transfer_szis a software configuration, the underlying hardware DMA controllers might have their own limits per DMA descriptor or block. The ESP-IDF driver manages these details, potentially splitting larger application requests into multiple DMA operations.
Tip: Always refer to the latest ESP-IDF documentation and the Technical Reference Manual (TRM) for your specific ESP32 variant for the most accurate details on DMA capabilities and configurations.
Common Mistakes & Troubleshooting Tips
| Mistake / Issue | Symptom(s) | Troubleshooting / Solution |
|---|---|---|
| Using Non-DMA-Capable Buffers |
|
|
| Buffer Deallocated or Modified Prematurely |
|
|
| Expecting DMA with spi_device_polling_transmit() |
|
|
| DMA Channel Misconfiguration / Exhaustion |
|
|
| Incorrect max_transfer_sz |
|
|
| Forgetting to Free DMA Buffers |
|
|
Exercises
- DMA Buffer Placement Analysis:
- Modify Example 1. Instead of
heap_caps_mallocwithMALLOC_CAP_DMA, try allocating thetx_bufferandrx_bufferin a few different ways:- As global static arrays:
static uint8_t tx_buffer_static[BUFFER_SIZE_BYTES]; - Using
malloc()(standard C library malloc, which usually maps toheap_caps_malloc(size, MALLOC_CAP_DEFAULT)).
- As global static arrays:
- Observe if the DMA transfer still works correctly for each case.
- Research and explain why
heap_caps_mallocwithMALLOC_CAP_DMAis the most robust approach for DMA buffers. (Hint: Consider where global static arrays are placed and the default heap capabilities).
- Modify Example 1. Instead of
- Asynchronous DMA with CPU Work:
- Take Example 1 (SPI Loopback with DMA). Convert it to use
spi_device_queue_trans()andspi_device_get_trans_result(). - In the time between queuing the transaction and waiting for its result, implement a simple counter that increments and prints its value to the console rapidly (e.g., in a loop that runs for a fixed number of iterations or a short delay).
- Observe how the counter continues to run while the SPI DMA transfer is presumably happening in the background. This demonstrates CPU offload.
- Take Example 1 (SPI Loopback with DMA). Convert it to use
- Investigating
max_transfer_sz:- Using the DMA loopback setup from Example 1, try transferring a relatively large amount of data (e.g., 2048 bytes).
- Experiment by setting
max_transfer_szinspi_bus_config_tto:- A value larger than your transaction size (e.g., 4096).
- A value smaller than your transaction size (e.g., 512 bytes).
0(to use the default).
- Does the transfer still succeed in all cases? (It should, as the driver handles segmentation).
- Conceptually, why might having a
max_transfer_szthat is too small potentially impact performance for very large, continuous data streams, even if the driver segments it? (Hint: overhead of managing multiple smaller DMA operations vs. fewer larger ones).
Summary
- DMA (Direct Memory Access) allows peripherals like SPI to transfer data to/from memory without continuous CPU intervention, freeing up the CPU for other tasks.
- Using DMA for SPI can significantly improve data throughput and reduce CPU load, especially for large or high-speed transfers.
- In ESP-IDF, DMA for an SPI bus is enabled by setting the
dma_chanparameter inspi_bus_config_ttoSPI_DMA_CH_AUTO(recommended) or a specific channel number duringspi_bus_initialize(). - Data buffers used in DMA transactions must be allocated in DMA-capable memory (use
heap_caps_mallocwithMALLOC_CAP_DMA) and must remain valid throughout the transaction. spi_device_transmit()uses DMA if enabled on the bus. For non-blocking DMA, usespi_device_queue_trans()andspi_device_get_trans_result().spi_device_polling_transmit()does not use DMA and is CPU-driven.- Different ESP32 variants have varying DMA capabilities, especially concerning PSRAM access;
SPI_DMA_CH_AUTOandMALLOC_CAP_DMAhelp abstract these differences.
Further Reading
- ESP-IDF SPI Master Driver Documentation:
- ESP-IDF SPI Master API Reference (Adjust URL for your specific ESP32 variant if needed).
- ESP-IDF Heap Memory Allocation:
- ESP-IDF Heap Memory Allocation Documentation (Details on
MALLOC_CAP_DMAand memory types).
- ESP-IDF Heap Memory Allocation Documentation (Details on
- ESP32 Technical Reference Manual (TRM):
- Consult the TRM for your specific ESP32 variant for detailed information on the GDMA controller and SPI peripheral hardware. (e.g., ESP32 TRM).


Hello!
Example #2 won’t work if DMA is disabled, since the transaction will then be limited to SOC_SPI_MAXIMUM_BUFFER_SIZE = 64 bytes, and .max_transfer_sz is ignored. It’s more interesting to compare DMA modes with polling and interrupt.