Chapter 155: CAN Error Handling and Recovery
Chapter Objectives
By the end of this chapter, you will be able to:
- Identify and understand the five basic types of CAN bus errors: Bit Error, Stuff Error, CRC Error, Form Error, and Acknowledgement (ACK) Error.
- Explain the purpose and structure of CAN Active and Passive Error Frames.
- Describe the three CAN controller error states: Error Active, Error Passive, and Bus-Off.
- Understand the role and behavior of the Transmit Error Counter (TEC) and Receive Error Counter (REC).
- Utilize TWAI alerts to detect various error conditions, including bus errors and state changes.
- Implement bus-off recovery procedures using the ESP-IDF TWAI driver functions.
- Retrieve and interpret TWAI status information, such as error counters and the current controller state.
- Appreciate the importance of fault confinement in maintaining CAN network stability.
1. Introduction
The Controller Area Network (CAN) protocol is renowned for its robustness and reliability, which are paramount in automotive and industrial control systems where communication failures can have serious consequences. A significant part of this robustness comes from its sophisticated built-in mechanisms for error detection, signaling, and fault confinement. Individual nodes can detect errors, inform other nodes, and even isolate themselves from the bus if they become a persistent source of problems.
In the previous chapters, we’ve explored how to configure the ESP32‘s TWAI peripheral and exchange CAN messages. However, real-world communication is rarely perfect. Physical layer issues (like faulty wiring, improper termination, or electromagnetic interference), mismatched baud rates, or malfunctioning nodes can all introduce errors onto the bus.
This chapter focuses on how these errors are handled within the CAN protocol and how you, as an embedded systems developer using the ESP32, can leverage the TWAI driver’s features to monitor bus health, react to error conditions, and implement recovery strategies. Understanding these mechanisms is crucial for building truly resilient and dependable CAN-based applications.
2. Theory
CAN’s error handling is a multi-layered approach involving error detection by any node, error signaling to all nodes, and fault confinement to prevent a persistently faulty node from disrupting the entire network.
2.1. Types of CAN Errors
Any node participating in CAN communication actively monitors the bus for errors. There are five primary types of errors that can be detected:
- Bit Error:
- Detection: A transmitter monitors the bus level while sending. If the bit level it reads back from the bus is different from the bit it intended to send, a Bit Error is detected.
- Exception: This rule does not apply during the arbitration field (where a node intentionally stops transmitting if it sends a recessive bit but detects a dominant one) or during the ACK slot (where the transmitter sends a recessive bit and expects a dominant bit from a receiver).
- Analogy: Imagine speaking a word and hearing yourself say a different word due to noise or a problem with your microphone.
- Stuff Error (Bit Stuffing Error):
- Detection: To ensure enough signal transitions for synchronization, CAN uses bit stuffing: after five consecutive bits of the same polarity, the transmitter inserts a complementary bit. If a receiver detects six consecutive bits of the same polarity in a part of the frame that should be stuffed (SOF, Arbitration, Control, Data, CRC fields), a Stuff Error is detected.
- Analogy: Imagine a rule in a written language where no letter can be repeated more than five times consecutively without inserting a special symbol. Seeing six identical letters in a row would be a “stuff error.”
- CRC Error (Cyclic Redundancy Check Error):
- Detection: The transmitter calculates a 15-bit CRC checksum based on the content of the SOF, Arbitration, Control, and Data fields and appends it to the message. All receivers also calculate the CRC on the incoming message. If a receiver’s calculated CRC does not match the CRC received in the frame, a CRC Error is detected.
- Analogy: Similar to a checksum on a downloaded file. If the calculated checksum doesn’t match the provided one, the file is likely corrupted.
- Form Error (Format Error):
- Detection: Certain parts of a CAN frame have a fixed format (e.g., CRC Delimiter, ACK Delimiter, End of Frame field, Interframe Space must all be recessive). If a node detects a dominant bit in one of these fields where a recessive bit is expected, a Form Error is detected.
- Analogy: Finding a punctuation mark in the middle of a word where it doesn’t belong – it violates the expected structure.
- Acknowledgement (ACK) Error:
- Detection: After a transmitter successfully sends a message (up to the CRC delimiter), it transmits a recessive bit in the ACK Slot. At least one receiver that has correctly received the message must acknowledge it by transmitting a dominant bit in this ACK Slot. If the transmitter does not detect this dominant bit (i.e., the ACK Slot remains recessive), an ACK Error is detected.
- Analogy: Sending a registered letter and not receiving a signed confirmation of receipt. It implies no one (correctly) received it.
2.2. Error Frames
When a node detects one of the errors mentioned above, it immediately stops transmitting the current data/remote frame (if it was transmitting) and broadcasts an Error Frame. The Error Frame serves to notify all other nodes on the bus that an error has occurred, causing them to discard the potentially corrupted message.
An Error Frame consists of two parts:
- Error Flag: This is a sequence of 6 to 12 bits of the same polarity (either all dominant or all recessive), which deliberately violates the bit stuffing rule. This makes it easily recognizable by all other nodes.
- Active Error Flag: Composed of 6 consecutive dominant bits. Sent by nodes in the “Error Active” state.
- Passive Error Flag: Composed of 6 consecutive recessive bits. Sent by nodes in the “Error Passive” state. These are less disruptive as they won’t override dominant bits from other nodes transmitting an active error flag.
- Error Delimiter: Following the Error Flag, there are 8 recessive bits.
After an Error Frame is completed, nodes may attempt to retransmit the interrupted message (if they were the original transmitter and are allowed to) after the usual Interframe Space.
2.3. Error Counters (TEC and REC)
To implement fault confinement, each CAN node maintains two internal error counters:
- Transmit Error Counter (TEC): Primarily reflects errors related to message transmission.
- Receive Error Counter (REC): Primarily reflects errors related to message reception.
The rules for incrementing and decrementing these counters are complex but follow general principles:
Incrementing Counters:
- Transmitter detects an error: TEC increases by 8 (e.g., Bit Error during own transmission, ACK Error).
- Receiver detects an error (except Bit Error during arbitration/ACK): REC increases by 1 (e.g., CRC Error, Form Error, Stuff Error on a message it was receiving).
- Bit Error during ACK slot or if transmitter sends Passive Error Frame: TEC increases by 8.
- Stuff Error during arbitration: REC increases by 1.
- If a node sends an Error Frame, and it’s the first node to detect the error and successfully transmit the Error Flag, its TEC might increase by 8. Other nodes detecting the error and also sending error flags might have their REC increase.
Decrementing Counters:
- Successful transmission of a message: TEC decreases by 1 (if TEC > 0).
- Successful reception of a message (up to and including ACK slot): REC decreases by 1 (if REC > 0).
- If REC was between 1 and 127, and it successfully processes a message up to the EOF, it may decrement faster if it was previously high.
- If REC > 127, successful reception decrements REC by 1.
These rules are designed so that a node that is consistently causing errors (e.g., due to a hardware fault) will see its error counters rise much faster than a node that is only occasionally affected by errors from the bus. A summary table:
Event / Condition | Effect on TEC (Transmit Error Counter) | Effect on REC (Receive Error Counter) |
---|---|---|
Transmitter detects an error (e.g., Bit Error during own transmission, ACK Error, first to send Error Flag) | Increases (e.g., by 8) | No direct change from this specific event. |
Receiver detects an error (e.g., CRC, Form, Stuff Error on a message it was receiving; Bit Error if transmitting passive error flag and it’s corrupted) | No direct change for the receiver from its own reception error. (Transmitter of bad frame might see TEC increase). | Increases (e.g., by 1 or 8 depending on error and state). |
Successful transmission of a message by this node | Decreases by 1 (if TEC > 0) | No change. |
Successful reception of a message by this node (up to and including correct ACK slot monitoring or sending ACK) | No change. | Decreases by 1 (if REC > 0). May decrease faster under certain conditions if REC was high but below 128. |
Node sends an Active Error Flag | Increases by 8 (if this node initiated it due to its TX error). | Other nodes that detect this error sequence (or the original error) will increment their REC. |
Node sends a Passive Error Flag | Increases by 8 (if this node initiated it due to its TX error). | Other nodes may increment REC. Passive error flags are less impactful. |
2.4. CAN Node Error States
Based on the values of TEC and REC, a CAN node operates in one of three error states:
- Error Active:
- Condition: TEC < 128 AND REC < 128.
- Behavior: The node participates normally in bus communication. When it detects an error, it transmits an Active Error Frame (6 dominant bits). This actively signals the error to all other nodes.
- Error Passive:
- Condition: TEC >= 128 OR REC >= 128 (but not Bus-Off).
- Behavior: The node still participates in communication but is considered “less healthy.”
- When it detects an error, it transmits a Passive Error Frame (6 recessive bits). A passive error frame does not corrupt other active error frames and is less disruptive.
- After transmitting a message, an Error Passive node must wait an additional “Suspend Transmission” time (8 recessive bits) after the Interframe Space before it can initiate a new transmission. This effectively lowers its bus bandwidth usage.
- Bus-Off:
- Condition: TEC >= 256.
- Behavior: The node is considered to be a major source of errors and is automatically disconnected from the bus by its own controller.
- It is not allowed to transmit any messages (including Data, Remote, or Error Frames). Its TX pin is forced to a permanent recessive state.
- It can still receive messages if its physical layer is functional.
- To rejoin the bus, the node must undergo a bus recovery sequence. This typically involves waiting for a specific number of consecutive occurrences of 11 recessive bits (Bus Idle condition) on the bus (128 occurrences of 11 recessive bits) and then resetting its error counters. The TWAI driver provides a function to initiate this.
stateDiagram-v2 %% Mermaid State Diagram for CAN Error States (Fault Confinement) direction TB state "Error Active" as Active state "Error Passive" as Passive state "Bus-Off" as BusOff [*] --> Active : Initialization / Low Error Count Active --> Passive : TEC >= 128 OR REC >= 128 Passive --> Active : TEC < 128 AND REC < 128 Passive --> BusOff : TEC >= 256 BusOff --> Active : Successful Recovery Sequence %% State descriptions Active : TEC < 128 AND REC < 128 Active : Normal operation Active : Sends Active Error Flags (dominant) Active : TEC++/REC++ on errors Active : TEC--/REC-- on success Passive : TEC >= 128 OR REC >= 128 Passive : Sends Passive Error Flags (recessive) Passive : Suspend Transmission delay after sending Passive : TEC++/REC++ on errors Passive : TEC--/REC-- on success BusOff : TEC >= 256 BusOff : Node is offline (cannot transmit) BusOff : Requires recovery BusOff : TEC and REC reset to 0 after recovery %% Add notes for transitions note right of Active Error accumulation end note note left of Passive Successful operations reduce errors end note note right of Passive Severe error accumulation (transmit side) end note note left of BusOff Recovery requires 128 x 11 recessive bits or API calls like twai_initiate_recovery end note %% Styling classDef activeStyle fill:#D1FAE5,stroke:#059669,color:#065F46 classDef passiveStyle fill:#FEF3C7,stroke:#D97706,color:#92400E classDef busOffStyle fill:#FEE2E2,stroke:#DC2626,color:#991B1B class Active activeStyle class Passive passiveStyle class BusOff busOffStyle
This state mechanism is a key part of CAN’s fault confinement, preventing a single malfunctioning node from continuously disrupting the entire network.
2.5. TWAI Driver Support for Error Handling
The ESP-IDF TWAI driver provides several mechanisms to monitor and manage error conditions:
- Alerts (alerts_enabled in twai_general_config_t):You can enable various alerts to be notified of specific events. Relevant error-related alerts include:
TWAI_ALERT_BUS_ERROR
: A bus error (Bit, Stuff, CRC, Form, ACK) has occurred.TWAI_ALERT_ARB_LOST
: The controller lost arbitration during transmission.TWAI_ALERT_ERR_PASS
: The controller has entered the Error Passive state.TWAI_ALERT_BUS_OFF
: The controller has entered the Bus-Off state.TWAI_ALERT_TX_FAILED
: A transmission attempt ultimately failed (e.g., due to too many retries, bus-off). (Note: This specific alert might be implicitly covered byTWAI_ALERT_BUS_OFF
orTWAI_ALERT_ARB_LOST
leading to failure).- TWAI_ALERT_ERR_CNT_WARNING: TEC or REC has surpassed the error warning limit (default 96). (TWAI_ALERT_BELOW_ERR_WARN signals recovery from this).These alerts can be read using twai_read_alerts().
- Status Information (twai_get_status_info()):This function retrieves the current status of the TWAI controller.
esp_err_t twai_get_status_info(twai_status_info_t *status_info);
- The
twai_status_info_t
structure contains:
typedef struct {
twai_state_t state; /*!< Current state of TWAI controller (Stopped, Running, Bus-Off, Recovering) */
uint32_t msgs_to_tx; /*!< Number of messages queued for transmission */
uint32_t msgs_to_rx; /*!< Number of messages in RX queue waiting to be read */
uint32_t tx_error_counter; /*!< Current value of Transmit Error Counter */
uint32_t rx_error_counter; /*!< Current value of Receive Error Counter */
uint32_t tx_failed_count; /*!< Number of messages that failed to transmit */
uint32_t rx_missed_count; /*!< Number of messages that were lost due to a full RX queue */
uint32_t arb_lost_count; /*!< Number of times arbitration was lost */
uint32_t bus_error_count; /*!< Total number of bus errors encountered */
} twai_status_info_t;
Member | Type | Description |
---|---|---|
state |
twai_state_t |
Current state of the TWAI controller (e.g., TWAI_STATE_STOPPED , TWAI_STATE_RUNNING , TWAI_STATE_BUS_OFF , TWAI_STATE_RECOVERING ). |
msgs_to_tx |
uint32_t |
Number of messages currently queued in the software transmit queue waiting for transmission. |
msgs_to_rx |
uint32_t |
Number of messages currently in the software receive queue waiting to be read by the application. |
tx_error_counter |
uint32_t |
Current value of the Transmit Error Counter (TEC). |
rx_error_counter |
uint32_t |
Current value of the Receive Error Counter (REC). |
tx_failed_count |
uint32_t |
Cumulative count of messages that failed to transmit (e.g., due to excessive retries leading to bus-off or other unrecoverable TX errors). |
rx_missed_count |
uint32_t |
Cumulative count of messages that were received by the hardware but lost due to a full RX queue (overflow). |
arb_lost_count |
uint32_t |
Cumulative count of times arbitration was lost during transmission attempts. |
bus_error_count |
uint32_t |
Cumulative count of general bus errors detected (Bit, Stuff, CRC, Form, ACK errors). |
- Bus-Off Recovery:
twai_initiate_recovery()
: This function manually triggers the bus-off recovery process. The controller will then attempt to re-synchronize with the bus and reset its error state if successful.twai_start()
: If the driver is in the Bus-Off state, callingtwai_start()
can also attempt to re-initialize and start the controller, effectively acting as a recovery mechanism. Some TWAI controller versions might have an auto-recovery feature from bus-off, but relying ontwai_initiate_recovery()
or a stop/start sequence provides more explicit control.
3. Practical Examples
Example 1: Monitoring TWAI Status and Error Counters
This example demonstrates how to periodically read and log the TWAI controller’s status, including its state and error counters.
Prerequisites:
- ESP-IDF v5.x project.
- TWAI driver configured (e.g.,
TWAI_MODE_NORMAL
orTWAI_MODE_SELF_TEST
) and started.
Code Snippet:
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/twai.h"
#include "esp_log.h"
static const char *TAG = "TWAI_STATUS_MONITOR";
#define TWAI_TX_GPIO_NUM CONFIG_EXAMPLE_TWAI_TX_GPIO
#define TWAI_RX_GPIO_NUM CONFIG_EXAMPLE_TWAI_RX_GPIO
// KConfig (ensure these are in your project's Kconfig.projbuild or sdkconfig.defaults)
// CONFIG_EXAMPLE_TWAI_TX_GPIO=21
// CONFIG_EXAMPLE_TWAI_RX_GPIO=22
static void twai_status_monitor_task(void *pvParameters)
{
twai_status_info_t status_info;
const TickType_t xDelay = pdMS_TO_TICKS(1000); // Check status every 1 second
while (1) {
if (twai_get_status_info(&status_info) == ESP_OK) {
const char *state_str;
switch (status_info.state) {
case TWAI_STATE_STOPPED: state_str = "STOPPED"; break;
case TWAI_STATE_RUNNING: state_str = "RUNNING"; break;
case TWAI_STATE_BUS_OFF: state_str = "BUS-OFF"; break;
case TWAI_STATE_RECOVERING: state_str = "RECOVERING"; break;
default: state_str = "UNKNOWN"; break;
}
ESP_LOGI(TAG, "TWAI Status: State=%s, TXQ=%lu, RXQ=%lu, TEC=%lu, REC=%lu, TX_failed=%lu, RX_missed=%lu, Arb_lost=%lu, Bus_errs=%lu",
state_str,
status_info.msgs_to_tx,
status_info.msgs_to_rx,
status_info.tx_error_counter,
status_info.rx_error_counter,
status_info.tx_failed_count,
status_info.rx_missed_count,
status_info.arb_lost_count,
status_info.bus_error_count);
if (status_info.state == TWAI_STATE_BUS_OFF) {
ESP_LOGW(TAG, "Controller is BUS-OFF! Recovery might be needed.");
}
} else {
ESP_LOGE(TAG, "Failed to get TWAI status.");
}
vTaskDelay(xDelay);
}
}
void app_main(void)
{
ESP_LOGI(TAG, "TWAI Error Handling Example: Status Monitor");
// Basic TWAI configuration (use SELF_TEST for simplicity if no external bus)
twai_general_config_t g_config = TWAI_GENERAL_CONFIG_DEFAULT(TWAI_TX_GPIO_NUM, TWAI_RX_GPIO_NUM, TWAI_MODE_SELF_TEST);
twai_timing_config_t t_config = TWAI_TIMING_CONFIG_125KBITS();
twai_filter_config_t f_config = TWAI_FILTER_CONFIG_ACCEPT_ALL();
ESP_LOGI(TAG, "Installing TWAI driver...");
if (twai_driver_install(&g_config, &t_config, &f_config) != ESP_OK) {
ESP_LOGE(TAG, "Failed to install TWAI driver.");
return;
}
ESP_LOGI(TAG, "Starting TWAI driver...");
if (twai_start() != ESP_OK) {
ESP_LOGE(TAG, "Failed to start TWAI driver.");
twai_driver_uninstall();
return;
}
ESP_LOGI(TAG, "TWAI driver started.");
xTaskCreate(twai_status_monitor_task, "twai_status_task", 4096, NULL, 5, NULL);
// To observe TEC/REC changes, you'd need to induce errors.
// In SELF_TEST mode without external factors, TEC/REC should ideally remain 0.
// If you have an external bus, try disconnecting termination or another node to see errors.
}
Build and Run:
- Set KConfig GPIOs.
- Build, flash, monitor.
- Observe: The task will periodically print the TWAI status. In a healthy self-test loop or a well-behaved bus, TEC and REC should be low or zero, and state should be
RUNNING
.
Example 2: Handling Bus-Off Alert and Initiating Recovery
graph TD %% Mermaid Flowchart for Bus-Off Detection and Recovery %% Styles classDef start fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6; classDef process fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; classDef decision fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E; classDef alert fill:#FEE2E2,stroke:#DC2626,stroke-width:1px,color:#991B1B; classDef recovery fill:#D1FAE5,stroke:#059669,stroke-width:1px,color:#065F46; classDef monitor fill:#E0E7FF,stroke:#4338CA,color:#3730A3; A[Application Running with TWAI Active]:::start A --> B["Enable <span style='font-family:monospace;font-size:0.8em;'>TWAI_ALERT_BUS_OFF</span> in <span style='font-family:monospace;font-size:0.8em;'>g_config.alerts_enabled</span>"]:::process B --> C{"Alert Handler Task:<br>Call <span style='font-family:monospace;font-size:0.8em;'>twai_read_alerts(&alerts, timeout)</span>"}:::monitor C --> D{Alert Triggered?}:::decision D -- "No (Timeout or Other Alerts)" --> C D -- "Yes" --> E{<span style='font-family:monospace;font-size:0.8em;'>alerts & TWAI_ALERT_BUS_OFF</span>?}:::decision E -- "Yes (Bus-Off Detected!)" --> F["Log Bus-Off Event"]:::alert F --> G["Call <span style='font-family:monospace;font-size:0.8em;'>twai_initiate_recovery()</span>"]:::recovery G --> H{Recovery Initiated Successfully?}:::decision H -- "Yes" --> I["TWAI Controller Enters Recovery State<br>(Monitor <span style='font-family:monospace;font-size:0.8em;'>twai_get_status_info()</span> for state changes: RECOVERING -> RUNNING)"]:::monitor H -- "No (Recovery Initiation Failed)" --> J["Log Recovery Initiation Failure<br>(May need driver stop/uninstall/reinstall)"]:::alert I --> changes E -- "No (Other Alert)" --> K["Handle Other Enabled Alerts<br>(e.g., ERR_PASS, BUS_ERROR)"]:::process K --> C subgraph "Background Process" L["Status Monitor Task (Optional):<br>Periodically call <span style='font-family:monospace;font-size:0.8em;'>twai_get_status_info()</span><br>to observe TEC, REC, State"]:::monitor end A -.-> L
This example sets up an alert for TWAI_ALERT_BUS_OFF
. When the alert is triggered, it calls twai_initiate_recovery()
.
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/twai.h"
#include "esp_log.h"
static const char *TAG = "TWAI_BUS_OFF_RECOVERY";
#define TWAI_TX_GPIO_NUM CONFIG_EXAMPLE_TWAI_TX_GPIO
#define TWAI_RX_GPIO_NUM CONFIG_EXAMPLE_TWAI_RX_GPIO
// KConfig
// CONFIG_EXAMPLE_TWAI_TX_GPIO=21
// CONFIG_EXAMPLE_TWAI_RX_GPIO=22
static void twai_alert_handler_task(void *pvParameters)
{
uint32_t alerts_triggered;
twai_status_info_t status_info;
ESP_LOGI(TAG, "TWAI Alert Handler Task started.");
while (1) {
// Block until an alert is triggered (or timeout)
if (twai_read_alerts(&alerts_triggered, pdMS_TO_TICKS(portMAX_DELAY)) == ESP_OK) {
ESP_LOGI(TAG, "Alerts triggered: 0x%08lX", alerts_triggered);
if (alerts_triggered & TWAI_ALERT_BUS_OFF) {
ESP_LOGE(TAG, "ALERT: Bus-Off event detected!");
ESP_LOGI(TAG, "Attempting to initiate recovery...");
if (twai_initiate_recovery() == ESP_OK) {
ESP_LOGI(TAG, "Bus-Off recovery initiated.");
// Monitor status to see if it returns to RUNNING
} else {
ESP_LOGE(TAG, "Failed to initiate Bus-Off recovery.");
}
}
if (alerts_triggered & TWAI_ALERT_ERR_PASS) {
ESP_LOGW(TAG, "ALERT: Entered Error Passive state.");
}
if (alerts_triggered & TWAI_ALERT_BUS_ERROR) {
ESP_LOGW(TAG, "ALERT: Bus error detected.");
if (twai_get_status_info(&status_info) == ESP_OK) {
ESP_LOGW(TAG, "Current TEC: %lu, REC: %lu", status_info.tx_error_counter, status_info.rx_error_counter);
}
}
// Add more alert checks as needed
}
}
}
void app_main(void)
{
ESP_LOGI(TAG, "TWAI Bus-Off Recovery Example");
twai_general_config_t g_config = TWAI_GENERAL_CONFIG_DEFAULT(TWAI_TX_GPIO_NUM, TWAI_RX_GPIO_NUM, TWAI_MODE_NORMAL);
// Enable alerts for Bus-Off, Error Passive, and general Bus Errors
g_config.alerts_enabled = TWAI_ALERT_BUS_OFF | TWAI_ALERT_ERR_PASS | TWAI_ALERT_BUS_ERROR | TWAI_ALERT_ERR_CNT_WARNING;
twai_timing_config_t t_config = TWAI_TIMING_CONFIG_125KBITS();
twai_filter_config_t f_config = TWAI_FILTER_CONFIG_ACCEPT_ALL();
ESP_LOGI(TAG, "Installing TWAI driver with alerts enabled...");
if (twai_driver_install(&g_config, &t_config, &f_config) != ESP_OK) {
ESP_LOGE(TAG, "Failed to install TWAI driver.");
return;
}
ESP_LOGI(TAG, "Starting TWAI driver...");
if (twai_start() != ESP_OK) {
ESP_LOGE(TAG, "Failed to start TWAI driver.");
twai_driver_uninstall();
return;
}
ESP_LOGI(TAG, "TWAI driver started.");
xTaskCreate(twai_alert_handler_task, "twai_alert_task", 4096, NULL, 10, NULL);
xTaskCreate(twai_status_monitor_task, "twai_status_task", 4096, NULL, 5, NULL); // From Example 1
ESP_LOGI(TAG, "System running. To test Bus-Off, you would need to create severe and persistent bus errors.");
ESP_LOGI(TAG, "For example, by shorting CAN_H/CAN_L or removing termination on a live bus (use a test setup, not a critical system!).");
ESP_LOGI(TAG, "Or by having this node transmit continuously while no other node acknowledges (ACK errors).");
// Example: Simulate conditions leading to ACK errors (if this is the only active node)
// This will likely cause TEC to rise and eventually lead to Bus-Off if run for long enough.
// Use with caution and on a test bus.
/*
vTaskDelay(pdMS_TO_TICKS(5000)); // Wait for tasks to start
ESP_LOGI(TAG, "Attempting to transmit repeatedly to potentially trigger ACK errors / Bus-Off...");
twai_message_t dummy_msg;
dummy_msg.identifier = 0x7FF; // Low priority ID
dummy_msg.flags = 0;
dummy_msg.data_length_code = 1;
dummy_msg.data[0] = 0x55;
for (int i = 0; i < 500; i++) { // Transmit many times
esp_err_t tx_res = twai_transmit(&dummy_msg, pdMS_TO_TICKS(10));
if (tx_res != ESP_OK && tx_res != ESP_ERR_TIMEOUT) {
ESP_LOGE(TAG, "TX error during stress test: %s", esp_err_to_name(tx_res));
// Check status_info.state here via a shared mechanism or log
twai_status_info_t current_status;
if (twai_get_status_info(¤t_status) == ESP_OK && current_status.state == TWAI_STATE_BUS_OFF) {
ESP_LOGE(TAG, "Entered BUS-OFF during stress test. Recovery should be triggered by alert task.");
break;
}
}
vTaskDelay(pdMS_TO_TICKS(5)); // Small delay between transmissions
}
ESP_LOGI(TAG, "Finished transmission stress test.");
*/
}
Build and Run:
- This example is best tested on a physical CAN bus where you can induce errors.
- If you run the commented-out transmission loop with
TWAI_MODE_NORMAL
and no other acknowledging CAN node (or faulty termination), the ESP32 will experience ACK errors. Its TEC will rise, potentially leading to Error Passive and then Bus-Off states. The alert task should then detect Bus-Off and attempt recovery. The status monitor will show the state changes. - Observe: Logs from
twai_alert_handler_task
indicatingTWAI_ALERT_BUS_OFF
and the recovery attempt. Logs fromtwai_status_monitor_task
showing the state change toBUS-OFF
and then hopefully back toRECOVERING
andRUNNING
.
4. Variant Notes
- Core Error Mechanisms: The fundamental CAN error types (Bit, Stuff, CRC, Form, ACK), error states (Active, Passive, Bus-Off), and TEC/REC behavior are part of the CAN standard and are implemented consistently by the TWAI peripheral across all ESP32 variants (ESP32, S2, S3, C3, C6, H2).
- Alerts and Status: The specific alerts (
TWAI_ALERT_...
) and thetwai_status_info_t
structure provided by the ESP-IDF driver are also generally consistent for these core error handling features. - Bus-Off Recovery Implementation: The
twai_initiate_recovery()
function and the ability to recover by restarting the driver (twai_start()
) are standard features. The exact timing of the hardware’s bus-off recovery sequence (128 occurrences of 11 recessive bits) is per the CAN specification. - Hardware Auto-Recovery: Some CAN controllers might offer hardware-based automatic bus-off recovery. While the ESP32’s TWAI controller might have some level of this, the ESP-IDF driver encourages explicit recovery initiation via
twai_initiate_recovery()
or a driver restart for more deterministic application behavior.
5. Common Mistakes & Troubleshooting Tips
Mistake / Issue | Symptom(s) | Troubleshooting / Solution |
---|---|---|
Ignoring Error Counters and Status | Application is unaware of degrading bus health; node may suddenly go Bus-Off without prior warning to the application. TEC/REC values are not monitored. |
|
No Bus-Off Recovery Implemented | Node enters Bus-Off state and remains offline indefinitely. No attempt to rejoin the bus. |
|
Persistent Physical Layer Problems | Node repeatedly goes Bus-Off despite recovery attempts. High TEC/REC values. Frequent bus error alerts. |
|
Mismatched Baud Rates Between Nodes | Frequent Form Errors, CRC Errors, Bit Errors, ACK Errors. Communication is unreliable or fails completely. Nodes may enter Error Passive or Bus-Off. |
|
Misinterpreting twai_transmit() Failures in Error States |
Application keeps trying to transmit when the controller is Bus-Off, leading to repeated ESP_ERR_INVALID_STATE . |
|
6. Exercises
- Enhanced Bus-Off Recovery:
- Modify Example 2 (
twai_alert_handler_task
). Iftwai_initiate_recovery()
is called and the node subsequently goes Bus-Off again within a short period (e.g., 3 times in 1 minute), implement a strategy where the application stops trying to recover automatically for a longer duration (e.g., 5 minutes) and logs a critical “Persistent Bus Failure” message. This prevents rapid, continuous recovery attempts on a fundamentally broken bus.
- Modify Example 2 (
- TEC/REC Threshold Warning:
- Using the
twai_status_monitor_task
from Example 1, add logic to issue a specific warning log if either TEC or REC exceeds a threshold (e.g., 64, which is halfway to the Error Passive limit of 128). This can serve as an early indicator of degrading bus quality. TheTWAI_ALERT_ERR_CNT_WARNING
alert can also be used for this.
- Using the
- Simulate ACK Errors and Observe TEC:
- Set up your ESP32 in
TWAI_MODE_NORMAL
. Ensure no other CAN nodes are connected and acknowledging, or deliberately remove bus termination from one end (on a safe test bus only!). - Write a task that attempts to transmit a CAN message repeatedly (e.g., 100 times with a small delay between each).
- In the
twai_status_monitor_task
, observe how the TEC increases due to ACK errors. Does the node transition to Error Passive? Can you make it go Bus-Off? (Be careful with stressing hardware if issues persist). - Important Safety Note: Modifying bus termination should only be done on an isolated test setup, not a production or critical CAN network.
- Set up your ESP32 in
7. Summary
- CAN incorporates robust error detection for Bit, Stuff, CRC, Form, and ACK errors.
- Detected errors lead to the transmission of Error Frames (Active or Passive) to alert other nodes.
- Transmit Error Counter (TEC) and Receive Error Counter (REC) track fault levels.
- Nodes transition between Error Active, Error Passive, and Bus-Off states based on TEC/REC values, implementing fault confinement.
- The ESP-IDF TWAI driver allows monitoring of these states and counters via
twai_get_status_info()
. - Error conditions can be detected using TWAI alerts (e.g.,
TWAI_ALERT_BUS_OFF
,TWAI_ALERT_BUS_ERROR
). - Bus-Off recovery is critical and can be initiated using
twai_initiate_recovery()
or by restarting the driver. - Persistent physical layer problems or configuration mismatches (like baud rates) are common sources of CAN errors.
8. Further Reading
- ESP-IDF TWAI API Reference:
- TWAI Driver Functions – Status and Alerts (Covers
twai_get_status_info
,twai_read_alerts
,twai_initiate_recovery
). - TWAI Data Types –
twai_status_info_t
,twai_state_t
, Alert flags
- TWAI Driver Functions – Status and Alerts (Covers
- Bosch CAN Specification (Version 2.0 Part A/B):
- The original specification provides the definitive details on error detection, error signaling, and fault confinement. Search for “Bosch CAN Specification.”
- Application Notes on CAN Error Handling:
- Many microcontroller vendors (e.g., Microchip, NXP, TI) publish detailed application notes on CAN error handling principles and best practices.