Chapter 24: Advanced Shell Scripting: Regular Expressions (grep
, sed
)
Chapter Objectives
By the end of this chapter, you will be able to:
- Understand the fundamental concepts and syntax of regular expressions (regex) for pattern matching.
- Implement
grep
to search for complex patterns within text files and command output. - Utilize
sed
(Stream Editor) to perform sophisticated in-place text transformations and substitutions. - Construct complex regular expression patterns using metacharacters, quantifiers, and character classes.
- Debug common regular expression errors and optimize patterns for efficiency in embedded environments.
- Apply these tools to solve real-world text processing problems in embedded Linux, such as log file analysis and script automation.
Introduction
In the world of embedded Linux, you will constantly interact with text. From parsing kernel boot logs and analyzing system messages to configuring network services and automating build scripts, the ability to manipulate text efficiently is not a luxury—it is a fundamental skill. While simple string matching can solve some problems, it quickly falls short when faced with the variable and complex data streams inherent in embedded systems. This is where the true power of the Linux command line comes to life through regular expressions.
A regular expression, often shortened to regex, is a special sequence of characters that defines a search pattern. It is a language unto itself, designed for describing patterns in text with incredible precision and flexibility. Imagine needing to find all IP addresses in a log file, extract specific sensor readings that fall within a certain range, or automatically modify a configuration file during a system update. These tasks, which would be complex and error-prone with traditional methods, become straightforward with the right pattern.
This chapter introduces two of the most powerful text processing utilities in the Linux ecosystem: grep
(Global Regular Expression Print) and sed
(Stream Editor). We will explore how grep
uses regular expressions to filter text and how sed
uses them to transform it. On your Raspberry Pi 5, these tools are indispensable for debugging, automation, and system management. By mastering them, you will transition from simply using the shell to commanding it, equipped to handle any text processing challenge an embedded system can present.
Technical Background
The Genesis and Philosophy of Regular Expressions
The concept of a regular expression has its roots in theoretical computer science, specifically in automata theory and formal language theory, which study mathematical models of computation. In the 1950s, mathematician Stephen Cole Kleene formalized the description of a “regular language,” a class of languages that can be recognized by a finite automaton. The notation he developed to describe these languages was the direct ancestor of the regular expressions we use today.
This theoretical foundation found its first major practical application in the early Unix text editors developed at Bell Labs in the late 1960s and 1970s. Ken Thompson, one of the creators of Unix, implemented regular expressions in the ed
editor, allowing users to search for patterns far more complex than simple fixed strings. This utility was so powerful that it was later extracted into a standalone program named grep
, a name derived from the ed
command g/re/p
(globally search for a regular expression and print the matching lines).
timeline title The Journey of a Regular Expression 1950s : Theoretical Foundations : Stephen Kleene formalizes "regular languages" in automata theory. : The mathematical notation for these languages is the direct ancestor of modern regex. 1960s-70s : Unix & The First Implementation : Ken Thompson implements regex in the 'ed' text editor on Unix at Bell Labs. : This provides powerful pattern matching beyond simple string searching for the first time. 1974 : The Birth of 'grep' : The regex search functionality is extracted from 'ed' into a standalone tool. : It is named 'grep' from the 'ed' command g/re/p (globally search for a regular expression and print). Present Day : Ubiquitous Tool : Regex is a core component in countless tools (grep, sed, awk), editors (Vim, VS Code), and programming languages (Python, Perl, JavaScript). : It embodies the Unix philosophy: a small, specialized, and powerful tool for text processing.
The philosophy behind these tools is central to the Unix/Linux design paradigm: create small, specialized programs that do one thing well and can be combined to perform complex tasks. grep
doesn’t edit files; it only searches. sed
is not a full-featured editor; it processes streams of text. By chaining these utilities together with pipes (|
), a developer can construct sophisticated text-processing pipelines without writing a single line of compiled code. This approach is particularly valuable in embedded systems, where resources may be limited and the ability to quickly analyze system behavior via the shell is critical for debugging.
Anatomy of a Regular Expression
A regular expression is a sequence of characters, but some characters have special meanings. These metacharacters are the building blocks of a pattern. The two main categories of characters in a regex are literals (standard text characters like ‘a’, ‘b’, ‘1’, ‘2’) and metacharacters.
The most basic regex consists only of literals. For example, the pattern error
will match the literal string “error” wherever it appears. The real power, however, comes from using metacharacters to define the structure of the pattern, not just its literal content.
Anchors: Tying Matches to Positions
Sometimes, you don’t just want to know if a pattern exists, but where it exists. Anchors are metacharacters that don’t match any character but instead match a position before, after, or between characters.
- The caret (
^
) matches the beginning of a line. The pattern^ERROR
will only match the word “ERROR” if it is the very first thing on a line, which is common for system log entries. - The dollar sign (
$
) matches the end of a line. The patterncomplete$
will only match the word “complete” if it is the last thing on a line, useful for finding processes that have finished successfully.
Character Classes and Sets: Matching One of Many
Often, you need to match any character from a specific set. Character classes are used for this purpose.
- The dot (
.
) is the simplest character class. It is a wildcard that matches any single character (except a newline). The patternc.t
would match “cat”, “cot”, and “c_t”, but not “ct” or “coat”. - Bracket expressions (
[]
) allow you to define a custom set of characters. The pattern[aeiou]
will match any single lowercase vowel. For instance,gr[ae]y
will match both “gray” and “grey”. - You can also specify a range of characters within brackets.
[0-9]
matches any single digit, and[a-zA-Z]
matches any single uppercase or lowercase letter. - A caret (
^
) as the first character inside a bracket expression negates the set.[^0-9]
matches any character that is not a digit.
Tip: Many common character sets have predefined names, known as POSIX character classes. For example,
[[:digit:]]
is equivalent to[0-9]
,[[:alpha:]]
is equivalent to[a-zA-Z]
, and[[:space:]]
matches whitespace characters. These are often more portable and readable than explicit ranges.
Quantifiers: Specifying Repetition
Quantifiers control how many times a preceding character or group can occur. They turn a simple pattern into a variable one.
- The asterisk (
*
) matches the preceding element zero or more times. The patterna*b
will match “b”, “ab”, “aaab”, and so on. This is often used for optional or repeating characters. - The plus sign (
+
) matches the preceding element one or more times. It is similar to*
, but requires at least one occurrence.a+b
matches “ab” and “aaab”, but not “b”. - The question mark (
?
) makes the preceding element optional, matching it zero or one time. The patterncolou?r
matches both “color” and “colour”. - Brace expressions (
{}
) provide precise control over the number of repetitions.{n}
matches the preceding element exactly n times.[0-9]{3}
matches exactly three digits.{n,}
matches the preceding element at least n times.{n,m}
matches the preceding element at least n times but no more than m times.

Grouping and Capturing
Parentheses ()
serve two purposes in regular expressions. First, they group parts of a pattern together, allowing you to apply a quantifier to an entire sub-pattern. For example, (ab)+
will match “ab”, “abab”, “ababab”, and so on. Without the parentheses, ab+
would match “ab”, “abb”, “abbb”, etc., as the +
would only apply to the b
.
Second, and more powerfully, they capture the portion of the string that matched the sub-pattern. This captured text can be referenced later, a feature that is the cornerstone of the sed
command’s substitution capability. These are known as backreferences. The first captured group is referenced by \1
, the second by \2
, and so on.
The grep
Utility: A Regex-Powered Filter
grep
is the quintessential tool for searching text with regular expressions. In its simplest form, you provide a pattern and a file, and grep
prints every line from the file that contains a match for the pattern.
The basic syntax is grep [OPTIONS] PATTERN [FILE...]
.
While simple string searches are useful, grep
‘s power is unlocked with regex. For example, to find all lines in /var/log/syslog
that indicate a USB device was connected, you might notice that these lines often contain the string “usb” followed by a bus-device number like “1-1.2”. A regex can capture this pattern precisely:
grep "usb [0-9]-[0-9]\.[0-9]" /var/log/syslog
This command searches for the literal string “usb “, followed by a digit, a hyphen, another digit, a literal dot (which must be escaped with a backslash \.
because .
is a metacharacter), and a final digit.
grep
has several command-line options that are invaluable in embedded development:
The sed
Utility: The Stream Editor
While grep
is for finding text, sed
is for changing it. sed
reads text from a file or a pipe, applies a series of commands to each line, and prints the result to standard output. It does not modify the original file unless you explicitly tell it to with the -i
(in-place) option.
The most common sed
command is substitution, which uses a regex to find text and replaces it with something else. The syntax is s/REGEX/REPLACEMENT/FLAGS
.
Imagine you have a configuration file where you need to change a hostname from old-host
to new-pi-5
. A simple sed
command can do this:
sed 's/old-host/new-pi-5/' config.txt
This command will print the entire contents of config.txt
, but with the first occurrence of “old-host” on each line replaced. To replace all occurrences on a line, you add the g
(global) flag:
sed 's/old-host/new-pi-5/g' config.txt
The true power of sed
emerges when you combine its substitution command with captured groups from a regex. Suppose you have log entries formatted like [TIMESTAMP] LEVEL: Message
. You want to reformat them to LEVEL (TIMESTAMP): Message
. You can use capturing parentheses in the regex and backreferences in the replacement string.
graph TD A[Start] --> B["Input<br>(File or stdin)"]; B --> C{"Read one line into<br><i>pattern space</i>"}; C --> D{Apply sed script<br>e.g., 's/find/replace/'}; D --> E{"Condition Met?<br>(Pattern found, address matches, etc.)"}; E -- Yes --> F[Modify pattern space]; E -- No --> G; F --> G["Print pattern space to stdout<br>(unless suppressed by -n)"]; G --> H{"End of input?"}; H -- No --> C; H -- Yes --> I[End]; style A fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff style B fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff style C fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff style D fill:#8b5cf6,stroke:#8b5cf6,stroke-width:1px,color:#ffffff style E fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff style F fill:#eab308,stroke:#eab308,stroke-width:1px,color:#1f2937 style G fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff style H fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff style I fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff
sed 's/^\[\([0-9.]*\)\] \([A-Z]*\): \(.*\)$/\2 (\1): \3/' app.log
Let’s break down this powerful command:
s/.../.../
: The substitution command.^\[\([0-9.]*\)\]
: This is the search pattern.^
: Matches the start of the line.\[
: Matches a literal opening bracket.\([0-9.]*\)
: This is the first capturing group (\1
). It matches and captures any sequence of digits and dots (the timestamp).\]
: Matches the closing bracket and a space.\([A-Z]*\)
: This is the second capturing group (\2
). It matches and captures a sequence of uppercase letters (the log level).:
: Matches the colon and space.\(.*\)
: This is the third capturing group (\3
). It matches and captures the rest of the line (the message).$
: Matches the end of the line.
\2 (\1): \3
: This is the replacement string.\2
: Inserts the content of the second captured group (the level).(\1)
: Inserts a space, an opening parenthesis, the content of the first captured group (the timestamp), and a closing parenthesis.: \3
: Inserts a colon, a space, and the content of the third captured group (the message).
sed
can do much more than substitution. It can delete lines (d
), print specific lines (p
), and execute commands based on address ranges (e.g., apply a command only to lines 5 through 10). This makes it a versatile tool for scripting automated changes to system files.
sequenceDiagram participant Stream as "Input Stream<br>(app.log)" participant SED as "sed Engine" participant Regex as "Regex Engine" participant Output as "Output Stream<br>(stdout)" rect rgba(248, 250, 252, 0.5) Note over Stream, SED: sed reads one line from the input stream. Stream->>SED: "[1625781600.123] ERROR: File not found" end rect rgba(248, 250, 252, 0.5) Note over SED, Regex: sed passes the line and pattern to the Regex Engine. SED->>Regex: Pattern: s/^\[\([0-9.]*\)\] \([A-Z]*\): \(.*\)$/... Regex-->>Regex: Match Attempt: Regex-->>Regex: 1. `^\[` matches "[". Regex-->>Regex: 2. `\([0-9.]*\)` matches and captures "1625781600.123" -> \1 Regex-->>Regex: 3. `\] ` matches "] ". Regex-->>Regex: 4. `\([A-Z]*\)` matches and captures "ERROR" -> \2 Regex-->>Regex: 5. `: ` matches ": ". Regex-->>Regex: 6. `\(.*\)$` matches and captures "File not found" -> \3 end rect rgba(248, 250, 252, 0.5) Note over Regex, SED: Regex Engine reports a successful match and returns the captured groups. Regex->>SED: Success!<br>Group \1: "1625781600.123"<br>Group \2: "ERROR"<br>Group \3: "File not found" end rect rgba(248, 250, 252, 0.5) Note over SED, Output: sed constructs the new string using the replacement pattern and backreferences. SED->>SED: Replacement: \2 (\1): \3 SED->>SED: Result: "ERROR (1625781600.123): File not found" SED->>Output: "ERROR (1625781600.123): File not found" end Note over Stream, Output: Process repeats for every line in the input stream.
Practical Examples
These examples assume you are logged into your Raspberry Pi 5 via SSH or using its terminal directly. The principles apply to any Linux system.
Example 1: Analyzing Kernel Boot Messages with grep
During boot, the Linux kernel prints thousands of messages to a buffer. You can view these with the dmesg
command. This firehose of information can be overwhelming, but grep
allows us to find exactly what we need.
Objective: Find all messages related to the network interface (eth0
) and extract the assigned IP address.
1. Initial Exploration:
First, let’s look at the raw output for eth0 to understand its structure.
dmesg | grep "eth0"
You might see output like this (timestamps and details will vary):
[ 1.234567] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 10.987654] bcmgenet fd580000.ethernet eth0: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 11.123123] bcmgenet fd580000.ethernet eth0: Link is Down
[ 13.456456] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
This is useful, but we want to find the IP address, which is usually assigned later by a service like dhcpcd
. Let’s search the system log instead.
2. Refining the Search with Extended Regex:
The system log (/var/log/syslog or accessible via journalctl) contains more detailed service information. We are looking for a line that says something like “eth0: leased 192.168.1.100 for 86400 seconds”. The IP address is a sequence of numbers and dots.
We will use grep -E
for extended regular expressions to make the pattern cleaner. An IP address consists of four numbers (from 1 to 3 digits each) separated by dots.
# Search syslog for lines about eth0 leasing an IP address
grep -E "eth0: leased [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /var/log/syslog
Explanation:
"eth0: leased "
is a literal string.[0-9]{1,3}
matches any digit from 0 to 9, repeated between 1 and 3 times.\.
matches a literal dot. We must escape it.
This command should return the specific line containing the IP address:
… dhcpcd[…]: eth0: leased 192.168.1.123 for 86400 seconds
3. Extracting the IP Address with -o:
We don’t need the whole line, just the IP address itself. The -o option is perfect for this. We can refine our regex to match only the IP address on the line we found.
# First, find the line, then pipe it to another grep to extract the IP
grep -E "eth0: leased" /var/log/syslog | grep -oE "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
This pipeline first selects the correct line and then the second grep
command extracts only the part of that line that matches the IP address pattern. The output will be clean:
192.168.1.123
This demonstrates how a precise regex combined with grep
options can turn a log file into a structured data source.
Example 2: Batch Renaming Files with sed
and mv
Imagine you have a set of sensor data files named with a YYYY-MM-DD
timestamp, like sensorlog_2025-07-08.txt
. You need to rename them to the DD-MM-YYYY
format, e.g., sensorlog_08-07-2025.txt
.
Objective: Rename files by reordering date components using sed
‘s capturing groups.
1. Create Sample Files:
First, let’s create some dummy files to work with.
touch sensorlog_2025-07-08.txt
touch sensorlog_2025-07-09.txt
touch sensorlog_2025-07-10.txt
ls
Output:
sensorlog_2025-07-08.txt sensorlog_2025-07-09.txt sensorlog_2025-07-10.txt
2. Construct the sed Command for Renaming:
We will loop through the files, use sed to generate the new filename, and then use mv to perform the rename. The key is the sed command that rearranges the date.
for filename in sensorlog_*.txt; do
# Use sed with capturing groups to generate the new name
# Regex: sensorlog_ (YYYY) - (MM) - (DD) .txt
# Capturing: (\1) (\2) (\3)
# Replacement: sensorlog_ \3 - \2 - \1 .txt
new_filename=$(echo "$filename" | sed -E 's/sensorlog_([0-9]{4})-([0-9]{2})-([0-9]{2})\.txt/sensorlog_\3-\2-\1.txt/')
# Echo the command first to ensure it's correct
echo mv "$filename" "$new_filename"
done
Explanation of the sed
command:
sed -E
: Use extended regex.'s/.../.../'
: The substitution command.sensorlog_
: Literal text.([0-9]{4})
: Capture group 1 (\1
): The 4-digit year.-
: Literal hyphen.([0-9]{2})
: Capture group 2 (\2
): The 2-digit month.-
: Literal hyphen.([0-9]{2})
: Capture group 3 (\3
): The 2-digit day.\.txt
: Literal “.txt”.sensorlog_\3-\2-\1.txt
: The replacement string, using backreferences to reorder the captured date parts.
graph TD subgraph "Batch Rename Script" A[Start] --> B{"Loop for each<br><i>sensorlog_*.txt</i> file"}; B -- "File found" --> C["Get current filename<br>e.g., sensorlog_2025-07-08.txt"]; C --> D{{"Generate new filename<br>using <i>sed</i> command"}}; style A fill:#1e3a8a,stroke:#1e3a8a,stroke-width:2px,color:#ffffff style B fill:#8b5cf6,stroke:#8b5cf6,stroke-width:1px,color:#ffffff style C fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff style D fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff style E fill:#f59e0b,stroke:#f59e0b,stroke-width:1px,color:#ffffff style F fill:#ef4444,stroke:#ef4444,stroke-width:1px,color:#ffffff style G fill:#0d9488,stroke:#0d9488,stroke-width:1px,color:#ffffff style H fill:#10b981,stroke:#10b981,stroke-width:2px,color:#ffffff D --> E{"Dry Run?<br>(using <i>echo</i>)"}; E -- "Yes (Recommended)" --> F["<b>Display Command:</b><br><i>echo mv old_name new_name</i>"]; E -- "No (Execute)" --> G["<b>Execute Command:</b><br><i>mv old_name new_name</i>"]; F --> B; G --> B; B -- "No more files" --> H[End]; end subgraph "sed Transformation" D1["<b>Input:</b><br>sensorlog_2025-07-08.txt"] D2["<b>Pattern:</b><br>s/sensorlog_([0-9]{4})-([0-9]{2})-([0-9]{2}).txt/.../"] D3["<b>Captures:</b><br>\\1 = '2025'<br>\\2 = '07'<br>\\3 = '08'"] D4["<b>Replacement:</b><br>sensorlog_\\3-\\2-\\1.txt"] D5["<b>Output:</b><br>sensorlog_08-07-2025.txt"] D1 --> D2 --> D3 --> D4 --> D5 end
3. Dry Run and Execution:
Running the script above will produce a “dry run” output, showing you the commands it would execute:
mv sensorlog_2025-07-08.txt sensorlog_08-07-2025.txt
mv sensorlog_2025-07-09.txt sensorlog_09-07-2025.txt
mv sensorlog_2025-07-10.txt sensorlog_10-07-2025.txt
Warning: Always perform a dry run using
echo
before executing commands that modify or delete files, especially when scripting. A small regex error could have unintended consequences.
Once you are confident the commands are correct, remove the echo
to perform the actual renaming.
for filename in sensorlog_*.txt; do
new_filename=$(echo "$filename" | sed -E 's/sensorlog_([0-9]{4})-([0-9]{2})-([0-9]{2})\.txt/sensorlog_\3-\2-\1.txt/')
mv "$filename" "$new_filename"
done
ls
Output:
sensorlog_08-07-2025.txt sensorlog_09-07-2025.txt sensorlog_10-07-2025.txt
Example 3: Modifying a Configuration File In-Place
Let’s say we have a simple application configuration file, app.conf
, and we need to update the SERVER_IP
and enable a DEBUG_MODE
setting for a test deployment.
Objective: Use sed
to modify configuration values directly in the file.
1. Create the Configuration File:
cat << EOF > app.conf
# Application Configuration
SERVER_IP=192.168.1.100
SERVER_PORT=8080
# Set to true to enable verbose logging
DEBUG_MODE=false
EOF
2. Modify the SERVER_IP:
We want to find the line starting with SERVER_IP= and replace whatever follows with the new IP address.
# Use the -i flag to edit the file in-place.
# The regex matches the start of the line, the key, and discards the old value.
sed -i -E 's/^SERVER_IP=.*/SERVER_IP=10.0.0.42/' app.conf
Explanation:
-i
: This is the crucial in-place edit flag. It tellssed
to write the changes back to the original file.^SERVER_IP=
: This anchor ensures we only match the line that begins withSERVER_IP=
, preventing accidental substitution if that string appeared elsewhere..*
: This matches the rest of the line (any character, zero or more times), effectively targeting the old value for replacement.
3. Modify the DEBUG_MODE:
Similarly, we can change the debug flag.
sed -i -E 's/^DEBUG_MODE=.*/DEBUG_MODE=true/' app.conf
4. Verify the Changes:
Now, view the file to confirm the modifications were successful.
cat app.conf
Output:
# Application Configuration
SERVER_IP=10.0.0.42
SERVER_PORT=8080
# Set to true to enable verbose logging
DEBUG_MODE=true
This technique is extremely common in deployment scripts for configuring an application on an embedded device without manual intervention.
Common Mistakes & Troubleshooting
Regular expressions are incredibly powerful, but their syntax is dense and can be unforgiving. Here are some common pitfalls and how to avoid them.
Exercises
- Filtering
dmesg
for USB Devices.- Objective: List all USB devices detected during boot, showing only the manufacturer and product name.
- Guidance:
- Run
dmesg | grep -i "usb"
. Look for lines containing “Product:” and “Manufacturer:”. - Create a
grep
pipeline. The firstgrep
should find lines with “Product:” or “Manufacturer:”. Use the|
operator within an ERE pattern for this (grep -E 'Product|Manufacturer'
). - Pipe the output of that command to a second
grep -oE
to extract just the text after “Product: ” or “Manufacturer: “. The pattern should match the remaining part of the line.
- Run
- Verification: The final output should be a clean list of names, without the “Product:” or “Manufacturer:” prefixes.
- Extracting Temperature Data.
- Objective: The Raspberry Pi 5’s temperature can be read from
/sys/class/thermal/thermal_zone0/temp
. The output is a number like45821
, which represents 45.821°C. Write a script that reads this file and prints “Current temperature: 45.8 C”. - Guidance:
- Use
cat
to read the file. - Pipe the output to a
sed
command. - Use capturing groups to capture the first two digits (
\1
) and the next three (\2
). The regex might look something like^([0-9]{2})([0-9]{3})$
. - The
sed
replacement string should format the output as"Current temperature: \1.\2 C"
. You can wrap this in anecho
statement.
- Use
- Verification: Running your script should produce a single, well-formatted line of text with the current CPU temperature.
- Objective: The Raspberry Pi 5’s temperature can be read from
- Commenting Out Debug Lines in a Script.
- Objective: You have a shell script with debug statements prefixed with
echo "DEBUG:"
. Create ased
command to comment out all these lines by adding a#
at the beginning. - Guidance:
- Create a sample script file (
test_script.sh
) with a few normalecho
lines and a fewecho "DEBUG:"
lines. - Use
sed -i
to perform an in-place edit. - Your regex should anchor to the beginning of the line (
^
) and look for the literal stringecho "DEBUG:"
. - The replacement string should use
&
, which is a specialsed
character that represents the entire matched pattern. The replacement will be"#&"
, which puts a#
before the original matched line.
- Create a sample script file (
- Verification: After running the command,
cat test_script.sh
should show the debug lines commented out.
- Objective: You have a shell script with debug statements prefixed with
- Validating MAC Addresses.
- Objective: Write a
grep
command that can validate if a given string is a valid MAC address (six pairs of hexadecimal characters separated by colons, e.g.,b8:27:eb:12:34:56
). - Guidance:
- Use
echo "some_string" | grep -E ...
to test your pattern. - A hexadecimal character can be represented by the character class
[0-9a-fA-F]
. - The pattern for one pair would be
[0-9a-fA-F]{2}
. The pattern for the full address would be five groups of a hex pair followed by a colon, and one final hex pair. - Use grouping
(...)
and a quantifier{5}
to repeat the first part of the pattern. The full regex will be quite long but very precise. Anchor it with^
and$
to ensure the entire line must match the pattern.
- Use
- Verification: Test your command with valid and invalid MAC addresses (e.g., with incorrect characters like ‘G’, wrong number of pairs, or wrong separators). The command should only produce output for the valid ones.
- Objective: Write a
Summary
- Regular Expressions (Regex) are a powerful language for defining text patterns, forming the basis of many advanced command-line tools.
grep
is the primary tool for searching text. It filters input line by line, printing only the lines that match a given regular expression.sed
(Stream Editor) is used for transforming text. Its most common function is substitution (s/find/replace/
), which can use captured groups from a regex for complex reformatting.- Metacharacters like
^
,$
,.
,*
,+
,?
,[]
, and()
provide the flexibility to define complex and variable patterns rather than just literal strings. - Capturing Groups
()
and Backreferences\1
,\2
are key for reordering and reusing parts of a matched string, especially insed
. - BRE vs. ERE: Using the
-E
flag withgrep
andsed
enables Extended Regular Expressions, which offer a cleaner and more powerful syntax. - Text processing is a critical skill in embedded Linux for log analysis, system configuration, and automation. Mastering
grep
andsed
is a significant step toward becoming a proficient embedded systems developer.
Further Reading
- The GNU Grep Manual: The official and most authoritative source for
grep
‘s features and options. - The GNU Sed Manual: The official documentation for the
sed
stream editor. - Regular-Expressions.info: An extremely comprehensive and well-regarded website covering all aspects of regular expressions across different flavors and tools.
- “The Art of Command Line” by Joshua Levy: A curated collection of notes and tips for using the command line effectively. It contains excellent sections on text processing.
- POSIX Basic and Extended Regular Expressions Standard: The official IEEE standard that defines the behavior of BRE and ERE, for those interested in the formal specifications.
- DigitalOcean Community Tutorials: A source of high-quality, practical tutorials on Linux tools, including many on
grep
andsed
. - Ryan’s Tutorials: A website with clear, visual explanations of many Linux commands and concepts.