Programming with Python | Chapter 15: Iterators and Generators
Chapter Objectives
- Understand Python‘s iteration protocol involving the
__iter__()
and__next__()
methods. - Differentiate between iterables (objects that can produce an iterator) and iterators (objects that produce values one at a time).
- Understand how
for
loops work under the hood using iterators. - Learn what generators are and how they simplify the creation of iterators.
- Use the
yield
keyword to create generator functions. - Understand how generator functions maintain state between calls.
- Create concise generator expressions.
- Appreciate the memory efficiency benefits of using generators, especially for large sequences.
Introduction
We’ve frequently used for
loops to iterate over sequences like lists, tuples, strings, dictionaries, and sets. But how does the for
loop actually work? Python relies on the iteration protocol, which involves special methods (__iter__
and __next__
) that define how iteration should proceed. Objects that can be iterated over are called iterables, and the objects that actually produce the values during iteration are called iterators. This chapter delves into this protocol, explaining iterables and iterators. We will then introduce generators, a powerful and concise way to create iterators using functions with the yield
keyword or through generator expressions. Generators are particularly useful for creating sequences lazily, meaning they produce values only when needed, making them highly memory-efficient for large datasets.
Theory & Explanation
The Iteration Protocol
At its core, iteration in Python relies on two special methods:
__iter__(self)
: This method should be implemented by an iterable object (like a list, string, dictionary, etc.). When called (e.g., by afor
loop or theiter()
built-in function), it must return an iterator object.__next__(self)
: This method must be implemented by an iterator object. When called (e.g., by afor
loop or thenext()
built-in function), it should return the next item in the sequence. When there are no more items, it must raise theStopIteration
exception.
Iterables vs. Iterators
- Iterable: An object capable of returning its members one at a time. Examples include lists, tuples, strings, dictionaries, sets, files, and objects of any class that implements the
__iter__()
method (or__getitem__()
for sequence-like behavior). An iterable essentially “knows” how to produce an iterator for itself. - Iterator: An object representing a stream of data. It produces the next value in the stream when you call
next()
on it. It maintains its current state (which item is next). An iterator must implement the__next__()
method and usually also implements__iter__(self)
which simply returnsself
(as the iterator is its own iterator).
How for loops work:
flowchart TD A[Start: for item in my_iterable:] --> B{"Call iter(my_iterable)"}; B --> C[Get iterator_object]; C --> D{"Call next(iterator_object)"}; D --> E{Item available?}; E -- Yes --> F[Assign item to item variable]; F --> G[Execute loop body]; G --> D; E -- No (StopIteration raised) --> H[End Loop]; style A fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px style H fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px style B fill:#F3F4F6,stroke:#6B7280 style D fill:#F3F4F6,stroke:#6B7280 style E fill:#FEF9C3,stroke:#CA8A04
When you write for item in my_iterable:
, Python internally does something like this:
- Calls
iter(my_iterable)
which, in turn, callsmy_iterable.__iter__()
to get an iterator object. - Enters a loop.
- Calls
next(iterator_object)
which, in turn, callsiterator_object.__next__()
to get the next item. - Assigns the returned item to the loop variable (
item
). - Executes the loop body.
- Repeats steps 3-5 until
iterator_object.__next__()
raises aStopIteration
exception. - The loop catches
StopIteration
and terminates gracefully.
my_list = [1, 2, 3]
# Get an iterator from the list (iterable)
my_iterator = iter(my_list) # Calls my_list.__iter__()
print(type(my_list)) # <class 'list'> (Iterable)
print(type(my_iterator)) # <class 'list_iterator'> (Iterator)
# Manually call next() on the iterator
print(next(my_iterator)) # Output: 1 (Calls my_iterator.__next__())
print(next(my_iterator)) # Output: 2
print(next(my_iterator)) # Output: 3
# Calling next() again will raise StopIteration
# print(next(my_iterator)) # Raises StopIteration exception
Generators: A Simpler Way to Create Iterators
Manually creating a class with __iter__
and __next__
to implement an iterator can be cumbersome. Generators provide a much simpler syntax using functions and the yield
keyword.
Generator Functions (yield):
A function becomes a generator function if it contains one or more yield
statements.
- When a generator function is called, it doesn’t execute the function body immediately. Instead, it returns a generator object, which is a type of iterator.
- When
next()
is called on the generator object for the first time, the function executes from the beginning until it hits ayield
statement. - The value specified after
yield
is returned bynext()
. - Crucially, the function’s execution state (local variables, instruction pointer) is paused at the
yield
statement. - Subsequent calls to
next()
resume execution immediately after the lastyield
statement, continuing until the nextyield
is encountered or the function terminates. - If the function terminates (e.g., reaches the end or a
return
statement), the generator object automatically raisesStopIteration
on the nextnext()
call.
def count_up_to(max_val):
"""A generator function that yields numbers from 1 up to max_val."""
print("Generator started...")
i = 1
while i <= max_val:
print(f"Yielding {i}")
yield i # Pauses here, returns i, remembers state (value of i)
i += 1
print("Generator finished.")
# Implicit StopIteration raised after this
# Call the generator function - returns a generator object
counter_gen = count_up_to(3)
print(f"Generator object created: {counter_gen}")
# Iterate using next()
print("Calling next() the first time:")
val1 = next(counter_gen) # Executes until the first yield
print(f"Received: {val1}")
print("\nCalling next() the second time:")
val2 = next(counter_gen) # Resumes after first yield, executes until second yield
print(f"Received: {val2}")
print("\nCalling next() the third time:")
val3 = next(counter_gen) # Resumes after second yield, executes until third yield
print(f"Received: {val3}")
print("\nCalling next() the fourth time:")
try:
next(counter_gen) # Resumes after third yield, finishes function, raises StopIteration
except StopIteration:
print("StopIteration caught, as expected.")
# Using a for loop (more common) - handles StopIteration automatically
print("\nUsing a for loop:")
for number in count_up_to(4): # Creates a new generator object implicitly
print(f"For loop received: {number}")
Benefits of Generators:
- Memory Efficiency: Generators produce values one at a time and only when requested (
lazy evaluation
). They don’t store the entire sequence in memory, making them ideal for very large or potentially infinite sequences where creating a list would be impossible or inefficient. - Simplicity: Writing a generator function with
yield
is often much simpler and more readable than creating a custom iterator class. - Composability: Generators can be chained together easily to create data processing pipelines.
flowchart TD subgraph Generator Function Call A["Call my_gen_func(...)"] --> B["Return generator_object (Iterator)"]; B --> C(State: Ready to start); end subgraph Iteration Loop D{"Call next(generator_object)"} --> E{Resume/Start function execution}; E --> F{"Execute until yield value"}; F --> G["Pause function state"]; G --> H{"Return value from next()"}; H --> I(State: Paused after yield); I -- Subsequent call --> D; F --> J{"Function ends <br>return</br>"}; J --> K["Raise StopIteration"]; K --> L(State: Exhausted); end C --> D; style A fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px style L fill:#FECACA,stroke:#DC2626,stroke-width:2px style F fill:#A7F3D0,stroke:#047857 style J fill:#A7F3D0,stroke:#047857 style G fill:#FEF9C3,stroke:#CA8A04 style I fill:#FEF9C3,stroke:#CA8A04
Generator Expressions
Similar to list comprehensions, you can create simple generators on-the-fly using generator expressions. They look like list comprehensions but use parentheses ()
instead of square brackets []
.
Syntax:
(expression for item in iterable if condition)
Generator expressions also produce values lazily, just like generator functions.
# List comprehension (builds the full list in memory)
squares_list = [x*x for x in range(5)]
print(f"List comprehension: {squares_list}") # Output: [0, 1, 4, 9, 16]
print(type(squares_list)) # <class 'list'>
# Generator expression (creates a generator object, values produced on demand)
squares_gen = (x*x for x in range(5))
print(f"Generator expression: {squares_gen}") # Output: <generator object <genexpr> at 0x...>
print(type(squares_gen)) # <class 'generator'>
# Iterate over the generator expression
print("Iterating over generator expression:")
for square in squares_gen:
print(square) # Output: 0, 1, 4, 9, 16 (one per line)
# Once consumed, a generator is exhausted
print("Trying to iterate again:")
for square in squares_gen:
print(square) # No output, generator is empty now
Code Examples
Example 1: Custom Iterator Class (Fibonacci Sequence)
# fibonacci_iterator.py
class FibonacciIterator:
"""An iterator for the Fibonacci sequence up to a max value."""
def __init__(self, max_value):
self._max_value = max_value
self._a = 0
self._b = 1
def __iter__(self):
# The iterator object is itself iterable
return self
def __next__(self):
# Calculate the next Fibonacci number
fib = self._a
if fib > self._max_value:
# No more items to produce
raise StopIteration
# Update state for the *next* call
self._a, self._b = self._b, self._a + self._b
return fib # Return the current number
# Using the custom iterator
print("Fibonacci sequence up to 50 (using custom iterator):")
fib_iter = FibonacciIterator(50)
for num in fib_iter:
print(num, end=" ") # Output: 0 1 1 2 3 5 8 13 21 34
print("\n")
Explanation:
- The
FibonacciIterator
class implements both__iter__
(returningself
) and__next__
. __init__
sets the maximum value and initializes the first two sequence numbers (_a
,_b
).__next__
calculates the current Fibonacci number (fib
), checks if it exceeds the maximum (raisingStopIteration
if it does), updates the state (_a
,_b
) for the next call, and returns the current number.
Example 2: Generator Function (Fibonacci Sequence)
# fibonacci_generator.py
def fibonacci_generator(max_value):
"""A generator function for the Fibonacci sequence up to a max value."""
a, b = 0, 1
print("Fibonacci generator started...")
while a <= max_value:
yield a # Yield the current number, pause state
a, b = b, a + b # Update state for next iteration
print("Fibonacci generator finished.")
# Using the generator function
print("Fibonacci sequence up to 50 (using generator):")
fib_gen = fibonacci_generator(50) # Creates the generator object
for num in fib_gen: # Iteration pulls values via yield
print(num, end=" ") # Output: 0 1 1 2 3 5 8 13 21 34
print("\n")
Explanation:
- This achieves the same result as Example 1 but with much less code.
- The
fibonacci_generator
function uses awhile
loop andyield a
. - The state (
a
,b
) is automatically saved and restored betweenyield
calls. StopIteration
is raised automatically when thewhile
loop condition becomes false and the function ends.
Example 3: Generator Expression for File Processing
# process_log_lazy.py
import os
# Create a dummy log file
log_filename = "app.log"
try:
with open(log_filename, "w") as f:
f.write("INFO: Application started\n")
f.write("DEBUG: Connecting to database\n")
f.write("ERROR: Failed to load resource X\n")
f.write("INFO: User logged in\n")
f.write("WARNING: Disk space low\n")
f.write("ERROR: Connection timed out\n")
except IOError:
print("Error creating dummy log file.")
# Use a generator expression to find error lines without loading the whole file
try:
with open(log_filename, "r") as f:
# Generator expression: creates a generator object
# Processes lines only as requested by the loop
error_lines_gen = (line.strip() for line in f if line.startswith("ERROR"))
print("--- Error lines found (processed lazily) ---")
for error_line in error_lines_gen: # Pulls lines from the generator
print(error_line)
except FileNotFoundError:
print(f"Error: Log file '{log_filename}' not found.")
except IOError as e:
print(f"Error reading log file: {e}")
finally:
# Clean up dummy file
if os.path.exists(log_filename):
# os.remove(log_filename) # Uncomment to remove after run
pass
Explanation:
- A dummy log file is created.
error_lines_gen = (line.strip() for line in f if line.startswith("ERROR"))
creates a generator expression.- This expression doesn’t read the whole file immediately. It yields lines starting with “ERROR” one by one
only when the for loop asks for the next item
. - This is very memory-efficient for large log files, as only one line (plus the generator’s internal state) needs to be in memory at a time during the filtering process.
Common Mistakes or Pitfalls
- Confusing Iterables and Iterators: Trying to call
next()
on an iterable directly (e.g.,next(my_list)
) instead of first getting an iterator usingiter(my_list)
. - Exhausting Iterators/Generators: Forgetting that iterators and generators can typically only be iterated over once. After the first full iteration, they are empty. If you need to iterate multiple times, you usually need to get a new iterator (
iter(iterable)
) or recreate the generator object by calling the generator function again or re-evaluating the generator expression. yield
vs.return
in Generators: Usingreturn value
in a generator function will cause it to raiseStopIteration
immediately after returning that value (or just raiseStopIteration
if no value is returned).yield
is used to produce values for the iteration sequence while pausing execution.- Overusing Generators: While memory-efficient, if you do need the entire sequence readily available (e.g., to sort it, access elements by index randomly), creating a list might be more appropriate than using a generator.
Chapter Summary
Concept | Key Method(s) / Syntax | Description |
---|---|---|
Iteration Protocol | __iter__() , __next__() |
The mechanism Python uses for iteration. Requires an iterable object and an iterator object. |
Iterable | __iter__() (or __getitem__() ) |
An object that can produce an iterator (e.g., lists, strings, dicts, files). It knows how to start an iteration. |
Iterator | __next__() , __iter__() (returns self) |
An object that produces the next value in a sequence when next() is called. Maintains state. Raises StopIteration when exhausted. |
for loop |
for item in iterable: |
Automatically uses the iteration protocol: calls iter() on the iterable, then repeatedly calls next() on the iterator until StopIteration occurs. |
Generator Function | def func(): ... yield value |
A function containing yield . When called, returns a generator object (an iterator). Execution pauses at yield and resumes on the next next() call. |
yield Keyword |
yield expression |
Used in generator functions. Produces a value for the iterator and pauses the function’s execution, saving its state. |
Generator Expression | (expr for item in iterable if cond) |
A concise, memory-efficient way to create a generator object using syntax similar to list comprehensions (but with parentheses). Evaluated lazily. |
Lazy Evaluation / Memory Efficiency | N/A (Benefit) | Generators produce items one at a time, only when requested. They don’t store the entire sequence in memory, making them ideal for large datasets. |
Iterator Exhaustion | N/A (Behavior) | Once an iterator (including generators) has been fully consumed (raised StopIteration ), it cannot be reused. A new iterator must be created. |
- Iteration in Python uses the iteration protocol:
__iter__()
(on iterables) returns an iterator, and__next__()
(on iterators) returns the next item or raisesStopIteration
. - Iterables (e.g., lists, strings) can produce iterators. Iterators produce values sequentially.
for
loops automatically handle the iteration protocol.- Generators are a simple way to create iterators.
- Generator functions use the
yield
keyword to produce values and pause execution state. Calling a generator function returns a generator object (an iterator). - Generator expressions (
(expr for item in iterable)
) provide a concise syntax for creating generators, similar to list comprehensions but using parentheses. - Generators offer significant memory efficiency by producing values lazily (on demand), making them suitable for large or infinite sequences.
Exercises & Mini Projects
Exercises
- Manual Iteration: Create a list
numbers = [10, 20, 30]
. Get an iterator from the list usingiter()
. Usenext()
three times to print each element. Try callingnext()
a fourth time and wrap it in atry...except StopIteration
block to catch the expected exception. - Simple Generator Function: Write a generator function
even_numbers(limit)
that yields even numbers starting from 0 up to (but not including)limit
. Use afor
loop to iterate through the generator created byeven_numbers(10)
and print each number. - Generator Expression: Use a generator expression to create a generator that yields the squares of numbers from 1 to 10. Iterate through the generator using a
for
loop and print each square. - Infinite Generator (Careful!): Write a generator function
count_forever(start=0)
that yields numbers starting fromstart
and increments indefinitely (usingwhile True
). Use afor
loop to print the first 10 numbers produced by this generator (use a counter andbreak
in your loop to stop it). - File Line Generator: Write a generator function
read_lines_gen(filepath)
that takes a file path, opens the file, and yields each line one by one (stripping whitespace). Use this generator in afor
loop to print the lines of a text file you create. Make sure to handleFileNotFoundError
.
Mini Project: Generating Random Walk Coordinates
Goal: Create a generator that yields coordinates for a simple 2D random walk.
Steps:
Import random:
You’ll needrandom.choice()
orrandom.randint()
.Define the Generator random_walk_2d(steps):
- This function should take the number of
steps
as an argument. - Initialize coordinates
x, y = 0, 0
. - Use a
for
loop to iteratesteps
times (e.g.,for _ in range(steps):
). - Inside the loop:
- Yield the current coordinates
(x, y)
. - Randomly choose a direction: e.g., generate a random integer from 0 to 3 (or use
random.choice(['N', 'S', 'E', 'W'])
). - Update
x
ory
based on the chosen direction (e.g., if direction is 0/’N’, increment y; if 1/’S’, decrement y; if 2/’E’, increment x; if 3/’W’, decrement x).
- Yield the current coordinates
- This function should take the number of
- Use the Generator:
- Call
random_walk_2d(100)
to create a generator object for a 100-step walk. - Use a
for
loop to iterate through the generator. - Inside the loop, print the
(x, y)
coordinates yielded at each step.
- Call
(Optional Enhancement): Modify the generator to also yield the step number along with the coordinates, e.g., yield step_number, (x, y)
.
Additional Sources: