Programming with Python | Chapter 15: Iterators and Generators

Chapter Objectives

  • Understand Python‘s iteration protocol involving the __iter__() and __next__() methods.
  • Differentiate between iterables (objects that can produce an iterator) and iterators (objects that produce values one at a time).
  • Understand how for loops work under the hood using iterators.
  • Learn what generators are and how they simplify the creation of iterators.
  • Use the yield keyword to create generator functions.
  • Understand how generator functions maintain state between calls.
  • Create concise generator expressions.
  • Appreciate the memory efficiency benefits of using generators, especially for large sequences.

Introduction

We’ve frequently used for loops to iterate over sequences like lists, tuples, strings, dictionaries, and sets. But how does the for loop actually work? Python relies on the iteration protocol, which involves special methods (__iter__ and __next__) that define how iteration should proceed. Objects that can be iterated over are called iterables, and the objects that actually produce the values during iteration are called iterators. This chapter delves into this protocol, explaining iterables and iterators. We will then introduce generators, a powerful and concise way to create iterators using functions with the yield keyword or through generator expressions. Generators are particularly useful for creating sequences lazily, meaning they produce values only when needed, making them highly memory-efficient for large datasets.

Theory & Explanation

The Iteration Protocol

At its core, iteration in Python relies on two special methods:

  1. __iter__(self): This method should be implemented by an iterable object (like a list, string, dictionary, etc.). When called (e.g., by a for loop or the iter() built-in function), it must return an iterator object.
  2. __next__(self): This method must be implemented by an iterator object. When called (e.g., by a for loop or the next() built-in function), it should return the next item in the sequence. When there are no more items, it must raise the StopIteration exception.

Iterables vs. Iterators

  • Iterable: An object capable of returning its members one at a time. Examples include lists, tuples, strings, dictionaries, sets, files, and objects of any class that implements the __iter__() method (or __getitem__() for sequence-like behavior). An iterable essentially “knows” how to produce an iterator for itself.
  • Iterator: An object representing a stream of data. It produces the next value in the stream when you call next() on it. It maintains its current state (which item is next). An iterator must implement the __next__() method and usually also implements __iter__(self) which simply returns self (as the iterator is its own iterator).

How for loops work:

flowchart TD
    A[Start: for item in my_iterable:] --> B{"Call iter(my_iterable)"};
    B --> C[Get iterator_object];
    C --> D{"Call next(iterator_object)"};
    D --> E{Item available?};
    E -- Yes --> F[Assign item to item variable];
    F --> G[Execute loop body];
    G --> D;
    E -- No (StopIteration raised) --> H[End Loop];

    style A fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px
    style H fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px
    style B fill:#F3F4F6,stroke:#6B7280
    style D fill:#F3F4F6,stroke:#6B7280
    style E fill:#FEF9C3,stroke:#CA8A04

When you write for item in my_iterable:, Python internally does something like this:

  1. Calls iter(my_iterable) which, in turn, calls my_iterable.__iter__() to get an iterator object.
  2. Enters a loop.
  3. Calls next(iterator_object) which, in turn, calls iterator_object.__next__() to get the next item.
  4. Assigns the returned item to the loop variable (item).
  5. Executes the loop body.
  6. Repeats steps 3-5 until iterator_object.__next__() raises a StopIteration exception.
  7. The loop catches StopIteration and terminates gracefully.
Python
my_list = [1, 2, 3]

# Get an iterator from the list (iterable)
my_iterator = iter(my_list) # Calls my_list.__iter__()

print(type(my_list))     # <class 'list'> (Iterable)
print(type(my_iterator)) # <class 'list_iterator'> (Iterator)

# Manually call next() on the iterator
print(next(my_iterator)) # Output: 1 (Calls my_iterator.__next__())
print(next(my_iterator)) # Output: 2
print(next(my_iterator)) # Output: 3

# Calling next() again will raise StopIteration
# print(next(my_iterator)) # Raises StopIteration exception

Generators: A Simpler Way to Create Iterators

Manually creating a class with __iter__ and __next__ to implement an iterator can be cumbersome. Generators provide a much simpler syntax using functions and the yield keyword.

Generator Functions (yield):

A function becomes a generator function if it contains one or more yield statements.

  • When a generator function is called, it doesn’t execute the function body immediately. Instead, it returns a generator object, which is a type of iterator.
  • When next() is called on the generator object for the first time, the function executes from the beginning until it hits a yield statement.
  • The value specified after yield is returned by next().
  • Crucially, the function’s execution state (local variables, instruction pointer) is paused at the yield statement.
  • Subsequent calls to next() resume execution immediately after the last yield statement, continuing until the next yield is encountered or the function terminates.
  • If the function terminates (e.g., reaches the end or a return statement), the generator object automatically raises StopIteration on the next next() call.
Python
def count_up_to(max_val):
    """A generator function that yields numbers from 1 up to max_val."""
    print("Generator started...")
    i = 1
    while i <= max_val:
        print(f"Yielding {i}")
        yield i # Pauses here, returns i, remembers state (value of i)
        i += 1
    print("Generator finished.")
    # Implicit StopIteration raised after this

# Call the generator function - returns a generator object
counter_gen = count_up_to(3)
print(f"Generator object created: {counter_gen}")

# Iterate using next()
print("Calling next() the first time:")
val1 = next(counter_gen) # Executes until the first yield
print(f"Received: {val1}")

print("\nCalling next() the second time:")
val2 = next(counter_gen) # Resumes after first yield, executes until second yield
print(f"Received: {val2}")

print("\nCalling next() the third time:")
val3 = next(counter_gen) # Resumes after second yield, executes until third yield
print(f"Received: {val3}")

print("\nCalling next() the fourth time:")
try:
    next(counter_gen) # Resumes after third yield, finishes function, raises StopIteration
except StopIteration:
    print("StopIteration caught, as expected.")

# Using a for loop (more common) - handles StopIteration automatically
print("\nUsing a for loop:")
for number in count_up_to(4): # Creates a new generator object implicitly
    print(f"For loop received: {number}")

Benefits of Generators:
  • Memory Efficiency: Generators produce values one at a time and only when requested (lazy evaluation). They don’t store the entire sequence in memory, making them ideal for very large or potentially infinite sequences where creating a list would be impossible or inefficient.
  • Simplicity: Writing a generator function with yield is often much simpler and more readable than creating a custom iterator class.
  • Composability: Generators can be chained together easily to create data processing pipelines.

flowchart TD
    subgraph Generator Function Call
        A["Call my_gen_func(...)"] --> B["Return generator_object (Iterator)"];
        B --> C(State: Ready to start);
    end

    subgraph Iteration Loop
        D{"Call next(generator_object)"} --> E{Resume/Start function execution};
        E --> F{"Execute until yield value"};
        F --> G["Pause function state"];
        G --> H{"Return value from next()"};
        H --> I(State: Paused after yield);
        I -- Subsequent call --> D;

        F --> J{"Function ends <br>return</br>"};
        J --> K["Raise StopIteration"];
        K --> L(State: Exhausted);
    end

    C --> D;

    style A fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px
    style L fill:#FECACA,stroke:#DC2626,stroke-width:2px
    style F fill:#A7F3D0,stroke:#047857
    style J fill:#A7F3D0,stroke:#047857
    style G fill:#FEF9C3,stroke:#CA8A04
    style I fill:#FEF9C3,stroke:#CA8A04

Generator Expressions

Similar to list comprehensions, you can create simple generators on-the-fly using generator expressions. They look like list comprehensions but use parentheses () instead of square brackets [].

Syntax:

Python
(expression for item in iterable if condition)

Generator expressions also produce values lazily, just like generator functions.

Python
# List comprehension (builds the full list in memory)
squares_list = [x*x for x in range(5)]
print(f"List comprehension: {squares_list}") # Output: [0, 1, 4, 9, 16]
print(type(squares_list)) # <class 'list'>

# Generator expression (creates a generator object, values produced on demand)
squares_gen = (x*x for x in range(5))
print(f"Generator expression: {squares_gen}") # Output: <generator object <genexpr> at 0x...>
print(type(squares_gen)) # <class 'generator'>

# Iterate over the generator expression
print("Iterating over generator expression:")
for square in squares_gen:
    print(square) # Output: 0, 1, 4, 9, 16 (one per line)

# Once consumed, a generator is exhausted
print("Trying to iterate again:")
for square in squares_gen:
    print(square) # No output, generator is empty now

Memory Usage: List vs. Generator (Large Sequence) List (Eager Evaluation) [x for x in range(1_000_000)] Stores ALL 1 million items in memory at once. Memory Usage Generator (Lazy Evaluation) (x for x in range(1_000_000)) Produces items one by one only when requested. Minimal memory footprint. Memory Usage

Code Examples

Example 1: Custom Iterator Class (Fibonacci Sequence)

Python
# fibonacci_iterator.py

class FibonacciIterator:
    """An iterator for the Fibonacci sequence up to a max value."""
    def __init__(self, max_value):
        self._max_value = max_value
        self._a = 0
        self._b = 1

    def __iter__(self):
        # The iterator object is itself iterable
        return self

    def __next__(self):
        # Calculate the next Fibonacci number
        fib = self._a
        if fib > self._max_value:
            # No more items to produce
            raise StopIteration

        # Update state for the *next* call
        self._a, self._b = self._b, self._a + self._b
        return fib # Return the current number

# Using the custom iterator
print("Fibonacci sequence up to 50 (using custom iterator):")
fib_iter = FibonacciIterator(50)
for num in fib_iter:
    print(num, end=" ") # Output: 0 1 1 2 3 5 8 13 21 34
print("\n")

Explanation:

  • The FibonacciIterator class implements both __iter__ (returning self) and __next__.
  • __init__ sets the maximum value and initializes the first two sequence numbers (_a, _b).
  • __next__ calculates the current Fibonacci number (fib), checks if it exceeds the maximum (raising StopIteration if it does), updates the state (_a, _b) for the next call, and returns the current number.

Example 2: Generator Function (Fibonacci Sequence)

Python
# fibonacci_generator.py

def fibonacci_generator(max_value):
    """A generator function for the Fibonacci sequence up to a max value."""
    a, b = 0, 1
    print("Fibonacci generator started...")
    while a <= max_value:
        yield a # Yield the current number, pause state
        a, b = b, a + b # Update state for next iteration
    print("Fibonacci generator finished.")

# Using the generator function
print("Fibonacci sequence up to 50 (using generator):")
fib_gen = fibonacci_generator(50) # Creates the generator object
for num in fib_gen: # Iteration pulls values via yield
    print(num, end=" ") # Output: 0 1 1 2 3 5 8 13 21 34
print("\n")

Explanation:

  • This achieves the same result as Example 1 but with much less code.
  • The fibonacci_generator function uses a while loop and yield a.
  • The state (a, b) is automatically saved and restored between yield calls.
  • StopIteration is raised automatically when the while loop condition becomes false and the function ends.

Example 3: Generator Expression for File Processing

Python
# process_log_lazy.py
import os

# Create a dummy log file
log_filename = "app.log"
try:
    with open(log_filename, "w") as f:
        f.write("INFO: Application started\n")
        f.write("DEBUG: Connecting to database\n")
        f.write("ERROR: Failed to load resource X\n")
        f.write("INFO: User logged in\n")
        f.write("WARNING: Disk space low\n")
        f.write("ERROR: Connection timed out\n")
except IOError:
    print("Error creating dummy log file.")

# Use a generator expression to find error lines without loading the whole file
try:
    with open(log_filename, "r") as f:
        # Generator expression: creates a generator object
        # Processes lines only as requested by the loop
        error_lines_gen = (line.strip() for line in f if line.startswith("ERROR"))

        print("--- Error lines found (processed lazily) ---")
        for error_line in error_lines_gen: # Pulls lines from the generator
            print(error_line)

except FileNotFoundError:
    print(f"Error: Log file '{log_filename}' not found.")
except IOError as e:
    print(f"Error reading log file: {e}")
finally:
    # Clean up dummy file
    if os.path.exists(log_filename):
        # os.remove(log_filename) # Uncomment to remove after run
        pass

Explanation:

  • A dummy log file is created.
  • error_lines_gen = (line.strip() for line in f if line.startswith("ERROR")) creates a generator expression.
  • This expression doesn’t read the whole file immediately. It yields lines starting with “ERROR” one by one only when the for loop asks for the next item.
  • This is very memory-efficient for large log files, as only one line (plus the generator’s internal state) needs to be in memory at a time during the filtering process.

Common Mistakes or Pitfalls

  • Confusing Iterables and Iterators: Trying to call next() on an iterable directly (e.g., next(my_list)) instead of first getting an iterator using iter(my_list).
  • Exhausting Iterators/Generators: Forgetting that iterators and generators can typically only be iterated over once. After the first full iteration, they are empty. If you need to iterate multiple times, you usually need to get a new iterator (iter(iterable)) or recreate the generator object by calling the generator function again or re-evaluating the generator expression.
  • yield vs. return in Generators: Using return value in a generator function will cause it to raise StopIteration immediately after returning that value (or just raise StopIteration if no value is returned). yield is used to produce values for the iteration sequence while pausing execution.
  • Overusing Generators: While memory-efficient, if you do need the entire sequence readily available (e.g., to sort it, access elements by index randomly), creating a list might be more appropriate than using a generator.

Chapter Summary

Concept Key Method(s) / Syntax Description
Iteration Protocol __iter__(), __next__() The mechanism Python uses for iteration. Requires an iterable object and an iterator object.
Iterable __iter__() (or __getitem__()) An object that can produce an iterator (e.g., lists, strings, dicts, files). It knows how to start an iteration.
Iterator __next__(), __iter__() (returns self) An object that produces the next value in a sequence when next() is called. Maintains state. Raises StopIteration when exhausted.
for loop for item in iterable: Automatically uses the iteration protocol: calls iter() on the iterable, then repeatedly calls next() on the iterator until StopIteration occurs.
Generator Function def func(): ... yield value A function containing yield. When called, returns a generator object (an iterator). Execution pauses at yield and resumes on the next next() call.
yield Keyword yield expression Used in generator functions. Produces a value for the iterator and pauses the function’s execution, saving its state.
Generator Expression (expr for item in iterable if cond) A concise, memory-efficient way to create a generator object using syntax similar to list comprehensions (but with parentheses). Evaluated lazily.
Lazy Evaluation / Memory Efficiency N/A (Benefit) Generators produce items one at a time, only when requested. They don’t store the entire sequence in memory, making them ideal for large datasets.
Iterator Exhaustion N/A (Behavior) Once an iterator (including generators) has been fully consumed (raised StopIteration), it cannot be reused. A new iterator must be created.
  • Iteration in Python uses the iteration protocol: __iter__() (on iterables) returns an iterator, and __next__() (on iterators) returns the next item or raises StopIteration.
  • Iterables (e.g., lists, strings) can produce iterators. Iterators produce values sequentially.
  • for loops automatically handle the iteration protocol.
  • Generators are a simple way to create iterators.
  • Generator functions use the yield keyword to produce values and pause execution state. Calling a generator function returns a generator object (an iterator).
  • Generator expressions ((expr for item in iterable)) provide a concise syntax for creating generators, similar to list comprehensions but using parentheses.
  • Generators offer significant memory efficiency by producing values lazily (on demand), making them suitable for large or infinite sequences.

Exercises & Mini Projects

Exercises

  1. Manual Iteration: Create a list numbers = [10, 20, 30]. Get an iterator from the list using iter(). Use next() three times to print each element. Try calling next() a fourth time and wrap it in a try...except StopIteration block to catch the expected exception.
  2. Simple Generator Function: Write a generator function even_numbers(limit) that yields even numbers starting from 0 up to (but not including) limit. Use a for loop to iterate through the generator created by even_numbers(10) and print each number.
  3. Generator Expression: Use a generator expression to create a generator that yields the squares of numbers from 1 to 10. Iterate through the generator using a for loop and print each square.
  4. Infinite Generator (Careful!): Write a generator function count_forever(start=0) that yields numbers starting from start and increments indefinitely (using while True). Use a for loop to print the first 10 numbers produced by this generator (use a counter and break in your loop to stop it).
  5. File Line Generator: Write a generator function read_lines_gen(filepath) that takes a file path, opens the file, and yields each line one by one (stripping whitespace). Use this generator in a for loop to print the lines of a text file you create. Make sure to handle FileNotFoundError.

Mini Project: Generating Random Walk Coordinates

Goal: Create a generator that yields coordinates for a simple 2D random walk.

Steps:

  1. Import random: You’ll need random.choice() or random.randint().
  2. Define the Generator random_walk_2d(steps):
    • This function should take the number of steps as an argument.
    • Initialize coordinates x, y = 0, 0.
    • Use a for loop to iterate steps times (e.g., for _ in range(steps):).
    • Inside the loop:
      • Yield the current coordinates (x, y).
      • Randomly choose a direction: e.g., generate a random integer from 0 to 3 (or use random.choice(['N', 'S', 'E', 'W'])).
      • Update x or y based on the chosen direction (e.g., if direction is 0/’N’, increment y; if 1/’S’, decrement y; if 2/’E’, increment x; if 3/’W’, decrement x).
  3. Use the Generator:
    • Call random_walk_2d(100) to create a generator object for a 100-step walk.
    • Use a for loop to iterate through the generator.
    • Inside the loop, print the (x, y) coordinates yielded at each step.

(Optional Enhancement): Modify the generator to also yield the step number along with the coordinates, e.g., yield step_number, (x, y).

Additional Sources:

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top