Programming with Python | Chapter 22: Working with CSV and JSON Data

Chapter Objectives

  • Understand the structure and common uses of CSV (Comma-Separated Values) files.
  • Use Python‘s built-in csv module to read and write CSV data.
    • Read CSV files using csv.reader.
    • Write CSV files using csv.writer.
    • Read CSV files into dictionaries using csv.DictReader.
    • Write data from dictionaries to CSV files using csv.DictWriter.
  • Understand the structure and common uses of JSON (JavaScript Object Notation) data.
  • Use Python’s built-in json module to work with JSON data.
    • Parse JSON strings into Python objects using json.loads().
    • Read JSON data from files into Python objects using json.load().
    • Serialize Python objects into JSON strings using json.dumps().
    • Write Python objects to JSON files using json.dump().
  • Recognize the differences and typical use cases for CSV and JSON.

Introduction

Applications often need to exchange data with other systems or store structured data in a simple text format. Two extremely common formats for this are CSV and JSON. CSV is a simple tabular format often used for spreadsheets and basic database exports, where data values are separated by commas. JSON is a human-readable format based on JavaScript object syntax, widely used for configuration files, data exchange between web servers and clients (APIs), and storing more complex nested data structures. Python’s standard library provides dedicated modules, csv and json, to make reading, writing, and manipulating data in these formats straightforward. This chapter covers the basics of working with both CSV and JSON files in Python.

Theory & Explanation

CSV (Comma-Separated Values)

CSV is a plain text format where data is organized into rows, and values within each row are separated by a delimiter, typically a comma. The first row often contains header names for the columns.

Example data.csv:

Plaintext
Name,Department,Salary
Alice,Engineering,80000
Bob,Sales,75000
Charlie,Engineering,90000

Key Characteristics:

  • Tabular data (rows and columns).
  • Plain text.
  • Simple structure.
  • Delimiter (usually comma, but can be tab, semicolon, etc.).
  • Good for spreadsheet-like data.
  • Not inherently suited for nested or complex data structures.

The csv Module:

Python’s csv module helps handle the nuances of CSV parsing and writing, such as quoting fields that contain commas or newlines.

Reading with csv.reader: Reads CSV data row by row, where each row is returned as a list of strings.

Python
import csv

try:
    with open('data.csv', mode='r', newline='', encoding='utf-8') as csvfile:
        # newline='' is important to prevent extra blank rows
        csv_reader = csv.reader(csvfile) # Create a reader object
        header = next(csv_reader) # Read the header row
        print(f"Header: {header}")
        for row in csv_reader: # Iterate over remaining rows
            # Each 'row' is a list of strings
            print(f"Row: {row}")
            # Example: Accessing data
            # name = row[0]
            # salary = int(row[2]) # Remember values are strings!
except FileNotFoundError:
    print("Error: data.csv not found.")
except Exception as e:
    print(f"An error occurred: {e}")

Note: newline='' is recommended when opening CSV files to prevent the csv module from misinterpreting line endings.

Writing with csv.writer: Writes data (typically lists or tuples) to a CSV file.

Python
import csv

header = ['Name', 'Score']
data = [
    ['Alice', 95],
    ['Bob', 88],
    ['Charlie', 72]
]

try:
    with open('scores.csv', mode='w', newline='', encoding='utf-8') as csvfile:
        csv_writer = csv.writer(csvfile)
        csv_writer.writerow(header) # Write a single row (header)
        csv_writer.writerows(data) # Write multiple rows from the list of lists
    print("Data written to scores.csv")
except Exception as e:
    print(f"An error occurred: {e}")

Reading with csv.DictReader: Reads CSV data row by row, where each row is returned as a dictionary. The keys are taken from the header row (or specified manually). This is often more convenient than csv.reader.

Python
import csv

try:
    with open('data.csv', mode='r', newline='', encoding='utf-8') as csvfile:
        # Assumes first row is header
        dict_reader = csv.DictReader(csvfile)
        print("\nReading with DictReader:")
        for row_dict in dict_reader:
            # Each 'row_dict' is a dictionary
            print(row_dict)
            # Example: Accessing data by header name
            # print(f"Name: {row_dict['Name']}, Salary: {row_dict['Salary']}")
except FileNotFoundError:
    print("Error: data.csv not found.")
except Exception as e:
    print(f"An error occurred: {e}")

graph TD
    A[Start] --> B("Open CSV file <b>data.csv</b>");
    B --> C{"Create <i>csv.DictReader(file)</i>"};
    C --> D["Read Header Row (implicitly by DictReader)"];
    D --> E{"Loop: Iterate over <i>DictReader</i>"};
    E -- Row available --> F["Get row as Dictionary <i>row_dict</i>"];
    F --> G["Process <i>row_dict[Column]</i>"];
    G --> E;
    E -- No more rows --> H(Close file);
    H --> I[End];

    style F fill:#ccf,stroke:#333,stroke-width:1px
    style G fill:#cfc,stroke:#333,stroke-width:1px

Writing with csv.DictWriter: Writes data from a list of dictionaries to a CSV file. Requires specifying the header names (fieldnames).

Python
import csv

fieldnames = ['id', 'product_name', 'price'] # Must match dictionary keys
data = [
    {'id': 101, 'product_name': 'Laptop', 'price': 1200.50},
    {'id': 102, 'product_name': 'Mouse', 'price': 25.99},
    {'id': 103, 'product_name': 'Keyboard', 'price': 75.00}
]

try:
    with open('products.csv', mode='w', newline='', encoding='utf-8') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader() # Write the header row based on fieldnames
        writer.writerows(data) # Write rows from the list of dictionaries
    print("\nData written to products.csv")
except Exception as e:
    print(f"An error occurred: {e}")

JSON (JavaScript Object Notation)

JSON is a lightweight data-interchange format inspired by JavaScript object literal syntax. It’s human-readable and easy for machines to parse and generate. It’s commonly used for configuration files and transmitting data over networks (APIs).

Example config.json:

JSON
{
  "server_name": "MainServer",
  "ip_address": "192.168.1.100",
  "port": 8080,
  "is_active": true,
  "allowed_users": [
    "admin",
    "guest",
    null
  ],
  "settings": {
    "theme": "dark",
    "max_connections": 100
  }
}

Key Characteristics:

  • Uses key-value pairs (like Python dictionaries).
  • Values can be strings, numbers, booleans (true/false), arrays (like Python lists), null (like Python None), or other JSON objects (nested structures).
  • Keys are always strings enclosed in double quotes.
  • Human-readable text format.
  • Excellent for representing nested or complex data structures.

Python json Module:

The json module handles serialization (converting Python objects to JSON strings/files) and deserialization (converting JSON strings/files to Python objects).

Mapping between JSON and Python:

JSON Type Python Equivalent
object (e.g., {"key": "value"}) dict
array (e.g., [1, "two", null]) list
string (e.g., "hello") str
number (integer, e.g., 123) int
number (real/float, e.g., 3.14, -1.5e2) float
true True
false False
null None

Parsing JSON Strings (json.loads): Converts a JSON formatted string into a Python object (usually a dictionary or list). loads stands for “load string”.

Python
import json

json_string = '{"name": "Alice", "age": 30, "city": "New York"}'

try:
    python_dict = json.loads(json_string)
    print(f"Parsed from string: {python_dict}")
    print(f"Name: {python_dict['name']}")
    print(f"Type: {type(python_dict)}") # <class 'dict'>
except json.JSONDecodeError as e:
    print(f"Error decoding JSON string: {e}")

Reading JSON Files (json.load): Reads JSON data from a file-like object (e.g., a file opened for reading) and parses it into a Python object.

Python
import json

# Assume 'config.json' exists with the content shown earlier
try:
    with open('config.json', 'r', encoding='utf-8') as f:
        config_data = json.load(f) # Load from file object
        print(f"\nLoaded from file: {config_data}")
        print(f"Server Port: {config_data['port']}")
        print(f"Allowed Users: {config_data['allowed_users']}")
        print(f"Theme Setting: {config_data['settings']['theme']}")
except FileNotFoundError:
    print("Error: config.json not found.")
except json.JSONDecodeError as e:
    print(f"Error decoding JSON file: {e}")
except Exception as e:
    print(f"An error occurred: {e}")

Serializing to JSON Strings (json.dumps): Converts a Python object (dictionaries, lists, strings, numbers, booleans, None) into a JSON formatted string. dumps stands for “dump string”.

Python
import json

python_data = {
    'id': 123,
    'name': 'Example Product',
    'tags': ['electronics', 'gadget'],
    'in_stock': True,
    'price': 49.99,
    'related_ids': None
}

# Convert Python dict to JSON string
# indent=2 makes the output nicely formatted (human-readable)
json_output_string = json.dumps(python_data, indent=2)
print("\nPython object serialized to JSON string:")
print(json_output_string)

Writing to JSON Files (json.dump): Writes a Python object to a file-like object (e.g., a file opened for writing) as JSON data.

Python
import json

python_data = { # Same data as above
    'id': 123,
    'name': 'Example Product',
    'tags': ['electronics', 'gadget'],
    'in_stock': True,
    'price': 49.99,
    'related_ids': None
}

output_filename = "output_data.json"
try:
    with open(output_filename, 'w', encoding='utf-8') as f:
        # Write Python object to file as JSON
        # indent=2 for pretty printing in the file
        json.dump(python_data, f, indent=2)
    print(f"\nPython object successfully written to '{output_filename}'")
except Exception as e:
    print(f"An error occurred writing JSON file: {e}")

graph TD
    J[Start] --> K["Have Python Object (e.g., list of dicts <b>data_list</b>)"];
    K --> L("Open output file <b>output.json</b> in write mode");
    L --> M{"Call <i>json.dump(data_list, file, indent=2)</i>"};
    M -- Serializes & Writes --> N("JSON data written to file");
    N --> O(Close file);
    O --> P[End];

    style K fill:#f9f,stroke:#333,stroke-width:1px
    style M fill:#ccf,stroke:#333,stroke-width:1px

CSV vs. JSON

  • CSV: Best for simple, flat, tabular data. Easy to view/edit in spreadsheet software. Less flexible for complex or nested data. File size can be smaller for simple tables.
  • JSON: Best for structured data, nested data, lists, and representing object state. Native format for web APIs. More flexible structure but can be slightly more verbose than CSV for simple tables.

Code Examples

Example 1: Reading CSV with DictReader and Writing JSON

Python
# csv_to_json_converter.py
import csv
import json

csv_filepath = 'products.csv' # Assumes this exists from Theory section
json_filepath = 'products.json'

data_list = []

# Read from CSV using DictReader
try:
    with open(csv_filepath, mode='r', newline='', encoding='utf-8') as infile:
        reader = csv.DictReader(infile)
        print(f"Reading data from {csv_filepath}...")
        for row in reader:
            # Convert price back to float (it was read as string)
            try:
                row['price'] = float(row['price'])
                row['id'] = int(row['id']) # Convert id to int
            except ValueError:
                print(f"Warning: Could not convert price/id for row: {row}")
                # Handle error as needed, e.g., skip row or set default
            data_list.append(row)
    print(f"Read {len(data_list)} records.")

except FileNotFoundError:
    print(f"Error: CSV file '{csv_filepath}' not found.")
    exit() # Stop script if input file not found
except Exception as e:
    print(f"Error reading CSV: {e}")
    exit()

# Write the list of dictionaries to JSON
try:
    with open(json_filepath, mode='w', encoding='utf-8') as outfile:
        # Use indent for readability in the output JSON file
        json.dump(data_list, outfile, indent=4)
    print(f"Data successfully written to {json_filepath}")
except Exception as e:
    print(f"Error writing JSON: {e}")


graph TD
    subgraph Read CSV
        R1(Open CSV File) --> R2{"<i>csv.DictReader</i>"};
        R2 --> R3{Loop through rows};
        R3 -- Each row --> R4["Get row as <b>Dict</b>"];
        R4 --> R5("Append Dict to <i>data_list</i>");
        R3 -- Done --> R6(Close CSV File);
    end

    subgraph Write JSON
        W1(Open JSON File) --> W2{"<i>json.dump(data_list, file)</i>"};
        W2 --> W3(Close JSON File);
    end

    R5 --> W1; 

    style R4 fill:#ccf,stroke:#333,stroke-width:1px
    style W2 fill:#cfc,stroke:#333,stroke-width:1px

Explanation:

  • Reads data from products.csv using csv.DictReader, automatically creating dictionaries for each row based on the header.
  • Converts price and id strings read from CSV back into float and int respectively, handling potential ValueError.
  • Appends each row dictionary to data_list.
  • Writes the entire data_list (a list of dictionaries) to products.json using json.dump with indentation for pretty printing.

Example 2: Reading JSON and Writing CSV with DictWriter

Python
# json_to_csv_converter.py
import json
import csv

json_filepath = 'products.json' # Assumes this exists from Example 1
csv_filepath = 'products_from_json.csv'

# Read JSON data from file
try:
    with open(json_filepath, mode='r', encoding='utf-8') as infile:
        print(f"Reading data from {json_filepath}...")
        data_list = json.load(infile) # data_list is now a list of dicts
        print(f"Read {len(data_list)} records.")

        if not data_list:
            print("JSON file is empty. Nothing to write to CSV.")
            exit()

except FileNotFoundError:
    print(f"Error: JSON file '{json_filepath}' not found.")
    exit()
except json.JSONDecodeError as e:
    print(f"Error decoding JSON: {e}")
    exit()
except Exception as e:
    print(f"Error reading JSON: {e}")
    exit()


# Write data to CSV using DictWriter
try:
    # Dynamically get fieldnames from the keys of the first dictionary
    # Assumes all dictionaries in the list have the same keys
    fieldnames = data_list[0].keys()

    with open(csv_filepath, mode='w', newline='', encoding='utf-8') as outfile:
        writer = csv.DictWriter(outfile, fieldnames=fieldnames)
        writer.writeheader() # Write header row
        writer.writerows(data_list) # Write data from list of dicts
    print(f"Data successfully written to {csv_filepath}")

except Exception as e:
    print(f"Error writing CSV: {e}")


Explanation:

  • Reads the list of dictionaries from products.json using json.load.
  • Checks if the list is empty.
  • Dynamically determines the CSV header (fieldnames) by looking at the keys of the first dictionary in the list (this assumes all dictionaries have the same structure).
  • Uses csv.DictWriter with the determined fieldnames to write the header and then all the dictionaries from data_list to products_from_json.csv.

Common Mistakes or Pitfalls

  • CSV newline='': Forgetting newline='' when opening CSV files for use with the csv module can lead to extra blank rows being written or read incorrectly on some platforms.
  • CSV Data Types: Remembering that csv.reader and csv.DictReader read all fields as strings. You need to manually convert them to int, float, etc., if needed, often within a try...except ValueError.
  • JSON Keys: JSON standard requires keys to be strings. While Python dictionaries allow other immutable types as keys, json.dumps will convert non-string keys to strings (or raise an error depending on options).
  • JSON Data Types: Trying to serialize Python objects that don’t have a direct JSON equivalent (like custom class instances, sets, dates) using json.dump/dumps will raise a TypeError unless you provide a custom serialization function (using the default argument) or convert them first. Dates are often converted to ISO format strings. Sets are often converted to lists.
  • load/dump vs. loads/dumps: Using load/dump when you have a string (use loads/dumps) or vice-versa when you have a file object. Remember s stands for “string”.
  • JSON Decoding Errors: Invalid JSON syntax in a file or string will cause json.JSONDecodeError during parsing (load/loads).

Chapter Summary

Format Module Function/Class Purpose Key Syntax/Example
CSV csv csv.reader(file_obj) Reads CSV row by row, returning each row as a list of strings. reader = csv.reader(f)
header = next(reader)
for row in reader: print(row)
CSV csv csv.writer(file_obj) Creates a writer object to write data (lists/tuples) to a CSV file. writer = csv.writer(f)
writer.writerow(['Head1', 'Head2'])
writer.writerows([['r1c1','r1c2'], ['r2c1','r2c2']])
CSV csv csv.DictReader(file_obj) Reads CSV row by row, returning each row as a dictionary using the header row as keys. reader = csv.DictReader(f)
for row_dict in reader: print(row_dict['HeaderName'])
CSV csv csv.DictWriter(file_obj, fieldnames) Creates a writer object to write data from dictionaries to a CSV file. Requires specifying `fieldnames`. fields = ['col1', 'col2']
writer = csv.DictWriter(f, fieldnames=fields)
writer.writeheader()
writer.writerows([{'col1': 'a', 'col2': 1}])
JSON json json.loads(json_string) Parses (deserializes) a JSON formatted string into a Python object (dict, list, etc.). ‘s’ for string. data = json.loads('{"key": "value"}')
JSON json json.load(file_obj) Reads and parses (deserializes) JSON data from a file-like object into a Python object. with open('data.json') as f:
data = json.load(f)
JSON json json.dumps(python_obj, indent=None) Serializes a Python object into a JSON formatted string. ‘s’ for string. `indent` for pretty printing. json_str = json.dumps({'key': 'val'}, indent=2)
JSON json json.dump(python_obj, file_obj, indent=None) Serializes a Python object and writes it to a file-like object as JSON. `indent` for pretty printing. with open('out.json', 'w') as f:
json.dump(data, f, indent=2)
  • CSV is a simple, text-based format for tabular data, using delimiters (usually commas). Python’s csv module helps read (reader, DictReader) and write (writer, DictWriter) CSV files. Remember newline='' and that values are read as strings.
  • JSON is a text-based format for structured data using key-value pairs and arrays, common in web APIs and configuration. Python’s json module handles serialization and deserialization.
    • json.loads(string): Parses JSON string to Python object.
    • json.load(file_obj): Parses JSON file object to Python object.
    • json.dumps(py_obj): Serializes Python object to JSON string. Use indent for pretty printing.
    • json.dump(py_obj, file_obj): Serializes Python object to JSON file object. Use indent.
  • Choose CSV for simple tabular data and JSON for nested/complex data structures or web data exchange.

Exercises & Mini Projects

Exercises

  1. Write Simple CSV: Create a list of lists representing student names and scores [['Name', 'Score'], ['Alice', 90], ['Bob', 85]]. Write this data to a file named report.csv using csv.writer.
  2. Read Simple CSV: Read the report.csv file created in Exercise 1 using csv.reader. Print the header and each data row.
  3. Write Dict to CSV: Create a list of dictionaries, where each dictionary represents a person with ‘name’ and ‘city’ keys. Write this list to people.csv using csv.DictWriter.
  4. Read CSV to Dict: Read the people.csv file created in Exercise 3 using csv.DictReader and print each dictionary row.
  5. Write JSON: Create a Python dictionary containing information about a movie (title, director, year, actors (list)). Write this dictionary to a file named movie.json using json.dump with an indent of 2.
  6. Read JSON: Read the movie.json file created in Exercise 5 using json.load. Print the director’s name and the list of actors from the loaded Python dictionary.

Mini Project: CSV Log Analyzer

Goal: Read a CSV log file (you’ll create a dummy one) and count the occurrences of different log levels (e.g., INFO, WARNING, ERROR).

Steps:

  1. Create Dummy Log CSV (system.log.csv):
    • Create a CSV file with columns like Timestamp, LogLevel, Message.Add several rows with different log levels (INFO, DEBUG, WARNING, ERROR). Example row: 2024-04-12 18:20:00,INFO,User logged in

    Timestamp,LogLevel,Message 2024-04-12 18:20:00,INFO,User logged in 2024-04-12 18:21:15,DEBUG,Checking database connection 2024-04-12 18:22:05,WARNING,Low disk space detected 2024-04-12 18:23:00,INFO,Data processed successfully 2024-04-12 18:24:30,ERROR,Failed to write to file 2024-04-12 18:25:00,INFO,User logged out 2024-04-12 18:26:00,WARNING,Configuration value missing
  2. Write the Python Script (log_analyzer.py):
    • Import the csv module.
    • Initialize an empty dictionary log_level_counts = {} to store the counts.
    • Use try...except FileNotFoundError to handle the case where the log file doesn’t exist.
    • Inside the try block:
      • Open system.log.csv using with open(...) in read mode ('r', newline='').
      • Create a csv.DictReader for the file.
      • Loop through each row (dictionary) provided by the DictReader.
      • Get the LogLevel value from the current row dictionary.
      • Update the log_level_counts dictionary: If the level is already a key, increment its value; otherwise, add the level as a new key with a value of 1. (Hint: counts.get(key, 0) + 1 is a useful pattern).
    • After the loop (outside the with block but inside the try block, or after the try block if you handle the empty case):
      • Print the heading “Log Level Counts:”.
      • Iterate through the log_level_counts dictionary (.items()) and print each log level and its count.
  3. Run and Test: Execute log_analyzer.py. The output should show the counts for INFO, DEBUG, WARNING, and ERROR based on your dummy CSV file.

Additional Sources:

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top