Programming with Python | Chapter 22: Working with CSV and JSON Data
Chapter Objectives
- Understand the structure and common uses of CSV (Comma-Separated Values) files.
- Use Python‘s built-in
csvmodule to read and write CSV data.- Read CSV files using
csv.reader. - Write CSV files using
csv.writer. - Read CSV files into dictionaries using
csv.DictReader. - Write data from dictionaries to CSV files using
csv.DictWriter.
- Read CSV files using
- Understand the structure and common uses of JSON (JavaScript Object Notation) data.
- Use Python’s built-in
jsonmodule to work with JSON data.- Parse JSON strings into Python objects using
json.loads(). - Read JSON data from files into Python objects using
json.load(). - Serialize Python objects into JSON strings using
json.dumps(). - Write Python objects to JSON files using
json.dump().
- Parse JSON strings into Python objects using
- Recognize the differences and typical use cases for CSV and JSON.
Introduction
Applications often need to exchange data with other systems or store structured data in a simple text format. Two extremely common formats for this are CSV and JSON. CSV is a simple tabular format often used for spreadsheets and basic database exports, where data values are separated by commas. JSON is a human-readable format based on JavaScript object syntax, widely used for configuration files, data exchange between web servers and clients (APIs), and storing more complex nested data structures. Python’s standard library provides dedicated modules, csv and json, to make reading, writing, and manipulating data in these formats straightforward. This chapter covers the basics of working with both CSV and JSON files in Python.
Theory & Explanation
CSV (Comma-Separated Values)
CSV is a plain text format where data is organized into rows, and values within each row are separated by a delimiter, typically a comma. The first row often contains header names for the columns.
Example data.csv:
Name,Department,Salary
Alice,Engineering,80000
Bob,Sales,75000
Charlie,Engineering,90000
Key Characteristics:
- Tabular data (rows and columns).
- Plain text.
- Simple structure.
- Delimiter (usually comma, but can be tab, semicolon, etc.).
- Good for spreadsheet-like data.
- Not inherently suited for nested or complex data structures.
The csv Module:
Python’s csv module helps handle the nuances of CSV parsing and writing, such as quoting fields that contain commas or newlines.
Reading with csv.reader: Reads CSV data row by row, where each row is returned as a list of strings.
import csv
try:
with open('data.csv', mode='r', newline='', encoding='utf-8') as csvfile:
# newline='' is important to prevent extra blank rows
csv_reader = csv.reader(csvfile) # Create a reader object
header = next(csv_reader) # Read the header row
print(f"Header: {header}")
for row in csv_reader: # Iterate over remaining rows
# Each 'row' is a list of strings
print(f"Row: {row}")
# Example: Accessing data
# name = row[0]
# salary = int(row[2]) # Remember values are strings!
except FileNotFoundError:
print("Error: data.csv not found.")
except Exception as e:
print(f"An error occurred: {e}")Note: newline='' is recommended when opening CSV files to prevent the csv module from misinterpreting line endings.
Writing with csv.writer: Writes data (typically lists or tuples) to a CSV file.
import csv
header = ['Name', 'Score']
data = [
['Alice', 95],
['Bob', 88],
['Charlie', 72]
]
try:
with open('scores.csv', mode='w', newline='', encoding='utf-8') as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow(header) # Write a single row (header)
csv_writer.writerows(data) # Write multiple rows from the list of lists
print("Data written to scores.csv")
except Exception as e:
print(f"An error occurred: {e}")Reading with csv.DictReader: Reads CSV data row by row, where each row is returned as a dictionary. The keys are taken from the header row (or specified manually). This is often more convenient than csv.reader.
import csv
try:
with open('data.csv', mode='r', newline='', encoding='utf-8') as csvfile:
# Assumes first row is header
dict_reader = csv.DictReader(csvfile)
print("\nReading with DictReader:")
for row_dict in dict_reader:
# Each 'row_dict' is a dictionary
print(row_dict)
# Example: Accessing data by header name
# print(f"Name: {row_dict['Name']}, Salary: {row_dict['Salary']}")
except FileNotFoundError:
print("Error: data.csv not found.")
except Exception as e:
print(f"An error occurred: {e}")graph TD
A[Start] --> B("Open CSV file <b>data.csv</b>");
B --> C{"Create <i>csv.DictReader(file)</i>"};
C --> D["Read Header Row (implicitly by DictReader)"];
D --> E{"Loop: Iterate over <i>DictReader</i>"};
E -- Row available --> F["Get row as Dictionary <i>row_dict</i>"];
F --> G["Process <i>row_dict[Column]</i>"];
G --> E;
E -- No more rows --> H(Close file);
H --> I[End];
style F fill:#ccf,stroke:#333,stroke-width:1px
style G fill:#cfc,stroke:#333,stroke-width:1px
Writing with csv.DictWriter: Writes data from a list of dictionaries to a CSV file. Requires specifying the header names (fieldnames).
import csv
fieldnames = ['id', 'product_name', 'price'] # Must match dictionary keys
data = [
{'id': 101, 'product_name': 'Laptop', 'price': 1200.50},
{'id': 102, 'product_name': 'Mouse', 'price': 25.99},
{'id': 103, 'product_name': 'Keyboard', 'price': 75.00}
]
try:
with open('products.csv', mode='w', newline='', encoding='utf-8') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader() # Write the header row based on fieldnames
writer.writerows(data) # Write rows from the list of dictionaries
print("\nData written to products.csv")
except Exception as e:
print(f"An error occurred: {e}")JSON (JavaScript Object Notation)
JSON is a lightweight data-interchange format inspired by JavaScript object literal syntax. It’s human-readable and easy for machines to parse and generate. It’s commonly used for configuration files and transmitting data over networks (APIs).
Example config.json:
{
"server_name": "MainServer",
"ip_address": "192.168.1.100",
"port": 8080,
"is_active": true,
"allowed_users": [
"admin",
"guest",
null
],
"settings": {
"theme": "dark",
"max_connections": 100
}
}
Key Characteristics:
- Uses key-value pairs (like Python dictionaries).
- Values can be strings, numbers, booleans (
true/false), arrays (like Python lists),null(like PythonNone), or other JSON objects (nested structures). - Keys are always strings enclosed in double quotes.
- Human-readable text format.
- Excellent for representing nested or complex data structures.
Python json Module:
The json module handles serialization (converting Python objects to JSON strings/files) and deserialization (converting JSON strings/files to Python objects).
Mapping between JSON and Python:
| JSON Type | Python Equivalent |
|---|---|
object (e.g., {"key": "value"}) |
dict |
array (e.g., [1, "two", null]) |
list |
string (e.g., "hello") |
str |
number (integer, e.g., 123) |
int |
number (real/float, e.g., 3.14, -1.5e2) |
float |
true |
True |
false |
False |
null |
None |
Parsing JSON Strings (json.loads): Converts a JSON formatted string into a Python object (usually a dictionary or list). loads stands for “load string”.
import json
json_string = '{"name": "Alice", "age": 30, "city": "New York"}'
try:
python_dict = json.loads(json_string)
print(f"Parsed from string: {python_dict}")
print(f"Name: {python_dict['name']}")
print(f"Type: {type(python_dict)}") # <class 'dict'>
except json.JSONDecodeError as e:
print(f"Error decoding JSON string: {e}")Reading JSON Files (json.load): Reads JSON data from a file-like object (e.g., a file opened for reading) and parses it into a Python object.
import json
# Assume 'config.json' exists with the content shown earlier
try:
with open('config.json', 'r', encoding='utf-8') as f:
config_data = json.load(f) # Load from file object
print(f"\nLoaded from file: {config_data}")
print(f"Server Port: {config_data['port']}")
print(f"Allowed Users: {config_data['allowed_users']}")
print(f"Theme Setting: {config_data['settings']['theme']}")
except FileNotFoundError:
print("Error: config.json not found.")
except json.JSONDecodeError as e:
print(f"Error decoding JSON file: {e}")
except Exception as e:
print(f"An error occurred: {e}")Serializing to JSON Strings (json.dumps): Converts a Python object (dictionaries, lists, strings, numbers, booleans, None) into a JSON formatted string. dumps stands for “dump string”.
import json
python_data = {
'id': 123,
'name': 'Example Product',
'tags': ['electronics', 'gadget'],
'in_stock': True,
'price': 49.99,
'related_ids': None
}
# Convert Python dict to JSON string
# indent=2 makes the output nicely formatted (human-readable)
json_output_string = json.dumps(python_data, indent=2)
print("\nPython object serialized to JSON string:")
print(json_output_string)Writing to JSON Files (json.dump): Writes a Python object to a file-like object (e.g., a file opened for writing) as JSON data.
import json
python_data = { # Same data as above
'id': 123,
'name': 'Example Product',
'tags': ['electronics', 'gadget'],
'in_stock': True,
'price': 49.99,
'related_ids': None
}
output_filename = "output_data.json"
try:
with open(output_filename, 'w', encoding='utf-8') as f:
# Write Python object to file as JSON
# indent=2 for pretty printing in the file
json.dump(python_data, f, indent=2)
print(f"\nPython object successfully written to '{output_filename}'")
except Exception as e:
print(f"An error occurred writing JSON file: {e}")graph TD
J[Start] --> K["Have Python Object (e.g., list of dicts <b>data_list</b>)"];
K --> L("Open output file <b>output.json</b> in write mode");
L --> M{"Call <i>json.dump(data_list, file, indent=2)</i>"};
M -- Serializes & Writes --> N("JSON data written to file");
N --> O(Close file);
O --> P[End];
style K fill:#f9f,stroke:#333,stroke-width:1px
style M fill:#ccf,stroke:#333,stroke-width:1pxCSV vs. JSON
- CSV: Best for simple, flat, tabular data. Easy to view/edit in spreadsheet software. Less flexible for complex or nested data. File size can be smaller for simple tables.
- JSON: Best for structured data, nested data, lists, and representing object state. Native format for web APIs. More flexible structure but can be slightly more verbose than CSV for simple tables.
Code Examples
Example 1: Reading CSV with DictReader and Writing JSON
# csv_to_json_converter.py
import csv
import json
csv_filepath = 'products.csv' # Assumes this exists from Theory section
json_filepath = 'products.json'
data_list = []
# Read from CSV using DictReader
try:
with open(csv_filepath, mode='r', newline='', encoding='utf-8') as infile:
reader = csv.DictReader(infile)
print(f"Reading data from {csv_filepath}...")
for row in reader:
# Convert price back to float (it was read as string)
try:
row['price'] = float(row['price'])
row['id'] = int(row['id']) # Convert id to int
except ValueError:
print(f"Warning: Could not convert price/id for row: {row}")
# Handle error as needed, e.g., skip row or set default
data_list.append(row)
print(f"Read {len(data_list)} records.")
except FileNotFoundError:
print(f"Error: CSV file '{csv_filepath}' not found.")
exit() # Stop script if input file not found
except Exception as e:
print(f"Error reading CSV: {e}")
exit()
# Write the list of dictionaries to JSON
try:
with open(json_filepath, mode='w', encoding='utf-8') as outfile:
# Use indent for readability in the output JSON file
json.dump(data_list, outfile, indent=4)
print(f"Data successfully written to {json_filepath}")
except Exception as e:
print(f"Error writing JSON: {e}")
graph TD
subgraph Read CSV
R1(Open CSV File) --> R2{"<i>csv.DictReader</i>"};
R2 --> R3{Loop through rows};
R3 -- Each row --> R4["Get row as <b>Dict</b>"];
R4 --> R5("Append Dict to <i>data_list</i>");
R3 -- Done --> R6(Close CSV File);
end
subgraph Write JSON
W1(Open JSON File) --> W2{"<i>json.dump(data_list, file)</i>"};
W2 --> W3(Close JSON File);
end
R5 --> W1;
style R4 fill:#ccf,stroke:#333,stroke-width:1px
style W2 fill:#cfc,stroke:#333,stroke-width:1pxExplanation:
- Reads data from
products.csvusingcsv.DictReader, automatically creating dictionaries for each row based on the header. - Converts
priceandidstrings read from CSV back intofloatandintrespectively, handling potentialValueError. - Appends each row dictionary to
data_list. - Writes the entire
data_list(a list of dictionaries) toproducts.jsonusingjson.dumpwith indentation for pretty printing.
Example 2: Reading JSON and Writing CSV with DictWriter
# json_to_csv_converter.py
import json
import csv
json_filepath = 'products.json' # Assumes this exists from Example 1
csv_filepath = 'products_from_json.csv'
# Read JSON data from file
try:
with open(json_filepath, mode='r', encoding='utf-8') as infile:
print(f"Reading data from {json_filepath}...")
data_list = json.load(infile) # data_list is now a list of dicts
print(f"Read {len(data_list)} records.")
if not data_list:
print("JSON file is empty. Nothing to write to CSV.")
exit()
except FileNotFoundError:
print(f"Error: JSON file '{json_filepath}' not found.")
exit()
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e}")
exit()
except Exception as e:
print(f"Error reading JSON: {e}")
exit()
# Write data to CSV using DictWriter
try:
# Dynamically get fieldnames from the keys of the first dictionary
# Assumes all dictionaries in the list have the same keys
fieldnames = data_list[0].keys()
with open(csv_filepath, mode='w', newline='', encoding='utf-8') as outfile:
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader() # Write header row
writer.writerows(data_list) # Write data from list of dicts
print(f"Data successfully written to {csv_filepath}")
except Exception as e:
print(f"Error writing CSV: {e}")
Explanation:
- Reads the list of dictionaries from
products.jsonusingjson.load. - Checks if the list is empty.
- Dynamically determines the CSV header (
fieldnames) by looking at the keys of the first dictionary in the list (this assumes all dictionaries have the same structure). - Uses
csv.DictWriterwith the determinedfieldnamesto write the header and then all the dictionaries fromdata_listtoproducts_from_json.csv.
Common Mistakes or Pitfalls
CSV newline='':Forgettingnewline=''when opening CSV files for use with thecsvmodule can lead to extra blank rows being written or read incorrectly on some platforms.- CSV Data Types: Remembering that
csv.readerandcsv.DictReaderread all fields as strings. You need to manually convert them toint,float, etc., if needed, often within atry...except ValueError. - JSON Keys: JSON standard requires keys to be strings. While Python dictionaries allow other immutable types as keys,
json.dumpswill convert non-string keys to strings (or raise an error depending on options). - JSON Data Types: Trying to serialize Python objects that don’t have a direct JSON equivalent (like custom class instances, sets, dates) using
json.dump/dumpswill raise aTypeErrorunless you provide a custom serialization function (using thedefaultargument) or convert them first. Dates are often converted to ISO format strings. Sets are often converted to lists. load/dumpvs.loads/dumps: Usingload/dumpwhen you have a string (useloads/dumps) or vice-versa when you have a file object. Remembersstands for “string”.- JSON Decoding Errors: Invalid JSON syntax in a file or string will cause
json.JSONDecodeErrorduring parsing (load/loads).
Chapter Summary
| Format | Module | Function/Class | Purpose | Key Syntax/Example |
|---|---|---|---|---|
| CSV | csv |
csv.reader(file_obj) |
Reads CSV row by row, returning each row as a list of strings. | reader = csv.reader(f)header = next(reader)for row in reader: print(row) |
| CSV | csv |
csv.writer(file_obj) |
Creates a writer object to write data (lists/tuples) to a CSV file. | writer = csv.writer(f)writer.writerow(['Head1', 'Head2'])writer.writerows([['r1c1','r1c2'], ['r2c1','r2c2']]) |
| CSV | csv |
csv.DictReader(file_obj) |
Reads CSV row by row, returning each row as a dictionary using the header row as keys. | reader = csv.DictReader(f)for row_dict in reader: print(row_dict['HeaderName']) |
| CSV | csv |
csv.DictWriter(file_obj, fieldnames) |
Creates a writer object to write data from dictionaries to a CSV file. Requires specifying `fieldnames`. | fields = ['col1', 'col2']writer = csv.DictWriter(f, fieldnames=fields)writer.writeheader()writer.writerows([{'col1': 'a', 'col2': 1}]) |
| JSON | json |
json.loads(json_string) |
Parses (deserializes) a JSON formatted string into a Python object (dict, list, etc.). ‘s’ for string. | data = json.loads('{"key": "value"}') |
| JSON | json |
json.load(file_obj) |
Reads and parses (deserializes) JSON data from a file-like object into a Python object. | with open('data.json') as f: data = json.load(f) |
| JSON | json |
json.dumps(python_obj, indent=None) |
Serializes a Python object into a JSON formatted string. ‘s’ for string. `indent` for pretty printing. | json_str = json.dumps({'key': 'val'}, indent=2) |
| JSON | json |
json.dump(python_obj, file_obj, indent=None) |
Serializes a Python object and writes it to a file-like object as JSON. `indent` for pretty printing. | with open('out.json', 'w') as f: json.dump(data, f, indent=2) |
- CSV is a simple, text-based format for tabular data, using delimiters (usually commas). Python’s
csvmodule helps read (reader,DictReader) and write (writer,DictWriter) CSV files. Remembernewline=''and that values are read as strings. - JSON is a text-based format for structured data using key-value pairs and arrays, common in web APIs and configuration. Python’s
jsonmodule handles serialization and deserialization.json.loads(string): Parses JSON string to Python object.json.load(file_obj): Parses JSON file object to Python object.json.dumps(py_obj): Serializes Python object to JSON string. Useindentfor pretty printing.json.dump(py_obj, file_obj): Serializes Python object to JSON file object. Useindent.
- Choose CSV for simple tabular data and JSON for nested/complex data structures or web data exchange.
Exercises & Mini Projects
Exercises
- Write Simple CSV: Create a list of lists representing student names and scores
[['Name', 'Score'], ['Alice', 90], ['Bob', 85]]. Write this data to a file namedreport.csvusingcsv.writer. - Read Simple CSV: Read the
report.csvfile created in Exercise 1 usingcsv.reader. Print the header and each data row. - Write Dict to CSV: Create a list of dictionaries, where each dictionary represents a person with ‘name’ and ‘city’ keys. Write this list to
people.csvusingcsv.DictWriter. - Read CSV to Dict: Read the
people.csvfile created in Exercise 3 usingcsv.DictReaderand print each dictionary row. - Write JSON: Create a Python dictionary containing information about a movie (title, director, year, actors (list)). Write this dictionary to a file named
movie.jsonusingjson.dumpwith an indent of 2. - Read JSON: Read the
movie.jsonfile created in Exercise 5 usingjson.load. Print the director’s name and the list of actors from the loaded Python dictionary.
Mini Project: CSV Log Analyzer
Goal: Read a CSV log file (you’ll create a dummy one) and count the occurrences of different log levels (e.g., INFO, WARNING, ERROR).
Steps:
Create Dummy Log CSV (system.log.csv):- Create a CSV file with columns like
Timestamp,LogLevel,Message.Add several rows with different log levels (INFO, DEBUG, WARNING, ERROR). Example row:2024-04-12 18:20:00,INFO,User logged in
Timestamp,LogLevel,Message 2024-04-12 18:20:00,INFO,User logged in 2024-04-12 18:21:15,DEBUG,Checking database connection 2024-04-12 18:22:05,WARNING,Low disk space detected 2024-04-12 18:23:00,INFO,Data processed successfully 2024-04-12 18:24:30,ERROR,Failed to write to file 2024-04-12 18:25:00,INFO,User logged out 2024-04-12 18:26:00,WARNING,Configuration value missing- Create a CSV file with columns like
Write the Python Script (log_analyzer.py):- Import the
csvmodule. - Initialize an empty dictionary
log_level_counts = {}to store the counts. - Use
try...except FileNotFoundErrorto handle the case where the log file doesn’t exist. - Inside the
tryblock:- Open
system.log.csvusingwith open(...)in read mode ('r',newline=''). - Create a
csv.DictReaderfor the file. - Loop through each
row(dictionary) provided by theDictReader. - Get the
LogLevelvalue from the currentrowdictionary. - Update the
log_level_countsdictionary: If the level is already a key, increment its value; otherwise, add the level as a new key with a value of 1. (Hint:counts.get(key, 0) + 1is a useful pattern).
- Open
- After the loop (outside the
withblock but inside thetryblock, or after thetryblock if you handle the empty case):- Print the heading “Log Level Counts:”.
- Iterate through the
log_level_countsdictionary (.items()) and print each log level and its count.
- Import the
- Run and Test: Execute
log_analyzer.py. The output should show the counts for INFO, DEBUG, WARNING, and ERROR based on your dummy CSV file.
Additional Sources:


