Programming with Python | Chapter 22: Working with CSV and JSON Data
Chapter Objectives
- Understand the structure and common uses of CSV (Comma-Separated Values) files.
- Use Python‘s built-in
csv
module to read and write CSV data.- Read CSV files using
csv.reader
. - Write CSV files using
csv.writer
. - Read CSV files into dictionaries using
csv.DictReader
. - Write data from dictionaries to CSV files using
csv.DictWriter
.
- Read CSV files using
- Understand the structure and common uses of JSON (JavaScript Object Notation) data.
- Use Python’s built-in
json
module to work with JSON data.- Parse JSON strings into Python objects using
json.loads()
. - Read JSON data from files into Python objects using
json.load()
. - Serialize Python objects into JSON strings using
json.dumps()
. - Write Python objects to JSON files using
json.dump()
.
- Parse JSON strings into Python objects using
- Recognize the differences and typical use cases for CSV and JSON.
Introduction
Applications often need to exchange data with other systems or store structured data in a simple text format. Two extremely common formats for this are CSV and JSON. CSV is a simple tabular format often used for spreadsheets and basic database exports, where data values are separated by commas. JSON is a human-readable format based on JavaScript object syntax, widely used for configuration files, data exchange between web servers and clients (APIs), and storing more complex nested data structures. Python’s standard library provides dedicated modules, csv
and json
, to make reading, writing, and manipulating data in these formats straightforward. This chapter covers the basics of working with both CSV and JSON files in Python.
Theory & Explanation
CSV (Comma-Separated Values)
CSV is a plain text format where data is organized into rows, and values within each row are separated by a delimiter, typically a comma. The first row often contains header names for the columns.
Example data.csv:
Name,Department,Salary
Alice,Engineering,80000
Bob,Sales,75000
Charlie,Engineering,90000
Key Characteristics:
- Tabular data (rows and columns).
- Plain text.
- Simple structure.
- Delimiter (usually comma, but can be tab, semicolon, etc.).
- Good for spreadsheet-like data.
- Not inherently suited for nested or complex data structures.
The csv Module:
Python’s csv
module helps handle the nuances of CSV parsing and writing, such as quoting fields that contain commas or newlines.
Reading with csv.reader:
Reads CSV data row by row, where each row is returned as a list of strings.
import csv
try:
with open('data.csv', mode='r', newline='', encoding='utf-8') as csvfile:
# newline='' is important to prevent extra blank rows
csv_reader = csv.reader(csvfile) # Create a reader object
header = next(csv_reader) # Read the header row
print(f"Header: {header}")
for row in csv_reader: # Iterate over remaining rows
# Each 'row' is a list of strings
print(f"Row: {row}")
# Example: Accessing data
# name = row[0]
# salary = int(row[2]) # Remember values are strings!
except FileNotFoundError:
print("Error: data.csv not found.")
except Exception as e:
print(f"An error occurred: {e}")
Note: newline=''
is recommended when opening CSV files to prevent the csv
module from misinterpreting line endings.
Writing with csv.writer:
Writes data (typically lists or tuples) to a CSV file.
import csv
header = ['Name', 'Score']
data = [
['Alice', 95],
['Bob', 88],
['Charlie', 72]
]
try:
with open('scores.csv', mode='w', newline='', encoding='utf-8') as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow(header) # Write a single row (header)
csv_writer.writerows(data) # Write multiple rows from the list of lists
print("Data written to scores.csv")
except Exception as e:
print(f"An error occurred: {e}")
Reading with csv.DictReader:
Reads CSV data row by row, where each row is returned as a dictionary. The keys are taken from the header row (or specified manually). This is often more convenient than csv.reader
.
import csv
try:
with open('data.csv', mode='r', newline='', encoding='utf-8') as csvfile:
# Assumes first row is header
dict_reader = csv.DictReader(csvfile)
print("\nReading with DictReader:")
for row_dict in dict_reader:
# Each 'row_dict' is a dictionary
print(row_dict)
# Example: Accessing data by header name
# print(f"Name: {row_dict['Name']}, Salary: {row_dict['Salary']}")
except FileNotFoundError:
print("Error: data.csv not found.")
except Exception as e:
print(f"An error occurred: {e}")
graph TD A[Start] --> B("Open CSV file <b>data.csv</b>"); B --> C{"Create <i>csv.DictReader(file)</i>"}; C --> D["Read Header Row (implicitly by DictReader)"]; D --> E{"Loop: Iterate over <i>DictReader</i>"}; E -- Row available --> F["Get row as Dictionary <i>row_dict</i>"]; F --> G["Process <i>row_dict[Column]</i>"]; G --> E; E -- No more rows --> H(Close file); H --> I[End]; style F fill:#ccf,stroke:#333,stroke-width:1px style G fill:#cfc,stroke:#333,stroke-width:1px
Writing with csv.DictWriter:
Writes data from a list of dictionaries to a CSV file. Requires specifying the header names (fieldnames
).
import csv
fieldnames = ['id', 'product_name', 'price'] # Must match dictionary keys
data = [
{'id': 101, 'product_name': 'Laptop', 'price': 1200.50},
{'id': 102, 'product_name': 'Mouse', 'price': 25.99},
{'id': 103, 'product_name': 'Keyboard', 'price': 75.00}
]
try:
with open('products.csv', mode='w', newline='', encoding='utf-8') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader() # Write the header row based on fieldnames
writer.writerows(data) # Write rows from the list of dictionaries
print("\nData written to products.csv")
except Exception as e:
print(f"An error occurred: {e}")
JSON (JavaScript Object Notation)
JSON is a lightweight data-interchange format inspired by JavaScript object literal syntax. It’s human-readable and easy for machines to parse and generate. It’s commonly used for configuration files and transmitting data over networks (APIs).
Example config.json:
{
"server_name": "MainServer",
"ip_address": "192.168.1.100",
"port": 8080,
"is_active": true,
"allowed_users": [
"admin",
"guest",
null
],
"settings": {
"theme": "dark",
"max_connections": 100
}
}
Key Characteristics:
- Uses key-value pairs (like Python dictionaries).
- Values can be strings, numbers, booleans (
true
/false
), arrays (like Python lists),null
(like PythonNone
), or other JSON objects (nested structures). - Keys are always strings enclosed in double quotes.
- Human-readable text format.
- Excellent for representing nested or complex data structures.
Python json Module:
The json
module handles serialization (converting Python objects to JSON strings/files) and deserialization (converting JSON strings/files to Python objects).
Mapping between JSON and Python:
JSON Type | Python Equivalent |
---|---|
object (e.g., {"key": "value"} ) |
dict |
array (e.g., [1, "two", null] ) |
list |
string (e.g., "hello" ) |
str |
number (integer, e.g., 123 ) |
int |
number (real/float, e.g., 3.14 , -1.5e2 ) |
float |
true |
True |
false |
False |
null |
None |
Parsing JSON Strings (json.loads)
: Converts a JSON formatted string into a Python object (usually a dictionary or list). loads
stands for “load string”.
import json
json_string = '{"name": "Alice", "age": 30, "city": "New York"}'
try:
python_dict = json.loads(json_string)
print(f"Parsed from string: {python_dict}")
print(f"Name: {python_dict['name']}")
print(f"Type: {type(python_dict)}") # <class 'dict'>
except json.JSONDecodeError as e:
print(f"Error decoding JSON string: {e}")
Reading JSON Files (json.load)
: Reads JSON data from a file-like object (e.g., a file opened for reading) and parses it into a Python object.
import json
# Assume 'config.json' exists with the content shown earlier
try:
with open('config.json', 'r', encoding='utf-8') as f:
config_data = json.load(f) # Load from file object
print(f"\nLoaded from file: {config_data}")
print(f"Server Port: {config_data['port']}")
print(f"Allowed Users: {config_data['allowed_users']}")
print(f"Theme Setting: {config_data['settings']['theme']}")
except FileNotFoundError:
print("Error: config.json not found.")
except json.JSONDecodeError as e:
print(f"Error decoding JSON file: {e}")
except Exception as e:
print(f"An error occurred: {e}")
Serializing to JSON Strings (json.dumps)
: Converts a Python object (dictionaries, lists, strings, numbers, booleans, None
) into a JSON formatted string. dumps
stands for “dump string”.
import json
python_data = {
'id': 123,
'name': 'Example Product',
'tags': ['electronics', 'gadget'],
'in_stock': True,
'price': 49.99,
'related_ids': None
}
# Convert Python dict to JSON string
# indent=2 makes the output nicely formatted (human-readable)
json_output_string = json.dumps(python_data, indent=2)
print("\nPython object serialized to JSON string:")
print(json_output_string)
Writing to JSON Files (json.dump)
: Writes a Python object to a file-like object (e.g., a file opened for writing) as JSON data.
import json
python_data = { # Same data as above
'id': 123,
'name': 'Example Product',
'tags': ['electronics', 'gadget'],
'in_stock': True,
'price': 49.99,
'related_ids': None
}
output_filename = "output_data.json"
try:
with open(output_filename, 'w', encoding='utf-8') as f:
# Write Python object to file as JSON
# indent=2 for pretty printing in the file
json.dump(python_data, f, indent=2)
print(f"\nPython object successfully written to '{output_filename}'")
except Exception as e:
print(f"An error occurred writing JSON file: {e}")
graph TD J[Start] --> K["Have Python Object (e.g., list of dicts <b>data_list</b>)"]; K --> L("Open output file <b>output.json</b> in write mode"); L --> M{"Call <i>json.dump(data_list, file, indent=2)</i>"}; M -- Serializes & Writes --> N("JSON data written to file"); N --> O(Close file); O --> P[End]; style K fill:#f9f,stroke:#333,stroke-width:1px style M fill:#ccf,stroke:#333,stroke-width:1px
CSV vs. JSON
- CSV: Best for simple, flat, tabular data. Easy to view/edit in spreadsheet software. Less flexible for complex or nested data. File size can be smaller for simple tables.
- JSON: Best for structured data, nested data, lists, and representing object state. Native format for web APIs. More flexible structure but can be slightly more verbose than CSV for simple tables.
Code Examples
Example 1: Reading CSV with DictReader
and Writing JSON
# csv_to_json_converter.py
import csv
import json
csv_filepath = 'products.csv' # Assumes this exists from Theory section
json_filepath = 'products.json'
data_list = []
# Read from CSV using DictReader
try:
with open(csv_filepath, mode='r', newline='', encoding='utf-8') as infile:
reader = csv.DictReader(infile)
print(f"Reading data from {csv_filepath}...")
for row in reader:
# Convert price back to float (it was read as string)
try:
row['price'] = float(row['price'])
row['id'] = int(row['id']) # Convert id to int
except ValueError:
print(f"Warning: Could not convert price/id for row: {row}")
# Handle error as needed, e.g., skip row or set default
data_list.append(row)
print(f"Read {len(data_list)} records.")
except FileNotFoundError:
print(f"Error: CSV file '{csv_filepath}' not found.")
exit() # Stop script if input file not found
except Exception as e:
print(f"Error reading CSV: {e}")
exit()
# Write the list of dictionaries to JSON
try:
with open(json_filepath, mode='w', encoding='utf-8') as outfile:
# Use indent for readability in the output JSON file
json.dump(data_list, outfile, indent=4)
print(f"Data successfully written to {json_filepath}")
except Exception as e:
print(f"Error writing JSON: {e}")
graph TD subgraph Read CSV R1(Open CSV File) --> R2{"<i>csv.DictReader</i>"}; R2 --> R3{Loop through rows}; R3 -- Each row --> R4["Get row as <b>Dict</b>"]; R4 --> R5("Append Dict to <i>data_list</i>"); R3 -- Done --> R6(Close CSV File); end subgraph Write JSON W1(Open JSON File) --> W2{"<i>json.dump(data_list, file)</i>"}; W2 --> W3(Close JSON File); end R5 --> W1; style R4 fill:#ccf,stroke:#333,stroke-width:1px style W2 fill:#cfc,stroke:#333,stroke-width:1px
Explanation:
- Reads data from
products.csv
usingcsv.DictReader
, automatically creating dictionaries for each row based on the header. - Converts
price
andid
strings read from CSV back intofloat
andint
respectively, handling potentialValueError
. - Appends each row dictionary to
data_list
. - Writes the entire
data_list
(a list of dictionaries) toproducts.json
usingjson.dump
with indentation for pretty printing.
Example 2: Reading JSON and Writing CSV with DictWriter
# json_to_csv_converter.py
import json
import csv
json_filepath = 'products.json' # Assumes this exists from Example 1
csv_filepath = 'products_from_json.csv'
# Read JSON data from file
try:
with open(json_filepath, mode='r', encoding='utf-8') as infile:
print(f"Reading data from {json_filepath}...")
data_list = json.load(infile) # data_list is now a list of dicts
print(f"Read {len(data_list)} records.")
if not data_list:
print("JSON file is empty. Nothing to write to CSV.")
exit()
except FileNotFoundError:
print(f"Error: JSON file '{json_filepath}' not found.")
exit()
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e}")
exit()
except Exception as e:
print(f"Error reading JSON: {e}")
exit()
# Write data to CSV using DictWriter
try:
# Dynamically get fieldnames from the keys of the first dictionary
# Assumes all dictionaries in the list have the same keys
fieldnames = data_list[0].keys()
with open(csv_filepath, mode='w', newline='', encoding='utf-8') as outfile:
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader() # Write header row
writer.writerows(data_list) # Write data from list of dicts
print(f"Data successfully written to {csv_filepath}")
except Exception as e:
print(f"Error writing CSV: {e}")
Explanation:
- Reads the list of dictionaries from
products.json
usingjson.load
. - Checks if the list is empty.
- Dynamically determines the CSV header (
fieldnames
) by looking at the keys of the first dictionary in the list (this assumes all dictionaries have the same structure). - Uses
csv.DictWriter
with the determinedfieldnames
to write the header and then all the dictionaries fromdata_list
toproducts_from_json.csv
.
Common Mistakes or Pitfalls
CSV newline='':
Forgettingnewline=''
when opening CSV files for use with thecsv
module can lead to extra blank rows being written or read incorrectly on some platforms.- CSV Data Types: Remembering that
csv.reader
andcsv.DictReader
read all fields as strings. You need to manually convert them toint
,float
, etc., if needed, often within atry...except ValueError
. - JSON Keys: JSON standard requires keys to be strings. While Python dictionaries allow other immutable types as keys,
json.dumps
will convert non-string keys to strings (or raise an error depending on options). - JSON Data Types: Trying to serialize Python objects that don’t have a direct JSON equivalent (like custom class instances, sets, dates) using
json.dump
/dumps
will raise aTypeError
unless you provide a custom serialization function (using thedefault
argument) or convert them first. Dates are often converted to ISO format strings. Sets are often converted to lists. load
/dump
vs.loads
/dumps
: Usingload
/dump
when you have a string (useloads
/dumps
) or vice-versa when you have a file object. Remembers
stands for “string”.- JSON Decoding Errors: Invalid JSON syntax in a file or string will cause
json.JSONDecodeError
during parsing (load
/loads
).
Chapter Summary
Format | Module | Function/Class | Purpose | Key Syntax/Example |
---|---|---|---|---|
CSV | csv |
csv.reader(file_obj) |
Reads CSV row by row, returning each row as a list of strings. | reader = csv.reader(f) header = next(reader) for row in reader: print(row) |
CSV | csv |
csv.writer(file_obj) |
Creates a writer object to write data (lists/tuples) to a CSV file. | writer = csv.writer(f) writer.writerow(['Head1', 'Head2']) writer.writerows([['r1c1','r1c2'], ['r2c1','r2c2']]) |
CSV | csv |
csv.DictReader(file_obj) |
Reads CSV row by row, returning each row as a dictionary using the header row as keys. | reader = csv.DictReader(f) for row_dict in reader: print(row_dict['HeaderName']) |
CSV | csv |
csv.DictWriter(file_obj, fieldnames) |
Creates a writer object to write data from dictionaries to a CSV file. Requires specifying `fieldnames`. | fields = ['col1', 'col2'] writer = csv.DictWriter(f, fieldnames=fields) writer.writeheader() writer.writerows([{'col1': 'a', 'col2': 1}]) |
JSON | json |
json.loads(json_string) |
Parses (deserializes) a JSON formatted string into a Python object (dict, list, etc.). ‘s’ for string. | data = json.loads('{"key": "value"}') |
JSON | json |
json.load(file_obj) |
Reads and parses (deserializes) JSON data from a file-like object into a Python object. | with open('data.json') as f: data = json.load(f) |
JSON | json |
json.dumps(python_obj, indent=None) |
Serializes a Python object into a JSON formatted string. ‘s’ for string. `indent` for pretty printing. | json_str = json.dumps({'key': 'val'}, indent=2) |
JSON | json |
json.dump(python_obj, file_obj, indent=None) |
Serializes a Python object and writes it to a file-like object as JSON. `indent` for pretty printing. | with open('out.json', 'w') as f: json.dump(data, f, indent=2) |
- CSV is a simple, text-based format for tabular data, using delimiters (usually commas). Python’s
csv
module helps read (reader
,DictReader
) and write (writer
,DictWriter
) CSV files. Remembernewline=''
and that values are read as strings. - JSON is a text-based format for structured data using key-value pairs and arrays, common in web APIs and configuration. Python’s
json
module handles serialization and deserialization.json.loads(string)
: Parses JSON string to Python object.json.load(file_obj)
: Parses JSON file object to Python object.json.dumps(py_obj)
: Serializes Python object to JSON string. Useindent
for pretty printing.json.dump(py_obj, file_obj)
: Serializes Python object to JSON file object. Useindent
.
- Choose CSV for simple tabular data and JSON for nested/complex data structures or web data exchange.
Exercises & Mini Projects
Exercises
- Write Simple CSV: Create a list of lists representing student names and scores
[['Name', 'Score'], ['Alice', 90], ['Bob', 85]]
. Write this data to a file namedreport.csv
usingcsv.writer
. - Read Simple CSV: Read the
report.csv
file created in Exercise 1 usingcsv.reader
. Print the header and each data row. - Write Dict to CSV: Create a list of dictionaries, where each dictionary represents a person with ‘name’ and ‘city’ keys. Write this list to
people.csv
usingcsv.DictWriter
. - Read CSV to Dict: Read the
people.csv
file created in Exercise 3 usingcsv.DictReader
and print each dictionary row. - Write JSON: Create a Python dictionary containing information about a movie (title, director, year, actors (list)). Write this dictionary to a file named
movie.json
usingjson.dump
with an indent of 2. - Read JSON: Read the
movie.json
file created in Exercise 5 usingjson.load
. Print the director’s name and the list of actors from the loaded Python dictionary.
Mini Project: CSV Log Analyzer
Goal: Read a CSV log file (you’ll create a dummy one) and count the occurrences of different log levels (e.g., INFO, WARNING, ERROR).
Steps:
Create Dummy Log CSV (system.log.csv):
- Create a CSV file with columns like
Timestamp
,LogLevel
,Message
.Add several rows with different log levels (INFO, DEBUG, WARNING, ERROR). Example row:2024-04-12 18:20:00,INFO,User logged in
Timestamp,LogLevel,Message 2024-04-12 18:20:00,INFO,User logged in 2024-04-12 18:21:15,DEBUG,Checking database connection 2024-04-12 18:22:05,WARNING,Low disk space detected 2024-04-12 18:23:00,INFO,Data processed successfully 2024-04-12 18:24:30,ERROR,Failed to write to file 2024-04-12 18:25:00,INFO,User logged out 2024-04-12 18:26:00,WARNING,Configuration value missing
- Create a CSV file with columns like
Write the Python Script (log_analyzer.py):
- Import the
csv
module. - Initialize an empty dictionary
log_level_counts = {}
to store the counts. - Use
try...except FileNotFoundError
to handle the case where the log file doesn’t exist. - Inside the
try
block:- Open
system.log.csv
usingwith open(...)
in read mode ('r'
,newline=''
). - Create a
csv.DictReader
for the file. - Loop through each
row
(dictionary) provided by theDictReader
. - Get the
LogLevel
value from the currentrow
dictionary. - Update the
log_level_counts
dictionary: If the level is already a key, increment its value; otherwise, add the level as a new key with a value of 1. (Hint:counts.get(key, 0) + 1
is a useful pattern).
- Open
- After the loop (outside the
with
block but inside thetry
block, or after thetry
block if you handle the empty case):- Print the heading “Log Level Counts:”.
- Iterate through the
log_level_counts
dictionary (.items()
) and print each log level and its count.
- Import the
- Run and Test: Execute
log_analyzer.py
. The output should show the counts for INFO, DEBUG, WARNING, and ERROR based on your dummy CSV file.
Additional Sources: