Chapter 13: Exploring and Rewriting History
Chapter Objectives
By the end of this chapter, you will be able to:
- Use
git reflog
as a safety net to recover lost commits or branches. - Employ advanced
git log
techniques for filtering and custom formatting of commit history. - Search project history for specific changes using
git log -S
(pickaxe) andgit log -G
. - Use
git blame
to identify who last modified specific lines in a file and in which commit. - Summarize project contributions using
git shortlog
. - Understand the purpose and severe implications of advanced history rewriting tools like
git filter-repo
. - Appreciate the risks involved in rewriting shared history and when such actions might (cautiously) be considered.
Introduction
So far in your Git journey, you’ve learned to create commits, branch, merge, and even perform some history modifications like amending commits and rebasing branches. These skills are crucial for day-to-day development. However, Git’s power extends further, offering sophisticated tools to explore your project’s evolution in detail and, when absolutely necessary, to rewrite that history on a larger scale.
In this chapter, we’ll delve into techniques for deeply inspecting your repository’s past. We’ll start with git reflog
, an essential safety net that records updates to the tips of branches and other references in your local repository, allowing you to recover from seemingly disastrous mistakes. We’ll then explore advanced git log
options for pinpointing specific commits through powerful filtering and custom formatting. You’ll learn how to search for when a particular piece of code was introduced or changed, and how to trace the authorship of every line in your files using git blame
. We’ll also look at git shortlog
for summarizing contributions.
Finally, we will cautiously approach the topic of advanced history rewriting. While tools like git rebase -i
(covered in Chapter 12) allow for localized history editing, sometimes more drastic changes are needed, such as removing a sensitive file from all commits or correcting author information throughout the project’s history. We’ll briefly discuss the older git filter-branch
and its modern, recommended replacement, git filter-repo
. These are powerful but dangerous tools, and their use, especially on shared history, comes with significant caveats that we will emphasize strongly.
Understanding these advanced capabilities will not only make you a more proficient Git user but also provide you with the tools to maintain a clean, understandable, and secure project history.
Theory
Your Safety Net: git reflog
Mistakes happen. You might accidentally delete a branch, perform a git reset --hard
that discards commits you later realize you needed, or a rebase might go awry. Before you panic, remember git reflog
.
The reflog (reference log) is a mechanism in Git that records when the tips of branches and other references (like HEAD
) were updated in your local repository. Think of it as Git’s journal, noting every significant move you make. Each entry in the reflog has an index, like HEAD@{0}
, HEAD@{1}
, etc., where HEAD@{0}
is the most recent state of HEAD
.
How git reflog Works:
When you switch branches, commit, reset, or amend, Git updates the reflog for HEAD and the reflog for the affected branch(es). This log is stored locally in your .git directory (specifically in .git/logs/) and is not part of the pushed repository; it won’t be shared with collaborators when you push.
Why it’s a Safety Net:
Because the reflog tracks these movements, even if a commit is no longer reachable by any branch or tag (it’s “orphaned”), it might still be in the reflog. This allows you to find its SHA-1 hash and potentially recover it by creating a new branch from it, checking it out, or resetting to it.
Reflog Entries:
A typical reflog entry shows:
- The reflog pointer (e.g.,
HEAD@{index}
). - The SHA-1 hash of the commit the pointer referred to.
- The action that caused the update (e.g.,
commit
,rebase
,reset
,checkout
). - A short description of the action or the commit message.
Expiration:
Reflog entries do expire. By default, reachable entries are kept for 90 days, and unreachable entries for 30 days. This is configurable but rarely needs changing for typical use.
graph RL subgraph "Scenario: Accidental git reset --hard" direction BT subgraph "1- Initial State (main branch)" C1["C1: Initial Commit"] --> C2["C2: Add feature A"] C2 --> C3["C3: Add feature B<br>(main, HEAD)"] class C1,C2,C3 commit; class C3 currentHEAD; end subgraph "2- Accidental git reset --hard HEAD~2" ResetOp["git reset --hard HEAD~2"] --> C1_Reset["C1: Initial Commit<br>(main, HEAD)"] C2_Lost["C2: Add feature A<br>(Orphaned)"] C3_Lost["C3: Add feature B<br>(Orphaned)"] C1_Reset -.-> C2_Lost; C2_Lost -.-> C3_Lost; class ResetOp operation; class C1_Reset currentHEAD; class C2_Lost,C3_Lost orphanedCommit; end subgraph "3- Using git reflog to Recover" Reflog["git reflog shows:"] Entry0["HEAD@{0}: reset: moving to HEAD~2 (to C1 hash)"] Entry1["HEAD@{1}: commit: C3: Add feature B (C3 hash)"] Entry2["HEAD@{2}: commit: C2: Add feature A (C2 hash)"] Entry3["HEAD@{3}: commit: C1: Initial Commit (C1 hash)"] Reflog --> Entry0 --> Entry1 --> Entry2 --> Entry3; RecoverOp["git reset --hard HEAD@{1}<br>(or git reset --hard C3_hash)"] Entry1 -- "Use this entry" --> RecoverOp; class Reflog operation; class Entry0,Entry1,Entry2,Entry3 reflogEntry; class RecoverOp operation; end subgraph "4- Recovered State (main branch)" C1_Rec["C1: Initial Commit"] --> C2_Rec["C2: Add feature A"] C2_Rec --> C3_Rec["C3: Add feature B<br>(main, HEAD)"] class C1_Rec,C2_Rec,C3_Rec commit; class C3_Rec currentHEAD; end end classDef commit fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; classDef currentHEAD fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46,font-weight:bold; classDef orphanedCommit fill:#FEE2E2,stroke:#DC2626,stroke-width:1.5px,color:#991B1B,font-style:italic; classDef operation fill:#FEF3C7,stroke:#D97706,stroke-width:1px,color:#92400E; classDef reflogEntry fill:#FFFBEB,stroke:#FBBF24,stroke-width:1px,color:#78350F,font-size:11px;
Advanced git log
Techniques
Chapter 4 introduced git log
for viewing commit history. Now, let’s explore its more advanced capabilities for filtering and formatting.
Filtering Commits:
git log offers numerous options to narrow down the commits displayed:
- By Author/Committer:
git log --author="John Doe"
: Shows commits where the author field matches “John Doe”.git log --committer="Jane Doe"
: Shows commits where the committer field matches “Jane Doe”. (Author is who wrote the patch; committer is who applied it. They are often the same.)
- By Date/Time:
git log --since="2 weeks ago"
orgit log --since="2023-01-01"
git log --until="1 day ago"
orgit log --until="2023-12-31"
git log --after="2023-01-01" --before="2023-01-31"
- By Message Content:
git log --grep="Fixes bug #123"
: Shows commits whose messages contain the specified string (case-sensitive by default; use-i
for case-insensitive).
- By File/Path:
git log -- <path/to/file_or_directory>
: Shows commits that affected the specified file or directory. The--
is important to separate paths from branch names or other options.
- By Commit Range:
git log main..feature
: Shows commits onfeature
branch that are not onmain
.git log <commit_hash1>..<commit_hash2>
: Shows commits reachable from<commit_hash2>
but not from<commit_hash1>
.git log <commit_hash>^..<commit_hash>
: Shows commits from<commit_hash>
up to its parent.
- By Number of Commits:
git log -n 5
orgit log -5
: Shows the last 5 commits.
Formatting Log Output:
The default git log format can be verbose. You can customize it extensively:
- Predefined Formats:
--oneline
: Shows each commit as a single line (SHA-1 abbreviation and commit title).--short
,--medium
(default),--full
,--fuller
.
- Graph and Decoration:
--graph
: Displays an ASCII art graph of the branch and merge history.--decorate
: Shows branch and tag names pointing to commits. Often used with--oneline --graph
.
- Custom Formatting with –pretty=format:<string>:This is extremely powerful. The <string> can contain placeholders that Git replaces with information from the commit object. Common placeholders include:
Placeholder | Description |
---|---|
%H | Full commit hash (SHA-1) |
%h | Abbreviated commit hash |
%T | Full tree hash |
%t | Abbreviated tree hash |
%P | Full parent hashes (space separated) |
%p | Abbreviated parent hashes (space separated) |
%an | Author name |
%ae | Author email |
%ad | Author date (format respects --date= option) |
%ar | Author date, relative (e.g., “2 weeks ago”) |
%cn | Committer name |
%ce | Committer email |
%cd | Committer date (format respects --date= option) |
%cr | Committer date, relative |
%s | Subject (commit message title – first line) |
%b | Body (rest of the commit message) |
%N | Commit notes |
%d | Ref names (branches, tags) pointing to this commit, like --decorate |
%D | Ref names without the ” (HEAD -> main)” part. |
%gD | Reflog selector, e.g., refs/stash@{1} |
%gs | Reflog subject |
%C(<color_name>) | Switch to specified color (e.g., red , green , blue , yellow , magenta , cyan , bold , ul , dim , reset ) |
%Creset | Reset color to default |
%n | Newline character |
%% | A raw ‘%’ character |
- Example: git log –pretty=”format:%C(yellow)%h %C(reset)%ad %C(cyan)%s %C(bold red)%d %C(reset)[%an]”This would output: abbreviated hash (yellow), author date, subject (cyan), ref names (bold red), and author name.
Filter Category | Option / Example | Description |
---|---|---|
By Author/Committer | --author="John Doe" |
Shows commits where the author field matches “John Doe”. |
--committer="Jane Doe" |
Shows commits where the committer field matches “Jane Doe”. | |
By Date/Time | --since="2 weeks ago" |
Shows commits made in the last two weeks. Also accepts specific dates like --since="2023-01-01" . |
--until="1 day ago" |
Shows commits made up until one day ago. Also accepts specific dates like --until="2023-12-31" . |
|
--after="YYYY-MM-DD" --before="YYYY-MM-DD" |
Shows commits within a specific date range. | |
By Message Content | --grep="Fixes bug #123" |
Shows commits whose messages contain the specified string. Use -i for case-insensitivity. |
By File/Path | -- <path/to/file_or_dir> |
Shows commits that affected the specified file or directory. The -- separates paths from other options/revisions. |
By Commit Range | main..feature |
Shows commits on the feature branch that are not on the main branch. |
<hash1>..<hash2> |
Shows commits reachable from <hash2> but not from <hash1> . |
|
<branch>~N..<branch> |
Shows the last N commits on <branch> . E.g., main~3..main for the last 3. |
|
By Number of Commits | -n <number> or -<number> |
Shows the last <number> commits (e.g., -5 for the last 5). |
By Code Changes (Pickaxe & Regex) | -S"string" |
Shows commits that changed the number of occurrences of “string” in the diff (i.e., added or removed the string). |
-G"regex" |
Shows commits where the added/removed lines in the patch text match the given POSIX regular expression. | |
By Merge/No-Merge | --merges / --no-merges |
Shows only merge commits, or excludes merge commits, respectively. |
Searching History for Code Changes
Sometimes you need to find when a specific piece of code was introduced or modified, or which commits affected lines matching a certain pattern.
- git log -S”string” (Pickaxe Search):The -S option (often called the “pickaxe” because it helps you pick out commits) looks for commits that changed the number of occurrences of the specified string. This means it finds commits that either introduced or removed that string. It’s different from –grep which searches commit messages. -S searches the actual diff/changes.For example, git log -S”myFunctionName” would show commits where myFunctionName was added or deleted.
- git log -G”regex” (Regex Search in Diffs):The -G option searches for differences whose patch text contains added/removed lines that match the given POSIX regular expression. This is more general than -S as it doesn’t just count occurrences but looks for the pattern in the diff lines themselves.For example, git log -G”user_id\s*=\s*\d+” would find commits where lines matching user_id = <number> were added or removed.
Who Changed What and When: git blame
git blame
is a powerful tool for line-by-line annotation of a file. For each line, it shows:
- The SHA-1 of the commit that last modified the line.
- The author of that commit.
- The timestamp of that commit.
- The line number.
- The content of the line.
Usage:
git blame <filename>
This is invaluable for understanding the history of a specific piece of code, finding out who wrote it, when, and in what context (by looking up the commit).
Common git blame
Options:
-L <start>,<end>
or-L :<funcname>
: Restrict blame output to the specified line range or function.-e
: Show author email instead of name.-w
: Ignore whitespace changes when blaming.-C
: Besides blaming lines that were modified in a commit, also blame lines that were copied or moved from other files modified in the same commit.-M
: Besides blaming lines that were modified in a commit, also blame lines that were copied or moved from other files modified in any commit. (More computationally expensive).
Command | Option | Description |
---|---|---|
git blame | ||
git blame <file> |
-L <start>,<end> |
Restrict blame output to the specified line range (e.g., -L 10,20 ). |
-L :<funcname> |
Restrict blame output to lines within the function matching <funcname> (if supported by language). |
|
-e or --show-email |
Show author email addresses instead of names. | |
-w or --ignore-ws |
Ignore whitespace changes when determining the commit that last modified a line. | |
-C / -M |
Track lines moved or copied from other files within the same commit (-C ) or any commit (-M ). Can be computationally expensive. |
|
git shortlog | ||
git shortlog |
-s or --summary |
Suppress commit descriptions, showing only a count of commits per author. |
-n or --numbered |
Sort output according to the number of commits per author in descending order (instead of alphabetically by author name). | |
-e or --email |
Show author email addresses in the output. | |
(log options) | Can be combined with most git log filtering options (e.g., main..feature , --since ). |

Summarizing Contributions: git shortlog
git shortlog
summarizes the output of git log
in a way that’s often useful for release notes or getting an overview of who contributed what. It groups commits by author and displays the first line of each commit message.
Common git shortlog
Options:
-s
or--summary
: Suppress commit descriptions, showing only a count of commits per author.-n
or--numbered
: Sort output according to the number of commits per author in descending order (instead of alphabetically).-e
or--email
: Show author email addresses.- Can be combined with regular
git log
filtering options (e.g.,git shortlog main..feature -sn
).
Advanced History Rewriting (Use With Extreme Caution!)
Sometimes, you might need to make sweeping changes to your project’s history. This is not something to be done lightly, as rewriting published history is dangerous and can cause significant problems for collaborators. Always back up your repository before attempting these operations.
Common Reasons for Large-Scale History Rewriting:
- Removing sensitive data (passwords, private keys) accidentally committed.
- Changing author email addresses or names across all commits.
- Removing a large file or directory that was mistakenly added to the repository’s entire history.
- Splitting a subdirectory into its own separate Git repository.
- Standardizing commit messages.
1. git filter-branch (The Old Way – Largely Superseded)
git filter-branch was Git’s original tool for complex history rewriting. It’s incredibly powerful but also notoriously slow, cumbersome to use correctly, and can be error-prone. While you might encounter it in older documentation, git filter-repo is now the recommended tool for most such tasks.
filter-branch
works by checking out each commit and running a specified filter (e.g., a shell command) on it. Common filters include:
--tree-filter
: Modifies files in the working directory. Slowest.--index-filter
: Modifies files in the staging area (index). Faster.--commit-filter
: Modifies commit metadata (e.g., author, message).--env-filter
: Modifies environment variables affecting commit info (author/committer name, email, date).--msg-filter
: Modifies commit messages.--subdirectory-filter
: Turns a subdirectory into the root of the repository.
Warning:
git filter-branch
rewrites commit SHAs. If you’ve pushed the history you’re filtering, you’ll need to force push, which is highly disruptive. It also leaves your original refs backed up inrefs/original/
, which you’d typically clean up after verifying the rewrite. It is strongly recommended to usegit filter-repo
instead.
2. git filter-repo (The Modern, Recommended Tool)
git filter-repo is a third-party tool (not part of core Git, needs installation) designed as a faster, safer, and more user-friendly replacement for git filter-branch. It provides a cleaner interface for common history rewriting tasks.
Key Advantages of git filter-repo
:
- Speed: Significantly faster than
filter-branch
. - Safety: Includes more safeguards and generally has saner defaults. It typically requires you to work on a fresh clone.
- Simplicity: Command-line options are often more intuitive for common tasks.
Installation:
git filter-repo is typically installed as a Python script, often via pip:
pip install git-filter-repo
Common git filter-repo
Use Cases:
- Removing files or paths:git filter-repo –path secret.txt –invert-paths (removes secret.txt)git filter-repo –path-glob ‘*.tmp’ –invert-paths (removes all .tmp files)
- Changing author/committer info:You can use a mailmap file or callbacks. For example, to change an old email:git filter-repo –mailmap .mailmapWhere .mailmap contains lines like: New Name <new@example.com> Old Name <old@example.com>
- Stripping blobs larger than a certain size:git filter-repo –strip-blobs-bigger-than 10M
- Extracting a subdirectory to become the new root:git filter-repo –path-rename old/sub/dir: .
Critical Warnings for git filter-repo
(and any history rewriting):
- Backup Your Repository: Always perform these operations on a fresh clone or ensure you have a reliable backup.
- Rewrites History: Like
filter-branch
, it changes commit SHAs from the point of the first modified commit onwards. - Shared History: DO NOT use on branches that have been pushed and are being used by collaborators unless you have coordinated with your entire team. Everyone will need to re-clone or perform complex recovery steps on their local repositories. The “Golden Rule of Rebasing” applies even more strongly here.
- Local Cleanup: After using
filter-repo
, Git might have old objects. Runninggit gc --prune=now --aggressive
can help clean these up, butfilter-repo
often handles much of this.
filter-repo
is a powerful tool for repository surgery. Treat it with respect and understand its consequences before use.
Feature / Aspect | git filter-branch (Old) |
git filter-repo (Modern) |
---|---|---|
Status | Largely superseded, complex, error-prone. Part of core Git. | Recommended replacement. Faster, safer, more user-friendly. Third-party tool (requires installation). |
Performance | Notoriously slow, especially on large repositories. | Significantly faster. |
Ease of Use | Cumbersome syntax, requires careful handling of shell quoting and filters. Easy to make mistakes. | More intuitive command-line options for common tasks. Clearer error messages. |
Safety | Can be dangerous; easy to corrupt repository if not used correctly. Backs up original refs to refs/original/ . |
More safeguards. Often requires working on a fresh clone. Better defaults. Handles cleanup more gracefully. |
Common Tasks | Removing files, changing author info, splitting subdirectories via various filters (--tree-filter , --index-filter , --env-filter , etc.). |
Similar tasks, but with dedicated options like --path , --path-rename , --strip-blobs-bigger-than , --mailmap , --replace-text . |
History Rewriting | Yes, rewrites commit SHAs. | Yes, rewrites commit SHAs. |
Impact on Shared History | BOTH TOOLS ARE EXTREMELY DISRUPTIVE TO SHARED HISTORY. The “Golden Rule of Rebasing” applies even more strongly. Requires full team coordination if used on pushed branches. All collaborators will need to re-clone or reset their local copies. |
|
Recommendation | Avoid if possible. Use git filter-repo instead for new history rewriting tasks. |
The preferred tool for most large-scale history rewriting needs. |
Practical Examples
Setup:
Let’s create a repository with a bit of history for our examples.
# Create and navigate to a new directory
mkdir git-history-lab
cd git-history-lab
# Initialize a Git repository
git init
# Configure user (if not set globally for these examples)
# git config user.name "Your Name"
# git config user.email "youremail@example.com"
# Commit 1
echo "Initial content for file1.txt" > file1.txt
git add file1.txt
git commit -m "C1: Add file1.txt"
# Commit 2 (as a different author for variety)
echo "Feature A in file2.txt" > file2.txt
git add file2.txt
git commit --author="Jane Doe <jane@example.com>" -m "C2: Add file2.txt by Jane"
# Commit 3
echo "Update file1.txt with more data" >> file1.txt
git add file1.txt
git commit -m "C3: Update file1.txt"
# Let's add a specific string we can search for later
echo "SensitiveData_XYZ" >> file1.txt
git add file1.txt
git commit -m "C4: Add sensitive data to file1 (oops)"
# Commit 5
echo "Another update to file2.txt" >> file2.txt
git add file2.txt
git commit --author="Jane Doe <jane@example.com>" -m "C5: Jane updates file2.txt again"
# Commit 6
echo "Final touch on file1.txt" >> file1.txt
git add file1.txt
git commit -m "C6: Finalize file1"
# Create a branch and make a commit there
git checkout -b feature/new-stuff C3
echo "Content for feature branch" > feature_file.txt
git add feature_file.txt
git commit -m "F1: Add feature_file.txt on feature branch"
git checkout main # Back to main branch
1. Using git reflog
as a Safety Net
Let’s simulate losing a commit. Suppose we accidentally reset main too far back.
Currently, main is at C6.
# Check current HEAD
git log -n 1 --oneline main
# Expected: <hash_C6> C6: Finalize file1
# Oops! Accidentally reset main back by 3 commits
git reset --hard HEAD~3 # This will discard C6, C5, C4 from main
git log -n 1 --oneline main
# Expected: <hash_C3> C3: Update file1.txt
# Oh no! C4, C5, and C6 are "gone" from main!
Now, use git reflog
to find the lost commits:
git reflog
Expected Output (will vary, look for recent entries):
<hash_C3> HEAD@{0}: reset: moving to HEAD~3
<hash_C6> HEAD@{1}: commit: C6: Finalize file1
<hash_C5> HEAD@{2}: commit: C5: Jane updates file2.txt again
<hash_C4> HEAD@{3}: commit: C4: Add sensitive data to file1 (oops)
<hash_F1> HEAD@{4}: checkout: moving from feature/new-stuff to main
<hash_F1> HEAD@{5}: commit: F1: Add feature_file.txt on feature branch
... (older entries)
We can see HEAD@{1}
was commit C6 (<hash_C6>
). We want to restore main
to this state.
# Restore main to the state of C6
git reset --hard <hash_C6> # Replace <hash_C6> with the actual hash from your reflog output
# Or, more generally, if C6 was HEAD@{1} before the reset:
# git reset --hard main@{1} # (If using branch reflog, or HEAD@{1} if it was the immediate previous state of HEAD)
git log -n 1 --oneline main
# Expected: <hash_C6> C6: Finalize file1
The commits C4, C5, and C6 are now back on the main
branch.
Recovering a deleted branch:
# Let's say feature/new-stuff was important
git branch -D feature/new-stuff # Delete the branch
# Oh no, it's gone!
git reflog
# Look for the last commit on feature/new-stuff, e.g.,
# <hash_F1> HEAD@{...}: commit: F1: Add feature_file.txt on feature branch
# Or an entry like: checkout: moving from feature/new-stuff to main
# Recover the branch
git checkout -b feature/new-stuff-recovered <hash_F1> # Replace <hash_F1> with actual hash
2. Advanced git log
Examples
# Show commits by Jane Doe
git log --author="Jane Doe" --oneline
# Expected output:
# <hash_C5> C5: Jane updates file2.txt again
# <hash_C2> C2: Add file2.txt by Jane
# Show commits on main since C3, affecting file1.txt
git log C3..main --oneline -- file1.txt
# Expected output:
# <hash_C6> C6: Finalize file1
# <hash_C4> C4: Add sensitive data to file1 (oops)
# Custom pretty format
git log -n 3 --pretty="format:%h %ar: %s [%an]" --graph
# Expected output (structure):
# * <hash_C6> X days ago: C6: Finalize file1 [Your Name]
# * <hash_C5> X days ago: C5: Jane updates file2.txt again [Jane Doe]
# * <hash_C4> X days ago: C4: Add sensitive data to file1 (oops) [Your Name]
# Find commits that introduced or removed "SensitiveData_XYZ"
git log -S"SensitiveData_XYZ" --oneline -p
# Expected: Should show commit C4 and its diff highlighting the addition.
# Find commits where diffs contain "file2" (case insensitive)
git log -G"file2" -i --oneline
# Expected: C2 and C5
3. Using git blame
# Blame file1.txt
git blame file1.txt
Expected Output (structure, hashes and dates will vary):
^<hash_C1> (Your Name 2024-05-15 10:00:00 +0300 1) Initial content for file1.txt
<hash_C3> (Your Name 2024-05-15 10:02:00 +0300 2) Update file1.txt with more data
<hash_C4> (Your Name 2024-05-15 10:03:00 +0300 3) SensitiveData_XYZ
<hash_C6> (Your Name 2024-05-15 10:05:00 +0300 4) Final touch on file1.txt
This shows who last changed each line and in which commit.
# Blame only lines 2-3 of file1.txt
git blame -L 2,3 file1.txt
# Expected output:
# <hash_C3> (Your Name 2024-05-15 10:02:00 +0300 2) Update file1.txt with more data
# <hash_C4> (Your Name 2024-05-15 10:03:00 +0300 3) SensitiveData_XYZ
4. Using git shortlog
# Summarize commit counts by author, sorted by count
git shortlog -sn
Expected Output:
4 Your Name
2 Jane Doe
# Show commit messages grouped by author (Jane Doe only)
git shortlog --author="Jane Doe"
Expected Output:
Jane Doe (2):
C2: Add file2.txt by Jane
C5: Jane updates file2.txt again
5. Advanced History Rewriting with git filter-repo
(Illustrative)
CRITICAL WARNING: The following commands rewrite history. ALWAYS run them on a fresh clone of your repository that you can afford to mess up. Never run them on your primary working copy without a backup, and especially not on a repository that has been shared if you haven’t coordinated with all collaborators.
Scenario: Remove the accidentally committed SensitiveData_XYZ
string from file1.txt
throughout the entire history. (This is a simplified example; real sensitive data removal might require more complex patterns or blob filtering).
First, you’d need to install git-filter-repo if you haven’t already:
pip install git-filter-repo
(or other method depending on your OS/Python setup).
On a fresh clone of git-history-lab
:
# cd ../
# git clone git-history-lab git-history-lab-filtered
# cd git-history-lab-filtered
# Example: Remove a file named 'secret.txt' if it existed
# git filter-repo --path secret.txt --invert-paths --force
# Example: To remove the line "SensitiveData_XYZ" from file1.txt in all commits.
# This is a bit more involved with filter-repo directly for content.
# A common approach is to use --blob-callback or --content-filter.
# Here's a conceptual --replace-text example (ensure your filter-repo version supports it or adapt):
# Create a replacements file, e.g., `replacements.txt`:
# SensitiveData_XYZ==REDACTED_DATA
# Then run (syntax might vary based on filter-repo version and specific needs):
# git filter-repo --replace-text replacements.txt --paths file1.txt --force
# For this specific string, we could also target the commit that added it (C4)
# and use interactive rebase to edit that commit if it's simple enough.
# However, filter-repo is for repository-wide changes.
# A more robust filter-repo way for content might be:
# git filter-repo --content-filter \
# --expression 'blob.data = blob.data.replace(b"SensitiveData_XYZ", b"REDACTED")' \
# --force
After running such a command, git filter-repo
will process all commits. Commit C4 (and any subsequent commit) would have new SHA-1 hashes. file1.txt
in the history would no longer contain “SensitiveData_XYZ”.
Verify:
git log --oneline
# Observe changed SHAs for C4, C5, C6.
git show <new_hash_C4> # Check content of file1.txt in the new C4
The output of git show
for the new C4 should not contain “SensitiveData_XYZ”.
Remember to clean up:
After verifying, you might push this to a new remote repository if this was a cleanup of a private repo, or if replacing a shared repo, all collaborators would need to re-clone or reset their local copies to the new history. This is a major disruptive operation.
OS-Specific Notes
git filter-repo
Installation:git filter-repo
is a Python script. The most common way to install it is using Python’s package installer,pip
:pip install git-filter-repo
.- Windows: You’ll need Python and pip installed and in your PATH.
- macOS/Linux: Python is usually pre-installed. You might need to install
pip
if it’s not available (python3 -m ensurepip --upgrade
or via your system’s package manager likeapt install python3-pip
). You might also usepip3
ifpython3
is your default.
- Shell Quoting for
git log --pretty=format:
:- When using complex format strings with spaces or special characters in
git log --pretty=format:"..."
, quoting rules can differ slightly between shells (Bash, Zsh, PowerShell, Windows CMD). - Bash/Zsh (Linux/macOS): Single quotes (
'...'
) are generally safer for literal strings, while double quotes ("..."
) allow variable expansion (which you usually don’t want in the format string itself). - PowerShell (Windows): PowerShell has its own quoting rules. Often, enclosing the format string in double quotes works, but complex characters might need escaping with a backtick (
`
). - Windows CMD: CMD’s quoting is more limited. Using Git Bash (which comes with Git for Windows) often provides a more consistent experience for complex Git commands.
- When using complex format strings with spaces or special characters in
- Performance of History Rewriting:
git filter-branch
(the old tool) is notoriously slow, especially on large repositories and on Windows due to its reliance on shell scripts and frequent process creation.git filter-repo
is significantly faster across all platforms because it’s optimized in Python and works more directly with Git data.
- Case Sensitivity in Searches:
- When using
git log -S
or-G
, remember that the default search is case-sensitive. Use the-i
option (e.g.,git log -S"mystring" -i
) for case-insensitive searches. This behavior is consistent across OSs, but the underlying filesystem’s case sensitivity (e.g., typically sensitive on Linux, insensitive on Windows/default macOS) is a separate factor that can affect how files are named and found.
- When using
Common Mistakes & Troubleshooting Tips
Git Issue / Error | Symptom(s) | Troubleshooting / Solution |
---|---|---|
git reflog is Local Only |
Expecting git reflog to show history from a remote or recover commits lost on another collaborator’s machine. Not finding expected commits after fetching. |
Understand that reflog tracks local HEAD and branch tip movements. It’s your personal safety net.
|
Complex git log Filters Not Working |
git log returns no commits, incorrect commits, or errors with complex filter combinations (dates, authors, regex). |
|
Running filter-branch / filter-repo on a Live Shared Repository |
CRITICAL: Collaborators encounter major issues pulling/pushing; histories diverge; duplicated commits appear; general chaos. |
The Golden Rule: DO NOT REWRITE SHARED/PUBLISHED HISTORY without extreme caution and full team coordination.
|
Losing Original Refs After filter-branch |
Deleting the refs/original/ backup refs created by filter-branch before being absolutely certain the rewrite was successful. Realizing the filter had an error later. |
filter-branch saves original refs in refs/original/namespace/ . Do not delete this namespace until 100% sure the filtered history is correct and desired.
( git filter-repo has different, often safer, backup/recovery mechanisms or expects operation on a clone).
|
git blame Points to a Reformatting/Refactoring Commit |
git blame output attributes a line to a commit that only changed formatting (e.g., indentation, auto-formatting) or moved code, not the commit that introduced the logic. |
|
git filter-repo not found or not working |
Command git filter-repo results in “command not found” or Python errors. |
|
Exercises
Use the git-history-lab
repository you set up. You may need to reset its state or add new commits for some exercises.
- Reflog Branch Recovery Drill:
- On
main
, create a new branchtemp-feature
. - Add two commits to
temp-feature
. - Switch back to
main
. - Accidentally delete the
temp-feature
branch usinggit branch -D temp-feature
. - Use
git reflog
to find the SHA-1 of the last commit made ontemp-feature
. - Recover the
temp-feature
branch by checking out that commit to a new branch namedtemp-feature-recovered
.
- On
- Log Detective Work:
- Using
git log
on yourgit-history-lab
repository:- Find all commits made by “Jane Doe” that affected
file2.txt
. - Find all commits on
main
made in the last hour (if you made recent commits; otherwise, adjust the timeframe or use a specific date range from your setup) that contain the word “Update” (case-insensitive) in their commit message. - Display the last 3 commits on
main
using a custom format that shows: abbreviated hash, relative committer date, author name, and the full commit message body, each on a new line.
- Find all commits made by “Jane Doe” that affected
- Using
- Blame and Trace:
- Run
git blame file1.txt
. - Identify the commit that introduced the line “SensitiveData_XYZ”.
- Use
git show <commit_hash>
for that commit to see the full context of the change. - If you have a commit that only changed whitespace in
file1.txt
(you might need to add one), rungit blame -w file1.txt
and compare its output togit blame file1.txt
for those lines.
- Run
- (Optional/Advanced)
filter-repo
Simulation – On a Clone!- Clone your
git-history-lab
repository to a new directory (e.g.,git-history-lab-clone
). Work only in this clone for this exercise. - In the clone, imagine file2.txt should never have been committed. Use git filter-repo to remove file2.txt from the entire history of the cloned repository.Command hint: git filter-repo –path file2.txt –invert-paths –force
- Verify by checking the log and trying to find
file2.txt
in older commits (it should be gone). - Delete the
git-history-lab-clone
directory afterwards to ensure you don’t mix it up with your original. This exercise is purely to understand the command’s effect.
- Clone your
Summary
git reflog
: Your local safety net, recording movements ofHEAD
and branch tips. Essential for recovering “lost” commits or branches (e.g.,HEAD@{index}
).- Advanced
git log
:- Filtering:
--author
,--committer
,--since
/--until
,--grep
,-- <path>
, commit ranges. - Formatting:
--oneline
,--graph
,--decorate
,--pretty="format:..."
with placeholders like%h
,%an
,%ad
,%s
,%d
.
- Filtering:
- Searching Code Changes:
git log -S"string"
(pickaxe): Finds commits that changed the count of “string”.git log -G"regex"
: Finds commits where diffs have lines matching “regex”.
git blame <file>
: Shows who last modified each line of a file, and in which commit. Options:-L
,-w
,-C
,-M
.git shortlog
: Summarizesgit log
output, grouped by author. Options:-s
,-n
,-e
.- Advanced History Rewriting (Use with Extreme Caution):
- Reserved for major repository surgery (e.g., removing sensitive data, large files from all history).
git filter-branch
: Older, slower, complex. Largely superseded.git filter-repo
: Modern, faster, safer alternative (requires separate installation).- Crucial Warning: These tools rewrite history (change SHAs). Never use on shared/pushed history without full team coordination and understanding the disruptive impact. Always backup first.
Mastering these tools allows for deep insights into your project’s history and provides mechanisms for recovery and, when absolutely necessary, for carefully considered history alterations.
Further Reading
- Official Git Documentation:
git-reflog(1)
: https://git-scm.com/docs/git-refloggit-log(1)
: https://git-scm.com/docs/git-loggit-blame(1)
: https://git-scm.com/docs/git-blamegit-shortlog(1)
: https://git-scm.com/docs/git-shortloggit-filter-branch(1)
: https://git-scm.com/docs/git-filter-branch (Read mainly for historical context and its warnings)
git filter-repo
Documentation:- Main Page & Installation: https://github.com/newren/git-filter-repo/
- Pro Git Book:
- Chapter 7.6 Git Tools – Rewriting History: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History (Covers
filter-branch
and the dangers) - Chapter 7.7 Git Tools – Debugging with Git (mentions
blame
andbisect
): https://git-scm.com/book/en/v2/Git-Tools-Debugging-with-Git
- Chapter 7.6 Git Tools – Rewriting History: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History (Covers