Chapter 16: Git Submodules and Subtrees
Chapter Objectives
By the end of this chapter, you will be able to:
- Understand the need for managing external dependencies or components within a Git repository.
- Add, initialize, and update Git submodules using commands like
git submodule add
,git submodule update
, andgit submodule sync
. - Clone projects that contain submodules and correctly initialize them.
- Make changes within submodules, commit them, and update the parent repository to track new submodule versions.
- Recognize the common challenges and workflows associated with submodules.
- Incorporate external repositories into your project using Git subtrees with commands like
git subtree add
,git subtree pull
, andgit subtree push
. - Compare and contrast Git submodules and subtrees, understanding their respective advantages, disadvantages, and ideal use cases.
Introduction
Modern software development often involves reusing code, whether it’s a third-party library, a shared internal component, or a separate project that your main project depends on. Managing these external dependencies directly within your primary Git repository can be challenging. Simply copying the code into your project makes it difficult to update the dependency or contribute changes back.
Git offers two primary mechanisms for incorporating one Git repository within another: submodules and subtrees. Both allow you to include and manage external code, but they do so in fundamentally different ways, each with its own workflow, benefits, and drawbacks.
Submodules treat the external repository as a distinct entity, linked to a specific commit in your main project. This keeps the histories separate but requires specific commands to manage. Subtrees, on the other hand, merge the external repository’s files and history directly into your main project, making it appear as a regular subdirectory.
Understanding how to use submodules and subtrees effectively is crucial for managing complex projects with external dependencies, promoting code reuse, and maintaining a clean and organized version control history. This chapter will explore both approaches, guiding you through their setup, daily usage, and helping you decide which method is best suited for your project’s needs.
Theory
Git Submodules
A Git submodule is essentially a Git repository embedded inside another (parent) Git repository. The parent repository doesn’t store the actual content of the submodule; instead, it stores a reference (a specific commit SHA-1) to the submodule’s repository and the path where the submodule should be checked out. This allows you to keep the submodule’s history separate from the parent repository’s history.
How Submodules Work:
When you add a submodule, Git creates a special file named .gitmodules in the root of your parent repository. This file stores metadata about your submodules, including their path and the URL of their repository. The parent repository also records a special type of entry in its tree object that points to a specific commit in the submodule’s repository.
This means that when someone clones your parent repository, they get the submodule directory, but it’s initially empty (or contains only the submodule’s .git
metadata if already initialized). They need to explicitly initialize and update the submodule to populate it with the code from the referenced commit.
%%{ init: { 'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans' } } }%% graph TD ParentRepo["<b>Parent Repository</b><br>(e.g., <i>MyMainProject</i>)"] subgraph ParentRepo direction LR P_GitDir[".git/ (Parent's Git DB)"] P_GitModules[".gitmodules <i>(File)</i>"] P_WorkTree["Working Tree"] end subgraph P_WorkTree direction TB P_File1["main_app.js"] P_SubmoduleDir["libs/my-library/ <i>(Submodule Directory)</i>"] end P_GitModules --> P_SubmoduleDir; P_SubmoduleDir -.-> |"Contains SHA-1 link to<br>Commit XYZ in MyLibraryRepo"| SubmoduleCommitRef; SubmoduleCommitRef["Commit XYZ SHA-1<br><i>(Stored by Parent Repo)</i>"] ExternalRepo["<b>External Submodule Repository</b><br>(e.g., <i>MyLibraryRepo</i>)"] subgraph ExternalRepo direction LR S_GitDir[".git/ (Submodule's Own Git DB)"] S_CommitHist["Commit History"] end S_CommitHist --> S_CommitX["..."] --> S_CommitXYZ["Commit XYZ<br><i>(Actual commit in MyLibraryRepo)</i>"] --> S_CommitY["..."] SubmoduleCommitRef --> S_CommitXYZ P_GitModulesContent["<b>Content of .gitmodules:</b><br><br>[submodule <i>libs/my-library</i>]<br> path = libs/my-library<br> url = <i>example.com/my-library.git</i>"] %% Styling classDef repo fill:#DBEAFE,stroke:#2563EB,stroke-width:2px,color:#1E40AF; classDef file fill:#EDE9FE,stroke:#5B21B6,stroke-width:1px,color:#5B21B6; classDef dir fill:#E0E7FF,stroke:#4338CA,stroke-width:1px,color:#3730A3; classDef commitRef fill:#FEF3C7,stroke:#D97706,stroke-width:1.5px,color:#92400E; classDef commitHistNode fill:#D1FAE5,stroke:#059669,stroke-width:1px,color:#065F46; classDef note fill:#F3F4F6,stroke:#9CA3AF,color:#1F2937,padding:10px,font-size:11px; class ParentRepo,ExternalRepo repo; class P_GitModules,P_File1 file; class P_SubmoduleDir,P_GitDir,S_GitDir dir; class SubmoduleCommitRef commitRef; class S_CommitHist,S_CommitXYZ,S_CommitX,S_CommitY commitHistNode; class P_GitModulesContent note; P_GitModules --- P_GitModulesContent;
Key Submodule Concepts and Commands:
- .gitmodules file:A plain text file in the root of the parent repository that defines the mapping between a path in your project and the submodule’s repository URL. Example:
[submodule "themes/hyde"] path = themes/hyde url = https://github.com/spf13/hyde.git
- git submodule add
[ ]:This command adds a new submodule to your project. - It clones the submodule repository from
<repository_url>
into the specified<path>
(if<path>
is omitted, Git often uses the repository’s name). - It creates or updates the
.gitmodules
file with the submodule’s information. - It stages the new submodule directory and the
.gitmodules
file. You then commit these changes to your parent repository. The staged submodule directory contains the commit SHA-1 of the submodule’sHEAD
at the time of adding.
- It clones the submodule repository from
- Cloning a Project with Submodules:When you clone a repository that contains submodules, the submodule directories are created, but they are empty.
- Option 1 (Clone and then initialize/update):
git clone <parent_repository_url>
cd <parent_repository_name>
git submodule init # Initializes submodules (registers paths from .gitmodules)
git submodule update # Fetches and checks out the correct commit for each submodule
- Option 2 (Recursive clone):
git clone --recurse-submodules <parent_repository_url>
This command automatically initializes and updates any submodules after the clone is complete.
- Option 1 (Clone and then initialize/update):
- git submodule init:Initializes your submodules by copying the submodule URLs from .gitmodules into your local .git/config file. This step is necessary before you can update the submodules for the first time after cloning a project without –recurse-submodules.
- git submodule update [–init] [–recursive] [–remote]:This command updates your registered submodules to match the commit recorded in the parent repository.
--init
: Also initializes any submodules that haven’t been initialized yet.--recursive
: Also updates any nested submodules (submodules within submodules).--remote
: By default,git submodule update
checks out the commit specified in the parent repository. If you use--remote
, it will instead update the submodule to the latest commit on its tracked branch (as defined in.gitmodules
or the submodule’s own configuration). This is useful if you always want your submodule to track the latest changes from its remote. Use with caution, as this means the parent repo might not be pointing to the version you’re actually using locally until you commit the change in the parent.
- git submodule sync [–recursive]:If the URL of a submodule’s repository changes in the .gitmodules file (e.g., it moved to a new server), git submodule sync updates the submodule’s remote URL configuration in your local .git/config to match what’s in .gitmodules.
- Working with Submodules:When you cd into a submodule directory, you are effectively in a separate Git repository. You can fetch, pull, checkout branches, make commits, etc., just like any other repository.
- Making Changes: If you make changes within a submodule and commit them, these changes are part of the submodule’s history, not the parent’s.
- Updating Parent to New Submodule Commit: After committing changes in the submodule, you need to go back to the parent repository (
cd ..
),git add
the submodule directory (which now points to the new commit in the submodule), and then commit this change in the parent repository. This updates the parent to track the new submodule version. - Detached HEAD: When you run
git submodule update
, Git checks out the submodule at a specific commit, putting the submodule in a “detached HEAD” state. If you want to make changes, you should typicallycd
into the submodule,git checkout <branch_name>
(e.g.,main
ormaster
), make your changes, commit, and push from within the submodule. Then, update the parent repository.
- git submodule foreach ‘<command>’:Executes the given shell command in each checked-out submodule. For example, git submodule foreach ‘git pull origin main’ would attempt to pull the latest changes from the main branch for all submodules.
Command | Description |
---|---|
git submodule add <repo_url> [<path>] | Adds a new submodule. Clones the submodule, creates/updates .gitmodules , and stages changes in the parent repository. |
git clone –recurse-submodules <repo_url> | Clones a repository and automatically initializes and updates all its submodules. |
git submodule init | Initializes submodules by copying their URLs from .gitmodules to local .git/config . Necessary after a normal clone before first update. |
git submodule update [–init] [–recursive] [–remote] | Updates submodules to the commit specified in the parent repository.
--init : Also initializes uninitialized submodules.
--recursive : Also updates nested submodules.
--remote : Updates submodule to the latest commit on its tracked remote branch (use with caution).
|
git submodule sync [–recursive] | Updates the submodule’s remote URL in local .git/config to match what’s in .gitmodules if it has changed. |
git submodule status [–cached] [–recursive] | Shows the status of submodules, including their current commit SHA-1 and if they are out of sync with the parent. |
git submodule foreach ‘<command>’ | Executes the given shell command in each checked-out submodule. E.g., git submodule foreach 'git status' . |
git submodule deinit [-f] <path> | Unregisters the submodule, removing its entry from .git/config . The submodule’s directory and .gitmodules entry remain until committed. |
git rm <submodule_path> | To properly remove a submodule:
1. git submodule deinit -f <path>
2. git rm <path> (removes from working tree and index)
3. Remove entry from .gitmodules manually or via git rm .
4. Commit changes. (Also remove .git/modules/<path> directory manually if desired).
|
Challenges with Submodules:
- Complexity: The workflow can be more complex than managing a single repository, especially for newcomers. Users must remember to initialize and update submodules.
- Detached HEADs: Can be confusing if users try to commit directly in a detached HEAD state within a submodule without checking out a branch first.
- Keeping Submodules Updated: Requires diligence to ensure the parent repository always points to the desired submodule commits, and that submodule changes are pushed before the parent repository’s changes that reference them.
- Merge Conflicts: Conflicts can arise in the submodule path in the parent repository if different branches update the submodule to different commits.
%%{ init: { 'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans' } } }%% graph TD A["Start: Need to include an external library (<i>LibX</i>)"] --> B{"Choose: Add as Submodule"}; B --> C["<b>1. Add Submodule:</b><br>In Parent Repo: <i>git submodule add <libx_url> path/to/libx</i>"]; C --> D["Git actions:<br>- Clones <i>LibX</i> into <i>path/to/libx</i>.<br>- Creates/updates <i>.gitmodules</i>.<br>- Stages <i>.gitmodules</i> and <i>path/to/libx</i> (link to <i>LibX</i> commit)."]; D --> E["In Parent Repo: <i>git commit -m \Add LibX submodule\</i>"]; E --> F["Submodule added and linked to a specific <i>LibX</i> commit."]; F --> G["<br><b>Later: Clone Parent Repo with Submodule</b>"]; G --> H{"How to clone?"}; H -- "Option 1: Standard Clone" --> I["<i>git clone <parent_url></i><br>Then <i>cd <parent_repo></i>"]; I --> J["<i>git submodule init</i> (registers submodule)"]; J --> K["<i>git submodule update</i> (fetches & checks out <i>LibX</i> code)"]; K --> L["<i>LibX</i> code populated at correct commit."]; H -- "Option 2: Recursive Clone" --> M["<i>git clone --recurse-submodules <parent_url></i>"]; M --> L; L --> N["<br><b>Later: Update Submodule to a Newer Version of <i>LibX</i></b>"]; N --> O{"Method 1: Update to parent's recorded commit (if another dev updated it)"}; O --> P["In Parent Repo: <i>git pull</i> (gets new parent commit pointing to new <i>LibX</i> SHA)"]; P --> Q["In Parent Repo: <i>git submodule update --init --recursive</i>"]; Q --> R["<i>LibX</i> directory updated to new SHA."]; N --> S{"Method 2: Update <i>LibX</i> to its latest, then update parent"}; S --> T["In Parent Repo: <i>cd path/to/libx</i>"]; T --> U["In <i>path/to/libx</i>: <i>git checkout main</i> (or desired branch)"]; U --> V["In <i>path/to/libx</i>: <i>git pull origin main</i> (fetches latest <i>LibX</i>)"]; V --> W["In Parent Repo: <i>cd ..</i> (back to parent root)"]; W --> X["In Parent Repo: <i>git add path/to/libx</i> (stages new <i>LibX</i> SHA)"]; X --> Y["In Parent Repo: <i>git commit -m \Update LibX submodule to latest\</i>"]; Y --> R; R --> Z["End: Submodule is up-to-date."]; %% Styling classDef startNode fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6; classDef processNode fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; classDef commandNode fill:#E0E7FF,stroke:#4338CA,stroke-width:1px,color:#3730A3,font-family:monospace; classDef resultNode fill:#F3F4F6,stroke:#9CA3AF,color:#374151,font-size:11px; classDef outcomeNode fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46; classDef decisionNode fill:#FEF9C3,stroke:#F59E0B,stroke-width:1px,color:#78350F; class A startNode; class B,H,N,O,S decisionNode; class C,D,E,I,J,K,M,P,Q,T,U,V,W,X,Y processNode; class F,L,R resultNode; class Z outcomeNode;
Git Subtrees
The git subtree
command (which is technically a contrib script but often packaged with Git distributions or available as git-subtree.sh
) provides an alternative strategy for managing dependencies. Instead of linking to another repository, git subtree
incorporates another repository into a subdirectory of your main repository, including its history.
How Subtrees Work:
When you add a subtree, Git essentially performs a merge of the external repository’s history into your main project, placing its files into a specified prefix (subdirectory). The external project’s history becomes part of your main project’s history. There’s no .gitmodules file or special link; the subdirectory just looks like any other directory in your project.
%%{ init: { 'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans' } } }%% graph TD ParentRepo["<b>Parent Repository</b><br>(e.g., <i>MyMainProject</i>)"] subgraph ParentRepo direction LR P_GitDir[".git/ (Parent's Git DB)"] P_WorkTree["Working Tree"] end subgraph P_WorkTree direction TB P_File1["main_app.js"] P_SubtreeDir["libs/my-library/ <i>(Subtree Directory)</i>"] end subgraph P_SubtreeDir direction TB ST_File1["library_code.js"] ST_File2["README.md"] ST_More["... (all files from MyLibraryRepo)"] end P_GitDir --> P_CommitHist_Parent["Parent Commit History"] subgraph P_CommitHist_Parent direction LR PC1["..."] --> PC_Merge["Merge Commit<br><i>(Incorporating MyLibraryRepo's history)</i><br>or Squash Commit"] --> PC2["..."] end ExternalRepo["<b>Original External Repository</b><br>(e.g., <i>MyLibraryRepo - conceptually separate</i>)"] subgraph ExternalRepo direction LR S_CommitHist["Original Commit History<br>L1 -> L2 -> L3"] end S_CommitHist -.->|History merged into Parent| PC_Merge Note["<b>Note:</b><br>- No *.gitmodules* file.<br>- *libs/my-library/* is a regular directory in the parent.<br>- *MyLibraryRepo's* history is part of *MyMainProject's* history (potentially squashed).<br>- Users cloning *MyMainProject* get *libs/my-library/* content automatically."] %% Styling classDef repo fill:#DBEAFE,stroke:#2563EB,stroke-width:2px,color:#1E40AF; classDef file fill:#EDE9FE,stroke:#5B21B6,stroke-width:1px,color:#5B21B6; classDef dir fill:#E0E7FF,stroke:#4338CA,stroke-width:1px,color:#3730A3; classDef commitHistNode fill:#D1FAE5,stroke:#059669,stroke-width:1px,color:#065F46; classDef mergeCommitNode fill:#FEF3C7,stroke:#D97706,stroke-width:1.5px,color:#92400E; classDef noteNode fill:#F3F4F6,stroke:#9CA3AF,color:#1F2937,padding:10px,font-size:11px; class ParentRepo,ExternalRepo repo; class P_File1,ST_File1,ST_File2,ST_More file; class P_SubtreeDir,P_GitDir dir; class P_CommitHist_Parent,S_CommitHist,PC1,PC2 commitHistNode; class PC_Merge mergeCommitNode; class Note noteNode; ParentRepo --- Note;
Key Subtree Commands and Concepts:
- git subtree add –prefix=
[–squash]:Adds the content of ‘s into in your main project. --prefix=<subdirectory>
: The path within your main project where the subtree will be added (e.g.,--prefix=vendor/my-library
).<repository_url>
: The URL of the external repository.<branch>
: The branch from the external repository to add.--squash
: Merges the entire history of the subtree into a single commit in your main project. This can keep your main project’s history cleaner but makes it harder to pull granular updates or contribute changes back.
- Cloning a Project with Subtrees:No special commands are needed. When someone clones your main repository, they get all the subtree content automatically because it’s part of the main repository’s files and history.
- git subtree pull –prefix=<subdirectory> <repository_url> <branch> [–squash]:Pulls updates from the external repository’s <branch> into your local <subdirectory>. This is essentially a merge operation.
- git subtree push –prefix=<subdirectory> <repository_url> <branch>:If you’ve made changes within the <subdirectory> in your main project that you want to contribute back to the original external repository, git subtree push can extract the changes relevant to the subtree and push them to the specified branch of the remote repository. This requires careful history management.
- git subtree merge –prefix=<subdirectory> <commit> [–squash]:Similar to pull, but merges a specific commit from the subtree’s history.
- git subtree split –prefix=<subdirectory> -b <new_branch_for_split_history>:This command can be used to extract the history of a subdirectory into a new branch, which can then be pushed to its own repository. This is useful if you later decide to turn a subdirectory that was managed as part of the main repo into a separate project or want to contribute changes back from a squashed subtree.
Command | Description |
---|---|
git subtree add –prefix=<dir> <repo> <branch> [–squash] | Adds the content of <repo> ‘s <branch> into <dir> .
--squash : Merges the subtree’s history into a single commit in the parent.
|
git subtree pull –prefix=<dir> <repo> <branch> [–squash] | Pulls updates from the external <repo> ‘s <branch> into the local <dir> . This performs a merge. |
git subtree push –prefix=<dir> <repo> <branch> | Pushes changes made within the local <dir> (subtree) back to the specified <branch> of the external <repo> . Can be complex. |
git subtree merge –prefix=<dir> <commit> [–squash] | Merges a specific <commit> from the subtree’s history into the local <dir> . Similar to pull but for a specific commit. |
git subtree split –prefix=<dir> -b <new_branch> | Extracts the history of the subdirectory <dir> into a new branch named <new_branch> . Useful for preparing to push changes back or for converting a subdirectory to a separate repository. |
Advantages of Subtrees:
- Simplicity for End Users: Users of your main repository don’t need to know about subtrees or run special commands to get the dependency code. It’s just there after a
git clone
. - Unified History (Optional): The dependency’s history can be fully integrated into the main project’s history, making it easier to browse.
- Offline Access: Once cloned, all code (including subtrees) is available.
- Works with Older Git Versions: The core idea of merging history has always been part of Git, though the
git subtree
command itself is a helper.
Disadvantages of Subtrees:
- History Complexity: Merging entire histories can make your main project’s history larger and more complex if not squashed.
- Contributing Back: Pushing changes back to the upstream (original dependency) repository is more complex than with submodules and requires careful use of
git subtree push
orgit subtree split
. - Updating: Pulling updates can sometimes lead to more complex merge conflicts compared to submodules, as you’re merging divergent histories directly.
- Repository Size: The main repository will contain all the files and history of the subtree, potentially increasing its size significantly.
%%{ init: { 'theme': 'base', 'themeVariables': { 'fontFamily': 'Open Sans' } } }%% graph TD A["Start: Need to include an external library (<i>LibY</i>)"] --> B{"Choose: Add as Subtree"}; B --> C["<b>1. Add Subtree:</b><br>In Parent Repo: <i>git subtree add --prefix=path/to/liby <liby_url> <branch> [--squash]</i>"]; C --> D["Git actions:<br>- Fetches <i>LibY</i>'s history.<br>- Merges <i>LibY</i>'s files into <i>path/to/liby</i>.<br>- Creates a merge commit (or a single squashed commit) in Parent Repo."]; D --> E["Subtree content and history (potentially squashed) are now part of Parent Repo."]; E --> F["<br><b>Later: Clone Parent Repo with Subtree</b>"]; F --> G["<i>git clone <parent_url></i>"]; G --> H["Result: <i>path/to/liby</i> and its content are present immediately.<br>No extra steps needed for users."]; H --> I["<br><b>Later: Update Subtree with Newer Version of <i>LibY</i></b>"]; I --> J["In Parent Repo: <i>git subtree pull --prefix=path/to/liby <liby_url> <branch> [--squash]</i>"]; J --> K["Git actions:<br>- Fetches new history from <i>LibY</i>.<br>- Merges changes into <i>path/to/liby</i>.<br>- Creates a new merge commit (or squashed commit) in Parent Repo."]; K --> L["<i>path/to/liby</i> is updated with new content from <i>LibY</i>."]; L --> M["<br><b>(Optional) Contributing Changes Back to <i>LibY</i> (Advanced)</b>"]; M --> N["Make changes within <i>path/to/liby</i> in Parent Repo and commit them."]; N --> O["Use <i>git subtree push --prefix=path/to/liby <liby_url> <branch></i><br><i>(Can be complex, may require <i>git subtree split</i> first)</i>"]; O --> P["Changes from Parent Repo's subtree are pushed to <i>LibY</i>."]; L --> Z["End: Subtree is part of the project and can be updated."]; P --> Z; %% Styling classDef startNode fill:#EDE9FE,stroke:#5B21B6,stroke-width:2px,color:#5B21B6; classDef processNode fill:#DBEAFE,stroke:#2563EB,stroke-width:1px,color:#1E40AF; classDef commandNode fill:#E0E7FF,stroke:#4338CA,stroke-width:1px,color:#3730A3,font-family:monospace; classDef resultNode fill:#F3F4F6,stroke:#9CA3AF,color:#374151,font-size:11px; classDef outcomeNode fill:#D1FAE5,stroke:#059669,stroke-width:2px,color:#065F46; classDef decisionNode fill:#FEF9C3,stroke:#F59E0B,stroke-width:1px,color:#78350F; classDef advancedNode fill:#FFFBEB,stroke:#FBBF24,stroke-width:1px,color:#78350F; class A startNode; class B decisionNode; class C,D,F,G,I,J,K,N,O processNode; class E,H,L,P resultNode; class M advancedNode; class Z outcomeNode;
Comparing Submodules and Subtrees
Feature / Aspect | Git Submodule | Git Subtree |
---|---|---|
Dependency Link | Parent repository stores a link (SHA-1 reference) to a specific commit of an external repository. | Parent repository merges files and history from an external repository into a subdirectory. |
History Separation | Histories of parent and submodule remain separate and distinct. | Histories are combined (unless --squash is used during add/pull, which creates a single merge commit). |
Metadata File | .gitmodules file in the parent repository tracks submodule paths and URLs. |
None. Information is part of merge commit metadata or tracked implicitly. |
Cloning Project | Requires extra steps after cloning parent: git submodule init & git submodule update , or clone with --recurse-submodules . |
A simple git clone of the parent repository gets all subtree content automatically. No extra steps for users. |
Updating Dependency | Use git submodule update [--remote] in parent, or git pull within submodule directory then commit parent. |
Use git subtree pull --prefix=<path> ... . |
Contributing Back to Dependency | Relatively straightforward: cd into submodule, make changes, commit, push. Then update parent to point to new submodule commit. |
More complex: Typically involves git subtree push --prefix=<path> ... or git subtree split to extract relevant history. |
Repository Size (Parent) | Parent repository stays smaller as it only stores a link and path, not the submodule’s full content/history. | Parent repository size increases as it incorporates all files and history (unless squashed) of the subtree. |
Ease of Use (for project users) | More commands to learn (init, update). Can be confusing (detached HEADs). | Transparent. Dependency looks like any other directory. |
Ease of Use (for project maintainers) | Clear separation of concerns. Workflow requires diligence. | Simpler initial setup. Updates and contributions back can be more complex to manage cleanly. |
Best For | Clear separation, frequent updates to dependency, regular contributions back, keeping parent repo lean. | Tightly integrated dependency, infrequent updates or contributions back, simpler user experience for cloners. |
When to Choose Which:
- Choose Git Submodules if:
- You want a very clear separation between your project and its dependencies.
- The dependency is updated frequently, and you want to explicitly control which version your project uses.
- You frequently contribute changes back to the dependency.
- You don’t want the dependency’s entire history bloating your main repository.
- Your collaborators are comfortable with the submodule workflow.
- Choose Git Subtrees if:
- You want the dependency to be an integral part of your repository, and users shouldn’t need special commands.
- You don’t contribute back to the dependency often, or at all.
- You want a simpler initial setup for users cloning the repository.
- You are okay with the dependency’s history (or a squashed version of it) being part of your main project’s history.
- The dependency doesn’t change very often, or you are okay with larger merge commits when updating.
Practical Examples
Setup:
First, let’s create two separate repositories: one for a “main-project” and one for a “shared-library”.
# Create a directory to hold our projects
mkdir git-dependency-management
cd git-dependency-management
# Create the shared-library repository
mkdir shared-library
cd shared-library
git init
echo "Version 1.0 of shared library" > library_code.txt
git add library_code.txt
git commit -m "L1: Initial library version 1.0"
echo "Functionality for feature X" >> library_code.txt
git add library_code.txt
git commit -m "L2: Add feature X to library"
cd .. # Back to git-dependency-management
# Create the main-project repository
mkdir main-project
cd main-project
git init
echo "Main project core file." > main_app.txt
git add main_app.txt
git commit -m "P1: Initial main project commit"
# For local examples, we'll use the file path to shared-library.
# In a real scenario, shared-library would be a remote URL (e.g., on GitHub).
# Let's get the path to shared-library. Assuming it's a sibling directory:
# On Linux/macOS: LIB_PATH="../shared-library"
# On Windows (Git Bash): LIB_PATH="../shared-library"
# On Windows (CMD): set LIB_PATH=..\shared-library
# For simplicity in examples, I'll use a relative path.
# Ensure your current directory is main-project for the following commands.
cd .. # Back to git-dependency-management
cd main-project # Ensure we are in main-project for the examples
For the examples below, we’ll use ../shared-library
as the URL for the shared-library
repository. Replace this with an actual remote URL if you have one set up.
1. Git Submodule Example
a. Adding the library as a submodule:
# Ensure you are in the main-project directory
git submodule add ../shared-library libs/my-library
# If shared-library was remote: git submodule add https://example.com/user/shared-library.git libs/my-library
Expected Output:
Cloning into '/path/to/git-dependency-management/main-project/libs/my-library'...
done.
Now check git status
:
git status
Expected Output:
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .gitmodules
new file: libs/my-library
Commit these changes:
git commit -m "P2: Add shared-library as a submodule"
b. Cloning a project with submodules:
Let’s simulate cloning main-project into a new directory.
cd .. # Go up from main-project to git-dependency-management
git clone main-project main-project-clone
cd main-project-clone
# Check the libs/my-library directory
ls libs/my-library
The libs/my-library directory will be empty or nearly empty.
Now, initialize and update the submodule:
git submodule init
git submodule update
Or, if you knew it had submodules, you could have cloned with:
# cd ..
# rm -rf main-project-clone # Clean up previous clone
# git clone --recurse-submodules main-project main-project-clone
# cd main-project-clone
Now ls libs/my-library
should show library_code.txt
.
c. Updating the submodule to a new version from its remote:
First, let’s add a new commit to shared-library.
cd ../shared-library # Navigate to the original shared-library
echo "Version 1.1 with bug fixes" >> library_code.txt
git add library_code.txt
git commit -m "L3: Release version 1.1 of library"
cd ../main-project # Go back to our original main-project
Now, in main-project
, update the submodule:
cd libs/my-library # Go into the submodule directory
git pull origin main # Assuming 'main' is the default branch of shared-library
# Or, if you want to update to the latest commit on the submodule's remote branch
# without explicitly going into the directory:
# git submodule update --remote libs/my-library
cd .. # Back to main-project root
git status
Expected Output (after cd ..
):
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: libs/my-library (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
This shows that libs/my-library (the submodule link) has been modified because it now points to a newer commit (L3).
Stage and commit this change in the parent repository:
git add libs/my-library
git commit -m "P3: Update shared-library submodule to v1.1"
d. Making changes within the submodule and updating the parent:
cd libs/my-library # Go into the submodule
# IMPORTANT: Check out a branch if you are in a detached HEAD state
git checkout main # Or whatever branch you want to commit to
echo "Hotfix for the library" >> library_code.txt
git add library_code.txt
git commit -m "L4: Hotfix applied in library"
# In a real scenario, you would 'git push' this from the submodule directory
# git push origin main # (If it had a remote configured)
cd .. # Back to main-project root
git status
Expected Output:
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: libs/my-library (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
Again, stage and commit in the parent:
git add libs/my-library
git commit -m "P4: Update submodule with library hotfix"
2. Git Subtree Example
Let’s reset main-project to its initial state (P1) for a clean subtree example.
(Or, for simplicity, we can use a new project or a new branch in main-project).
Let’s use a new branch in main-project for clarity.
# Ensure you are in the main-project directory
git checkout -b feature/subtree-integration P1 # Create new branch from P1
a. Adding the library as a subtree:
# Add shared-library's 'main' branch into the 'vendor/our-library' directory
# --squash is optional but often recommended for cleaner history in the parent
git subtree add --prefix=vendor/our-library ../shared-library main --squash
Expected Output (will show merge details):
git fetch ../shared-library main
From ../shared-library
* branch main -> FETCH_HEAD
Added dir 'vendor/our-library'
Squashed commit '--prefix=vendor/our-library ../shared-library main' (was <hash of L4 or latest in shared-library>)
Now ls vendor/our-library will show library_code.txt.
git log in main-project will show a new merge commit (or a single commit if squashed from the start of history) that added these files.
No .gitmodules file is created.
b. Pulling updates from the library into the subtree:
First, let’s add another commit to shared-library.
cd ../shared-library # Navigate to the original shared-library
echo "Version 2.0 of library" >> library_code.txt
git add library_code.txt
git commit -m "L5: Release version 2.0 of library"
cd ../main-project # Go back to main-project (on feature/subtree-integration branch)
Now, pull these updates into the subtree:
git subtree pull --prefix=vendor/our-library ../shared-library main --squash
Expected Output (will show merge details):
From ../shared-library
* branch main -> FETCH_HEAD
Merge made by the 'recursive' strategy.
vendor/our-library/library_code.txt | 1 +
1 file changed, 1 insertion(+)
Squashed commit '--prefix=vendor/our-library ../shared-library main' (was <hash of L5>)
The vendor/our-library/library_code.txt
will now contain “Version 2.0 of library”.
c. Pushing changes from the subtree back to the library (Advanced):
Suppose you modify vendor/our-library/library_code.txt in main-project:
echo "Critical fix from main project" >> vendor/our-library/library_code.txt
git add vendor/our-library/library_code.txt
git commit -m "P-Subtree: Critical fix for library made in main project"
To push this specific change back to the shared-library
(this can be tricky and requires the histories to be related, which they are if not squashing aggressively or if add
was done correctly):
# This command attempts to push the history of vendor/our-library to the 'main' branch of ../shared-library
# This is more complex if squashing was used heavily.
# git subtree push --prefix=vendor/our-library ../shared-library main
Note: Pushing subtrees can be complex. It often involves
git subtree split
to create a clean history branch for the subtree first, then pushing that branch. For simple, infrequent contributions, it might be easier to manually apply patches or merge changes in a separate clone of the library.
OS-Specific Notes
Git submodules and git subtree
commands behave consistently across Windows, macOS, and Linux. The core Git functionality they rely on is platform-independent.
- Paths in
.gitmodules
and--prefix
: Always use forward slashes (/
) for paths specified in the.gitmodules
file or for the--prefix
option ingit subtree
. Git handles path conversions internally. - Submodule URLs: URLs for submodule repositories (e.g.,
https://
,git@
, local file paths likefile:///
) are handled the same way across platforms, though local file path syntax might vary slightly if constructed manually outside of Git’s typical relative path handling (e.g.,../shared-library
works well). - Shell for
git submodule foreach
: The command executed byforeach
is run in a shell environment. On Windows, this is typically the Git Bash shell (sh.exe
). Ensure commands are compatible with this shell.
No significant OS-specific differences in behavior are commonly encountered with the core functionality of submodules or subtrees.
Common Mistakes & Troubleshooting Tips
Context | Issue / Symptom | Troubleshooting / Solution |
---|---|---|
Submodules | Forgetting to Initialize/Update After Cloning |
Run git submodule init then git submodule update .Alternatively, clone with git clone --recurse-submodules .For existing clones, git submodule update --init --recursive is a good general command.
|
Committing in a Detached HEAD State in Submodule |
Always cd into the submodule, git checkout <branch> (e.g., main ), make changes, commit, and push (if shared).Then, cd .. to parent and commit the updated submodule reference.
|
|
Pushing Parent Repo Before Submodule Changes |
Always push changes made in the submodule repository first, before pushing the parent repository commit that references the new submodule state. Consider using git push --recurse-submodules=on-demand .
|
|
Subtrees | Difficulty Pushing Changes Back Upstream |
Pushing subtrees can be complex, especially after squashing. Consider using git subtree split --prefix=<path> -b <new-branch> to create a clean branch of the subtree’s history for pushing.For frequent contributions, submodules might be a better fit. Alternatively, manage contributions via patches or direct PRs to the upstream repository. |
Merge Conflicts During subtree pull |
Treat as a regular merge conflict. Resolve conflicts in the files within the subtree prefix. Ensure you are pulling from the correct upstream branch and repository. If squashing, conflicts are against the last squashed commit. |
|
General | Choosing the Wrong Tool (Submodule vs. Subtree) | Carefully evaluate pros/cons based on project needs: dependency update frequency, contribution workflow, history management preference, team familiarity, and desired user experience for cloners. Re-evaluate if the current method feels overly complex. |
Submodule URL Changes |
1. Update the URL in the .gitmodules file.2. Run git submodule sync to update local .git/config .3. Commit the .gitmodules change. Users will need to run git submodule sync after pulling this commit.
|
Exercises
- Submodule Practice:
- Create a new main project repository (
my-app
) and a new library repository (my-widget-lib
). - Add a few commits to
my-widget-lib
. - In
my-app
, addmy-widget-lib
as a submodule in alibs/
directory. Commit this. - Simulate cloning
my-app
into a new directory (my-app-clone
) and ensure you can correctly initialize and get themy-widget-lib
code. - Make a new commit in the original
my-widget-lib
. - In your original
my-app
repository, update the submodule to point to this new commit frommy-widget-lib
and commit the change. - In
my-app-clone
, pull the changes frommy-app
‘s remote (which would be the originalmy-app
in this local simulation) and then update its submodule to reflect the new version.
- Create a new main project repository (
- Subtree Practice:
- Create another new main project repository (
my-other-app
) and a new utility library repository (my-util-lib
). - Add a few commits to
my-util-lib
. - In
my-other-app
, addmy-util-lib
as a subtree into autils/
directory. Use the--squash
option. Commit this. - Simulate cloning
my-other-app
. Verify that the utility library code is present immediately without extra steps. - Make a new commit in the original
my-util-lib
. - In
my-other-app
, pull the updates frommy-util-lib
into your subtree, again using--squash
. Verify the changes.
- Create another new main project repository (
Summary
- Git Submodules allow you to embed external Git repositories within a parent repository, keeping their histories separate. The parent stores a reference to a specific commit of the submodule.
- Key commands:
git submodule add
,init
,update [--recursive, --remote]
,sync
,foreach
. - Requires
.gitmodules
file. - Cloning requires extra steps to populate submodules (or use
git clone --recurse-submodules
). - Good for clear separation and frequent contributions back to the dependency.
- Key commands:
- Git Subtrees merge an external repository’s files and history directly into a subdirectory of the parent repository.
- Key commands:
git subtree add
,pull
,push
,merge
,split
. - No special files like
.gitmodules
; dependency becomes part of the parent. - Simpler for users cloning the main project as no extra steps are needed.
- Good when you want the dependency integrated tightly and don’t contribute back often, or want a simpler user experience for cloners.
- Key commands:
- Choosing between them depends on your project’s needs regarding history management, update frequency, contribution workflow, and team familiarity.
Both submodules and subtrees are powerful tools for managing dependencies in Git, each offering a different approach to integrating external code into your projects.
Further Reading
- Pro Git Book:
- Chapter 7.11 Git Tools – Submodules: https://git-scm.com/book/en/v2/Git-Tools-Submodules
- Chapter 7.12 Git Tools – Subtree Merging: https://git-scm.com/book/en/v2/Git-Tools-Subtree-Merging (Note: This covers the older “subtree merge strategy” rather than the
git subtree
command, but the concepts are related).
- Official Git Documentation:
git-submodule(1)
: https://git-scm.com/docs/git-submodulegit-subtree(1)
: https://git-scm.com/docs/git-subtree (May be incontrib/subtree
in Git source)
- Atlassian Git Tutorials:
- Git Submodule: https://www.atlassian.com/git/tutorials/git-submodule
- Git Subtree: An Alternative to Submodules: https://www.atlassian.com/git/tutorials/git-subtree
