This post is also available in French.

A month ago we were exploring Git submodules; I told you then our next in-depth article would be about subtrees, which are the main alternative. Update March 25, 2016: I removed all the parts about our now-deprecated git-stree tool. You should look at the awesome git-subrepo project instead if you want that kind of goodness. As before, we’ll dive deep and perform every common use-case step by step to illustrate best practices.

Subtree fundamentals A quick reminder of terminology first: with Git, a repo is local. The remote version, which is mostly use for archiving, collaboration, sharing, and CI triggers, is called a remote. In the remainder of this text, whenever you read “repo” or “Git repo”, remember it’s your local, interactive repo (that is, with a working directory alongside its .git root). With subtrees, there are no nested repos: there’s only one repo, the container, just like a regular codebase. That means just one lifecycle, and no special tricks to keep in mind for commands and workflows, it’s business as usual. Ain’t life sweet?

Three approaches: pick one! There are three technical ways to handle your subtrees; although it’s sometimes possible to mix these approaches, I recommend you pick one and stick with it, at least on a per-repo basis, to avoid trouble. The manual way Git does not provide a native subtree command, unlike what happens for submodules. Subtrees are not so much a feature as they are a concept, an approach to managing embedded code with Git. They mostly rely on the adequate use of classic porcelain commands (mostly merge and cherry-pick ), along with a plumbing one ( read-tree ). The manual approach works everywhere, and is actually quite simple, but requires a good understanding of the underlying notions so you execute the few procedures properly. We’ll use that as a starting point, because it offers the best degree of control over operations, and leaves us with complete freedom in how we manage history (including its graph) and branches… The git subtree contrib script In June 2012, with version 1.7.11, Git started bundling a third-party contrib script name git-subtree.sh in its official distro; it went as far as adding a git-subtree binding to it among its installed binaries, so that you could type git subtree and feel like it were a “native” command. Integration stops there, however; the “documentation” is not a man page, and is therefore not installed as such. The usual help calls ( man git-subtree , git help subtree or git subtree --help ) are not implemented. A git subtree with no arguments dumps a short synopsis, without further info. Only the text file linked at the beginning of this paragraph provides info, and it is buried down in the contrib/ directory of your Git install. This script, that I will henceforth refer to as git subtree , has a few notable merits: mostly it is robust and offers familiar syntaxes ( add , pull , push …) on top of operations that are sometimes complex. However, it also comes with a few operations (e.g. split ) and notions (e.g. --ignore-joins and --rejoin ) that are rather confusing at first, not to mention its very peculiar understanding of --squash … Most importantly, it maintains a subtree-specific “branch” that gets merged on every git subtree pull and git subtree merge . This means it will clutter your graph forever, and I, for one, have a strong distaste for this. Another issue is, it won’t let you pick which local subtree commits to backport with git subtree push : it’s an all-or-nothing affair. This contradicts one of the key benefits of subtrees, which is to be able to mix container-specific customizations with general-purpose fixes and enhancements. Still, it’s been here for a while and has therefore been considerably tested (both in the test suite and battle-testing sense), which is not to be dismissed. git-subrepo For a while, we used our own custom solution, named git-stree , that did a reasonable job meeting all our needs, but had a number of dusty corner cases where it would just fall apart. This article used to detail that tool, but starting March 25, 2016 it’s officially deprecated. This is in favor of a wonderful third-party tool called git-subrepo . If you want to play with subrepo management in a flexible, well-tested, well-documented and rock-solid way, check it out. This article won’t demonstrate the git-subrepo approaches just now, but rest assured they work. We may find time for that in the future. In the meantime, their docs and guides are great, give it a spin!

Subtrees, step by step So, let’s start exploring every common use-case for subtrees in a collaborative project; we’ll detail each of the three approaches, every time. In order to facilitate your following along, I’ve put together a few example repos with their “remotes” (actually just directories). You can uncompress the archive wherever you want, then open a shell (or Git Bash, if you’re on Windows) in the git-subs directory it creates: Download the example repos You’ll find three directories in there: main acts as the container repo, local to the first collaborator,

acts as the container repo, local to the first collaborator, plugin acts as the central maintenance repo for the module, and

acts as the central maintenance repo for the module, and remotes contains the filesystem remotes for the two previous repos. In the example commands below, the prompt always displays which repo we’re into. If you’d like to test out multiple approaches in parallel, I suggest you duplicate the unzipped root git-subs directory as many times as you need (once, or twice) so you can compare the procedures as you go. Our subtree structure It’s pretty simple: . ├── README.md ├── lib │ └── index.js └── plugin-config.json Every time, we’ll want to use that subtree in our container codebase, in the vendor/plugins/demo subfolder.

Adding a subtree Manually Let’s start by defining a named remote for our subtree’s central repo, so we don’t clutter our CLIs with its path/URL later: manually/main ( master u = ) $ git remote add plugin .. /remotes/plugin manually/main ( master u = ) $ git fetch plugin warning: no common commits remote: Counting objects: 11 , done. remote: Compressing objects: 100 % ( 9 /9 ) , done. remote: Total 11 ( delta 1 ) , reused 0 ( delta 0 ) Unpacking objects: 100 % ( 11 /11 ) , done. From .. /remotes/plugin * [ new branch ] master - > plugin/master manually/main ( master u = ) $ We now need to update our index with the contents of this plugin’s master branch, and update our working directory with it; and all this needs to happen in the proper subfolder, too. This is what read-tree does. We’ll use the -u option so the working directory is maintained along with the index. manually/main ( master u = ) $ git read-tree \ --prefix = vendor/plugins/demo -u plugin/master manually/main ( master + u = ) $ git status On branch master Your branch is up-to-date with 'origin/master' . Changes to be committed: ( use "git reset HEAD <file>..." to unstage ) new file: vendor/plugins/demo/README.md new file: vendor/plugins/demo/lib/index.js new file: vendor/plugins/demo/plugin-config.json Awesome. Now let’s finalize that with a commit: manually/main ( master + u = ) $ git commit \ -m "Added demo plugin subtree in vendor/plugins/demo" [ master 76b347a ] Added demo plugin subtree in vendor/plugins/demo 3 files changed, 19 insertions ( + ) create mode 100644 vendor/plugins/demo/README.md create mode 100644 vendor/plugins/demo/lib/index.js create mode 100644 vendor/plugins/demo/plugin-config.json manually/main ( master u+1 ) $ There we are! Nothing too fancy! With git subtree Here too, naming the subtree’s remote will shorten later CLI calls. No need for a manual fetch though: git subtree will do it when necessary. We’ll use its add subcommand: git-subtree/main ( master u = ) $ git remote add plugin \ .. /remotes/plugin/ git-subtree/main ( master u = ) $ git subtree add \ --prefix = vendor/plugins/demo plugin master git fetch plugin master warning: no common commits remote: Counting objects: 11 , done. remote: Compressing objects: 100 % ( 9 /9 ) , done. remote: Total 11 ( delta 1 ) , reused 0 ( delta 0 ) Unpacking objects: 100 % ( 11 /11 ) , done. From .. /remotes/plugin * branch master - > FETCH_HEAD * [ new branch ] master - > plugin/master Added dir 'vendor/plugins/demo' git-subtree/main ( master u+4 ) $ OK, notice the last prompt: the command merged our plugin’s history with our container’s. Let’s verify that with a log: git-subtree/main ( master u+4 ) $ git log --oneline --graph \ --decorate * 32e539d ( HEAD, master ) Add 'vendor/plugins/demo/' from… | \ | * fe64799 ( plugin/master ) Fix repo name for main project… | * 89d24ad Main files ( incl. subdir ) for plugin, to populate its… | * cc88751 Initial commit * b90985a ( origin/master ) Main files for the project, to populate… * e052943 Initial import If you’re like me, you’re not too fond of polluting your container history with the commit details from the subtree… You might think we have a solution in the --squash option git-subtree offers on its add , pull and merge subcommands. After all, git merge --squash produces a squash commit instead of a regular merge, which better matches what we’re after. Think again: git-subtree/main ( master u+4 ) $ git reset --hard @ { u } HEAD is now at b90985a Main files for the project, to populate its… git-subtree/main ( master u = ) $ git subtree add \ --prefix = vendor/plugins/demo --squash plugin master git fetch plugin master From .. /remotes/plugin * branch master - > FETCH_HEAD Added dir 'vendor/plugins/demo' git-subtree/main ( master u+2 ) $ Noticed the u+2 in the prompt, instead of u+1 ? Let’s check: git-subtree/main ( master u+2 ) $ git log --oneline --graph \ --decorate * 352af7a ( HEAD, master ) Merge commit '03e04026fdba2ff1200a226c3… |\ | * 03e0402 Squashed ' vendor/plugins/demo/' content from commit… * b90985a ( origin/master ) Main files for the project, to populate… * e052943 Initial import There you have it… Instead of doing a regular squash commit, it squashes the subtree’s history, makes a commit out of it its dedicated “branch” (not an actual branch, but an unnamed, untagged sequence of commits), and merges that. This behavior makes sense when considering the technical implementation of git subtree and the features it offers. Still, I don’t like it. I just don’t think it’s worth polluting your graph like that (as you’ll see in later updates, it gets ugly pretty fast).

Grabbing/updating a repo that uses subtrees Alright! Now that we saw how to add a subtree, what do our colleagues have to do to get these in their local repos? After all, if we were to use submodules, they’d need either a git clone --recursive to grab it, or the bulletproof sequence of git fetch + git submodule sync --recursive + git submodule update --init --recursive for an existing repo. Ain’t life fun. Well, you know what? With subtrees, they don’t need to do anything special. The reason is simple: there’s just one repo: the container. With subtrees, cloning/pulling just works. Too good to be true? Let’s check. We’ll start by sharing our commit(s) that added the subtree, so our colleagues can clone or pull their repos from the remote. For every copy of the test folder you made, use a git push . git-subtree/main ( master u+2 ) $ git push … manually/main ( master u+1 ) $ git push … To get an up-to-date repo, you just need a regular clone/pull. This works regardless of your original adding approach, so I’ll just show it once: manually/main ( master u = ) $ cd .. manually $ git clone remotes/main colleague Cloning into 'colleague' .. . done. manually $ cd colleague manually/colleague ( master u = ) $ tree vendor vendor └── plugins └── demo ├── README.md ├── lib │ └── index.js └── plugin-config.json 3 directories, 3 files (In the Git Bash you get on Windows, you won’t have the tree command; same for OSX or various bare-bones Linux distros: you’ll need to install the command. If you don’t have it, just check the tree using your file explorer or a basic ls -lR command instead.)

Updating a subtree in-place in the container It can happen that subtree code can only be used or tested inside container code; most themes and plugins have such constraints. In that situation, you’ll be forced to evolve your subtree code straight inside the container codebase, before finally backporting it to its remote. Another common occurrence, which subtrees are good at but submodules cannot cleanly accommodate, is the need to customize the subtree’s code in a container-specific way, without pushing these changes back upstream. You should be careful to distinguish between both situations, putting each use-case into its own commits. On the other hand, when subtree changes require adjustments in the rest of the container code, you don’t have to make two separate commits for it (one for subtree code, one for container code): the commands we’ll use later for backporting can figure the split out, and this will spare you a failing-tests, partly-implemented commit in the container codebase… Regardless of the selected approach, these updates are freely performed on the container codebase, which is the unique repo we’re dealing with when performing them. Collaborators don’t need any special procedure: the subtree has no special status. This is an enormous advantage over submodules, for which this section would be waaaay longer… Subtree updates can be freely performed within the container codebase. Let’s unroll a scenario in which we’ll mix four types of commits: Commits touching only the subtree , intended for backport (e.g. fixes);

, intended for backport (e.g. fixes); Commits touching only container code ;

; Commits touching both container and subtree code , the latter part being intended for backport ;

, the latter part being ; Commits touching only the subtree, in a container-specific way that is not to be backported. You should copy-paste the following set of commands (intentionally listed without prompts) in the main folder of every copy you made (one per approach). Make sure you read the commands’ output and check nothing seems to break, though! You never know… git push echo '// Now super fast' >> vendor/plugins/demo/lib/index.js git commit -am "[To backport] Faster plugin" date >> main-file-1 git commit -am "Container-only work" date >> vendor/plugins/demo/fake-work date >> main-file-2 git commit -am "[To backport] Timestamping (requires container tweaks)" echo '// Container-specific' >> vendor/plugins/demo/lib/index.js git commit -am "Container-specific plugin update"

Backporting to the subtree’s remote Now let’s see how to backport the necessary commits, once for each approach. We’ll start by looking at our recent commits to keep our history fresh in mind: manually/main ( master u+4 ) $ git log --oneline --decorate --stat -5 28e310b ( master ) Container-specific plugin update vendor/plugins/demo/lib/index.js | 1 + 1 file changed, 1 insertion ( + ) 71d2d12 [ To backport ] Timestamping ( requires container tweaks ) main-file-2 | 1 + vendor/plugins/demo/fake-work | 1 + 2 files changed, 2 insertions ( + ) c693673 Container-only work main-file-1 | 1 + 1 file changed, 1 insertion ( + ) 92bc02d [ To backport ] Faster plugin vendor/plugins/demo/lib/index.js | 1 + 1 file changed, 1 insertion ( + ) 4f758af ( origin/master ) Updated the plugin vendor/plugins/demo/fake-work | 2 ++ 1 file changed, 2 insertions ( + ) Manually We could create synthetic commits in the middle of nowhere, but that’s fugly. I favor creating a local branch specifically for backporting, and have it track the proper remote for our plugin: manually/main ( master u+4 ) $ git checkout -b backport-plugin \ plugin/master manually/main ( backport-plugin u = ) $ Now let’s cherry-pick the commits we’re interested in (adding a -x into the mix so the commit message has extra lines detailing the source for each cherry pick). manually/main ( backport-plugin u = ) $ git cherry-pick -x master~3 [ backport-plugin 953ec4d ] [ To backport ] Faster plugin Date: Thu Jan 29 21 :54:45 2015 +0100 1 file changed, 1 insertion ( + ) manually/main ( backport-plugin u+1 ) $ git cherry-pick -x \ --strategy = subtree master^ [ backport-plugin 34f50a4 ] [ To backport ] Timestamping ( requires con… Date: Thu Jan 29 21 :55:00 2015 +0100 1 file changed, 1 insertion ( + ) manually/main ( backport-plugin u+1 ) $ git log --oneline \ --decorate --stat -2 34f50a4 ( HEAD, backport-plugin ) [ To backport ] Timestamping ( requir… fake-work | 1 + 1 file changed, 1 insertion ( + ) 953ec4d [ To backport ] Faster plugin lib/index.js | 1 + 1 file changed, 1 insertion ( + ) manually/main ( backport-plugin u+2 ) $ git push plugin HEAD:master Counting objects: 7 , done. Delta compression using up to 4 threads. Compressing objects: 100 % ( 6 /6 ) , done. Writing objects: 100 % ( 7 /7 ) , 877 bytes | 0 bytes/s, done. Total 7 ( delta 2 ) , reused 0 ( delta 0 ) To .. /remotes/plugin dc995bf .. 34f50a4 backport-plugin - > master Just like with git merge -s subtree plugin/master earlier on, Git’s builtin directory heuristics usually do just fine. Astute readers will probably have noticed that we didn’t even have to specify the subtree strategy whenever the heuristics worked out, thanks to non-ambiguous paths in our working directories (the backport branch has different, unprefixed contents). However, it is prudent to specify --strategy=subtree ( -s means something else in cherry-pick ) to make sure files outside of the subtree (elsewhere in container code) will get quietly ignored, as would happen for main-file-2 in master^ . If you forget this option, Git will refuse to complete the cherry-pick, as it would believe our side ( backport-plugin ) just removed that file (you’d see a deleted by us conflict). So you’d better use that specific option all the time, just to be on the safe side. The log above confirms the backported files are put in the “plugin root,” properly unprefixed. And the final push lets us publish that backport to the central remote for the plugin. With git subtree Sure, there’s a pretty git subtree push subcommand, but it has a significant drawback: it backports every single commit that touched the subtree. You can’t pick the relevant commits. So our last commit, which was container-specific, gets cargoed along… Grmbl. This is not what we want here, but I’ll show you the command anyway: git-subtree/main ( master u+4 ) $ git subtree push \ -P vendor/plugins/demo plugin master git push using: plugin master -n 1 / 10 ( 0 ) -n 2 / 10 ( 1 ) -n 3 / 10 ( 2 ) -n 4 / 10 ( 2 ) -n 5 / 10 ( 3 ) -n 6 / 10 ( 3 ) -n 7 / 10 ( 4 ) -n 8 / 10 ( 5 ) -n 9 / 10 ( 6 ) -n 10 / 10 ( 7 ) Counting objects: 11 , done. Delta compression using up to 4 threads. Compressing objects: 100 % ( 9 /9 ) , done. Writing objects: 100 % ( 11 /11 ) , 1.11 KiB | 0 bytes/s, done. Total 11 ( delta 4 ) , reused 0 ( delta 0 ) To .. /remotes/plugin/ 2872e5d .. e857a74 e857a74119c3e1c1b237b367c4a6c8f79deca1a7 - > m… git-subtree/main ( master u+4 ) $ git log --oneline --decorate -4 \ plugin/master e857a74 ( plugin/master ) Container-specific plugin update ddabc13 [ To backport ] Timestamping ( requires container tweaks ) 73a22ea [ To backport ] Faster plugin 2872e5d Pseudo-commit Note the latest (top) backport, that we don’t want here…

Removing a subtree It’s just a directory in your repo. A good ol’ git rm will do, regardless of the approach you used. main ( master u = ) $ git rm -r vendor/plugins/demo rm 'vendor/plugins/demo/README.md' rm 'vendor/plugins/demo/fake-work' rm 'vendor/plugins/demo/lib/index.js' rm 'vendor/plugins/demo/plugin-config.json' main ( master + u = ) $ git commit -m "Removing demo subtree" [ master 3893865 ] Removing demo subtree 4 files changed, 24 deletions ( - ) delete mode 100644 vendor/plugins/demo/README.md delete mode 100644 vendor/plugins/demo/fake-work delete mode 100644 vendor/plugins/demo/lib/index.js delete mode 100644 vendor/plugins/demo/plugin-config.json main ( master u+1 ) $

Turning a directory into a subtree This is the last “fun” use-case: you want to take code that always was an integral part of your container codebase, and extract it for sharing between multiple codebases. Let’s start by creating a “local remote” folder. You can copy-paste these: cd .. mkdir remotes/myown cd remotes/myown git init --bare cd .. / .. /main We’ll then perform a series of mixed commits touching (or not) a subdirectory in our codebase. I’ll re-use the earlier commands, but change the directory name. Just copy-paste the commands below in your “manually” copy and, if you played with git subtree , in the matching copy as well: mkdir -p lib/plugins/myown/lib echo '// Yo!' > lib/plugins/myown/lib/index.js git add lib/plugins/myown git commit -m "Plugin sez: Yo, dawg." date >> main-file-1 git commit -am "Container-only work" echo '// Now super fast' > lib/plugins/myown/lib/index.js date >> main-file-2 git commit -am "Faster plugin (requires container tweaks)" git push This should create three commits, two of which touch the lib/plugins/myown subdirectory. Then it publishes to the remote (just to avoid +n in our prompts in the following examples). Manually The idea is to create a special branch for the future subtree, and filter down its history so it only keeps commits that touched the subdirectory, rewriting the tree root as it goes. This sounds like heavy-lifting, but it precisely matches what a mode of the “bulldozer” git filter-branch command does: the --subdirectory-filter option. See for yourself: manually/main ( master u = ) $ git checkout -b split-plugin manually/main ( split-plugin ) $ git filter-branch \ --subdirectory-filter lib/plugins/myown Rewrite 973cfacecb645f66b89accedac8780c19140401b ( 2 /2 ) Ref 'refs/heads/split-plugin' was rewritten manually/main ( split-plugin ) $ git log --oneline --decorate 5af0de1 ( HEAD, split-plugin ) Faster plugin ( requires container twe… 4fc711a Plugin sez: Yo, dawg. manually/main ( split-plugin ) $ tree . └── lib └── index.js 1 directory, 1 file (Again, tree is not necessarily available on your setup; if it’s missing, go with a basic ls -lR instead.) Now we just need to push that to the proper remote: manually/main ( split-plugin ) $ git remote add myown \ .. /remotes/myown manually/main ( split-plugin ) $ git push -u myown \ split-plugin:master Counting objects: 8 , done. Delta compression using up to 4 threads. Compressing objects: 100 % ( 2 /2 ) , done. Writing objects: 100 % ( 8 /8 ) , 617 bytes | 0 bytes/s, done. Total 8 ( delta 0 ) , reused 0 ( delta 0 ) To .. /remotes/myown * [ new branch ] split-plugin - > master Branch split-plugin set up to track remote branch master from myow… manually/main ( split-plugin u = ) $ At this stage, you can kill the backport branch if you think you won’t need it anymore for later backports. Otherwise just let it be… There’s no need to replace the lib/plugins/myown subdirectory in master with the result of a read-tree , either: future merge -s subtree --squash calls will work just fine, as if you had injected the contents as a subtree in the first place. Isn’t it handy? With git subtree There is a split subcommand intended for about the same thing. Assuming you copy-pasted the series of commit commands from earlier, it would look like this: git-subtree/main ( master u = ) $ git subtree split \ -P lib/plugins/myown -b split-plugin -n 1 / 14 ( 0 ) -n 2 / 14 ( 1 ) -n 3 / 14 ( 2 ) -n 4 / 14 ( 3 ) -n 5 / 14 ( 4 ) -n 6 / 14 ( 5 ) -n 7 / 14 ( 6 ) -n 8 / 14 ( 7 ) -n 9 / 14 ( 8 ) -n 10 / 14 ( 9 ) -n 11 / 14 ( 10 ) -n 12 / 14 ( 11 ) -n 13 / 14 ( 12 ) -n 14 / 14 ( 13 ) Created branch 'split-plugin' a54c695c65db858a68720dd9b93061ea28d13243 git-subtree/main ( master u = ) $ You can then repeat the remote-updating commands we had, to publish the final result: git-subtree/main ( split-plugin ) $ git remote add myown \ .. /remotes/myown git-subtree/main ( split-plugin ) $ git push -u myown \ split-plugin:master Counting objects: 8 , done. Delta compression using up to 4 threads. Compressing objects: 100 % ( 2 /2 ) , done. Writing objects: 100 % ( 8 /8 ) , 556 bytes | 0 bytes/s, done. Total 8 ( delta 1 ) , reused 0 ( delta 0 ) To .. /remotes/myown * [ new branch ] split-plugin - > master Branch split-plugin set up to track remote branch master from myow… git-subtree/main ( split-plugin u = ) $ However, git subtree will refuse later squash pulls, as it doesn’t find any trace of earlier adds, and it doesn’t rely on Git’s builtin heuristics to figure it out, using its own technical implementation instead: git-subtree/main ( split-plugin u = ) $ git checkout master git-subtree/main ( master u = ) $ git subtree pull --squash \ -P lib/plugins/myown myown master From .. /remotes/myown * branch master - > FETCH_HEAD Can 't squash-merge: ' lib/plugins/myown' was never added. Long story short, you either forget about squashes and merge the subtree’s history from now on (ugh!), or replace the legacy subdirectory with a formal subtree addition: git-subtree/main ( master u = ) $ git rm -r lib/plugins/myown rm 'lib/plugins/myown/lib/index.js' git-subtree/main ( master + u = ) $ git commit \ -m "Removing lib/plugins/myown for subtree replacement" [ master bf59e62 ] Removing lib/plugins/myown for subtree replacement 1 file changed, 1 deletion ( - ) delete mode 100644 lib/plugins/myown/lib/index.js git-subtree/main ( master u+1 ) $ git subtree add \ -P lib/plugins/myown --squash myown master git fetch myown master From .. /remotes/myown * branch master - > FETCH_HEAD Added dir 'lib/plugins/myown' git-subtree/main ( master u+3 ) $ After that, git subtree pull -P lib/plugins/myown --squash myown master will work… Now that’s a nice set of flaming hoops to jump through…

So, which approach should I use? The important thing is to grok the manual approach: it lets you do what you want, however you want to do it, and therefore devise a series of commands that best fits your strategic choices about branches, commits, backports, etc.