When you want to include external code in your git repository, for example a third-party library or code shared with another team, there are two popular options: git-submodule and git-subtree. Unfortunately both suffer from some problems and this led me to start thinking about an alternative way to handle external repositories.

I will first summarize how git-submodule and git-subtree work and discuss their drawbacks. I will then introduce an alternative to these, “git-subrepo” and present a (partial) proof-of-concept implementation. In the following, we assume a single submodule/subtree/subrepo.

git-submodule Submodules are the officially supported way of including external repositories; git-submodule is included with every git installation. A submodule is basically a pointer to a commit of an external git repository. This model works well when you want to include a third-party library in your project that only occasionally needs to be updated. On the other hand, if you use a submodule for a tightly-coupled library to which you will often be making changes, you will find that submodules are not the answer. In this use case, you will simultaneously be making changes to your top-level project and the library. When the time has come to commit these changes, you first need to commit your changes in the submodule, then do the same in your top-level project. Oh, and make sure you push the changes to the submodule before committing and pushing the top-level changes or other people will run into trouble when pulling in your top-level commit. Things get even more exciting when you want to push that submodule commit afterwards to fix things, only to find out that someone sneaked in a commit in the meantime! Now you’ll have to create a new commit on the top-level project to point it to your rebased commit, hah! In short: too much of a hassle and far too easy to make a mess. If that’s not enough to convince you, you should try branching with this setup. You’ll have to manually branch (and later merge) each of your submodules when you create a new branch. In the past, I have worked with a git repository containing a dozen tightly-coupled submodules for code-sharing with other teams. Needless to say, it didn’t take very long to realize that submodules are not suited for that use case.

git-subtree git-subtree avoids most of the problems submodules suffer from. During everyday work, you won’t have to worry about a thing, as git-subtree is only needed when interfacing with external repositories. So, most of the time you are working with a single repository; the library is simply included in your repository as a subdirectory. That means that branching is also transparent. This is infinitely more comfortable than mucking about with submodules. At some point you can choose to have git-subtree extract the changes to your library from the commit history. This will create a new branch with a commit for each commit of the top-level project that includes changes to the library. The root directory of this new branch is equal to the subdirectory where the library resides. The new branch can then be pushed to the library’s remote, where it can be merged with other branches. After all, that’s the point of having a submodule or a subtree. Unfortunately, splitting out these subtree changes can take a long time. And unless you use the --rejoin option, git-subtree needs to perform this splitting over and over for all commits in your history on each split. The --rejoin option merges the new subtree branch back into the main project’s branch, duplicating the existing commits. While this raises no technical worries, your history will be complicated unnecessarily.