That is basically it. 99% of your Git work is just creating those objects and manipulating pointers that reference them. If your first commit has the following structure:

Your first commit (how you see it)

Git will store it like that:

Your commit as seen by Git. 3 folders and 3 files create 3 ‘tree’ objects, 3 ‘blob’ objects and a ‘commit’ object.

What happens if we change libs/base_libs/file.py and commit again? Changing the file meaning Git has to create a new ‘blob’ object, since its contents have changed. This also means creating a new ‘tree’ object, because base_libs folder content was changed, same as its parent folder (libs). The new commit will look like this:

This is your second commit. Since we only changed the file in the bottom folder, the other files can be reused.

Notice how the files that weren’t changed are still referenced by git using the same objects. The second and first commit point to the exact same objects. This simple concept is the engine that drives Git. What happens if we change ‘settings.py’ and commit again? Since it’s a file at the root level, changing it will only require creating a new ‘blob’ object along with a new root ‘tree’ object. It won’t have any effect on the ‘libs’ folder, so Git can reuse them in the next commit:

Using this approach, Git doesn’t need to endlessly apply diffs to files to reach some point in your project’s lifetime. A snapshot of your project can be reconstructed by a simple tree traversal, starting from the commit object. This is why Git is not diff-based, but object based.

So Git doesn’t care about diffs at all?

Not exactly. Git tries to be very efficient in storing its objects on disk, since software projects can get bloated very quickly. Git compresses the content of your files (using zlib) but that’s not all. What if I change one line in a big file and make a new commit? According to what we learned, that will require creating a new ‘blob’ object, since its contents have changed. That will result in 2 big objects in git’s object database that are very similar.

Git will occasionally look for those incidents and will try to create ‘packfiles’ that contain several objects in one file. In those ‘packfiles’, Git will utilize the difference between two nearly-identical files, storing one version of the file as a whole, and the other as a delta. The version that will be stored intact is the more recent version, because that’s what you’ll most likely be working with. This technique is called ‘delta compression’, and Git tells you about it all the time, especially when you deal with a remote repository. So now when you see this once-cryptic Git message:

Git’s delta compression in action

You know Git is just being as efficient as it can be.

Sum it up

Git is not diff-based, it is object-based.

Git does not apply diffs to show you a version of your project

to show you a version of your project Git does traverse object trees to show you a version of your project

to show you a version of your project Git does use diffs to minimize disk space for its objects

Git diagrams were taken from ‘Git Internals’ by Scott Chacon under the Creative Commons Attribution-ShareAlike license (which is a great book, you should read it)