Git == DB

Git is actually a data structure which stores key-value data. For every value we will add in the repository we will get a unique key with whom we could than get the value. Git uses 2 concepts in order to save the data: blobs & trees.

Blobs

We can use the `git hash-object` command to get the calculated key for a specified file and also to create a new object in the repository:

We got SHA-1 of the object we just created which is also named “blob”. Now that we know that Git stores all different data & information in key-value like data structure let’s see where it stores it:

Interesting… a folder named 84 maybe connected to our SHA-1 hash which starts with the same letters? Let’s get into this folder:

Git is actually uses the 2 first characters to order the objects in the repository. For every object there is a saved Zlib file.

Trees

As we said earlier Git uses blobs to save the state of single files. Things are becoming much more complex when we need to save the connections between those files, and a connection between a blob that represents contents to the path. In order to solve this complexity a new concept has been introduced in Git which is named Trees. Tree is another object which saves the contents alongside with the blob’s format, let’s examine the object’s format:

{file-mode} {object-type} {object-hash}\t{file-name}



The file-mode field is responsible for saving the permissions of every object in the tree. When Git copies the files to the working directory it needs to save the original permissions thus this information is being saved in the creation of the tree itself.

The following are the possible values for the file-mode field:

040000 — for a folder

100644 — for read only file

100664 — for read/write only file

100755 — for an execution file

120000 — for a symbolic link file

160000 — for a Git link

Every tree in Git can point to other trees, or in different words, a tree serves Git to represent a folder. In this way, Git represents the files & folders structure by a basic tree format.

Let’s verify our understanding on this concept, we’ll create a new folder (in our current directory) and we’ll copy the `1.txt` file to it:

Because we didn’t change the contents of the file, the SHA1 calculation of the file has not been changed.

102633 blob 352806164C31AC7F77CA29CE0F78C13D357F30B0 1.txt

We can also point our newly created tree to our previous tree.

102633 blob 352806164C31AC7F77CA29CE0F78C13D357F30B0 1.txt

100644 blob 468A779BA3D7D115B10D71ECD6FB9AC4E50B8E59 2.txt

100644 blob 28AB96C2D9620D1D4F2CC9FD74AFACB1CD16621D newfile.txt

040000 tree 6d2b647a0bb32c9e648ed130afbbfd2608427a23 internal-dir

Basics, Basics are important

I don’t think that the basics that kids need have changed in 10,000 years.

Git add

When we run the `git add` command 2 things are happening behind the scenes:

New objects are being created in the objects directory which represent the contents we just add — blobs for files and trees for directories.

New metadata that points to those created objects is being written to the index file.

When we don’t add new files but only change the contents of given files — Git still goes through those 2 phases. As we said earlier, Git doesn’t care what is the change that has been done to the file — it just calculates SHA-1 based on the contents of the file.

Git commit

Git is basically a wrapper on top of the trees concept. Let’s understand this, When we run `git commit`, Git is building a new tree object from the information that is being stored in the index file and saves the object to the objects directory.

Let’s see this in action:

We can see that a new object of type commit has been created, every objects contains the following data:

The Tree’s object id

Information about the commit author

GPG commit signature

Commit message

The calculated SHA-1 to the commit object is actually the commit ID we’re already familiar with:

Git log

`git log` helps us in viewing the whole repository history, or in other words the commits objects relations.

Let’s create a new commit:

After we have created a new commit, an object with the same ID has been created. The tree of that commit points to the same files but on a new blob for the file we just changed. The difference between the old commit object is the parent field, which points to the old commit object. Meaning, by a set of commits we can create history logs:

`glog` command output

How git knows what is the working directory?

In Git we can go back in time, in order to go back in Git we’ll use the `git checkout` command. As you can already understand by using `git checkout` Git replaces the commit object and the contents of the index file. So how Git knows where we are now? It uses the .git/HEAD file

In this example we have seen how `git checkout` replaces the contents of the files and the .git/HEAD file which points to the current commit.

Let’s try now to use `git checkout` with branches

We’ve changed the branch to master, which points to the same commit exactly. Behind the scenes we can see that the value that is written in the HEAD file changed from the value of the commit to the path of a file that matches the master branch. The contents of that path is exactly the same commit we’ve already saw in the previous step.

Git branch

In the same way that the master points to the a specific commit, in every branch creation a new file is being created with the contents of a SHA1 encoded file, let’s examine this:

Because `branch1` is not inside master, they are pointing to the same commit object.

Summary

In this article we learned how different git commands work behind the scenes. We learned about the index file, it’s purpose, format and how it serves Git. Than, we learned about the building blocks of Git — trees & blobs — their contents & usage.

You can be an expert in Git without having this knowledge, but I’m sure that this knowledge will help you play with Git with more confidence.