Getting Git, part 3 #

hacking, February 3rd 2008

Note: updated to correct a logical error. When development converges, a child will have multiple parents — not vice-versa. Kudos to Johannes Grødem for sharp eyes and a heads-up.

Intermission

I've learned that some people are reading this and wondering if they really need to know all this to use Git?.

No.

If all you're using Git for is "edit-diff-commit, edit-diff-commit" cycle, you don't. This seems to be 99% of what many people use a VCS for, and there is nothing wrong with that. You can even go quite a bit beyond that, and still you don't need to know anything about what's really going on — just follow the a simple recipe, and you're good to go.

It's when you move beyond the recipe level that you need to understand the model the VCS uses, just like you need to with CVS, Subversion, Darcs, or any other VCS. If you don't think that's true, you've either internalized the model without noticing, or you're just hammering out recipes.

Specifically, this series is written to teach enough of the Git model to be able to look at a bunch of disparate branches you need to merge somehow, figure out the kind of history you want to build, and then do that. No recipe in the world is going to to this for the general case: you need to know what is going on before you can decide what you want.

What's In A Commit

So, where were we? Ah, storing history. If you haven't read it yet, here's where you can read the story so far.

Commits are the third kind of object stored in the object database. You could call commit "a moment in history", but let's see exactly what it contains:

Tree : the entire contents of the directory tree associated with the commit.

Parent(s) : a commit has one or more parent commits. A parent commit is the "previous" commit: the changes introduced by a commit can be seen by comparing the trees of the commit and its parent(s). The common case is a single parent: this represents normal linear development. Multiple parents represent converging lines of development. A commit with multiple parents is called a merge commit . Two parents is the norm, but multiple branches can be merged with a single commit.

Comment : some text describing the commit.

Committer : the person who actually created the commit, and the date this was done.

Author: the person responsible for the change represented by the commit, and the date. Often the author and the committer are the same, but when eg. submitting patches by email Git automatically preserves information about the original author.

Commits form a DAG via parents. When development diverges, multiple children will share that same parent. When development converges, a single child will have multiple parents. If this is not clear, get a piece of paper, and draw a few dags — it's more effective then any fancy graphic I might cook up.

So, if you have hold of a commit object, you have hold of the entire history up to that point — but you don't know anything about the future. In other words:

Back to our regularly scheduled Erlang envy. Review: What is a commit object? Can multiple commits refer to a single tree; if so, what does it mean; if not, why not? Can multiple commits share parents; if so, what does it mean; if not, why not? If the tip of a branch is a commit, can you guess what the history of the branch is?

Next time: tagsoup.

Also, here's some moral support for me: Git is the next Unix says apenwarr. I may not agree with the metaphor, but it's a nice read:

Git was originally not a version control system; it was designed to be the infrastructure so that someone else could build one on top. And they did; nowadays there are more than 100 git-* commands installed along with git. It's scary and confusing and weird, but what that means is git is a platform. It's a new set of nouns and verbs that we never had before. Having new nouns and verbs means we can invent entirely new things that we previously couldn't do.