Git is a command line-based distributed version control system. Version control system means that its goal is to track and record every change in a set of text files (typically source code, but not exclusively), collectively called a repository. It then possible to check what every changes contained and when it occurred, and to recover and return to any previous state of the tracked files.

Distributed means that these tracked files can be stored on many different computer systems, such as a local computer, remote servers, or in the cloud. None plays the role of main copy, they are all equivalent, and users can synchronise (pull and push) code from one to another.

Git is without any doubt a powerful but relatively difficult piece of software to use. It was developed by Linus Torvalds, the main developer of the Linux Kernel, for the development of the Linux kernel. GitHub, is a web interface to git. It allows to perform the most important functions of git through a friendly and easy to use graphical interface, adding some handy project management and sharing features. We are going to focus on GitHub, an illustrate

the creation of a repository

adding and modifying files

external contributions using forks and pull requests

open, comment and close issues

While the main context in which my colleagues come to know git and GitHub is bioinformatics-related, I’m going to demonstrate it with something many more people are familiar with. We are going to use GitHub to manage a pancakes recipe.

Note that to use GitHub, one needs to create a free account, so that all operations can be tracked and attributed to a person. Here, I will be using two accounts here, namely my main account lgatto, and LaurentGatto.

A new repository

Below, I start by creating a repository under user lgatto by clicking on the + in the top right corner, and choosing New repository.

The next step is to name the repository (I call it pancakes ), provide a short description, choose whether to make it public or private (we choose the former). I also choose to initialise the repository with a README file (by default, it will be README.md , in markdown format).

We won’t use any of the suggested licences here, as those are targeted towards software generally. It we wanted to set one, we would probably use CC-BY and mention it in the README file.

We can now click the Create repository button to actually create it. We obtain the repository lgatto/pancakes populated with the README.md file.

This is a remote repository, as it lives remotely (on one of the GitHub servers). Later, we’ll see how to clone it locally. Note that the remote repository isn’t special in any way, and doesn’t necessarily define the main one. In this case, it happens to be the first one that was created, but we could also have created a local repository and pushed it remotely.

Adding a file

It is simple to create or upload new files to the remote repository by clicking the respective grey buttons (see the grey button on the previous screen-shot). I can click Create new file to open a interface where I can give the file a name (here we use pancakes.md , specifying the markdown file extension), and copy/paste and adapt the recipe from the recipe BBC recipe site.

At the bottom of the page, I provide a commit message, which is a message that is recorded as part of the history of the repository. Using good commit messages is important as it helps understand the changes and the evolution of a repository without the need to look at the actual individual changes.

After clicking Commit new file , I see the new state of my repository, that now contains two files, namely README.md and pancakes.md .

I can inspect the history of that repository by clicking on the commits link under the repository name. At this stage I have two commits:

the initial commit that created the repository with the README file, and the one that added the recipe itself.

On the right, I can read the first couple of characters of the commit tags. It uniquely identifies each commit and, as a consequence, each state of the repository.

Using issues

A very attractive feature of GitHub, which is specific to the web interface and is missing from the git software, is the availability of repository-specific issues. The current repository hasn’t any (open) issue yet, as shown by the Issues (0) tab.

To open a new issue, I select that tab and then press the green New issue button, that opens an issue edition window.

I can now write a new issue using the markdown format. Here, I use an emoji, and could also easily add figures and links to other issues.

Once submitted, the issue nicely displays for anyone to see and comment.

I now switch to the LaurentGatto account and post a comment on this issue which, as a reminder, is an issue in the pancakes repository that belongs to user lgatto .

The comment is publicly visible and others could further comment and discuss.

External contributions

The following feature is probably the one where GitHub particularly shines. It is an extremely powerful mechanism to collaborate and track external contribution to a repository.

As user LaurentGatto , I can choose to fork lgatto ’s pancakes repository (or any publicly available repository for the matter) by clicking on the Fork button in the top right corner. Forking is going to create an exact copy of the repository, including all the commit history (but not the issues) into the new account, while keeping a public record of where it comes from.

The following screen-shot shows the forking transition screen.

Below, we see the LaurentGatto/pancakes repository that, just under the repository title/name, is clearly labelled as a fork of lgatto/pancakes . The little avatar, on the left of the latest commit line, indicates that the last commit was by lgatto - the whole repository history is preserved.

User LaurentGatto can now edit or upload new files. Below, he clicks the little pen on the right to edit the pancakes.md files.

LaurentGatto adds a Tips section and a new bullet point suggesting to enjoy the pancakes with chocolate spread and cold milk. The file update needs a message (and an optional extended description, that is left black here) before the actual commit.

Below, we see the updated pancakes.md file with LaurentGatto ’s repository.

Checking the commit history of the repository, we see that there’s now a third commit Added nutella tip, by LaurentGatto , in addition to the previous ones by lgatto . Note in the title above that these changes have been recorded in LaurentGatto/pancakes , a fork of lgatto/pancakes , and do not exist in lgatto/pancakes at this time.

User LaurentGatto , is he wishes so, can now contribute his changes back to the original repository by sending a Pull request from the identically named tab.

There’s currently no existing pull request (often shorted as PR) under the Pull requests tab, and a new one can be started by clicking the green New pull request button.

LaurentGatto can now see the differences between the original and the forked repositories (i.e. a single commit by user LaurentGatto - the actual difference between the file(s) could be seen scrolling down) and can initiate the pull request by clicking the green Create pull request button.

It can be useful to provide addition comments, or a general description for a PR, before actually sending it back to the original repository. Once the Create pull request button is pressed and the PR is sent, the page udpates to the destination repository, i.e. lgatto/pancakes in this example.

Back in the initial lgatto/pancakes repository, we see that there’s no conflict between the current state of lgatto/pancakes and the PR from LaurentGatto . There could be a conflict if multiple changes affected the same line.

The next screen-shot shows the lgatto/pancake repository as seen by lgatto , where the PR from LaurentGatto has now appeared and can be merged.

Below, the merge is confirmed with a small message. This mechanism however also allows to explicitly review pull requests and ask for specific changes before accepting to merge.

The following screen-shot show the merged PR.

Here we look at the commit history of lgatto/pancake , and we can see that LaurentGatto did a modification and that lgatto merged it into lgatto/pancakes .

Below, we show the previously opened Pancake tips issue (issue #1), how lgatto posts a last comment and closes the issue with the Close and comment green button.

Closed issues aren’t deleted and still visible on the repository.

Navigating commits

If we look back at the commit history, we see towards the unique commit tags and, on their right, buttons that allow to browse a past state of the repository. Below, the commit corresponding to the addition of the pancake recipe is selected (commit message 3898b2c2cab04761b64b78dbe3725e84b585ca2c ).

We are now in the exact state of that commit.

Cloning locally

So far, we have exclusively used GitHub. But for more substantial projects, where source code or analysis reports are written and executed, repositories are managed and updated locally. The creation of a new (local) copy is called cloning, and can be done using the URL under the Clone or download button.

From the command line, the command

git clone git@github.com:lgatto/pancakes.git

will produce a full copy (i.e. all files with complete history) in a local directory. In this case, we see that we have the two files.

The git log command recapitulates the full commit history.

It is of course also possible to add new files from a local repository. Below, we create a new (empty) file with touch waffles.md to store a recipe for waffles. We then in turn add the file to the local repository, commit the addition (with a little commit message), and actually push it to the remote repository. The screen-shot below also illustrates the status command to show the current status of the repository - here, one new added (A) file.

Below, we see the update repository git log output, with the latest commit.

When pushing a commit from a local repository, the files and the history are updated in the remote repository.

Conclusions

What do I use Github for? Many things, including

software development (here’s the MSnbase R/Bioconductor package repository, that has been going on since Oct 4 2010, according to the git log)

project management (here’s a current MSnbase development project) and issue tracking and discussions (the latest MSnbase issue is #425)

collaborative writing papers (here’s the repository of the Ten Simple Rules for Taking Advantage of git and GitHub paper, that was written and managed collaboratively using GitHub)

course development (here’s the repository for my Introduction to bioinformatics course (WSBIM1207) course)

…

In particular, when analysing data for collaborators, we always create a GitHub repository in which all the code goes. Issues are used to discuss specific points, and very often, I ask my collaborators to create a Github account to join in, and ask project-related questions directly on Github rather than in scattered emails. This allows to track all discussion about a project in the same place.

Finally, while it isn’t a replacement for a proper backup solution (the size of file on GitHub is limited), it can be used as such. In addition, even for a single user, it allows to easily work on different computers while keeping track of changes, irrespective of where they were done.

Further reading and references