Yesterday I was curious about how git push works over SSH. I’m getting more used to using strace to figure this kind of thing out, so I gave it a shot. If I strace pushing to this site’s repository, this shows up:

[pid 15943] execve("/usr/bin/ssh", ["ssh", "git@github.com", "git-receive-pack 'kamalmarhubi/w"...], [/* 51 vars */]) = 0

So git push eventually calls ssh git@github.com git-receive-pack <repo-path> . Trying this out at my terminal gives me this:

$ ssh git@github.com git-receive-pack kamalmarhubi/website 00bb29793c39c8e4bfec627d60938c4ed2086cc60bb1 refs/heads/gh-pagesreport-status delete-refs side-band-64k quiet atomic ofs-delta agent=git/2:2.4.8~upload-pack-wrapper-script-1211-gc27b061 003f04bfcb3e238e5660ae9e71a6ce99f472211fe85f refs/heads/master 0000

with the terminal waiting for my input. SSH is used to handle authentication and remote connection, and then it runs a command at the other end to handle the data exchange. These lines are the start of that exchange.

A tiny bit of looking around the internet told me that the protocol is made up of lines prefixed by their length as 4 hex digits. Then it looks like a commit SHA-1 and a ref. The sender terminates with 0000 .

There are a couple of lines here, one for each branch in the repository. The first line additionally has a bunch of stuff at the end that looks like a description of what the sending program is and some features it supports.

While I was looking into this, I used xsel to copy the output to paste into an editor. This was really confusing, because all that got pasted was the first line without all the metadata!

00bb29793c39c8e4bfec627d60938c4ed2086cc60bb1 refs/heads/gh-pages

Looking at the entire output through hexdump -C , it turns out that there’s a null byte after refs/heads/gh-pages , and then a newline at the end (marked with * below):

00000000 30 30 62 62 32 39 37 39 33 63 33 39 63 38 65 34 |00bb29793c39c8e4| 00000010 62 66 65 63 36 32 37 64 36 30 39 33 38 63 34 65 |bfec627d60938c4e| 00000020 64 32 30 38 36 63 63 36 30 62 62 31 20 72 65 66 |d2086cc60bb1 ref| 00000030 73 2f 68 65 61 64 73 2f 67 68 2d 70 61 67 65 73 |s/heads/gh-pages| 00000040 *00*72 65 70 6f 72 74 2d 73 74 61 74 75 73 20 64 |.report-status d| 00000050 65 6c 65 74 65 2d 72 65 66 73 20 73 69 64 65 2d |elete-refs side-| 00000060 62 61 6e 64 2d 36 34 6b 20 71 75 69 65 74 20 61 |band-64k quiet a| 00000070 74 6f 6d 69 63 20 6f 66 73 2d 64 65 6c 74 61 20 |tomic ofs-delta | 00000080 61 67 65 6e 74 3d 67 69 74 2f 32 3a 32 2e 34 2e |agent=git/2:2.4.| 00000090 38 7e 75 70 6c 6f 61 64 2d 70 61 63 6b 2d 77 72 |8~upload-pack-wr| 000000a0 61 70 70 65 72 2d 73 63 72 69 70 74 2d 31 32 31 |apper-script-121| 000000b0 31 2d 67 63 32 37 62 30 36 31*0a*30 30 33 66 37 |1-gc27b061.003f7| 000000c0 39 32 66 34 39 36 65 37 35 33 64 62 39 33 33 30 |92f496e753db9330| 000000d0 66 30 61 34 65 38 32 39 30 62 38 61 36 63 62 61 |f0a4e8290b8a6cba| 000000e0 38 61 62 36 64 61 62 20 72 65 66 73 2f 68 65 61 |8ab6dab refs/hea| 000000f0 64 73 2f 6d 61 73 74 65 72 0a 30 30 30 30 |ds/master.0000| 000000fe

Without doing any research, here’s what I think happened. The git folks defined the fairly simple length-prefixed, newline-separated protocol. Then at some point they wanted to add some metadata to the protocol without breaking compatibility with older versions of git. They came up with a nifty hack that exploits C’s null-terminated strings: add the metadata after a null byte but before the newline. This way, reading up to a newline will get all the metadata. The metadata-processing code knows to look past the null byte, but the existing protocol code would see only the part up before it, presumably letting it worked unchanged!

And when I copied it using xsel , the stuff past the null byte got skipped.

Cute hack, and mystery solved!