Well, what makes git super fast? A look into git’s underbelly..
Before i begin, i will be setting up an empty repository.
nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ git init Initialized empty Git repository in /home/nikhil/dev/blog/git/.git/ nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ ls -a . .. .git
Also, it can be seen that initializing the repository creates a .git directory, and see the contents of the directory. As you can see, the objects folder is empty. Git has initialized the
objects directory and created
info subdirectories in it, but there are no regular files.
nikhil@nikhil-Inspiron-3537:~/dev/blog/git/.git$ tree . |-- branches |-- config |-- description |-- HEAD |-- hooks | |-- applypatch-msg.sample | |-- commit-msg.sample | |-- post-update.sample | |-- pre-applypatch.sample | |-- pre-commit.sample | |-- prepare-commit-msg.sample | |-- pre-rebase.sample | `-- update.sample |-- info | `-- exclude |-- objects | |-- info | `-- pack `-- refs |-- heads `-- tags 9 directories, 12 files
At the core of Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time. To demonstrate, you can use the plumbing command
hash-object, which takes some data, stores it in your
.git directory, and gives you back the key the data is stored as. Note that the
hash-object is a plumbing command and is not meant to be used in a regular day.
nikhil@nikhil-Inspiron-3537:~/dev/blog/git/.git$ echo 'supercompiler' | git hash-object -w --stdin 755eb4004ee1ac36d0dd51008ed6279c2fb200e5
hash-object to store the object; otherwise, the command simply tells you what the key would be.
--stdin tells the command to read the content from stdin; if you don’t specify this,
hash-object expects the path to a file. The output from the command is a 40-character checksum hash. This is the SHA-1 hash — a checksum of the content you’re storing plus a header.
Let us move to the objects directory and see how the file is stored,
nikhil@nikhil-Inspiron-3537:~/dev/blog/git/.git/objects$ tree . |-- 75 | `-- 5eb4004ee1ac36d0dd51008ed6279c2fb200e5 |-- info `-- pack 3 directories, 1 file
You can see a file in the
objects directory. This is how Git stores the content initially — as a single file per piece of content, named with the SHA-1 checksum of the content and its header. The subdirectory is named with the first 2 characters of the SHA, and the filename is the remaining 38 characters.
You can pull the content back out of Git with the
cat-file command. This command is sort of a Swiss army knife for inspecting Git objects. Passing
-p to it instructs the
cat-file command to figure out the type of content and display it nicely for you.
nikhil@nikhil-Inspiron-3537:~/dev/blog/git/.git/objects$ git cat-file -p 755eb4004ee1ac36d0dd51008ed6279c2fb200e5 supercompiler
Ok, let us play around a bit.
I am creating a v1 of a file and writing it to the repository, followed by modifying the file and writing the v2 to the repository. We can see both the file contents using the cat-file command and see a total of three different hashes stored within the objects directory.
nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ echo "version 1" > manual.txt nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ git hash-object -w manual.txt 83baae61804e65cc73a7201a7252750c76066a30 nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 version 1 nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ echo "version 2" > manual.txt nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ git hash-object -w manual.txt 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ git cat-file -p 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a version 2 nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ tree .git/objects/ .git/objects/ |-- 1f | `-- 7a7a472abf3dd9643fd615f6da379c4acb3e3a |-- 75 | `-- 5eb4004ee1ac36d0dd51008ed6279c2fb200e5 |-- 83 | `-- baae61804e65cc73a7201a7252750c76066a30 |-- info `-- pack 5 directories, 3 files
You can have Git tell you the object type of any object in Git, given its SHA-1 key, with
nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ git cat-file -t 83baae61804e65cc73a7201a7252750c76066a30 blob
Now, there are two things that has to be mentioned here.
- Git does not store the file. Git stores only the contents.
- The contents are stored as a blob object