He who dove too deep into git checkout and git reset, Part 1

Red Huq

2020-06-20 09:00

I love git. I think it's a brilliant masterpiece of design and programming by Linus Torvalds. To me it's more than just a tool—it's art. Ok I'm a little weird but we know that today git is the de facto tool for version control and an unequivocal cornerstone of successful software development.

In this blog post series, I'm going to dive really deep into the inner workings of the commands git checkout and git reset, break them down logically, and expose the method to the madness. You might already be familiar with a few aspects of these commands, but both of them conceal surprisingly powerful and useful functionality. Some of it might be new territory but I hope a lot of it will absolutely blow your mind. Either way, let's started with git checkout.

Table of contents¶

A pointer or two about a pointer or two
Checking out a branch
Checking out a commit
Demystifying the detached HEAD state
When and why git checkout sometimes doesn't work
Thoughts and what's next

A pointer or two about a pointer or two¶

To understand how git checkout and git reset work, we're going to need to visit a place that most developers pretend doesn't exist, if not dread. Yes, we're going to visit the .git directory of a git repository.

The .git directory is chock full of information about a codebase and its history, but the only components we'll be needing here are two types of pointers.

To help explain, I'll be using this toy repository below—feel free to clone it and follow along. In the repository, you can see I made a few commits on the master branch, and then added a few commits on its two child branches branch1 and branch2.

The branch reference¶

The first pointer is a branch reference (let's call it a BRANCH ref from now on) and it identifies the most recent commit—the "tip" or present state—of a branch by storing its SHA-1 value in a file within the directory .git/refs/heads/. Let's take a peek at the BRANCH ref for the master branch.

We see that the abbreviated SHA-1 value for the most recent commit on the master branch matches the value stored in .git/refs/heads/master. As you may have guessed, the branches branch1 and branch2 have their own BRANCH refs.

The HEAD reference¶

In git repositories, we typically have numerous branches that we switch between frequently. How does git know which branch we're currently on? That responsibility is given to another type of pointer called the HEAD reference (or the HEAD ref). Sound familiar? When interacting with git in the command line, you've probably seen the HEAD ref rear its head (sorry, couldn't help it).

While there's a BRANCH ref for each branch, there's only one HEAD ref. By default, the HEAD ref file stores a pointer to the BRANCH ref of the current branch. Because we're on the master branch right now, let's visualize what this looks like.

Now let's examine the HEAD ref file in the .git directory.

We can see that the file .git/HEAD contains a pointer to the BRANCH ref for the master branch—our current branch. To summarize, think of the HEAD ref as a marker for where you are currently in the repository.

Checking out a branch¶

Let's use our newfound knowledge on these two types of pointers to unravel a command most developers are very familiar with and use often: git checkout <branch>—it lets us switch from one branch to another.

When switching branches, git assigns the HEAD ref to a different BRANCH ref. That's it.

Here's the updated diagram after executing git checkout branch1.

Let's also take a look at the HEAD ref file in the .git directory.

I've hacked my terminal to make it pretty, but try executing the default git log while on branch1. That very first line in the log has been concealing the truth the entire time!

Checking out a commit¶

If git checkout <branch> moves us to another branch, then git checkout <commit> (where <commit> is a SHA-1 value) probably moves us to another commit? Indeed! But what does that even mean, and what happens to the HEAD ref?

I'll answer the second question first: git checkout <commit> assigns the HEAD ref directly to a commit, i.e., the HEAD ref file no longer stores a pointer to a BRANCH ref, but rather stores the commit's SHA-1 value.

If we were to checkout the first commit on branch1 (SHA-1 2aaf9c0), the updated diagram would look like:

Now let's see what happens when we emulate this in the terminal and inspect the HEAD ref file.

Ignore the scary looking "detached HEAD state" message and focus on the contents of the HEAD ref file. As we can see, it no longer contains a pointer but the full SHA-1 value for the commit we checked out.

Demystifying the detached HEAD state¶

I still remember the first time I inadvertently found myself face-to-face with the scary "detached HEAD state" (is this limbo?) and those austere messages from git. It's actually pretty straightforward once you break it down.

Because the HEAD ref no longer points to a BRANCH ref, it's not coupled to a branch and its most recent commit; hence the aptly named detached HEAD state. But let's think about what this entails. We know a BRANCH ref identifies the present state of a branch, while the HEAD ref tells us where we are currently in the repository. That suggests checking out a commit enables us to visit a previous state of the codebase—effectively travelling back in time!

We can also move forward in time if we checkout a more recent commit, say, the second commit on branch1 (SHA-1 138ef91). Here's the updated diagram after executing git checkout 138ef91.

The target commit can even be the most recent commit on a branch, but checking out a commit will always result in entering a detached HEAD state.

Using the detached HEAD state effectively¶

Why would anyone want to be in a detached HEAD state?

For starters, a detached HEAD state can be your best friend when fixing bugs. Say we discover a bug but don't know when or how it was introduced. By checking out past commits, we can regress to a state when the code was bug-free and then identify the commit responsible. If you adhered to git best practices by submitting atomic commits with informative commit messages, then finding the culpable commit is straightforward because you've left yourself numerous breadcrumbs.

Fixing the bug would be even easier—you could just revert that commit without worry! There's even a handy git command to systematically find the bug using binary search. But if your commits were bloated with all sorts of changes and the commit messages were broad and ambiguous, then may god help you.

Another great use of a detached HEAD state is for experimentation. We can play around with a previous state of the codebase, and even submit commits in the local repository. But once we leave the detached HEAD state, these commits will be lost. To keep these commits, simply create a new branch.

Leaving the detached HEAD state¶

There's two ways to exit a detached HEAD state: at any point, either create a new branch or switch to an existing one. Why do these work? Because first, both options involve checking out a branch. Second, we already know that checking out a branch assigns the HEAD ref to a different BRANCH ref (either newly created or an existing one, respectively); either way, the HEAD ref gets "reattached".

We'll demo this by checking out the second commit of branch1, creating a new file, submitting a commit, and then finally creating a new branch branch3. The updated diagram would look like:

Next, let's execute the corresponding commands in the terminal.

Notice that creating branch3 not only let us leave the detached HEAD state, but the contents of the HEAD ref file once again points to a BRANCH ref.

When and why git checkout sometimes doesn't work¶

Sometimes checking out commits (git checkout <commit>) or existing branches (git checkout <branch>) fails and git tells us

Please commit your changes or stash them before you switch branches. Aborting

In this final section, we'll explain when and why this scenario occurs and how to deal with it.

Let's say we made a change—either modified the contents of a file, deleted it, or added a new file entirely. If that change exists in our working directory or index (some call it the staging area), attempting to checkout a commit or an existing branch will fail if that change is in conflict with the state of the working directory or index, respectively, of the target commit or branch.

I'll admit that's a lot to digest so to help explain it, let's start on branch1, modify file1.py (i.e., introduce a change in the working directory), and try to switch to branch2, which also contains a file1.py.

Because our changed file1.py in the working directory would be overwritten by the same file in the working directory on branch2, git actually does us a favor and asks us what we'd like to do with the change we made (hey it could be important). And we have 3 options: either commit, stash, or discard the changes. After selecting one of these options, we'd be free to switch to branch2.

The same rules apply if:

we tried to switch branches but the above change existed in the index instead of the working directory
we attempted to checkout a commit, instead of a branch, that had a conflicting version of file1.py

To really hit this home, let's introduce a different change by creating a new file file3.py on branch1, adding it to the index, and then attempting to switch to branch2.

Notice that git doesn't stop us from checking out branch2. In fact, git allows us to bring that change over to branch2. This is because branch2 doesn't already have a file3.py in its index and therefore, no conflict is present.

One last point and it might be obvious but checking out a branch will never fail when creating a new branch regardless of what changes are currently present in the working directory or index. This makes sense because the working directory and index of a new branch will never, by definition, be in conflict with the original branch.

In summary, during a git checkout <branch> or git checkout <commit>, both the working directory and index will be updated. If there are any conflicts present, we'd need to address them.

Thoughts and what's next¶

We all know that git has a steep learning curve, mainly due to its abstract nature. The fact that most training sessions ask newcomers to memorize commands doesn't help at all. Learning git by memorizing and regurgitating didn't work for me, and I'm positive that applies to most folks. Only after dissecting a command, unraveling how it works, and understanding the underlying logic driving its functionality did I feel proficient.

I know this was probably more than you ever expected to know about using git checkout, but I'm hoping it empowered you to use this command more confidently and effectively so you spend more time developing and spend less time perusing StackOverflow or googling to refresh your memory.

The next command I plan to cover is git reset; it's not as straightforward as git checkout, but everything I covered here will serve as building blocks to digesting git reset. So keep an eye out for a future blog post on this command as well how to use both git checkout and git reset at the file level...