How to think about what the HEAD thingy actually is in Git – 097

My learning preference is through hands-on experimentation. I like taking things apart, putting them back together, or trying to break stuff. So let’s break apart this HEAD thingy.

According to this SO answer, it is a file.

saraf@TheBlueDog  ~/Source/Repos/scrap/detachedHeadPractice (master)
$ cat .git/HEAD
ref: refs/heads/master

and that file contains some content. I’d assume “ref” means reference, and “master” is the master branch we’ve been playing with. I’m not sure what “refs/heads/master” actually stands for, but I’ll figure that out later.

I keep hearing HEAD referred to as a pointer. This would make sense if it is a file that only contains the location(?) where to point to. That answers the literal question.

But what does the HEAD pointer do?

According to this stack overflow answer, “HEAD is a reference to the last commit in the currently checked-out branch.” Considering we have master checked out, that makes sense. Doing a git log would show all the commits in the repo.  If we switched to a branch and did a git log, we’d see all the commits for that branch.

But what about the detached HEAD state from yesterday?

Continuing with the same stack overflow answer, “A detached HEAD is the situation you end up in whenever you check out a commit (or tag) instead of a branch.”

Yep, we saw that yesterday. HEAD is pointing at the 2nd commit, whereas master is pointing at the 3rd.

visualization tool showing a detached HEAD state 

Now I had to re-read this line from the same stack overflow answer, “In this case, you have to imagine this as a temporary branch without a name; so instead of having a named branch reference, we only have HEAD.”

A temporary branch without a name??

Okay, what I think is happening is that if you were in a detached state (as shown above) and did a git log, you’d only see 2 commits, not 3.  Since you are seeing 2 commits, you are still technically on a branch. It’s just that this branch doesn’t have a name.

In other words, a Git pointer never points to just a single commit. A git pointer points to the last commit in that series of commits (which I guess is where the temporary nameless branch concept comes from). Which now begs the question… what’s really a git branch?

3 thoughts on “How to think about what the HEAD thingy actually is in Git – 097

  1. “you are still technically on a branch” – I think this way of thinking about it is likely to lead you astray. This will seem (at first) like nit picking, but I find it more helpful to think of it as “you are on a commit that is part of a branch’s history.” Subtle (and pedantic) though this may seem, it is the key to truly understanding how what git branches really are.

    Why does this distinction matter? Well, consider the case where that commit shows up in more than one branch. E.g., suppose you have master as shown in your example, but you also have some ‘mybranch’ which ends in a different commit, but where that commit’s predecessor is the same commit your detached head is currently on.

    Once you’re in this state, you can make a slightly different statement: “you are on a commit that is part of two different branches’ histories”.

    And here’s the super-crucial point of the ‘detached HEAD’ state: even if the commit is visible somewhere in some particular branch’s history, git doesn’t actually know about that. (It could, in principle, deduce this fact – it could look at all your branches and walk the entire history of all of them to see if it can find your commit. And then it could, in principle, produce a list of all the branches that your detached HEAD is ‘on’. But it doesn’t do this. It would be expensive, and in any case, that’s not how it works.)

    And this is the case even if your detached HEAD is pointing at exactly the same place as the tip of one or more branches. (Again, git could look at every single branch, and if any of those branches match your detached HEAD, then it could, in principle, say that you are ‘on’ all of those branches. This would be significantly less expensive than walking the history of every single branch to see if your commit shows up, but it doesn’t do this either.)

    One thing that I found helpful to bear in mind when I was trying to wrap my head around this is that you can have any number of distinct branches all on the same commit. (In fact, this is the normal state when you create a new branch from an existing branch. The commit for the new branch will be the very same commit as the one of the original branch.) So you can have 1 commit as the tip of 10 branches. The branches are distinct because you can do work on any of these branches without affecting any of the others – each is free to move independently.

    So when you are in detached HEAD mode, you are, by definition, not ‘on’ any branch. Sure, the commit you’re on may well appear in any number of branches, but that’s not information git actually has direct access to. (It could in principle calculate it, but it doesn’t.)

    The reason this matters is that if you’re on a branch, certain things that you do will affect that branch. So if you commit new changes, that’ll update where that branch points. This is important because your branches define all the bits of your repo you can actually get to. If you have a detached HEAD, it’s possible that it’s pointing at a commit that isn’t accessible through anything other than the detached HEAD. If you commit some changes, you will almost certainly have put yourself into that state. (I say ‘almost’ because if you are sufficiently careful you can create a new commit that is absolutely identical to an existing commit already in the system, in which case git won’t actually create a new commit – commits that are identical in every respect are, by definition, the same commit. But this is something you’d have to contrive – it wouldn’t normally happen.) So new commits you make in the detached HEAD state don’t get stored anywhere that can be relied upon to remain accessible in the long run. They will be accessible to you for as long as your HEAD points to them, but if you do anything that points the head elsewhere (e.g., switching to an actual branch), your changes are now no longer accessible. Well, they’ll remain accessible through the reflog for a short and undefined length of time, but they are now eligible for garbage collection, so they could be gone at any moment.

    But the key point here is that anything you do while in detached HEAD state will not have any effect on any branch. This is why it is misleading to say “you are still technically on a branch”.

    One of the most important epiphanies for me with git is the realisation that a particular commit is not on a particular branch. One commit can show up on any number of branches (including 0 branches, if you’re in the detached HEAD state). Given a commit, you cannot reliably answer the question “Which branch am I on?” That information isn’t stored in the commit, nor could it be because a commit can be in any number of different branches. (And although it is possible in principle to list all the branches the commit appears in, that’s not the same thing as being ‘on’ any of those branches, because anything you do in detached head won’t affect any branches. And the defining feature of being on a particular branch is that your operations affect that branch. That is precisely what it means to be on a branch.) Also, if you do things like rebasing, a commit that was in a branch might not be any more. So even if commits did contain a list of the branches they are in (which they don’t) that list would go out of date over time, and since commits are immutable in git (anything that appears to modify a commit actually creates a new one) such a list, if it were baked into a commit, wouldn’t be reliable.

    There are only two things you can determine directly from a commit: 1) what does the source tree look like at this commit? 2) what were the parents of this commit (i.e., what commit or, in the case of a merge, commits, preceded this one immediately)? Commits have no concept of a branch.

    (Commit messages may mention branches, but this is mere convention, and not part of git’s model.)

    A completely different way to look at this is that a branch is a concept layered on top of commits (which in turn sits on top of trees and blobs). There’s an entire layer of git in which branches are just not a concept at all; commits are in that layer. Branches are just modifiable references to commits with special behaviour for when you tell git you want to be ‘on’ one of them.

    Sorry for the wordy answer. I hope it wasn’t too patronising. I was trying to channel my former self back when I was learning this stuff, attempting to describe all the stuff I wish I had known earlier.

    Like

  2. Hey! Wow thanks for the detailed reply! Receiving comments like this is *exactly* why I am doing this series. I want to have a discussion about these concepts. For example, you said that it’s more helpful to think “you are on a commit that is part of a branch’s history.” Exactly. that’s was my original thinking as well, but after reading the SO accepted answer, I thought “perhaps I’m missing something.”

    I’m on a train heading into work, and I’ll read your reply in details tonight, but I wanted to reach out quickly to let you know I sincerely appreciate you taking the time to write up your thoughts!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s