How to think about what the HEAD thingy actually is in Git – 097

My learning preference is through hands-on experimentation. I like taking things apart, putting them back together, or trying to break stuff. So let’s break apart this HEAD thingy.

According to this SO answer, it is a file.

saraf@TheBlueDog  ~/Source/Repos/scrap/detachedHeadPractice (master)
$ cat .git/HEAD
ref: refs/heads/master

and that file contains some content. I’d assume “ref” means reference, and “master” is the master branch we’ve been playing with. I’m not sure what “refs/heads/master” actually stands for, but I’ll figure that out later.

I keep hearing HEAD referred to as a pointer. This would make sense if it is a file that only contains the location(?) where to point to. That answers the literal question.

But what does the HEAD pointer do?

According to this stack overflow answer, “HEAD is a reference to the last commit in the currently checked-out branch.” Considering we have master checked out, that makes sense. Doing a git log would show all the commits in the repo.  If we switched to a branch and did a git log, we’d see all the commits for that branch.

But what about the detached HEAD state from yesterday?

Continuing with the same stack overflow answer, “A detached HEAD is the situation you end up in whenever you check out a commit (or tag) instead of a branch.”

Yep, we saw that yesterday. HEAD is pointing at the 2nd commit, whereas master is pointing at the 3rd.

visualization tool showing a detached HEAD state 

Now I had to re-read this line from the same stack overflow answer, “In this case, you have to imagine this as a temporary branch without a name; so instead of having a named branch reference, we only have HEAD.”

A temporary branch without a name??

Okay, what I think is happening is that if you were in a detached state (as shown above) and did a git log, you’d only see 2 commits, not 3.  Since you are seeing 2 commits, you are still technically on a branch. It’s just that this branch doesn’t have a name.

In other words, a Git pointer never points to just a single commit. A git pointer points to the last commit in that series of commits (which I guess is where the temporary nameless branch concept comes from). Which now begs the question… what’s really a git branch?

4 thoughts on “How to think about what the HEAD thingy actually is in Git – 097

  1. “you are still technically on a branch” – I think this way of thinking about it is likely to lead you astray. This will seem (at first) like nit picking, but I find it more helpful to think of it as “you are on a commit that is part of a branch’s history.” Subtle (and pedantic) though this may seem, it is the key to truly understanding how what git branches really are.

    Why does this distinction matter? Well, consider the case where that commit shows up in more than one branch. E.g., suppose you have master as shown in your example, but you also have some ‘mybranch’ which ends in a different commit, but where that commit’s predecessor is the same commit your detached head is currently on.

    Once you’re in this state, you can make a slightly different statement: “you are on a commit that is part of two different branches’ histories”.

    And here’s the super-crucial point of the ‘detached HEAD’ state: even if the commit is visible somewhere in some particular branch’s history, git doesn’t actually know about that. (It could, in principle, deduce this fact – it could look at all your branches and walk the entire history of all of them to see if it can find your commit. And then it could, in principle, produce a list of all the branches that your detached HEAD is ‘on’. But it doesn’t do this. It would be expensive, and in any case, that’s not how it works.)

    And this is the case even if your detached HEAD is pointing at exactly the same place as the tip of one or more branches. (Again, git could look at every single branch, and if any of those branches match your detached HEAD, then it could, in principle, say that you are ‘on’ all of those branches. This would be significantly less expensive than walking the history of every single branch to see if your commit shows up, but it doesn’t do this either.)

    One thing that I found helpful to bear in mind when I was trying to wrap my head around this is that you can have any number of distinct branches all on the same commit. (In fact, this is the normal state when you create a new branch from an existing branch. The commit for the new branch will be the very same commit as the one of the original branch.) So you can have 1 commit as the tip of 10 branches. The branches are distinct because you can do work on any of these branches without affecting any of the others – each is free to move independently.

    So when you are in detached HEAD mode, you are, by definition, not ‘on’ any branch. Sure, the commit you’re on may well appear in any number of branches, but that’s not information git actually has direct access to. (It could in principle calculate it, but it doesn’t.)

    The reason this matters is that if you’re on a branch, certain things that you do will affect that branch. So if you commit new changes, that’ll update where that branch points. This is important because your branches define all the bits of your repo you can actually get to. If you have a detached HEAD, it’s possible that it’s pointing at a commit that isn’t accessible through anything other than the detached HEAD. If you commit some changes, you will almost certainly have put yourself into that state. (I say ‘almost’ because if you are sufficiently careful you can create a new commit that is absolutely identical to an existing commit already in the system, in which case git won’t actually create a new commit – commits that are identical in every respect are, by definition, the same commit. But this is something you’d have to contrive – it wouldn’t normally happen.) So new commits you make in the detached HEAD state don’t get stored anywhere that can be relied upon to remain accessible in the long run. They will be accessible to you for as long as your HEAD points to them, but if you do anything that points the head elsewhere (e.g., switching to an actual branch), your changes are now no longer accessible. Well, they’ll remain accessible through the reflog for a short and undefined length of time, but they are now eligible for garbage collection, so they could be gone at any moment.

    But the key point here is that anything you do while in detached HEAD state will not have any effect on any branch. This is why it is misleading to say “you are still technically on a branch”.

    One of the most important epiphanies for me with git is the realisation that a particular commit is not on a particular branch. One commit can show up on any number of branches (including 0 branches, if you’re in the detached HEAD state). Given a commit, you cannot reliably answer the question “Which branch am I on?” That information isn’t stored in the commit, nor could it be because a commit can be in any number of different branches. (And although it is possible in principle to list all the branches the commit appears in, that’s not the same thing as being ‘on’ any of those branches, because anything you do in detached head won’t affect any branches. And the defining feature of being on a particular branch is that your operations affect that branch. That is precisely what it means to be on a branch.) Also, if you do things like rebasing, a commit that was in a branch might not be any more. So even if commits did contain a list of the branches they are in (which they don’t) that list would go out of date over time, and since commits are immutable in git (anything that appears to modify a commit actually creates a new one) such a list, if it were baked into a commit, wouldn’t be reliable.

    There are only two things you can determine directly from a commit: 1) what does the source tree look like at this commit? 2) what were the parents of this commit (i.e., what commit or, in the case of a merge, commits, preceded this one immediately)? Commits have no concept of a branch.

    (Commit messages may mention branches, but this is mere convention, and not part of git’s model.)

    A completely different way to look at this is that a branch is a concept layered on top of commits (which in turn sits on top of trees and blobs). There’s an entire layer of git in which branches are just not a concept at all; commits are in that layer. Branches are just modifiable references to commits with special behaviour for when you tell git you want to be ‘on’ one of them.

    Sorry for the wordy answer. I hope it wasn’t too patronising. I was trying to channel my former self back when I was learning this stuff, attempting to describe all the stuff I wish I had known earlier.

    Like

  2. Hey! Wow thanks for the detailed reply! Receiving comments like this is *exactly* why I am doing this series. I want to have a discussion about these concepts. For example, you said that it’s more helpful to think “you are on a commit that is part of a branch’s history.” Exactly. that’s was my original thinking as well, but after reading the SO accepted answer, I thought “perhaps I’m missing something.”

    I’m on a train heading into work, and I’ll read your reply in details tonight, but I wanted to reach out quickly to let you know I sincerely appreciate you taking the time to write up your thoughts!

    Like

  3. Sorry for the delay in replying – I’ve had a heck of a month and a half. Also, I thought WordPress would notify me when you replied but it didn’t for some reason.

    I can see some possible benefit of thinking of the detached HEAD state as being like an unnamed branch, because it would then let you make the simplifying assumption that whatever you do in git, you’re always working on a branch of some kind, even if it’s a special unnamed one. By contrast, having to accommodate the fact that there is a special case in which there is no branch complicates the mental model in a way that cuts across a lot of what you do in git. So I can sort of see where that SO answer is coming from.

    However, it’s not how the git docs describe it. The description of the detached HEAD state can be found in https://git-scm.com/docs/git-checkout and it says:

    “detached HEAD […] means simply that HEAD refers to a specific commit, as opposed to referring to a named branch”

    So the git docs don’t ask you to think of this state as being on a temporary branch without a name. True, it does use the phrase “named branch” which could be taken to imply that there’s also such a thing as an unnamed branch. But I’ve not come across any such concept in the docs – as far as I can tell, all branches are named branches. And the documentation typically calls out detached HEAD as a special case.

    Also, ‘git branch -a’ shows the text ‘* (no branch)’ when your HEAD is detached, which again seems to point against thinking of detached head as a temporary unnamed branch.

    I think that if the tooling doesn’t treat detached HEAD as an unnamed temporary branch, it’s probably going to work better to go along with what git thinks. Adopting mental models that are different from what the tools use is a risky business.

    I think with hindsight I was also conflating a couple of issues in my earlier reply: 1) how to think about detached HEAD mode, and 2) how to think about what branches really are in git. These are related, and responding correctly to finding yourself in detached HEAD mode definitely needs clear understanding of what branches are. But they’re still different things, and I think maybe I tried to talk about both ideas at once. That’s partly because a client of mine got into a pickle recently after getting into a detached HEAD mode because they had a misleading concept of how branches worked. (They thought that a particular commit was inherently on a particular branch. Mind you, they were also using submodules, which provides many additional opportunities to confuse yourself. Managing branching across submodules is vexing, and frequently puts you into detached HEAD mode.)

    Like

Leave a comment