[go: up one dir, main page]

Confusing git terminology

Next week, Julia Evans published on her blog about confusing git terminology. This is an awesome post but not all explanations resonated with me so I thought I'd write my own version (or rather, add my own notes) in case others felt the same (Julia, feel free to cherry pick from here to your blog 😉). I'll also reorder them to make it easier to cross-reference without you having to jump around.

My mental representation of git

First, let me quickly describe how I represent a git repository in my head.

A git repository is a set of directed acyclic graphs of commits. In many cases a repository has only one such graph, but there can actually be multiple (early users of GitHub Pages know about the gh-pages branch, in most case it's an entirely separate branch, a separate graph not connected in any wayto the other branches).

Then to easily reference some of those commits, we put labels on them: those are our branches and tags (among other things).

Each git repository on a machine contains such a set of directed acyclic graphs of commits, and each time you git clone, git fetch and git push you copy parts of these graphs between repositories.

You can use gitk --all or git log --all --oneline --graph to visualize the graphs known on your matchine.

HEAD and “heads”

As Julia says, “heads” are “branches” (contrary to tags that are immutable, those “heads” move along the graph).

The way I see HEAD though is more like “what's been checked out in the working directory”. It will thus indeed be “the current branch” most of the time, but not always (we'll come to those cases below).

One interesting thing: a remote repository also has a HEAD, it then represents the “default branch” that will be checked out when you clone the repository (unless you tell git to checkout a specific branch). Actually, git makes no distinction between a repository on a server that everyone will clone from (e.g. on GitHub), and any of these clones: git is decentralized before all. You can even clone from a repository you already have on your machine, and observe that the branch that will be checked out by default will be that source repository's HEAD. When you change the “default branch” of your repository on GitHub, what you're actually doing is updating its HEAD.

“reference”, “symbolic reference”

A reference is any label on a commit in the directed acyclic graph of commits. It allows you to reference (sic!) a commit by a (somewhat) simple name (much simpler than the commit ID at least). Those are branches (local and remote), tags, as well as HEAD, FETCH_HEAD, ORIG_HEAD, MERGE_HEAD, etc.

A symbolic reference is a reference that points to another reference, rather than directly to a commit. This is the case of HEAD when you checkout a branch: it points to the branch so that git knows to move that branch forward when you make a new commit.

Note that as Julia notes, HEAD^^^ is not technically a reference, it's one of many different ways of specifying revisions (another name for a commit).

“index”, “staged”, “cached”

I have nothing to add to what Julia wrote. tl;dr: they're all the same thing, but --cached (or --staged which is a synonym) and --index mean slightly different things.

“untracked files”, “remote-tracking branch”, “track remote branch”

The word “track” here has three different meanings:

“detached HEAD state”

When the HEAD points to a (local) branch, each new commit will move the branch label to the new commit.

When the HEAD points to anything else than a (local) branch, git won't be able to move the reference to a new commit: you're in a “detached HEAD state”, if you make a new commit, only HEAD will reference it and nothing else, so if you switch to a branch you'll no longer have any reference (label) to that commit. In other words, you're in a “detached HEAD state” when HEAD is not a “symbolic reference” but directly references a commit.

Note that when you checkout anything that's not a local branch (in refs/heads/), whether it's a tag or a “remote tracking branch”, git will resolve it to the commit ID and setup HEAD to point to that ID, so you'll be in a “detached HEAD state”.

“ours” and “theirs” while merging or rebasing

“Ours” and “theirs”, or “local” and “remote”, are indeed confusing.

When merging, you merge another branch into the current branch: the current branch is “ours” and the other one is thus “theirs”.

But when rebasing the current branch on top of another branch, you're repeatedly cherry-picking the commits from the current branch on top of the other branch, so the other branch is “ours” or “local”, and the commits from the current branch are “theirs”. To make things a bit clearer, I like to think of how rebase work (conceptually at least): after determining the list of commits that defers between the branches and need to be rebased, first checkout the other (target) branch, then for each commit in the list cherry-pick it, and finally update the branch to point to the last rebased commit. Because you start by moving to the branch on top of which you want to rebase, it becomes the “ours” or “local”, and the branch you started from becomes the “theirs” or “remote”.

“Your branch is up to date with ‘origin/main’”

This is directly derived from the “tracking” of your branch, as seen above: if your current branch “tracks” refs/remote/origin/main, then git status will display by how much commit the two branches diverge. When they don't diverge (i.e. both references point to the exact same commit), then the branch is said to be “up to date” with its “upstream”.

Remember though, as Julia points out, that refs/remote/origin/main is only updated when you explicitly fetch from the remote repository (with git fetch, git pull, or git remote update).

“can be fast-forwarded”

This is another message you can see in the output of git status related to the state of this branch relative to its “upstream” branch. We've seen that when they both point to the same commit you'll get an “is up-to-date” message; this one is another situation when the branches have not diverged, but they're not identical either. This happens when the current branch is “behind” its “upstream”: it points to a commit that's part of the “upstream”, but “upstream” actually has more commits.

A - B (main)
     \
      C - D (origin/main)

or if you prefer

A - B (main) - C - D (origin/main)

This will typically be the case when you did git pull a few days ago to bring your main “up-to-date” with origin/main (at that time, both main and origin/main pointed to commit B) and didn't touch it since then, and things continued moving in the origin remote repository (commits C and D were added). When you git fetch origin main, you retrieve commits C and D locally into origin/main; now main can be “fast-forwarded” to commit D by just moving the main label along the graph towards origin/main.

In other words, there's no need to create a merge commit when running git merge (or git pull), and there's no risk of merge conflict. There's hardly any situation safer than a “fast-forward merge”.

Note that such a “fast-forward merge” can actually bring in merge commits (here, main can be fast-forwarded to origin/main, and bring in commits C, D, E, F, and G):

A - B (main) - C - D (origin/main)
 \            /
  E -- F --- G (origin/newfeature)

As for the name, I like to imagine those commits as a timeline, or a tape in a tape cassette or VHS. You were following changes but ⏸️ paused a few days ago at your last git pull. Git knows that there's origin/main ahead in a “straight line” so you can just press the “⏩ fast forward” button to safely reach that new state.

The other situations you can experience that are neither an “is up to date with” or “can be forwarded” are:

HEAD^, HEAD~, HEAD^^, HEAD~~, HEAD^2, HEAD~2

When you need to specify commits as parameters to git commands, one way is to use the commit ID, or a reference (branch, tag) name. But git makes it easier for those commits that are not directly pointed by a reference: if you know how to find that commit then no need to use git log to go search the commit ID yourself, you can tell git how to get to it from another commit.

That's what the ^ and ~ suffixes do (there are other notations as well).

So ^ is actually a shorthand for ^1 which takes the “first parent” of the commit you apply it to. Most commits have only a single parent, but merge commits will have at least 2 (yes, at least, you can actually have merge commits with more than 2 parents), so ^ or ^1 will take the first, and ^2 the second (and ^3 the third, you got it).

HEAD^^ actually just applies the ^ operator to HEAD^, which itself had applied it to HEAD, therefore taking “two commits ago”.

To make it easier to follow the “first parents”, the ~ operator can be used. Similarly, ~ is actually a shorthand for ~1. Directly taken for the docs, ~3 is equivalent to ^^^ and directly expressed “three commits before” (or “three commits ago” when applied to HEAD). So “ten commits ago” can be written either HEAD^^^^^^^^^^ or HEAD~10, one is easier to read than the other 😉

.. and ...

Those are generally used with git log and git diff.

The notation r1..r2 selects all commits reachable from r2 that are not reachable from r1 (note that r1 and r2 can be any form of revision: a reference or a commit ID), whereas r1...r2 selects all commits reachable from either r1 or r2 but not both.

In a typical tree with two diverging branches like this:

A - B (main)
  \ 
    C - D (test)

the notation main..test will select all of B, C and D (but not A), whereas main...test will select commits C and D only.

Note that the behavior is different with git diff, as git diff is about comparing two points in the graph, not a range of commits! git diff thus has its own definition for .. and ...: whereas git diff r1..r2 is equivalent to git diff r1 r2, showing the difference between those 2 commits, git diff r1...r2 will however find the last common ancestor of r1 and r2 (same as git merge-base r1 r2), and diff between that common ancestor and r2. In other words, git diff main...test will show the changes in test since the point it diverged from main (what changes did I add to my branch, ignoring commits added to the “upstream” since then? or what changes exist in my “upstream” branch since I branched out, ignoring changes in my branch?)

While this might seem the reverse of git log (commit B is taken into account by git log main...test but not git log main...test, and by git diff main...test but not git diff main..test), this is actually rather consistent with git log, at least for ...: git log main...test and git diff main...test will both only tell you about commits C and D (notice that this is what GitHub is using when clicking on those compare links).

TL;DR: forget about the .. notation, it's almost never what you want for git log, use either ... or the space-separated form of git diff.

refspecs

Refspecs are used by git fetch and git push to determine what to fetch or push, respectively, and the mapping between local references and remote ones (though most of the time one uses those commands without an explicit refspec). A default refspec can also be configured for a remote (remote repository) for each action (fetch or push); one will generally be configured for fetching.

When you clone a repository, git sets up a remote named origin and configures its default refspec, generally with +refs/heads/*:refs/remotes/origin/* but this can differ depending on the options passed to git clone.

This refspec tells git that when fetching from the remote repository, all the references inside refs/heads/ (due to the * wildcard) will be fetched and stored locally into refs/remote/origin/ (using the same name suffix). The + is equivalent to passing --force to the commands and will update the destination reference even if the new value is not “fast-forwarded” from the current value. When fetching, this means that if someone force-pushed a branch, git will update the corresponding refs/remote/ on your side to make it match the remote reference; without the +, your “remote-tracking branch” would instead stay desynchronized.

The --tags flag is actually a shorthand to adding the refs/tags/*:refs/tags/* refspec: tags are synchronized (either fetched or pushed, depending on the command) between repositories (without overwriting existing tags at the destination).

As I said above, you can actually use those refspecs for pushing too.

For example, with git push origin HEAD:test you will update (or possibly create) a test branch on the remote repository (git will expand test to refs/heads/test) to point to the commit that's locally your HEAD (this will send the appropriate commits to the remote to make it possible). I use this from time to time on side-projects where I'm the only maintainer to test local commit on a scratch branch, to trigger my GitHub Actions; if the build pass, then only will I push to main; all without having to create that test branch locally.

I sometimes also use the form git push origin main^:main to push my main branch, except for its last commit, that I will keep local as it's likely a work in progress.

People working with Gerrit will be familiar with git push origin HEAD:refs/for/main to push commits for review (refs/for is a magic namespace in Gerrit to push for review for a target branch), and now you know what it means 😉.

You might sometimes also see things like git push origin :test, this will delete the remote test branch, and is equivalent to git push --delete test (and it was the only way to delete a remote branch or tag before the --delete flag was added).

“reset”, “revert”, “restore”

Those three terms are all meant to somehow destroy something, but in different ways. Eck there's even a section of the docs dedicated to disambiguating them!

checkout

The git checkout command can do two seamingly unrelated things:

Technically, those are actually quite similar as they're about changing files in your working directory, and in the case of “switching” also changing what HEAD points to.

Nowadays, you should rather use the git switch and git restore commands to the same effects.

“tree-ish”

In git, each commit is a snapshot of the state of the repository, along with some metadata (among them the commit message, committer, and author). That snapshot is stored as a tree object. A “tree-ish” is anything that resolves to a tree object: either the tree ID itself, or a commit-ish (a commit ID, a reference name, possibly using the ^ or ~ operators as seen above).

Technically you can also refer to a subtree (directory) of a given tree-ish by suffixing it with : followed by the path of the directory. While I sometimes use this notation with git show to refer to files (show me the content of the given file inside that commit), I've never ever used it for a subtree (this can apparently be used with git restore --source=, git checkout, and git reset; looks like a very advanced feature to me).

reflog

The reflog, or reference log, is kind of an audit log of any change ever done to references in your local repository.

You'll almost never use it but it can save yourself in some gnarly situations, to recover things you accidentally deleted.

merge vs rebase vs cherry-pick

I have to say I don't quite understand how those terms are confusing 🤷

I suppose this is due to superficial knowledge of git; knowing mostly git commands and not really having a mental representation of the concepts at hand. Git core concepts aren't that hard to comprehend, but if nobody explains them to you and you only learned to use git by memorizing a few commands, you can quickly get lost, particularly when told to change your workflow (fwiw, this is I think the main reason we created internal training sessions at work, starting from those concepts towards the commands that manipulate them, dispensed to all new hires).

The commands can sometimes be confusing to use though:

The thing to remember: git rebase can be destructive, so use with care and don't hesitate to create a branch as bookmark before you rebase, and/or abort your rebase if you feel like you lose control of it. That being said, my personal workflow involves rebasing a lot

git rebase --onto

When you use git rebase main to rebase your current branch on top of main (e.g. just before merging it, as a “fast-forward merge”, because you like your history to be linear; or just to avoid all those merge commits whenever you want to sync your feature branch with new changes from main), git will first find the last common ancestor between your current branch and main, and get the list of commits in your branch since that point (this is the exact equivalent to git log main... or git log main...HEAD if you remember). It will then replay them on top of main.

So main is used twice here: to find which commit to rebase, and “onto” which base.

Imagine you started working on a new feature, so you branched from main at some point. Then management decides that the feature becomes a priority and should be released early, without other features that already landed on main. So a new branch (let's call it release-X) is created from an earlier point of main than you branched from, then possibly a few bugfixes are cherry-picked too. You would then want to take all the commits from your branch and move them as if you branched from that new branch (or any earlier point from main than you initially branched from): git rebase --onto release-X main.

commit, more confusing terms, and all the rest…

I'll stop there I have nothing to add to what Julia says on “commit”.

I might actually do a followup post with some of the things she left out. I'd personally add fork vs. clone too.